Loop blocking for linear algebra codes often have three levels: register blocking, L2 cache blocking, and L3 cache (or TLB) blocking.
Some notes on blocking for different purposes in GEMM operations.
Loop blocking for linear algebra codes often have three levels: register blocking, L2 cache blocking, and L3 cache (or TLB) blocking.
Some notes on blocking for different purposes in GEMM operations.
通用矩阵乘(GEMM)优化算法
软件优化策略: 1)改进访存局部性; 2)利用向量指令.