|
← back
mtGEMM: An Efficient GEMM Library for Modern Multi-Core DSPs
Apr 1, 2026 by Jianbin Fang, Kainan Yu, Peng Zhang, Dezun Dong, Xinxin Qi, Xingyu Hou, Ruibo Wang, Kai Lu (IEEE Transactions on Parallel and Distributed Systems)
DOI 10.1109/TPDS.2026.3664114
Built mtGEMM, a practical GEMM library tuned end-to-end for modern multi-core DSPs by rethinking micro-kernels for heterogeneous on-chip memory, removing multi-core memory bottlenecks, and nailing efficient transpose-GEMM; it squeezes 92–96% of hardware peak with nearly linear scalability, so DSPs finally get GEMM performance that matches their raw compute and bandwidth.
source S2, crossref
|