A 28 nm 1.3 TFLOPS/mm2 Floating-Point SRAM-Based CIM Macro With Asynchronous Normalization and Parallel Sorting Alignment for AI-Edge Chip

Mar 1, 2026 by Zhiting Lin, Miao Long, Yang Yang, Xin Wang, Hao Li, Wenqiang Zhang, Lintao Chen, Yu Liu, Xin Li, Xiulong Wu (IEEE Transactions on Very Large Scale Integration (VLSI) Systems)

DOI 10.1109/TVLSI.2025.3630648

Built a 28 nm SRAM-based FP-CIM macro that rips through the usual floating-point baggage by doing asynchronous exponent normalization while finding the max exponent and doing mantissa alignment via a parallel cross-structure max instead of subtract-and-shift. The result is a tiny 6 kb, 0.067 mm2 macro hitting 1.3 TFLOPS/mm2 and 12.8 TFLOPS/W at 150 MHz—FP compute-in-memory with dramatic area and energy wins for edge AI.

source S2, crossref

dgfl, 2026