|
← back
TOM: A Ternary Read-only Memory Accelerator for LLM-powered Edge Intelligence
Feb 24, 2026 by Hongyi Guan, Yijia Zhang, Wenqiang Wang, Yizhao Gao, Shijie Cao, Chen Zhang, Ningyi Xu
We built TOM, a hybrid ROM-SRAM accelerator that fuses ternary quantization with logic-synthesized ROM to cram huge LLM weights on-chip while keeping QLoRA-style adapters in SRAM for on-device tuning — think extreme density and bandwidth from ROM with the flexibility of SRAM. It’s fun because we exploit sparsity-aware ROM cells, distributed ROM+SRAM banks and dynamic power gating to hit real-time edge throughput (3,306 TPS on BitNet-2B) without surrendering tunability.
source S2
|