|
← back
PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator
Jan 20, 2026 by Yue Jiet Chong, Yimin Wang, Zhen Wu, Xuanyao Fong (arXiv.org)
DOI 10.48550/arXiv.2601.13628
Built PRIMAL, a PIM-first LLM inference accelerator that natively runs LoRA by combining heterogeneous PIM PEs, a 2D mesh computational fabric, and an SRAM reprogramming + power gating trick to overlap reconfigs with compute; the result is a practical, pipelined LoRA flow that slashes communication and power while boosting throughput and energy efficiency versus an H100 on Llama-13B.
source S2, openalex
|