PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator

PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator

Jan 20, 2026 by Yue Jiet Chong, Yimin Wang, Zhen Wu, Xuanyao Fong (arXiv.org)

Built PRIMAL, a PIM-first LLM inference accelerator that natively runs LoRA by combining heterogeneous PIM PEs, a 2D mesh computational fabric, and an SRAM reprogramming + power gating trick to overlap reconfigs with compute; the result is a practical, pipelined LoRA flow that slashes communication and power while boosting throughput and energy efficiency versus an H100 on Llama-13B.

source S2, openalex

dgfl, 2026