LoCaLUT: Harnessing Capacity–Computation Tradeoffs for LUT-Based Inference in DRAM-PIM

LoCaLUT: Harnessing Capacity–Computation Tradeoffs for LUT-Based Inference in DRAM-PIM

Jan 31, 2026 by Junguk Hong, Changmin Shin, Sukjin Kim, Si Ung Noh, Taehee Kwon, Seong-Yeol Park, Hanjun Kim, Youngsok Kim, Jinho Lee (International Symposium on High-Performance Computer Architecture)

DOI 10.1109/HPCA68181.2026.11408523

We turned DRAM-PIM’s memory abundance into compute by packing many MACs into LUT lookups and then shrunk and streamed those LUTs for real devices—introducing canonicalization to kill redundancy, a tiny remap LUT to recover canonical forms, and slice streaming to only pull useful columns into the buffer. The result, LoCaLUT, makes low-bit DNN inference on UPMEM-style PIMs actually fast and scalable without adding logic—check the repo if you want to steal the idea.

source S2, crossref

dgfl, 2026