Papernews
← back

TeLLMe: An Efficient End-to-End Ternary LLM Prefill and Decode Accelerator with Table-Lookup Matmul on Edge FPGAs

Feb 21, 2026 by Ye Qiao, Zhiheng Chen, Yifan Zhang, Yian Wang, Sitao Huang (Symposium on Field Programmable Gate Arrays)

DOI 10.1145/3748173.3779191



Built TeLLMe, a low-power edge FPGA accelerator that runs ternary 1.58-bit LLMs end-to-end by replacing matmuls with a table-lookup ternary engine, streaming fused ops, URAM weight buffering, and memory-savvy prefill/decoding attention so you actually get sub‑1s TTFT and ~25 tok/s at 5W — makes deploying real LLM workloads on wearables and other embedded platforms practical.

source S2, crossref



dgfl, 2026