FARE: A Fine-grained Pipelined Reconfigurable FlashAttention Kernel

FARE: A Fine-grained Pipelined Reconfigurable FlashAttention Kernel

Feb 21, 2026 by Kaushikkumar S. Rathva, A. Alam, S. Srinivasan, S. K. Mandal (Symposium on Field Programmable Gate Arrays)

DOI 10.1145/3748173.3779572

We built FARE, a fine-grained pipelined, reconfigurable FlashAttention kernel on FPGA that splits attention into parallel stages to slash memory bottlenecks and control the datapath end-to-end. It’s fun because you get GPU-class attention latency tradeoffs with tunable hardware, up to 2.9× speedups and massive resource savings on a Zynq, which makes FPGAs a practical, energy-friendly platform for scaled transformers.

source S2, crossref

dgfl, 2026