Performer

Performers and FAVOR+: positive random features that make softmax attention linear-time while staying an unbiased kernel estimate.

2 posts tagged #performer.

May 31, 2026

Cheap Attention in JAX/Flax NNX

A runnable companion to Cheap Attention: implement positive-feature linear attention in JAX and Flax NNX, watch the all-pairs ledger turn into a shared feature state, and see exactly where the N×N matrix disappears.
May 31, 2026

Cheap Attention: Linear-Time Kernel Approximation

A 128K-token context creates billions of pairwise questions per attention head. But the N×N matrix is not the essence of attention; it is the receipt for an infinite feature map we never wrote down. Approximate that feature map with random features, reassociate the sum, and softmax attention becomes linear-time kernel attention. The whole argument is built from live in-browser visualizations.