← all tags

Linear Attention

Linear attention and Performers: approximating the softmax kernel with random features so the N×N attention matrix never forms.

2 posts tagged #linear-attention.