← all tags

Self Attention

Self-attention as a learned bilinear relation and a Nadaraya–Watson kernel smoother: why Q and K projections matter and how heads become readable.

4 posts tagged #self-attention.