#neural-networks
2 posts tagged #neural-networks.
-
What an MLP Knows, When It's a Kernel
The transformer MLP is illegible because its primitive does not carry a kernel. Give it one and the four objects that make attention legible follow for free — for the whole network.
-
Activations Are Bad for Geometry
ReLU, GELU, and friends factor into a layer's Jacobian as a diagonal modulation that wrecks the geometry of the data manifold. Why pointwise activations are a representational bug.