Home Artificial Intelligence What LayerNorm really does for Attention in Transformers 2 things, not 1…

What LayerNorm really does for Attention in Transformers 2 things, not 1…

10
What LayerNorm really does for Attention in Transformers
2 things, not 1…

What LayerNorm really does for Attention in Transformers2 things, not 1… Normalization via LayerNorm has been part and parcel of the Transformer architecture for a while. In the event you asked most AI practitioners why we now have LayerNorm, the generic answer can be that we use LayerNorm to normalize the activations on the forward pass and gradients on the backward. But that default…

10 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here