What LayerNorm really does for Attention in Transformers 2 things, not 1…

-

What LayerNorm really does for Attention in Transformers2 things, not 1… Normalization via LayerNorm has been part and parcel of the Transformer architecture for a while. In the event you asked most AI practitioners why we now have LayerNorm, the generic answer can be that we use LayerNorm to normalize the activations on the forward pass and gradients on the backward. But that default…

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

38 COMMENTS

0 0 votes
Article Rating
guest
38 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

38
0
Would love your thoughts, please comment.x
()
x