BatchNorm

Vision Transformer with BatchNorm

How integrating BatchNorm in a normal Vision transformer architecture results in faster convergence and a more stable networkConsider d=2 — given by the highest row of graphs — ViT and ViTBNFFN are comparably impacted...

Recent posts

Popular categories

ASK ANA