training

Recent insights into training dynamics of deep classifiers

A latest study from researchers at MIT and Brown University characterizes several...

Techniques for training large neural networks

Pipeline parallelism splits a model “vertically” by layer. It’s also possible to “horizontally” split certain operations inside a layer, which is normally called Tensor Parallel training. For a lot of modern models (akin to the Transformer), the...

Efficient training of language models to fill in the center

We show that autoregressive language models can learn to infill text after we apply a simple transformation to the dataset, which simply moves a span of text from the center of a document to...

Recent posts

Popular categories

ASK ANA