Techniques for training large neural networks

Pipeline parallelism splits a model “vertically” by layer. It’s also possible to “horizontally” split certain operations inside a layer, which is normally called Tensor Parallel training. For a lot of modern models (akin to the Transformer), the computation bottleneck is multiplying an activation batch matrix with a big weight matrix. Matrix multiplication will be considered dot products between pairs of rows and columns; it’s possible to compute independent dot products on different GPUs, or to compute parts of every dot product on different GPUs and sum up the outcomes. With either strategy, we will slice the burden matrix into even-sized “shards”, host each shard on a special GPU, and use that shard to compute the relevant a part of the general matrix product before later communicating to mix the results.

One example is Megatron-LM, which parallelizes matrix multiplications inside the Transformer’s self-attention and MLP layers. PTD-P uses tensor, data, and pipeline parallelism; its pipeline schedule assigns multiple non-consecutive layers to every device, reducing bubble overhead at the price of more network communication.

Sometimes the input to the network will be parallelized across a dimension with a high degree of parallel computation relative to cross-communication. Sequence parallelism is one such idea, where an input sequence is split across time into multiple sub-examples, proportionally decreasing peak memory consumption by allowing the computation to proceed with more granularly-sized examples.

Techniques for training large neural networks

What are your thoughts on this topic?
Let us know in the comments below.

1 COMMENT

Share this article

Recent posts

OpenAI Unveils SearchGPT: A Recent AI-Powered Search Engine

Public Release: Kling AI Video Generator

UK declares hiring of AI staff, but criticism continues

Radical Simplicity in Data Engineering

OpenAI reveals ‘SearchGPT’

Techniques for training large neural networks

What are your thoughts on this topic? Let us know in the comments below.

1 COMMENT

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.