Home Artificial Intelligence Efficient training of language models to fill in the center

Efficient training of language models to fill in the center

1
Efficient training of language models to fill in the center

We show that autoregressive language models can learn to infill text after we apply a simple transformation to the dataset, which simply moves a span of text from the center of a document to its end. While this data augmentation has garnered much interest lately, we offer extensive evidence that training models with a big fraction of knowledge transformed in this fashion doesn’t harm the unique left-to-right generative capability, as measured by perplexity and sampling evaluations across a wide selection of scales. Given the usefulness, simplicity, and efficiency of coaching models to fill-in-the-middle (FIM), we advise that future autoregressive language models be trained with FIM by default. To this end, we run a series of ablations on key hyperparameters, akin to the info transformation frequency, the structure of the transformation, and the strategy of choosing the infill span. We use these ablations to prescribe strong default settings and best practices to coach FIM models. Now we have released our greatest infilling model trained with best practices in our API, and release our infilling benchmarks to help future research.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here