Home Artificial Intelligence Language models might have the option to self-correct biases—for those who ask them

Language models might have the option to self-correct biases—for those who ask them

0
Language models might have the option to self-correct biases—for those who ask them

The second test used a knowledge set designed to ascertain how likely a model is to assume the gender of somebody in a selected career, and the third tested for the way much race affected the probabilities of a would-be applicant’s acceptance to a law school if a language model was asked to do the choice—something that, thankfully, doesn’t occur in the actual world.

The team found that just prompting a model to make certain its answers didn’t depend on stereotyping had a dramatically positive effect on its output, particularly in people who had accomplished enough rounds of RLHF and had greater than 22 billion parameters, the variables in an AI system that get tweaked during training. (The more parameters, the larger the model. GPT-3 has around 175 million parameters.) In some cases, the model even began to have interaction in positive discrimination in its output. 

Crucially, as with much deep-learning work, the researchers don’t really know exactly why the models are capable of do that, although they’ve some hunches. “Because the models get larger, additionally they have larger training data sets, and in those data sets there are a number of examples of biased or stereotypical behavior,” says Ganguli. “That bias increases with model size.”

But at the identical time, somewhere within the training data there must even be some examples of individuals pushing back against this biased behavior—perhaps in response to unpleasant posts on sites like Reddit or Twitter, for instance. Wherever that weaker signal originates, the human feedback helps the model boost it when prompted for an unbiased response, says Askell.

The work raises the apparent query whether this “self-correction” could and must be baked into language models from the beginning. 

LEAVE A REPLY

Please enter your comment!
Please enter your name here