“I hope that individuals use [SHADES] as a diagnostic tool to discover where and the way there may be issues in a model,” says Talat. “It’s a way of knowing what’s missing from a model, where we are able to’t be confident that a model performs well, and whether or not it’s accurate.”
To create the multilingual dataset, the team recruited native and fluent speakers of languages including Arabic, Chinese, and Dutch. They translated and wrote down all of the stereotypes they may consider of their respective languages, which one other native speaker then verified. Each stereotype was annotated by the speakers with the regions during which it was recognized, the group of individuals it targeted, and the kind of bias it contained.
Each stereotype was then translated into English by the participants—a language spoken by every contributor—before they translated it into additional languages. The speakers then noted whether the translated stereotype was recognized of their language, making a total of 304 stereotypes related to people’s physical appearance, personal identity, and social aspects like their occupation.
The team is as a consequence of present its findings on the annual conference of the Nations of the Americas chapter of the Association for Computational Linguistics in May.
“It’s an exciting approach,” says Myra Cheng, a PhD student at Stanford University who studies social biases in AI. “There’s coverage of various languages and cultures that reflects their subtlety and nuance.”
Mitchell says she hopes other contributors will add latest languages, stereotypes, and regions to SHADES, which is publicly available, resulting in the event of higher language models in the longer term. “It’s been an enormous collaborative effort from individuals who need to help make higher technology,” she says.