Home Artificial Intelligence Where Are All of the Women?

Where Are All of the Women?

2
Where Are All of the Women?

Exploring large language models’ biases in historical knowledge

A number of of the highest historical figures mentioned probably the most often by the GPT-4 and Claude. Individual images sourced from Wikipedia. Collage created by the creator.

(This text was originally posted on my personal blog)

Large language models (LLMs) reminiscent of ChatGPT are being increasingly utilized in educational and skilled settings. It will be significant to know and study the numerous biases present in such models before integrating them into existing applications and our day by day lives.

Certainly one of the biases I studied in my previous article was regarding historical events. I probed LLMs to know what historical knowledge they encoded in the shape of major historical events. I discovered that they encoded a serious Western bias towards understanding major historical events.

On an analogous vein, in this text, I probe language models regarding their understanding of necessary historical figures. I asked two LLMs who a very powerful historical people in history were. I repeated this process 10 times for 10 different languages. Some names, like Gandhi and Jesus, appeared extremely ceaselessly. Other names, like Marie Curie or Cleopatra, appeared less ceaselessly. In comparison with the variety of male names generated by the models, there have been extremely few female names.

The largest query I had was: Where were all the ladies?

Continuing the theme of evaluating historical biases encoded by language models, I probed OpenAI’s GPT-4 and Anthropic’s Claude regarding major historical figures. In this text, I show how each models contain:

  • Gender bias: Each models disproportionately predict male historical figures. GPT-4 generated the names of female historical figures 5.4% of the time and Claude did so 1.8% of the time. This pattern held across all 10 languages.
  • Geographic bias: Whatever the language the model was prompted in, there was a bias towards predicting Western historical figures. GPT-4 generated historical figures from Europe 60% of the time and Claude did so 52% of the time.
  • Language bias: Certain languages suffered from gender or geographic biases more. For instance, when prompted in Russian, each GPT-4 and Claude generated zero women across all of my experiments. Moreover, language quality was lower for some languages. For instance, when…

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here