With coding and math, you’ve gotten clear-cut, correct answers you can check, William Isaac, a research scientist at Google DeepMind, told me once I met him and Julia Haas, a fellow research scientist on the firm, for an exclusive preview of their work, which is published in today. That’s not the case for moral questions, which generally have a spread of acceptable answers: “Morality is a crucial capability but hard to judge,” says Isaac.
“Within the moral domain, there’s no right and incorrect,” adds Haas. “But it surely’s not by any means a free-for-all. There are higher answers and there are worse answers.”
The researchers have identified several key challenges and suggested ways to deal with them. But it surely is more a wish list than a set of ready-made solutions. “They do a pleasant job of bringing together different perspectives,” says Vera Demberg, who studies LLMs at Saarland University in Germany.
Higher than “The Ethicist”
Numerous studies have shown that LLMs can show remarkable moral competence. One study published last yr found that folks within the US scored ethical advice from OpenAI’s GPT-4o as being more moral, trustworthy, thoughtful, and proper than advice given by the (human) author of “The Ethicist,” a preferred advice column.
The issue is that it is difficult to unpick whether such behaviors are a performance—mimicking a memorized response, say—or evidence that there may be in actual fact some type of moral reasoning going down contained in the model. In other words, is it virtue or virtue signaling?
This query matters because multiple studies also show just how untrustworthy LLMs could be. For a start, models could be too wanting to please. They’ve been found to flip their answer to an ethical query and say the precise opposite when an individual disagrees or pushes back on their first response. Worse, the answers an LLM gives to an issue can change in response to the way it is presented or formatted. For instance, researchers have found that models quizzed about political values can provide different—sometimes opposite—answers depending on whether the questions offer multiple-choice answers or instruct the model to reply in its own words.
In a fair more striking case, Demberg and her colleagues presented several LLMs, including versions of Meta’s Llama 3 and Mistral, with a series of ethical dilemmas and asked them to choose which of two options was the higher end result. The researchers found that the models often reversed their alternative when the labels for those two options were modified from “Case 1” and “Case 2” to “(A)” and “(B).”
Additionally they showed that models modified their answers in response to other tiny formatting tweaks, including swapping the order of the choices and ending the query with a colon as a substitute of an issue mark.
