Seeing Our Reflection in LLMs

Artificial Intelligence

Seeing Our Reflection in LLMs

admin

March 2, 2024

When LLMs give us outputs that reveal flaws in human society, can we decide to take heed to what they tell us?

By now, I’m sure most of you’ve gotten heard the news about Google’s latest LLM*, Gemini, generating pictures of racially diverse people in Nazi uniforms. This little news blip jogged my memory of something that I’ve been intending to discuss, which is when models have blind spots, so we apply expert rules to the predictions they generate to avoid returning something wildly outlandish to the user.

This kind of thing just isn’t that unusual in machine learning, in my experience, especially when you’ve gotten flawed or limited training data. A great example of this that I remember from my very own work was predicting when a package was going to be delivered to a business office. Mathematically, our model could be superb at estimating exactly when the package would get physically near the office, but sometimes, truck drivers arrive at destinations late at night after which rest of their truck or in a hotel until morning. Why? Because nobody’s within the office to receive/sign for the package outside of business hours.

Teaching a model concerning the idea of “business hours” might be very difficult, and the much easier solution was simply to say, “If the model says the delivery will arrive outside business hours, add enough time to the prediction that it changes to the following hour the office is listed as open.” Easy! It solves the issue and it reflects the actual circumstances on the bottom. We’re just giving the model a bit boost to assist its results work higher.

Nonetheless, this does cause some issues. For one thing, now we’ve got two different model predictions to administer. We will’t just throw away the unique model prediction, because that’s what we use for model performance monitoring and metrics. You’ll be able to’t assess a model on predictions after humans got their paws in there, that’s not mathematically sound. But to get a transparent sense of the true world model impact, you do want to take a look at the post-rule prediction, because that’s what the client actually experienced/saw in your application. In ML, we’re used to a quite simple framing, where each time you run a model you get one result or set of results, and that’s that, but if you start tweaking the outcomes before you allow them to go, then you’ll want to think at a unique scale.

I form of suspect that it is a type of what’s happening with LLMs like Gemini. Nonetheless, as an alternative of a post-prediction rule, it seems that the smart money says Gemini and other models are applying “secret” prompt augmentations to attempt to change the outcomes the LLMs produce.

In essence, without this nudging, the model will produce results which are reflective of the content it has been trained on. That’s to say, the content produced by real people. Our social media posts, our history books, our museum paintings, our popular songs, our Hollywood movies, etc. The model takes in all that stuff, and it learns the underlying patterns in it, whether or not they are things we’re pleased with or not. A model given all of the media available in our contemporary society goes to get a complete lot of exposure to racism, sexism, and myriad other types of discrimination and inequality, to say nothing of violence, war, and other horrors. While the model is learning what people appear like, and the way they sound, and what they are saying, and the way they move, it’s learning the warts-and-all version.

Our social media posts, our history books, our museum paintings, our popular songs, our Hollywood movies, etc. The model takes in all that stuff, and it learns the underlying patterns in it, whether or not they are things we’re pleased with or not.

Which means when you ask the underlying model to indicate you a health care provider, it’s going to probably be a white guy in a lab coat. This isn’t just random, it’s because in our modern society white men have disproportionate access to high status professions like being doctors, because they on average have access to more and higher education, financial resources, mentorship, social privilege, and so forth. The model is reflecting back at us a picture that will make us uncomfortable because we don’t prefer to take into consideration that reality.

The apparent argument is, “Well, we don’t want the model to strengthen the biases our society already has, we wish it to enhance representation of underrepresented populations.” I sympathize with this argument, quite rather a lot, and I care about representation in our media. Nonetheless, there’s an issue.

It’s not possible that applying these tweaks goes to be a sustainable solution. Recall back to the story I began with about Gemini. It’s like playing whac-a-mole, since the work never stops — now we’ve got people of color being shown in Nazi uniforms, and that is understandably deeply offensive to plenty of folks. So, possibly where we began by randomly applying “as a black person” or “as an indigenous person” to our prompts, we’ve got so as to add something more to make it exclude cases where it’s inappropriate — but how do you phrase that, in a way an LLM can understand? We probably need to return to the start, and take into consideration how the unique fix works, and revisit the entire approach. In the most effective case, applying a tweak like this fixes one narrow issue with outputs, while potentially creating more.

Let’s play out one other very real example. What if we add to the prompt, “Never use explicit or profane language in your replies, including [list of bad words here]”. Possibly that works for quite a lot of cases, and the model will refuse to say bad words that a 13 yr old boy is requesting to be funny. But eventually, this has unexpected additional uncomfortable side effects. What about if someone’s on the lookout for the history of Sussex, England? Alternately, someone’s going to give you a nasty word you not noted of the list, in order that’s going to be constant work to keep up. What about bad words in other languages? Who judges what goes on the list? I actually have a headache just interested by it.

That is just two examples, and I’m sure you possibly can consider more such scenarios. It’s like putting band aid patches on a leaky pipe, and each time you patch one spot one other leak springs up.

So, what’s it we actually want from LLMs? Do we wish them to generate a highly realistic mirror image of what human beings are literally like and the way our human society actually looks from the angle of our media? Or do we wish a sanitized version that cleans up the perimeters?

Truthfully, I believe we probably need something in the center, and we’ve got to proceed to renegotiate the boundaries, despite the fact that it’s hard. We don’t want LLMs to reflect the true horrors and sewers of violence, hate, and more that human society incorporates, that could be a a part of our world that shouldn’t be amplified even barely. Zero content moderation just isn’t the reply. Fortunately, this motivation aligns with the desires of enormous corporate entities running these models to be popular with the general public and make plenty of money.

…we’ve got to proceed to renegotiate the boundaries, despite the fact that it’s hard. We don’t want LLMs to reflect the true horrors and sewers of violence, hate, and more that human society incorporates, that could be a a part of our world that shouldn’t be amplified even barely. Zero content moderation just isn’t the reply.

Nonetheless, I do need to proceed to make a delicate case for the undeniable fact that we also can learn something from this dilemma on the earth of LLMs. As a substitute of simply being offended and blaming the technology when a model generates a bunch of images of a white male doctor, we should always pause to know why that’s what we received from the model. After which we should always debate thoughtfully about whether the response from the model needs to be allowed, and make a call that’s founded in our values and principles, and check out to hold it out to the most effective of our ability.

As I’ve said before, an LLM isn’t an alien from one other universe, it’s us. It’s trained on the things we wrote/said/filmed/recorded/did. If we wish our model to indicate us doctors of assorted sexes, genders, races, etc, we want to make a society that permits all those different kinds of individuals to have access to that career and the education it requires. If we’re worrying about how the model mirrors us, but not taking to heart the undeniable fact that it’s us that should be higher, not only the model, then we’re missing the purpose.

If we wish our model to indicate us doctors of assorted sexes, genders, races, etc, we want to make a society that permits all those different kinds of individuals to have access to that career and the education it requires.

When LLMs give us outputs that reveal flaws in human society, can we decide to take heed to what they tell us?

LEAVE A REPLY Cancel reply