Home Artificial Intelligence How should AI systems behave, and who should determine?

How should AI systems behave, and who should determine?

How should AI systems behave, and who should determine?

OpenAI’s mission is to be sure that artificial general intelligence (AGI) advantages all of humanity. We subsequently think lots in regards to the behavior of AI systems we construct within the run-up to AGI, and the best way wherein that behavior is set.

Since our launch of ChatGPT, users have shared outputs that they consider politically biased, offensive, or otherwise objectionable. In lots of cases, we expect that the concerns raised have been valid and have uncovered real limitations of our systems which we wish to deal with. We have also seen just a few misconceptions about how our systems and policies work together to shape the outputs you get from ChatGPT.

Below, we summarize:

  • How ChatGPT’s behavior is formed;
  • How we plan to enhance ChatGPT’s default behavior;
  • Our intent to permit more system customization; and
  • Our efforts to get more public input on our decision-making.

Where we’re today

Unlike abnormal software, our models are massive neural networks. Their behaviors are learned from a broad range of information, not programmed explicitly. Though not an ideal analogy, the method is more just like training a dog than to abnormal programming. An initial “pre-training” phase comes first, wherein the model learns to predict the subsequent word in a sentence, informed by its exposure to plenty of Web text (and to an enormous array of perspectives). That is followed by a second phase wherein we “fine-tune” our models to narrow down system behavior.

As of today, this process is imperfect. Sometimes the fine-tuning process falls in need of our intent (producing a protected and great tool) and the user’s intent (getting a helpful output in response to a given input). Improving our methods for aligning AI systems with human values is a top priority for our company, particularly as AI systems change into more capable.

A two step process: Pre-training and fine-tuning

The 2 principal steps involved in constructing ChatGPT work as follows:

  • First, we “pre-train” models by having them predict what comes next in an enormous dataset that accommodates parts of the Web. They could learn to finish the sentence “as an alternative of turning left, she turned ___.” By learning from billions of sentences, our models learn grammar, many facts in regards to the world, and a few reasoning abilities. In addition they learn a number of the biases present in those billions of sentences.
  • Then, we “fine-tune” these models on a more narrow dataset that we rigorously generate with human reviewers who follow guidelines that we offer them. Since we cannot predict all of the possible inputs that future users may put into our system, we don’t write detailed instructions for each input that ChatGPT will encounter. As an alternative, we outline just a few categories in the rules that our reviewers use to review and rate possible model outputs for a spread of example inputs. Then, while they’re in use, the models generalize from this reviewer feedback with a view to reply to a wide selection of specific inputs provided by a given user.

The role of reviewers and OpenAI’s policies in system development

In some cases, we may give guidance to our reviewers on a certain type of output (for instance, “don’t complete requests for illegal content”). In other cases, the guidance we share with reviewers is more high-level (for instance, “avoid taking a position on controversial topics”). Importantly, our collaboration with reviewers shouldn’t be one-and-done—it’s an ongoing relationship, wherein we learn lots from their expertise.

A big a part of the fine-tuning process is maintaining a powerful feedback loop with our reviewers, which involves weekly meetings to deal with questions they might have, or provide clarifications on our guidance. This iterative feedback process is how we train the model to be higher and higher over time.

Addressing biases

Many are rightly apprehensive about biases within the design and impact of AI systems. We’re committed to robustly addressing this issue and being transparent about each our intentions and our progress. Towards that end, we’re sharing a portion of our guidelines that pertain to political and controversial topics. Our guidelines are explicit that reviewers mustn’t favor any political group. Biases that nevertheless may emerge from the method described above are bugs, not features.

While disagreements will all the time exist, we hope sharing this blog post and these instructions will give more insight into how we view this critical aspect of such a foundational technology. It’s our belief that technology firms should be accountable for producing policies that rise up to scrutiny.

We’re all the time working to enhance the clarity of those guidelines—and based on what we have learned from the ChatGPT launch to this point, we’re going to offer clearer instructions to reviewers about potential pitfalls and challenges tied to bias, in addition to controversial figures and themes. Moreover, as a part of ongoing transparency initiatives, we’re working to share aggregated demographic details about our reviewers in a way that doesn’t violate privacy rules and norms, since that is a further source of potential bias in system outputs.

We’re currently researching methods to make the fine-tuning process more comprehensible and controllable, and are constructing on external advances comparable to rule based rewards and Constitutional AI.

Where we’re going: The constructing blocks of future systems

In pursuit of our mission, we’re committed to making sure that access to, advantages from, and influence over AI and AGI are widespread. We imagine there are at the least three constructing blocks required with a view to achieve these goals within the context of AI system behavior.

1. Improve default behavior. We wish as many users as possible to seek out our AI systems useful to them “out of the box” and to feel that our technology understands and respects their values.

Towards that end, we’re investing in research and engineering to cut back each glaring and subtle biases in how ChatGPT responds to different inputs. In some cases ChatGPT currently refuses outputs that it shouldn’t, and in some cases, it doesn’t refuse when it should. We imagine that improvement in each respects is feasible.

Moreover, now we have room for improvement in other dimensions of system behavior comparable to the system “making things up.” Feedback from users is invaluable for making these improvements.

2. Define your AI’s values, inside broad bounds. We imagine that AI ought to be a useful gizmo for individual people, and thus customizable by each user as much as limits defined by society. Subsequently, we’re developing an upgrade to ChatGPT to permit users to simply customize its behavior.

This can mean allowing system outputs that other people (ourselves included) may strongly disagree with. Striking the proper balance here might be difficult–taking customization to the intense would risk enabling malicious uses of our technology and sycophantic AIs that mindlessly amplify people’s existing beliefs.

There’ll subsequently all the time be some bounds on system behavior. The challenge is defining what those bounds are. If we attempt to make all of those determinations on our own, or if we attempt to develop a single, monolithic AI system, we might be failing within the commitment we make in our Charter to “avoid undue concentration of power.”

3. Public input on defaults and hard bounds. One option to avoid undue concentration of power is to present individuals who use or are affected by systems like ChatGPT the power to influence those systems’ rules.

We imagine that many choices about our defaults and hard bounds ought to be made collectively, and while practical implementation is a challenge, we aim to incorporate as many perspectives as possible. As a place to begin, we’ve sought external input on our technology in the shape of red teaming. We also recently began soliciting public input on AI in education (one particularly essential context wherein our technology is being deployed).

We’re within the early stages of piloting efforts to solicit public input on topics like system behavior, disclosure mechanisms (comparable to watermarking), and our deployment policies more broadly. We’re also exploring partnerships with external organizations to conduct third-party audits of our safety and policy efforts.


Combining the three constructing blocks above gives the next picture of where we’re headed:

Diagram of where we’re headed building ChatGPT

Sometimes we’ll make mistakes. Once we do, we’ll learn from them and iterate on our models and systems.

We appreciate the ChatGPT user community in addition to the broader public’s vigilance in holding us accountable, and are excited to share more about our work within the three areas above in the approaching months.

Should you are all in favour of doing research to assist achieve this vision, including but not limited to research on fairness and representation, alignment, and sociotechnical research to grasp the impact of AI on society, please apply for subsidized access to our API via the Researcher Access Program.

We’re also hiring for positions across Research, Alignment, Engineering, and more.



Please enter your comment!
Please enter your name here