Now we all know what OpenAI’s superalignment team has been as much as

Artificial Intelligence

Now we all know what OpenAI’s superalignment team has been as much as

admin

December 15, 2023

OpenAI’s approach to the superalignment problem.

OPENAI

The researchers indicate that the issue is tough to check because superhuman machines don’t exist. So that they used stand-ins. As a substitute of how humans could supervise superhuman machines, they checked out how GPT-2, a model that OpenAI released five years ago, could supervise GPT-4, OpenAI’s latest and strongest model. “Should you can do this, it may be evidence you can use similar techniques to have humans supervise superhuman models,” says Collin Burns, one other researcher on the superalignment team.

The team took GPT-2 and trained it to perform a handful of various tasks, including a set of chess puzzles and 22 common natural-language-processing tests that assess inference, sentiment evaluation, and so forth. They used GPT-2’s responses to those tests and puzzles to coach GPT-4 to perform the identical tasks. It’s as if a twelfth grader were taught learn how to do a task by a 3rd grader. The trick was to do it without GPT-4 taking too big a success in performance.

The outcomes were mixed. The team measured the gap in performance between GPT-4 trained on GPT-2’s best guesses and GPT-4 trained on correct answers. They found that GPT-4 trained by GPT-2 performed 20% to 70% higher than GPT-2 on the language tasks but did less well on the chess puzzles.

The proven fact that GPT-4 outdid its teacher in any respect is impressive, says team member Pavel Izmailov: “It is a really surprising and positive result.” However it fell far wanting what it could do by itself, he says. They conclude that the approach is promising but needs more work.

“It’s an interesting idea,” says Thilo Hagendorff, an AI researcher on the University of Stuttgart in Germany who works on alignment. But he thinks that GPT-2 may be too dumb to be a superb teacher. “GPT-2 tends to offer nonsensical responses to any task that’s barely complex or requires reasoning,” he says. Hagendorff would really like to know what would occur if GPT-3 were used as a substitute.

He also notes that this approach doesn’t address Sutskever’s hypothetical scenario wherein a superintelligence hides its true behavior and pretends to be aligned when it isn’t. “Future superhuman models will likely possess emergent abilities that are unknown to researchers,” says Hagendorff. “How can alignment work in these cases?”

However it is simple to indicate shortcomings, he says. He’s pleased to see OpenAI moving from speculation to experiment: “I applaud OpenAI for his or her effort.”

OpenAI now desires to recruit others to its cause. Alongside this research update, the corporate announced a latest $10 million money pot that it plans to make use of to fund people working on superalignment. It’ll offer grants of as much as $2 million to school labs, nonprofits, and individual researchers and one-year fellowships of $150,000 to graduate students. “We’re really enthusiastic about this,” says Aschenbrenner. “We actually think there’s so much that latest researchers can contribute.”

LEAVE A REPLY Cancel reply