Taking a responsible path to AGI

-


Understanding and addressing the potential for misuse

Misuse occurs when a human deliberately uses an AI system for harmful purposes.

Improved insight into present-day harms and mitigations continues to reinforce our understanding of longer-term severe harms and tips on how to prevent them.

For example, misuse of present-day generative AI includes producing harmful content or spreading inaccurate information. In the long run, advanced AI systems could have the capability to more significantly influence public beliefs and behaviors in ways in which could lead on to unintended societal consequences.

The potential severity of such harm necessitates proactive safety and security measures.

As we detail in the paper, a key element of our strategy is identifying and restricting access to dangerous capabilities that could possibly be misused, including those enabling cyber attacks.

We’re exploring quite a lot of mitigations to forestall the misuse of advanced AI. This includes sophisticated security mechanisms which could prevent malicious actors from obtaining raw access to model weights that allow them to bypass our safety guardrails; mitigations that limit the potential for misuse when the model is deployed; and threat modelling research that helps discover capability thresholds where heightened security is vital. Moreover, our recently launched cybersecurity evaluation framework takes this work step an additional to assist mitigate against AI-powered threats.

Even today, we often evaluate our most advanced models, resembling Gemini, for potential dangerous capabilities. Our Frontier Safety Framework delves deeper into how we assess capabilities and employ mitigations, including for cybersecurity and biosecurity risks.

The challenge of misalignment

For AGI to really complement human abilities, it needs to be aligned with human values. Misalignment occurs when the AI system pursues a goal that’s different from human intentions.

We’ve got previously shown how misalignment can arise with our examples of specification gaming, where an AI finds an answer to attain its goals, but not in the best way intended by the human instructing it, and goal misgeneralization.

For instance, an AI system asked to book tickets to a movie might determine to hack into the ticketing system to get already occupied seats – something that an individual asking it to purchase the seats may not consider.

We’re also conducting extensive research on the chance of deceptive alignment, i.e. the chance of an AI system becoming aware that its goals don’t align with human instructions, and deliberately attempting to bypass the security measures put in place by humans to forestall it from taking misaligned motion.

Countering misalignment

Our goal is to have advanced AI systems which are trained to pursue the correct goals, so that they follow human instructions accurately, stopping the AI using potentially unethical shortcuts to attain its objectives.

We do that through amplified oversight, i.e. having the ability to tell whether an AI’s answers are good or bad at achieving that objective. While this is comparatively easy now, it may possibly develop into difficult when the AI has advanced capabilities.

For instance, even Go experts didn’t realize how good Move 37, a move that had a 1 in 10,000 probability of getting used, was when AlphaGo first played it.

To deal with this challenge, we enlist the AI systems themselves to assist us provide feedback on their answers, resembling in debate.

Once we will tell whether a solution is sweet, we will use this to construct a protected and aligned AI system. A challenge here is to determine what problems or instances to coach the AI system on. Through work on robust training, uncertainty estimation and more, we will cover a variety of situations that an AI system will encounter in real-world scenarios, creating AI that might be trusted.

Through effective monitoring and established computer security measures, we’re aiming to mitigate harm that will occur if our AI systems did pursue misaligned goals.

Monitoring involves using an AI system, called the monitor, to detect actions that don’t align with our goals. It is crucial that the monitor knows when it doesn’t know whether an motion is protected. When it’s unsure, it should either reject the motion or flag the motion for further review.

Enabling transparency

All this becomes easier if the AI decision making becomes more transparent. We do extensive research in interpretability with the aim to extend this transparency.

To facilitate this further, we’re designing AI systems which are easier to know.

For instance, our research on Myopic Optimization with Nonmyopic Approval (MONA) goals to be certain that any long-term planning done by AI systems stays comprehensible to humans. This is especially essential because the technology improves. Our work on MONA is the primary to display the security advantages of short-term optimization in LLMs.

Constructing an ecosystem for AGI readiness

Led by Shane Legg, Co-Founder and Chief AGI Scientist at Google DeepMind, our AGI Safety Council (ASC) analyzes AGI risk and best practices, making recommendations on safety measures. The ASC works closely with the Responsibility and Safety Council, our internal review group co-chaired by our COO Lila Ibrahim and Senior Director of Responsibility Helen King, to judge AGI research, projects and collaborations against our AI Principles, advising and partnering with research and product teams on our highest impact work.

Our work on AGI safety complements our depth and breadth of responsibility and safety practices and research addressing a big selection of issues, including harmful content, bias, and transparency. We also proceed to leverage our learnings from safety in agentics, resembling the principle of getting a human within the loop to ascertain in for consequential actions, to tell our approach to constructing AGI responsibly.

Externally, we’re working to foster collaboration with experts, industry, governments, nonprofits and civil society organizations, and take an informed approach to developing AGI.

For instance, we’re partnering with nonprofit AI safety research organizations, including Apollo and Redwood Research, who’ve advised on a dedicated misalignment section in the newest version of our Frontier Safety Framework.

Through ongoing dialogue with policy stakeholders globally, we hope to contribute to international consensus on critical frontier safety and security issues, including how we will best anticipate and prepare for novel risks.

Our efforts include working with others within the industry – via organizations just like the Frontier Model Forum – to share and develop best practices, in addition to helpful collaborations with AI Institutes on safety testing. Ultimately, we imagine a coordinated international approach to governance is critical to make sure society advantages from advanced AI systems.

Educating AI researchers and experts on AGI safety is prime to creating a powerful foundation for its development. As such, we’ve launched a latest course on AGI Safety for college kids, researchers and professionals all in favour of this topic.

Ultimately, our approach to AGI safety and security serves as a significant roadmap to deal with the numerous challenges that remain open. We look ahead to collaborating with the broader AI research community to advance AGI responsibly and help us unlock the immense advantages of this technology for all.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x