Google DeepMind strengthens the Frontier Safety Framework

-


We’re expanding our risk domains and refining our risk assessment process.

AI breakthroughs are transforming our on a regular basis lives, from advancing mathematics, biology and astronomy to realizing the potential of personalized education. As we construct increasingly powerful AI models, we’re committed to responsibly developing our technologies and taking an evidence-based approach to staying ahead of emerging risks.

Today, we’re publishing the third iteration of our Frontier Safety Framework (FSF) — our most comprehensive approach yet to identifying and mitigating severe risks from advanced AI models.

This update builds upon our ongoing collaborations with experts across industry, academia and government. We’ve also incorporated lessons learned from implementing previous versions and evolving best practices in frontier AI safety.

Key updates to the Framework

Addressing the risks of harmful manipulation

With this update, we’re introducing a Critical Capability Level (CCL)* focused on harmful manipulation — specifically, AI models with powerful manipulative capabilities that could possibly be misused to systematically and substantially change beliefs and behaviors in identified high stakes contexts over the course of interactions with the model, reasonably leading to additional expected harm at severe scale.

This addition builds on and operationalizes research we’ve done to discover and evaluate mechanisms that drive manipulation from generative AI. Going forward, we’ll proceed to take a position on this domain to higher understand and measure the risks related to harmful manipulation.

Adapting our approach to misalignment risks

We’ve also expanded our Framework to deal with potential future scenarios where misaligned AI models might interfere with operators’ ability to direct, modify or shut down their operations.

While our previous version of the Framework included an exploratory approach centered on instrumental reasoning CCLs (i.e., warning levels specific to when an AI model starts to think deceptively), with this update we now provide further protocols for our machine learning research and development CCLs focused on models that would speed up AI research and development to potentially destabilizing levels.

Along with the misuse risks arising from these capabilities, there are also misalignment risks stemming from a model’s potential for undirected motion at these capability levels, and the likely integration of such models into AI development and deployment processes.

To deal with risks posed by CCLs, we conduct safety case reviews prior to external launches when relevant CCLs are reached. This involves performing detailed analyses demonstrating how risks have been reduced to manageable levels. For advanced machine learning research and development CCLs, large-scale internal deployments can even pose risk, so we at the moment are expanding this approach to incorporate such deployments.

Sharpening our risk assessment process

Our Framework is designed to deal with risks in proportion to their severity. We’ve sharpened our CCL definitions specifically to discover the critical threats that warrant probably the most rigorous governance and mitigation strategies. We proceed to use safety and security mitigations before specific CCL thresholds are reached and as a part of our standard model development approach.

Lastly, on this update, we go into more detail about our risk assessment process. Constructing on our core early-warning evaluations, we describe how we conduct holistic assessments that include systematic risk identification, comprehensive analyses of model capabilities and explicit determinations of risk acceptability.

Advancing our commitment to frontier safety

This latest update to our Frontier Safety Framework represents our continued commitment to taking a scientific and evidence-based approach to tracking and staying ahead of AI risks as capabilities advance toward AGI. By expanding our risk domains and strengthening our risk assessment processes, we aim to be certain that transformative AI advantages humanity, while minimizing potential harms.

Our Framework will proceed evolving based on latest research, stakeholder input and lessons from implementation. We remain committed to working collaboratively across industry, academia and government.

The trail to helpful AGI requires not only technical breakthroughs, but additionally robust frameworks to mitigate risks along the way in which. We hope that our updated Frontier Safety Framework contributes meaningfully to this collective effort.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x