Introducing the Frontier Safety Framework

-


Risk domains and mitigation levels

Our initial set of Critical Capability Levels relies on investigation of 4 domains: autonomy, biosecurity, cybersecurity, and machine learning research and development (R&D). Our initial research suggests the capabilities of future foundation models are most certainly to pose severe risks in these domains.

On autonomy, cybersecurity, and biosecurity, our primary goal is to evaluate the degree to which threat actors could use a model with advanced capabilities to perform harmful activities with severe consequences. For machine learning R&D, the main focus is on whether models with such capabilities would enable the spread of models with other critical capabilities, or enable rapid and unmanageable escalation of AI capabilities. As we conduct further research into these and other risk domains, we expect these CCLs to evolve and for several CCLs at higher levels or in other risk domains to be added.

To permit us to tailor the strength of the mitigations to every CCL, we’ve also outlined a set of security and deployment mitigations. Higher level security mitigations lead to greater protection against the exfiltration of model weights, and better level deployment mitigations enable tighter management of critical capabilities. These measures, nevertheless, may decelerate the speed of innovation and reduce the broad accessibility of capabilities. Striking the optimal balance between mitigating risks and fostering access and innovation is paramount to the responsible development of AI. By weighing the general advantages against the risks and bearing in mind the context of model development and deployment, we aim to make sure responsible AI progress that unlocks transformative potential while safeguarding against unintended consequences.

Investing within the science

The research underlying the Framework is nascent and progressing quickly. We’ve invested significantly in our Frontier Safety Team, which coordinated the cross-functional effort behind our Framework. Their remit is to progress the science of frontier risk assessment, and refine our Framework based on our improved knowledge.

The team developed an evaluation suite to evaluate risks from critical capabilities, particularly emphasising autonomous LLM agents, and road-tested it on our state-of-the-art models. Their recent paper describing these evaluations also explores mechanisms that would form a future “early warning system”. It describes technical approaches for assessing how close a model is to success at a task it currently fails to do, and in addition includes predictions about future capabilities from a team of expert forecasters.

Staying true to our AI Principles

We are going to review and evolve the Framework periodically. Specifically, as we pilot the Framework and deepen our understanding of risk domains, CCLs, and deployment contexts, we are going to proceed our work in calibrating specific mitigations to CCLs.

At the guts of our work are Google’s AI Principles, which commit us to pursuing widespread profit while mitigating risks. As our systems improve and their capabilities increase, measures just like the Frontier Safety Framework will ensure our practices proceed to satisfy these commitments.

We stay up for working with others across industry, academia, and government to develop and refine the Framework. We hope that sharing our approaches will facilitate work with others to agree on standards and best practices for evaluating the security of future generations of AI models.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x