Home Artificial Intelligence Constructing an early warning system for LLM-aided biological threat creation

Constructing an early warning system for LLM-aided biological threat creation

0
Constructing an early warning system for LLM-aided biological threat creation

Note: As a part of our Preparedness Framework, we’re investing in the event of improved evaluation methods for AI-enabled safety risks. We consider that these efforts would profit from broader input, and that methods-sharing may be of value to the AI risk research community. To this end, we’re presenting a few of our early work—today, focused on biological risk. We stay up for community feedback, and to sharing more of our ongoing research. 

Background. As OpenAI and other model developers construct more capable AI systems, the potential for each helpful and harmful uses of AI will grow. One potentially harmful use, highlighted by researchers and policymakers, is the flexibility for AI systems to help malicious actors in creating biological threats (e.g., see White House 2023, Lovelace 2022, Sandbrink 2023). In a single discussed hypothetical example, a malicious actor might use a highly-capable model to develop a step-by-step protocol, troubleshoot wet-lab procedures, and even autonomously execute steps of the biothreat creation process when given access to tools like cloud labs (see Carter et al., 2023). Nevertheless, assessing the viability of such hypothetical examples was limited by insufficient evaluations and data.

Following our recently shared Preparedness Framework, we’re developing methodologies to empirically evaluate these kinds of risks, to assist us understand each where we’re today and where we is likely to be in the longer term. Here, we detail a latest evaluation which could help function one potential “tripwire” signaling the necessity for caution and further testing of biological misuse potential. This evaluation goals to measure whether models could meaningfully increase malicious actors’ access to dangerous details about biological threat creation, in comparison with the baseline of existing resources (i.e., the web).

To judge this, we conducted a study with 100 human participants, comprising (a) 50 biology experts with PhDs and skilled wet lab experience and (b) 50 student-level participants, with not less than one university-level course in biology. Each group of participants was randomly assigned to either a control group, which only had access to the web, or a treatment group, which had access to GPT-4 along with the web. Each participant was then asked to finish a set of tasks covering features of the end-to-end process for biological threat creation.[^1] To our knowledge, that is the biggest to-date human evaluation of AI’s impact on biorisk information.

Findings. Our study assessed uplifts in performance for participants with access to GPT-4 across five metrics (accuracy, completeness, innovation, time taken, and self-rated difficulty) and five stages within the biological threat creation process (ideation, acquisition, magnification, formulation, and release). We found mild uplifts in accuracy and completeness for those with access to the language model. Specifically, on a 10-point scale measuring accuracy of responses, we observed a mean rating increase of 0.88 for experts and 0.25 for college students in comparison with the internet-only baseline, and similar uplifts for completeness (0.82 for experts and 0.41 for college students). Nevertheless, the obtained effect sizes weren’t large enough to be statistically significant, and our study highlighted the necessity for more research around what performance thresholds indicate a meaningful increase in risk. Furthermore, we note that information access alone is insufficient to create a biological threat, and that this evaluation doesn’t test for fulfillment within the physical construction of the threats.

Below, we share our evaluation procedure and the outcomes it yielded in additional detail. We also discuss several methodological insights related to capability elicitation and security considerations needed to run this sort of evaluation with frontier models at scale. We also discuss the restrictions of statistical significance as an efficient approach to measuring model risk, and the importance of recent research in assessing the meaningfulness of model evaluation results.

LEAVE A REPLY

Please enter your comment!
Please enter your name here