A hazard evaluation framework for code synthesis large language models

-

Codex, a big language model (LLM) trained on a wide range of codebases, exceeds the previous state-of-the-art in its capability to synthesize and generate code. Although Codex provides a plethora of advantages, models which will generate code on such scale have significant limitations, alignment problems, the potential to be misused, and the likelihood to extend the speed of progress in technical fields which will themselves have destabilizing impacts or have misuse potential. Yet such safety impacts are usually not yet known or remain to be explored. On this paper, we outline a hazard evaluation framework constructed at OpenAI to uncover hazards or safety risks that the deployment of models like Codex may impose technically, socially, politically, and economically. The evaluation is informed by a novel evaluation framework that determines the capability of advanced code generation techniques against the complexity and expressivity of specification prompts, and their capability to grasp and execute them relative to human ability.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

41 COMMENTS

0 0 votes
Article Rating
guest
41 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

41
0
Would love your thoughts, please comment.x
()
x