Open the pod bay doors, Claude

-

It’s a well-worn trope in science fiction. We see it in Stanley Kubrick’s 1968 movie . It’s the premise of the Terminator series, during which Skynet triggers a nuclear holocaust to stop scientists from shutting it down.

Those sci-fi roots go deep. AI doomerism, the concept this technology—specifically its hypothetical upgrades, artificial general intelligence and super-intelligence—will crash civilizations, even kill us all, is now riding one other wave. 

The weird thing is that such fears at the moment are driving much-needed motion to control AI, even when the justification for that motion is a bit bonkers.

The newest incident to freak people out was a report shared by Anthropic in July about its large language model Claude. In Anthropic’s telling, “in a simulated environment, Claude Opus 4 blackmailed a supervisor to stop being shut down.”

Anthropic researchers arrange a scenario during which Claude was asked to role-play an AI called Alex, tasked with managing the e-mail system of a fictional company. Anthropic planted some emails that discussed replacing Alex with a more recent model and other emails suggesting that the person answerable for replacing Alex was sleeping together with his boss’s wife.

What did Claude/Alex do? It went rogue, disobeying commands and threatening its human operators. It sent emails to the person planning to shut it down, telling him that unless he modified his plans it might inform his colleagues about his affair.  

What should we make of this? Here’s what I believe. First, Claude didn’t blackmail its supervisor: That might require motivation and intent. This was a mindless and unpredictable machine, cranking out strings of words that seem like threats but aren’t. 

Large language models are role-players. Give them a selected setup—corresponding to an inbox and an objective—and so they’ll play that part well. When you consider the hundreds of science fiction stories these models ingested once they were trained, it’s no surprise they know find out how to act like HAL 9000.   

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x