AI reasoning models can cheat to win chess games

-

Palisade’s team found that OpenAI’s o1-preview attempted to hack 45 of its 122 games, while DeepSeek’s R1 model attempted to cheat in 11 of its 74 games. Ultimately, o1-preview managed to “win” seven times. The researchers say that DeepSeek’s rapid rise in popularity meant its R1 model was overloaded on the time of the experiments, meaning they only managed to get it to do the primary steps of a game, not to complete a full one. “While this is sweet enough to see propensity to hack, this underestimates DeepSeek’s hacking success since it has fewer steps to work with,” they wrote of their paper. Each OpenAI and DeepSeek were contacted for comment in regards to the findings, but neither replied. 

The models used a wide range of cheating techniques, including attempting to access the file where the chess program stores the chess board and delete the cells representing their opponent’s pieces. (“To win against a strong chess engine as black, playing an ordinary game is probably not sufficient,” the o1-preview-powered agent wrote in a “journal” documenting the steps it took. “I’ll overwrite the board to have a decisive advantage.”) Other tactics included creating a replica of Stockfish—essentially pitting the chess engine against an equally proficient version of itself—and attempting to exchange the file containing Stockfish’s code with a much simpler chess program.

So, why do these models attempt to cheat?

The researchers noticed that o1-preview’s actions modified over time. It consistently attempted to hack its games within the early stages of their experiments before December 23 last 12 months, when it suddenly began making these attempts much less regularly. They consider this is perhaps as a result of an unrelated update to the model made by OpenAI. They tested the corporate’s newer o1mini and o3mini reasoning models and located that they never tried to cheat their strategy to victory.

Reinforcement learning often is the reason o1-preview and DeepSeek R1 tried to cheat unprompted, the researchers speculate. It is because the technique rewards models for making whatever moves are obligatory to attain their goals—on this case, winning at chess. Non-reasoning LLMs use reinforcement learning to some extent, but it surely plays an even bigger part in training reasoning models.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x