GPT-4 will change the world.
Last December, I played a couple of chess games against ChatGPT. These all the time ended the identical way: ChatGPT would play an accurate opening, until it forgot where its pieces were and commenced playing illegal moves, with full confidence after all. The reality is, GPT-3 doesn’t know the way to play chess. Playing a game against it reveals its true nature as a stochastic parrot that merely produces a believable-sounding answer from its training set. I wrote in December:
ChatGPT cannot play chess at a human level (yet). It’s clearly aware of the sport and in a position to accurately play mainline openings. However the moment the sport moves out of theory, ChatGPT can now not sustain. This shows that the language model doesn’t (yet) have any understanding of chess fundamentals, but merely repeats moves and phrases that commonly occur in a documented chess game.
ChatGPT’s confidence, combined with its “bending” of the principles of Chess, became something of a meme on the Chess side of the web, with Reddit posts that hit the front page and YouTube videos receiving tens of millions of views. We laughed at it. Mocked it. Used its deficiencies to justify the prevalence of humans over machines.
Then, GPT-4 arrived.
I actually have gained rating points since December. My current Chess.com ELO rating sits at 1435, which indicates an intermediate player. While GPT-4 is marketed as a major step up over GPT-3, I didn’t expect it play particularly well. So, I began a game. Here’s what happened.
And never only did I lose, I got blown off the board, checkmated in 20 moves!
I attempted to make use of the weird Polish Opening (1. b4) as an anti-GPT strategy, as there are significantly fewer games played in these positions than in the favored openings. It didn’t appear to matter: GPT-4 handled the position well and took advantage of my mistakes.
What scared me probably the most was the chatbot’s attacking style: it sacrificed a bishop to open up my king and launch an enormous attack. This can be a very different approach from traditional chess computers, and more like a choice a human player who likes to attack might make: not the perfect move by computer evaluation, but difficult for humans to defend against.
I used the Polish Opening for a second game as well.
GPT-4 starts the sport with a quite common mistake: results in the horse being kicked across the board and compelled into the best way of black’s own bishop. I see this move on a regular basis when playing human players, but I used to be expecting GPT-4 to have seen enough games to play something stronger. Or perhaps it played the move since it’s so common?
While I won an early pawn and eventually the sport, it was anything but easy. On move 27, I made a mistake that results in forced checkmate in 2, but ChatGPT missed it. The miss was very human-like as well, specializing in my attack on the knight somewhat than my weak king. The sport ended after ChatGPT lost its rook and queen for a similar attack that will have worked a couple of moves ago. Perhaps it forgot the white rook on b1?
I desired to try a game with a well-liked opening as well, so I began one with d4.
This game led to a rook endgame where ChatGPT had a kingside pawn majority. I used to be hoping GPT-4 would begin to falter at this phase of the sport, because the variety of moves played would surely mean there are not any similar games in its data set. But, to my surprise, the bot played a wonderful endgame. After I sneaked my rook to the eighth rank, the Chess engine Stockfish was evaluating my position as losing. Nevertheless, GPT-4 didn’t discover a move that will maintain its advantage, and selected to repeat moves and make a draw. Again, that is an unusual but human-like decision: GPT had more pawns than I did, thus giving it winning probabilities, but only if you happen to can find the win. A position like this could easily be lost as well if you happen to happened to lose one among the pawns.
I didn’t expect GPT-4 to give you the chance to play chess, not to say to it! ChatGPT played like a human: it lost a game by making mistakes within the opening and endgame, but won one through relentless attack. It also knew the way to handle a slower, positional game, and a good rook endgame.
GPT-4 didn’t make a single illegal move. The truth is, it corrected me the few times I imputted a move fallacious — though not at perfect accuracy either as I had to desert a couple of games because of the mistakes I made in transcribing moves.
You possibly can also say that GPT-4 was playing the sport blindfolded because it had no access to refreshing its memory of the present state of the board. This explains mistakes just like the one which won me our second game. If you happen to ranked GPT-4 against blindfolded human players, how high wouldn’t it rating? Enough to earn a FIDE title? I also played all games as white to be sure the prompt I used (which was similar to within the December post) had no effect on the outcomes.
I ended my December post with the next sentence:
It’s likely that OpenAI’s bot will give you the chance to beat me in the long run, but until then I’ll enjoy the prevalence of meat over machine!
I didn’t expect that day to return only a couple of months after.
Computers have defeated humans before, but this time is different. It took ChatGPT three months to beat me at chess. How long will it take until it will possibly beat me at programming? Probably not months. But probably not centuries either.
GPT-4 will change the world.
background music