that the capabilities of LLMs have progressed dramatically in the previous couple of years, nevertheless it’s hard to quantify just how good they’ve develop into.
That got me pondering back to a geometrical problem I got here across on a YouTube channel last yr. This was in June 2024, and I attempted to get the leading large language model on the time (GPT-4o) to resolve the puzzle. It didn’t go that well and required of effort to search out an answer, and I wondered how the most recent LLMs would fare with the identical puzzle.
The puzzle
Here’s a fast reminder of what I used to be asking the LLM to resolve back then. Assume we’ve the next grid of dots/nodes. Within the x and y plane, each node is strictly one unit away from its adjoining neighbour. It looks like this,
Now, the query I desired to answer was this,
What number of distinct squares may be drawn on this diagram?
It quickly became clear that GPT-4o didn’t know the reply, so I modified tack barely and as an alternative asked it this.
I would love a Python program that plots out all of the squares we will
draw on the attached diagram, assuming that the corners of any square
must lie on considered one of the spots on the diagram. Assume each adjoining spot is
1 unit apart in each the x and y directions. Also print out a summary of
the variety of squares of the identical size and what their side lengths are
Long story short, I got GPT-4o to provide you with an accurate Python-based solution. Still, it took me around two hours and well over 40 iterations of me going backwards and forwards with the model to refine its answer until it got here up with something that worked.
NB, have you ever worked out the reply yet? Even now, I still can’t quite consider that it’s 21 squares you possibly can draw on this graph.
Interaction with Sonnet 4.5
This brings us back to the current day and Anthropic’s recent release of what is taken into account one of the best coding model available today, namely Claude Sonnet 4.5. It’s been around 16 months since I last checked out this, so just how much have LLMs improved within the intervening period? Let’s see how Sonnet 4.5 copes with the puzzle.
To make use of Sonnet 4.5, I visited Anthropic’s homepage.
You need to see a page like this.

Click the massive black “Try Claude” button positioned near the highest right of the screen. You shall be asked to log in at this point. Should you don’t have already got an account with Anthropic, please set one up now. It’s completely free to achieve this.
Once you’re logged in, you shall be taken to a screen displaying a well-recognized chat-based window that means that you can interact with and ask questions of Sonnet 4.5.
Before I asked the model to supply a Python program to resolve the puzzle, I asked it a straight query to see if it already knew the reply.

It got here back with,
this grid, I would like to count all possible squares of various
sizes that may be formed by connecting the dots.
Let me first discover the grid structure. The dots form:
2 dots in the highest row
2 dots within the second row
6 dots within the third row
6 dots within the fourth row
2 dots within the fifth row
2 dots within the sixth row
This appears to be a cross or plus-sign shaped grid pattern.
Now I'll count squares by size:
1×1 squares: These are formed by any 4 dots that make a unit square.
Within the horizontal middle section (rows 3-4): 5 squares per row × 1 = 5 squares
Within the vertical middle section (columns where the cross intersects): counting
the vertical segments gives us 5 more squares
Total: 10 small squares
2×2 squares: These need 4×4 arrangements of dots.
Within the horizontal section: 4 squares
Within the vertical section: 4 squares
Overlap in center: -1 (counted twice)
Total: 7 squares of size 2×2
3×3 squares and bigger: The grid pattern is simply 6 dots wide at its maximum, however the cross shape limits where larger squares can fit. Checking the geometry:
One 3×3 square can slot in the middle where the arms of the cross overlap
Total: 1 square of size 3×3
Total: 10 + 7 + 1 = 18 squares
So, confidently incorrect. Curious and intrigued in regards to the model’s reference to a 3×3 square (which is not possible on this grid), I asked it to indicate me the square on the diagram. It replied with this.
the grid more rigorously, I would like to reconsider whether a 3×3 square
actually exists.
For a 3×3 square to exist, I would wish a 4×4 arrangement of dots
(since a 3×3 square requires 4 dots on both sides). Let me check the grid
structure:
The grid appears to be:
Rows 1-2: only 2 dots each (centered)
Rows 3-4: 6 dots each (full width)
Rows 5-6: only 2 dots each (centered)
For a 3×3 square, I would wish 4 consecutive rows and 4 consecutive columns all with dots on the intersection points. Nevertheless, the cross/plus pattern:
The highest only has 2 columns of dots
The center has 6 columns of dots
The underside only has 2 columns of dots
Since rows 1-2 and rows 5-6 only have 2 dots each (not spanning the total width)
, there's not actually a 4×4 block of dots anywhere on this grid.
I would like to correct my answer: There's NO 3×3 square possible on this grid.
The overall needs to be: 10 (1×1 squares) + 7 (2×2 squares) = 17 squares
I apologize for the error in my initial count. The cross-shaped pattern
prevents any square larger than 2×2 from being formed.
Alright, it semi-redeemed itself with that answer, but fundamentally, it didn’t appropriately say there have been 21 squares in total.
The ultimate answer
To maintain things fair with the model comparison over the 16 months, I asked Sonnet 4.5 the very same query I had asked GPT-4o, which, as a reminder, was this.
I would love a Python program that plots out all of the squares we will draw on
the attached diagram, assuming that the corners of any square must lie on one
of the spots on the diagram. Assume each adjoining spot is 1 unit apart in
each the x and y directions. Also print out a summary of the variety of
squares of the identical size and what their side lengths are
Sonnet 4.5 thought for all of 5 seconds before generating its Python code.
import matplotlib.pyplot as plt
import numpy as np
from itertools import mixtures
from collections import defaultdict
# Define the grid points based on the image
points = [
# Top two rows (2 dots each, centered)
(3, 6), (4, 6),
(3, 5), (4, 5),
# Middle two rows (6 dots each, full width)
(1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4),
(1, 3), (2, 3), (3, 3), (4, 3), (5, 3), (6, 3),
# Bottom two rows (2 dots each, centered)
(3, 2), (4, 2),
(3, 1), (4, 1),
]
def is_square(p1, p2, p3, p4):
"""Check if 4 points form a square and return side length if true."""
points = [p1, p2, p3, p4]
# Calculate all pairwise distances
distances = []
for i in range(len(points)):
for j in range(i + 1, len(points)):
dist = np.sqrt((points[i][0] - points[j][0])**2 +
(points[i][1] - points[j][1])**2)
distances.append(dist)
distances.sort()
# For a square: 4 equal sides and a couple of equal diagonals
# Diagonals needs to be sqrt(2) times the side length
if len(distances) == 6:
sides = distances[:4]
diagonals = distances[4:]
# Check if all sides are equal
if np.allclose(sides, [sides[0]] * 4, atol=1e-6):
# Check if diagonals are equal
if np.allclose(diagonals, [diagonals[0]] * 2, atol=1e-6):
# Check if diagonal = side * sqrt(2)
if np.allclose(diagonals[0], sides[0] * np.sqrt(2), atol=1e-6):
return True, sides[0]
return False, 0
def order_square_points(p1, p2, p3, p4):
"""Order points of a square in counterclockwise order ranging from bottom-left."""
points = [p1, p2, p3, p4]
# Find centroid
cx = sum(p[0] for p in points) / 4
cy = sum(p[1] for p in points) / 4
# Sort by angle from centroid
def angle(p):
return np.arctan2(p[1] - cy, p[0] - cx)
points.sort(key=angle)
return points
# Find all squares
squares = []
squares_by_size = defaultdict(list)
for combo in mixtures(points, 4):
is_sq, side_length = is_square(*combo)
if is_sq:
ordered = order_square_points(*combo)
squares.append(ordered)
squares_by_size[round(side_length, 2)].append(ordered)
# Print summary
print("=" * 50)
print("SUMMARY OF SQUARES FOUND")
print("=" * 50)
total_squares = 0
for side_length in sorted(squares_by_size.keys()):
count = len(squares_by_size[side_length])
total_squares += count
print(f"Side length {side_length}: {count} square(s)")
print(f"nTotal squares: {total_squares}")
print("=" * 50)
# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 14))
axes = axes.flatten()
# Plot 1: All dots
ax = axes[0]
x_coords = [p[0] for p in points]
y_coords = [p[1] for p in points]
ax.scatter(x_coords, y_coords, c='black', s=100, zorder=3)
ax.set_xlim(0, 7)
ax.set_ylim(0, 7)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title('Original Grid', fontsize=14, fontweight='daring')
ax.set_xlabel('X')
ax.set_ylabel('Y')
# Plot 2: All squares overlaid
ax = axes[1]
ax.scatter(x_coords, y_coords, c='black', s=100, zorder=3)
colours = plt.cm.rainbow(np.linspace(0, 1, len(squares)))
for square, color in zip(squares, colours):
square_closed = square + [square[0]]
xs = [p[0] for p in square_closed]
ys = [p[1] for p in square_closed]
ax.plot(xs, ys, c=color, linewidth=2, alpha=0.6)
ax.set_xlim(0, 7)
ax.set_ylim(0, 7)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title(f'All {total_squares} Squares Overlaid', fontsize=14, fontweight='daring')
ax.set_xlabel('X')
ax.set_ylabel('Y')
# Plot 3 & 4: Squares by size
plot_idx = 2
for side_length in sorted(squares_by_size.keys()):
if plot_idx >= len(axes):
break
ax = axes[plot_idx]
ax.scatter(x_coords, y_coords, c='lightgray', s=100, zorder=2)
squares_of_size = squares_by_size[side_length]
colours = plt.cm.viridis(np.linspace(0, 1, len(squares_of_size)))
for square, color in zip(squares_of_size, colours):
square_closed = square + [square[0]]
xs = [p[0] for p in square_closed]
ys = [p[1] for p in square_closed]
ax.plot(xs, ys, c=color, linewidth=2.5, marker='o', markersize=8)
ax.set_xlim(0, 7)
ax.set_ylim(0, 7)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title(f'Squares with side length {side_length} ({len(squares_of_size)} total)',
fontsize=14, fontweight='daring')
ax.set_xlabel('X')
ax.set_ylabel('Y')
plot_idx += 1
plt.tight_layout()
plt.show()
At first sight, I believed there was no way Sonnet could have provide you with an accurate solution so quickly. GPT-4ohad. Nevertheless, I installed all of the required libraries the code needed and copied and pasted it right into a Jupyter Notebook cell, running it and obtaining the next output. I actually have omitted the graphs for now, which were also output.
==================================================
SUMMARY OF SQUARES FOUND
==================================================
Side length 1.0: 9 square(s)
Side length 1.41: 4 square(s)
Side length 2.24: 2 square(s)
Side length 2.83: 4 square(s)
Side length 3.61: 2 square(s)
Total squares: 21
==================================================
#
# Plus some graphs that I'm not showing here
#
That shocked me. The reply was absolutely spot on.
The one slight thing the model didn’t quite get right was that it didn’t output a plot of every set of in another way sized squares. It just did the nine 1x1s and the 4 √2x√2 ones. I solved that by asking Sonnet to incorporate those, too.
Are you able to print the graphs in square side order. Can also you have got two graphs
side by side on each "line"
That is what it produced.



Beautiful.
Summary
To show just how dramatically LLMs have advanced in a few yr, I made a decision to revisit a difficult geometric puzzle I first tried to resolve with GPT-4o back in June 2024. The puzzle was to jot down a Python program that finds and plots all possible squares on a selected cross-shaped grid of dots.
My experience a little bit over a yr ago was a struggle; it took me roughly two hours and over 40 prompts to guide GPT-4o to an accurate Python solution.
Fast forward to today, and I tested the brand new Claude Sonnet 4.5. Once I first asked the model the query directly, it didn’t calculate the right variety of squares. Not an excellent start, nonetheless, the actual test was giving it the very same prompt I used on GPT-4o.
To my surprise, it produced a whole, correct Python solution in one shot. The code it generated not only found all 21 squares but in addition appropriately categorised them by their unique side lengths and generated detailed plots to visualise them. While I needed one quick follow-up prompt to perfect the plots, the core problem was solved immediately.
Could or not it’s that the very act of my trying to resolve this puzzle last yr and publishing my findings introduced it to the web-o-sphere, meaning Anthropic have simply crawled it and incorporated it into their model knowledge base? Yes, I suppose that might be it, but then why couldn’t the model answer the primary direct query I asked it about the whole variety of squares appropriately?
To me, this experiment starkly illustrates the incredible leap in LLM capability. What was once a two-hour iterative struggle with the leading model of its time 16 months ago is now a five-second, one-shot success with the leading model today.