of AI hype, it looks like everyone seems to be using and enormous for each problem in Computer Vision. Many individuals see these tools as one-size-fits-all solutions and immediately use the newest, shiniest model as a substitute of understanding the underlying signal they wish to extract. But oftentimes there’s beauty to simplicity. It’s one of the vital vital lessons I’ve learned as an engineer: don’t overcomplicate solutions to easy problems.

Let me show you a practical application of some easy classical Computer Vision techniques to detect rectangular objects on flat surfaces and apply a perspective transformation to rework the skewed rectangle. Similar methods are widely used, for instance, in document scanning and extraction applications.
Along the way in which you’ll learn some interesting concepts from standard classical Computer Vision techniques to methods to order polygon points and why this is said to a combinatoric project problem.
- Detection
- Grayscale
- Edge Detection
- Dilation
- Contour Detection
- Perspective Transformation
- Variant A: Easy sort based on sum/diff
- Variant B: Task Optimization Problem
- Variant C: Cyclic sorting with anchor
- Applying the Perspective Transformation
- Conclusion
Detection
To detect Sudoku grids I considered many alternative approaches starting from easy thresholding, hough line transformations or some type of edge detection to training a deep learning model for segmentation or keypoint detection.
Let’s define some to scope the issue:
- The Sudoku grid is clearly and fully visible within the frame with a transparent quadrilateral border, with strong contrast from the background.
- The surface on which the Sudoku grid is printed must be flat, but could be captured from an angle and appear skewed or rotated.

I’ll show you an easy pipeline with some filtering steps to detect the bounds of our Sudoku grid. On a high level, the processing pipeline looks as follows:


Grayscale
In this primary step we simply convert the input image from its three color channels to a single channel grayscale image, as we don’t need any color information to process these images.
def find_sudoku_grid(
image: np.ndarray,
) -> np.ndarray | None:
"""
Finds the most important square-like contour in a picture, likely the Sudoku grid.
Returns:
The contour of the found grid as a numpy array, or None if not found.
"""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Edge Detection
After converting the image to grayscale we are able to use the Canny edge detection algorithm to extract edges. There are two thresholds to decide on for this algorithm that determine if pixels are accepted as edges:

In our case of detecting Sudoku grids, we assume very strong edges on the border lines of our grid. We will select a high upper threshold to reject noise from appearing in our mask, and a lower threshold not too low to reject small noisy edges connected to the foremost border from showing up in our mask.
A blur filter is usually used before passing images to Canny to scale back noise, but on this case the sides are very strong but narrow, hence the blur is omitted.
def find_sudoku_grid(
image: np.ndarray,
canny_threshold_1: int = 100,
canny_threshold_2: int = 255,
) -> np.ndarray | None:
"""
Finds the most important square-like contour in a picture, likely the Sudoku grid.
Args:
image: The input image.
canny_threshold_1: Lower threshold for the Canny edge detector.
canny_threshold_2: Upper threshold for the Canny edge detector.
Returns:
The contour of the found grid as a numpy array, or None if not found.
"""
...
canny = cv2.Canny(gray, threshold1=canny_threshold_1, threshold2=canny_threshold_2)

Dilation
On this next step, we post-process the sting detection mask with a dilation kernel to shut small gaps within the mask.
def find_sudoku_grid(
image: np.ndarray,
canny_threshold_1: int = 100,
canny_threshold_2: int = 255,
morph_kernel_size: int = 3,
) -> np.ndarray | None:
"""
Finds the most important square-like contour in a picture, likely the Sudoku grid.
Args:
image: The input image.
canny_threshold_1: First threshold for the Canny edge detector.
canny_threshold_2: Second threshold for the Canny edge detector.
morph_kernel_size: Size of the morphological operation kernel.
Returns:
The contour of the found grid as a numpy array, or None if not found.
"""
...
kernel = cv2.getStructuringElement(
shape=cv2.MORPH_RECT, ksize=(morph_kernel_size, morph_kernel_size)
)
mask = cv2.morphologyEx(canny, op=cv2.MORPH_DILATE, kernel=kernel, iterations=1)

Contour Detection
Now that the binary mask is prepared, we are able to run a contour detection algorithm to search out coherent blobs and filter right down to a single contour with 4 points.
contours, _ = cv2.findContours(
mask, mode=cv2.RETR_EXTERNAL, method=cv2.CHAIN_APPROX_SIMPLE
)

This initial contour detection will return an inventory of contours that contain each pixel that is a component of the contour. We will use the Douglas–Peucker algorithm to iteratively reduce the variety of points within the contour and approximate the contour with an easy polygon. We will select a minimum distance between points for the algorithm.


If we assume that even for a few of the most skewed rectangle, the shortest side is no less than 10% of the circumference of the form, we are able to filter the contours right down to polygons with exactly 4 points.
contour_candidates: list[np.ndarray] = []
for cnt in contours:
# Approximate the contour to a polygon
epsilon = 0.1 * cv2.arcLength(curve=cnt, closed=True)
approx = cv2.approxPolyDP(curve=cnt, epsilon=epsilon, closed=True)
# Keep only polygons with 4 vertices
if len(approx) == 4:
contour_candidates.append(approx)
Finally we take the most important detected contour, presumably the ultimate Sudoku grid. We sort the contours by area in reverse order after which take the primary element, corresponding to the most important contour area.
best_contour = sorted(contour_candidates, key=cv2.contourArea, reverse=True)[0]

Perspective Transformation
Finally we’d like to rework the detected grid back to its square. To realize this, we are able to use a perspective transformation. The transformation matrix could be calculated by specifying where the 4 points of our Sudoku grid contour must be placed in the long run: the 4 corners of the image.
rect_dst = np.array(
[[0, 0], [width - 1, 0], [width - 1, height - 1], [0, height - 1]],
)

To match the contour points to the corners, they must be ordered first, in order that they could be assigned accurately. Let’s define the next order for our corner points:

Variant A: Easy sort based on sum/diff
To sort the extracted corners and assign them to those goal points, an easy algorithm could have a look at the sum
and differences
of the x
and y
coordinates for every corner.
p_sum = p_x + p_y
p_diff = p_x - p_y
Based on these values, it’s now possible to distinguish the corners:
- The highest left corner has each a small x and y value, it has the smallest sum
argmin(p_sum)
- Bottom right corner has the most important sum
argmax(p_sum)
- Top right corner has the most important diff
argmax(p_diff)
- Bottom left corner has the smallest difference
argmin(p_diff)
In the next animation, I attempted to visualise this project of the 4 corners of a rotating square. The coloured lines represent the respective image corner assigned to every square corner.

def order_points(pts: np.ndarray) -> np.ndarray:
"""
Orders the 4 corner points of a contour in a consistent
top-left, top-right, bottom-right, bottom-left sequence.
Args:
pts: A numpy array of shape (4, 2) representing the 4 corners.
Returns:
A numpy array of shape (4, 2) with the points ordered.
"""
# Reshape from (4, 1, 2) to (4, 2) if needed
pts = pts.reshape(4, 2)
rect = np.zeros((4, 2), dtype=np.float32)
# The highest-left point may have the smallest sum, whereas
# the bottom-right point may have the most important sum
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
# The highest-right point may have the smallest difference,
# whereas the bottom-left may have the most important difference
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect
This works well unless the rectangle is heavily skewed, like the next one. On this case, you may clearly see that this method is flawed, as there the identical rectangle corner is assigned multiple image corners.

Variant B: Task Optimization Problem
One other approach could be to attenuate the distances between each point and its assigned corner. This could be implemented using a pairwise_distances
calculation between each point and the corners and the linear_sum_assignment
function from scipy, which solves the project problem while minimizing a price function.
def order_points_simplified(pts: np.ndarray) -> np.ndarray:
"""
Orders a set of points to best match a goal set of corner points.
Args:
pts: A numpy array of shape (N, 2) representing the points to order.
Returns:
A numpy array of shape (N, 2) with the points ordered.
"""
# Reshape from (N, 1, 2) to (N, 2) if needed
pts = pts.reshape(-1, 2)
# Calculate the gap between each point and every goal corner
D = pairwise_distances(pts, pts_corner)
# Find the optimal one-to-one project
# row_ind[i] must be matched with col_ind[i]
row_ind, col_ind = linear_sum_assignment(D)
# Create an empty array to carry the sorted points
ordered_pts = np.zeros_like(pts)
# Place each point in the proper slot based on the corner it was matched to.
# For instance, the purpose matched to target_corners[0] goes into ordered_pts[0].
ordered_pts[col_ind] = pts[row_ind]
return ordered_pts

Regardless that this solution works, it will not be ideal, because it relies on the image distance between the form points and the corners and it’s computationally costlier because a distance matrix needs to be constructed. In fact here within the case of 4 points assigned that is negligible, but this solution wouldn’t be well suited to a polygon with many points!
Variant C: Cyclic sorting with anchor
This third variant is a really lightweight and efficient option to sort and assign the points of the form to the image corners. The thought is to calculate an angle for every point of the form based on the centroid position.

Because the angles are , we’d like to decide on an anchor to ensure absolutely the order of the points. We simply select the purpose with the bottom sum of x and y.
def order_points(self, pts: np.ndarray) -> np.ndarray:
"""
Orders points by angle across the centroid, then rotates to start out from top-left.
Args:
pts: A numpy array of shape (4, 2).
Returns:
A numpy array of shape (4, 2) with points ordered."""
pts = pts.reshape(4, 2)
center = pts.mean(axis=0)
angles = np.arctan2(pts[:, 1] - center[1], pts[:, 0] - center[0])
pts_cyclic = pts[np.argsort(angles)]
sum_of_coords = pts_cyclic.sum(axis=1)
top_left_idx = np.argmin(sum_of_coords)
return np.roll(pts_cyclic, -top_left_idx, axis=0)

We will now use this function to sort our contour points:
rect_src = order_points(grid_contour)
Applying the Perspective Transformation
Now that we all know which points must go where, we are able to finally move on to probably the most interesting part: creating and truly applying the angle transformation to the image.

Since we have already got our list of points for the detected quadrilateral sorted in rect_src
, and we now have our goal corner points in rect_dst
, we are able to use the OpenCV method for calculating the transformation matrix:
warp_mat = cv2.getPerspectiveTransform(rect_src, rect_dst)
The result’s a , defining methods to transform from a skewed 3D perspective view to a 2D flat top-down view. To get this flat top-down view of our Sudoku grid, we are able to apply this attitude transformation to our original image:
warped = cv2.warpPerspective(img, warp_mat, (side_len, side_len))
And voilà , we now have our perfectly square Sudoku grid!

Conclusion
On this project we walked through an easy pipeline using classical Computer Vision techniques to extract Sudoku grids from pictures. These methods provide an easy option to detect the bounds of the Sudoku grids. In fact as a consequence of its simplicity there are some limitations to how well this approach generalizes to different settings and extreme environments reminiscent of low light or hard shadows. Using a deep-learning based approach could make sense if the detection must generalize to an enormous amount of various settings.
Next, a perspective transformation is used to get a flat top-down view of the grid. This image can now be utilized in further processing, reminiscent of extracting the numbers within the grid and truly solving the Sudoku. In a next article we are going to look further into these natural next steps on this project.
Take a look at the source code of the project below and let me know if you might have any questions or thoughts on this project. Until then, completely satisfied coding!
For more details and the total implementation including the code for the all of the animations and visualizations, take a look at the source code of this project on my GitHub:
https://github.com/trflorian/sudoku-extraction
All visualizations on this post were created by the creator.