Classical Computer Vision and Perspective Transformation for Sudoku Extraction

of AI hype, it looks like everyone seems to be using and enormous for each problem in Computer Vision. Many individuals see these tools as one-size-fits-all solutions and immediately use the newest, shiniest model as a substitute of understanding the underlying signal they wish to extract. But oftentimes there’s beauty to simplicity. It’s one of the vital vital lessons I’ve learned as an engineer: don’t overcomplicate solutions to easy problems.

Animation of processing pipeline — Processing pipeline steps animated

Let me show you a practical application of some easy classical Computer Vision techniques to detect rectangular objects on flat surfaces and apply a perspective transformation to rework the skewed rectangle. Similar methods are widely used, for instance, in document scanning and extraction applications.

Along the way in which you’ll learn some interesting concepts from standard classical Computer Vision techniques to methods to order polygon points and why this is said to a combinatoric project problem.

Detection
- Grayscale
- Edge Detection
- Dilation
- Contour Detection
Perspective Transformation
- Variant A: Easy sort based on sum/diff
- Variant B: Task Optimization Problem
- Variant C: Cyclic sorting with anchor
- Applying the Perspective Transformation
Conclusion

Detection

To detect Sudoku grids I considered many alternative approaches starting from easy thresholding, hough line transformations or some type of edge detection to training a deep learning model for segmentation or keypoint detection.

Let’s define some to scope the issue:

The Sudoku grid is clearly and fully visible within the frame with a transparent quadrilateral border, with strong contrast from the background.
The surface on which the Sudoku grid is printed must be flat, but could be captured from an angle and appear skewed or rotated.

Examples for different image qualities — Examples for various image qualities

I’ll show you an easy pipeline with some filtering steps to detect the bounds of our Sudoku grid. On a high level, the processing pipeline looks as follows:

Visualization of processing pipeline steps

Grayscale

In this primary step we simply convert the input image from its three color channels to a single channel grayscale image, as we don’t need any color information to process these images.

def find_sudoku_grid(
    image: np.ndarray,
) -> np.ndarray | None:
    """
    Finds the most important square-like contour in a picture, likely the Sudoku grid.

    Returns:
        The contour of the found grid as a numpy array, or None if not found.
    """

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Edge Detection

After converting the image to grayscale we are able to use the Canny edge detection algorithm to extract edges. There are two thresholds to decide on for this algorithm that determine if pixels are accepted as edges:

Canny edge thresholds explained — Thresholds of Canny edge detection

In our case of detecting Sudoku grids, we assume very strong edges on the border lines of our grid. We will select a high upper threshold to reject noise from appearing in our mask, and a lower threshold not too low to reject small noisy edges connected to the foremost border from showing up in our mask.

A blur filter is usually used before passing images to Canny to scale back noise, but on this case the sides are very strong but narrow, hence the blur is omitted.

def find_sudoku_grid(
    image: np.ndarray,
    canny_threshold_1: int = 100,
    canny_threshold_2: int = 255,
) -> np.ndarray | None:
    """
    Finds the most important square-like contour in a picture, likely the Sudoku grid.

    Args:
        image: The input image.
        canny_threshold_1: Lower threshold for the Canny edge detector.
        canny_threshold_2: Upper threshold for the Canny edge detector.

    Returns:
        The contour of the found grid as a numpy array, or None if not found.
    """

    ...

    canny = cv2.Canny(gray, threshold1=canny_threshold_1, threshold2=canny_threshold_2)

Dilation

On this next step, we post-process the sting detection mask with a dilation kernel to shut small gaps within the mask.

def find_sudoku_grid(
    image: np.ndarray,
    canny_threshold_1: int = 100,
    canny_threshold_2: int = 255,
    morph_kernel_size: int = 3,
) -> np.ndarray | None:
    """
    Finds the most important square-like contour in a picture, likely the Sudoku grid.

    Args:
        image: The input image.
        canny_threshold_1: First threshold for the Canny edge detector.
        canny_threshold_2: Second threshold for the Canny edge detector.
        morph_kernel_size: Size of the morphological operation kernel.

    Returns:
        The contour of the found grid as a numpy array, or None if not found.
    """

    ...

    kernel = cv2.getStructuringElement(
        shape=cv2.MORPH_RECT, ksize=(morph_kernel_size, morph_kernel_size)
    )
    mask = cv2.morphologyEx(canny, op=cv2.MORPH_DILATE, kernel=kernel, iterations=1)

Contour Detection

Now that the binary mask is prepared, we are able to run a contour detection algorithm to search out coherent blobs and filter right down to a single contour with 4 points.

contours, _ = cv2.findContours(
    mask, mode=cv2.RETR_EXTERNAL, method=cv2.CHAIN_APPROX_SIMPLE
)

This initial contour detection will return an inventory of contours that contain each pixel that is a component of the contour. We will use the Douglas–Peucker algorithm to iteratively reduce the variety of points within the contour and approximate the contour with an easy polygon. We will select a minimum distance between points for the algorithm.

If we assume that even for a few of the most skewed rectangle, the shortest side is no less than 10% of the circumference of the form, we are able to filter the contours right down to polygons with exactly 4 points.

contour_candidates: list[np.ndarray] = []
for cnt in contours:
    # Approximate the contour to a polygon
    epsilon = 0.1 * cv2.arcLength(curve=cnt, closed=True)
    approx = cv2.approxPolyDP(curve=cnt, epsilon=epsilon, closed=True)

    # Keep only polygons with 4 vertices
    if len(approx) == 4:
        contour_candidates.append(approx)

Finally we take the most important detected contour, presumably the ultimate Sudoku grid. We sort the contours by area in reverse order after which take the primary element, corresponding to the most important contour area.

best_contour = sorted(contour_candidates, key=cv2.contourArea, reverse=True)[0]

Filtered contour highlighted on original image

Perspective Transformation

Finally we’d like to rework the detected grid back to its square. To realize this, we are able to use a perspective transformation. The transformation matrix could be calculated by specifying where the 4 points of our Sudoku grid contour must be placed in the long run: the 4 corners of the image.

rect_dst = np.array(
    [[0, 0], [width - 1, 0], [width - 1, height - 1], [0, height - 1]],
)

To match the contour points to the corners, they must be ordered first, in order that they could be assigned accurately. Let’s define the next order for our corner points:

Variant A: Easy sort based on sum/diff

To sort the extracted corners and assign them to those goal points, an easy algorithm could have a look at the sum and differences of the x and y coordinates for every corner.

p_sum = p_x + p_y
p_diff = p_x - p_y

Based on these values, it’s now possible to distinguish the corners:

The highest left corner has each a small x and y value, it has the smallest sum argmin(p_sum)
Bottom right corner has the most important sum argmax(p_sum)
Top right corner has the most important diff argmax(p_diff)
Bottom left corner has the smallest difference argmin(p_diff)

In the next animation, I attempted to visualise this project of the 4 corners of a rotating square. The coloured lines represent the respective image corner assigned to every square corner.

Animation of a rotating square, each corner with a different color and lines indicating the assignment to image corners — Animation of a rotating square, each corner with a special color and contours indicating the project to image corners

def order_points(pts: np.ndarray) -> np.ndarray:
    """
    Orders the 4 corner points of a contour in a consistent
    top-left, top-right, bottom-right, bottom-left sequence.

    Args:
        pts: A numpy array of shape (4, 2) representing the 4 corners.

    Returns:
        A numpy array of shape (4, 2) with the points ordered.
    """
    # Reshape from (4, 1, 2) to (4, 2) if needed
    pts = pts.reshape(4, 2)
    rect = np.zeros((4, 2), dtype=np.float32)

    # The highest-left point may have the smallest sum, whereas
    # the bottom-right point may have the most important sum
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]

    # The highest-right point may have the smallest difference,
    # whereas the bottom-left may have the most important difference
    diff = np.diff(pts, axis=1)
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]

    return rect

This works well unless the rectangle is heavily skewed, like the next one. On this case, you may clearly see that this method is flawed, as there the identical rectangle corner is assigned multiple image corners.

Same assignment procedure fails with a skewed rotating quadrilateral shape — Same project procedure fails with a skewed rotating quadrilateral shape

Variant B: Task Optimization Problem

One other approach could be to attenuate the distances between each point and its assigned corner. This could be implemented using a pairwise_distances calculation between each point and the corners and the linear_sum_assignment function from scipy, which solves the project problem while minimizing a price function.

def order_points_simplified(pts: np.ndarray) -> np.ndarray:
    """
    Orders a set of points to best match a goal set of corner points.

    Args:
        pts: A numpy array of shape (N, 2) representing the points to order.

    Returns:
        A numpy array of shape (N, 2) with the points ordered.
    """
    # Reshape from (N, 1, 2) to (N, 2) if needed
    pts = pts.reshape(-1, 2)

    # Calculate the gap between each point and every goal corner
    D = pairwise_distances(pts, pts_corner)

    # Find the optimal one-to-one project
    # row_ind[i] must be matched with col_ind[i]
    row_ind, col_ind = linear_sum_assignment(D)

    # Create an empty array to carry the sorted points
    ordered_pts = np.zeros_like(pts)

    # Place each point in the proper slot based on the corner it was matched to.
    # For instance, the purpose matched to target_corners[0] goes into ordered_pts[0].
    ordered_pts[col_ind] = pts[row_ind]

    return ordered_pts

Animated rotating skewed quadrilaterl with corners assigned correctly to image corners — Animated rotating skewed quadrilateral with corners assigned accurately to image corners

Regardless that this solution works, it will not be ideal, because it relies on the image distance between the form points and the corners and it’s computationally costlier because a distance matrix needs to be constructed. In fact here within the case of 4 points assigned that is negligible, but this solution wouldn’t be well suited to a polygon with many points!

Variant C: Cyclic sorting with anchor

This third variant is a really lightweight and efficient option to sort and assign the points of the form to the image corners. The thought is to calculate an angle for every point of the form based on the centroid position.

Sketch of angles assigned to each corner — Sketch of angles assigned to every corner

Because the angles are , we’d like to decide on an anchor to ensure absolutely the order of the points. We simply select the purpose with the bottom sum of x and y.

def order_points(self, pts: np.ndarray) -> np.ndarray:
    """
    Orders points by angle across the centroid, then rotates to start out from top-left.

    Args:
        pts: A numpy array of shape (4, 2).

    Returns:
        A numpy array of shape (4, 2) with points ordered."""
    pts = pts.reshape(4, 2)
    center = pts.mean(axis=0)
    angles = np.arctan2(pts[:, 1] - center[1], pts[:, 0] - center[0])
    pts_cyclic = pts[np.argsort(angles)]
    sum_of_coords = pts_cyclic.sum(axis=1)
    top_left_idx = np.argmin(sum_of_coords)
    return np.roll(pts_cyclic, -top_left_idx, axis=0)

Animated rotating skewed quadrilaterl with corners assigned correctly with angle assignment method — Animated rotating skewed quadrilaterl with corners assigned accurately with angle project method

We will now use this function to sort our contour points:

rect_src = order_points(grid_contour)

Applying the Perspective Transformation

Now that we all know which points must go where, we are able to finally move on to probably the most interesting part: creating and truly applying the angle transformation to the image.

Animation of applying perspective transformation

Since we have already got our list of points for the detected quadrilateral sorted in rect_src, and we now have our goal corner points in rect_dst, we are able to use the OpenCV method for calculating the transformation matrix:

warp_mat = cv2.getPerspectiveTransform(rect_src, rect_dst)

The result’s a , defining methods to transform from a skewed 3D perspective view to a 2D flat top-down view. To get this flat top-down view of our Sudoku grid, we are able to apply this attitude transformation to our original image:

warped = cv2.warpPerspective(img, warp_mat, (side_len, side_len))

And voilà, we now have our perfectly square Sudoku grid!

Final flat top-down view of Sudoku square after perspective transformation

Conclusion

On this project we walked through an easy pipeline using classical Computer Vision techniques to extract Sudoku grids from pictures. These methods provide an easy option to detect the bounds of the Sudoku grids. In fact as a consequence of its simplicity there are some limitations to how well this approach generalizes to different settings and extreme environments reminiscent of low light or hard shadows. Using a deep-learning based approach could make sense if the detection must generalize to an enormous amount of various settings.

Next, a perspective transformation is used to get a flat top-down view of the grid. This image can now be utilized in further processing, reminiscent of extracting the numbers within the grid and truly solving the Sudoku. In a next article we are going to look further into these natural next steps on this project.

Take a look at the source code of the project below and let me know if you might have any questions or thoughts on this project. Until then, completely satisfied coding!

For more details and the total implementation including the code for the all of the animations and visualizations, take a look at the source code of this project on my GitHub:

https://github.com/trflorian/sudoku-extraction

All visualizations on this post were created by the creator.

Classical Computer Vision and Perspective Transformation for Sudoku Extraction

Detection

Grayscale

Edge Detection

Dilation

Contour Detection

Perspective Transformation

Variant A: Easy sort based on sum/diff

Variant B: Task Optimization Problem

Variant C: Cyclic sorting with anchor

Applying the Perspective Transformation

Conclusion

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Scaling innovation in manufacturing with AI

Exclusive interview with Google DeepMind CEO Demis Hassabis

Making Smarter Bets: Towards a Winning AI Strategy with Probabilistic Considering

MIT Energy Initiative conference spotlights research priorities amidst a changing energy landscape

Networking for AI: Constructing the inspiration for real-time intelligence

Classical Computer Vision and Perspective Transformation for Sudoku Extraction

Detection

Grayscale

Edge Detection

Dilation

Contour Detection

Perspective Transformation

Variant A: Easy sort based on sum/diff

Variant B: Task Optimization Problem

Variant C: Cyclic sorting with anchor

Applying the Perspective Transformation

Conclusion

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.