The Fantastic thing about Space-Filling Curves: Understanding the Hilbert Curve

0. Introduction

(SFC) are fascinating mathematical constructs with many practical applications in data science and data engineering. While they might sound abstract, they’re often hiding in plain sight—behind terms like Z-ordering or Liquid Clustering (used, for instance, in platforms like Databricks). If you happen to’ve worked with large-scale data platforms, chances are high you’ve already used SFCs without realizing it.

Despite its relevance in modern systems, information on this topic is commonly fragmented, making it difficult to bridge theory and practice. This text goals to bridge that gap, while specializing in the Hilbert curve.

My goal is to supply a condensed and accessible overview of SFCs: starting with their mathematical origins, moving through practical implementation techniques, and ending with real-world applications in data processing and optimization. It’s not the plan to interchange existing sources but fairly reference them for more detailed information. Further sources for terminology and details can be referenced throughout the text.

You would possibly ask: What’s so fascinating about curves? In any case, a daily curve is straightforward to know and doubtless not the primary topic I’d pick up a book about. But SFCs are different. They traverse every point in a continuous space, have fractal properties, and produce visually striking patterns when plotted in 2D or 3D-especially in lower iterations. So, allow us to take a better look.

(If you must start with visualization and animations directly, try my GitHub repository)

1. History and Theory of Space-Filling Curves

The study of SFCs dates back to the nineteenth century, when Georg Cantor made a groundbreaking discovery. He showed that ”two finite-dimensional smooth manifolds have the identical cardinality, no matter their dimensions.” [1]

As an example this, consider the unit interval [0, 1] ⊂ R and the unit square [0, 1]² ⊂ R². Intuitively, one might expect the square to have a bigger cardinality than the road segment. Nonetheless, Cantor demonstrated that each sets even have the identical cardinality, using his approach to interleaving decimals.

This result implies the existence of a bijection between the interval and the square, meaning there’s a one-to-one correspondence between their elements. Following Cantor’s discovery, a natural query arose: Is there also a continuous bijection between these sets? Eugen Netto answered this query within the negative.

On this context, continuity could be interpreted geometrically: a continuous mapping would allow one to “draw” the image in 2D or 3D without lifting the pen – forming a curve. This insight laid the groundwork for the later development of SFCs — curves that, while continuous, can come arbitrarily near filling a higher-dimensional
space.

2. Peano Curve: The Discovery of Space-Filling Curves

After Netto’s sobering discovery, the query arose as as to whether such a mapping, if not bijective, might be surjective. The primary one who was capable of define such a mapping was G. Peano, constructing the so-called Peano curve.

The Peano curve is defined recursively. Its domain is the unit interval [0, 1] ⊂ R, and its image lies within the unit square [0, 1]² ⊂ R². By repeatedly subdividing the interval [0, 1] into thirds, and correspondingly partitioning the square in R² right into a 3 × 3 grid, the development converges to the actual space-filling curve because the variety of iterations tends to infinity. [1]

Figure 1: Peano curve of order 1,2 and three (from left to right).
The image of the Peano curve of order 1 is copied and mirrored in higher orders. It could be observed that the essential pattern of the first-order Peano curve reappears in higher orders, but is mirrored in every second iteration. This alternating strategy of mirroring and rotating the essential element is a feature shared by other SFCs as well.
(Image from Wikipedia under public domain license, modified by creator)

Thus, the graphs of the Peano curve at finite iterations (Figure 1) don’t represent the “final” SFC. Only within the limit, because the variety of iterations of this recursive mapping approaches infinity, does the actual SFC emerge, which traverses every point in [0, 1]². Visually, on this limit, the curve would essentially appear as a filled square spanning from (0, 0) to (1, 1)

This statement raises an initially counterintuitive query: By definition, a curve is one-dimensional. While it might probably be embedded in a higher-dimensional space (n > 1), its intrinsic parameter domain stays one-dimensional. Yet, if the Peano curve passes through every point in [0, 1]² and thus completely fills the plane, can its image still be considered one-dimensional? The reply isn’t any: the image of the Peano curve has Hausdorff dimension 2. One other characteristic of an SFC is that its image has positive Jordan content (Peano-Jordan Measure). These facts could appear surprising, but it surely aligns with the properties of fractals: many such sets have Hausdorff dimensions greater than 1, and a few even non-integer Hausdorff dimensions.

3. The Hilbert Curve – Popular till today!

Although Peano was the primary to construct an SFC, a way more famous example is the Hilbert curve, defined by David Hilbert in 1891. Its definition is barely simpler and begins with a 2 x 2 grid. Just like the Peano curve, the mapping of the Hilbert curve recursively subdivides each interval in [0, 1] and every square in [0, 1]² into 4 smaller intervals/squares at each step. As with the Peano curve, the Hilbert curve converges to a real SFC within the limit because the variety of iterations approaches infinity.

Figure 2: The fundamental unit on the left (order 1) is repeated to construct higher-order Hilbert curves. Nonetheless, the crucial transformations (comparable to mirroring and rotation) are more complex than within the case of the Peano curve.

For the needs of this text, we’ll give attention to the Hilbert curve, as its properties make it a priceless tool in modern data platforms.

3.1 Formal Definition of the Hilbert Curve

Starting with the interval [0,1] because the domain of the Hilbert curve, each recursion step divides the present interval into 4 equal subintervals: is the left endpoint and the interval width, the subintervals are:

Splitting intervals in [0,1]. (Formular from [2], Image by creator)

For any chosen point in [0, 1], exactly one in every of these subintervals comprises the purpose. This interval can then be subdivided again using the identical rule, producing a finer interval that also comprises the purpose. This process could be continued infinitely, yielding an arbitrarily precise location of the purpose along the curve. The identical recursive subdivision is applied in [0, 1]² in parallel, splitting each square into 4 smaller squares:

Splitting quadrants in [0,1]². (Formular from [2], Image by creator)

General properties:

Surjective: From its recursive definition it follows that the Hilbert curve is surjective: every point in [0, 1]² is roofed within the limit. The nested intervals are compact, and adjoining intervals share boundary points (e.g., a + h/4 is each the suitable endpoint of the primary subinterval and the left endpoint of the second).
Thus the whole square is filled. The mapping, nonetheless, isn’t injective—attempts to implement bijectivity (e.g., by opening intervals) break continuity.
Continuous: This property is evident from visual representations: the curve could be drawn without lifting the pen. Formally, it might probably be established by showing that the Hilbert curve arises because the uniform limit of continuous functions, and uniform convergence preserves continuity.
Nowhere differentiable: By taking a look at graphs of the Hilbert Curve it is clear that this curve isn’t
differentiable. A proof for this property was given by H.Sagan using the difference quotient.
Locality preserving: In contrast to simpler mappings comparable to the Z-order curve, the Hilbert curve tends to preserve locality: points which might be close within the one-dimensional parameter are sometimes mapped to nearby. This aspect is crucial for applications in big data platforms.
Positive Jordan Content: Within the limit of infinitely many iterations, the image of the Hilbert curve has positive Jordan measure, meaning that it occupies a nonzero area of the plane. (Peano-Jordan Measure)
Hausdorff Dimension of two: Correspondingly, the Hilbert curve doesn’t behave like a usual one-dimensional line, but has Hausdorff dimension 2, reflecting that it fully fills the unit square.

Though, early definitions of the Hilbert Curve are approached in 2D, higher dimensions are also feasible. The algorithm we discuss in the following section works in any finite dimension.

4 Computing the Hilbert Curve With Skilling’s Algorithm

The definition of the Hilbert Curve was given in a geometrical manner without an algebraic definition for computing coordinates on a given grid, for a given point in I. It took almost 100 years after Hilbert released his idea before mathematicians thought of ways the way to compute points for a given Hilbert index. Who could blame them? In any case, for a very long time there have been no computers that would draw curves with tons of or hundreds of points. While researching I discovered multiple ways the way to compute the Hilbert curve – from complex numbers to L-Systems. While some are super extensive, others preserve the iterative approach for computing single points of the curve. What I used to be in search of was something easy:

A function that takes a Hilbert index (i.e. any numbers like 1,2,3 in 1D space) and returns its coordinates. You’ll be able to consider the Hilbert index because the variety of the interval from left to right for Hilbert Curve of order < infinity.
A function that does the inverse, mapping a coordinate back to its Hilbert index.

While searching the web for possible implementations I got here across a Github repository of Princeton University implementing the algorithm of John Skilling, that was published in a paper from 2004 called Programming the Hilbert Curve. Unfortunately, this paper isn’t freely available for the general public, so I made a decision to investigate the code from the Princeton repository.

4.1 Skilling’s Algorithm – Overview

Skilling observed that mapping Hilbert indices to coordinates could be expressed elegantly by way of binary operations. For instance, consider the indices 0, 1, 2, 3 in a single dimension. These correspond to the coordinates (0, 0), (1, 0), (1, 1), (0, 1) in a 2 × 2 grid. Here, the values 0, 1, 2, 3 now not represent fractional points within the unit interval (like 1/3), but as a substitute discrete interval numbers. With a 2 × 2 grid, there are exactly 4 intervals in [0, 1] and 4 corresponding squares in [0, 1]². Skilling’s algorithm generalizes this concept. It computes the mapping from a Hilbert index to its corresponding coordinate (and vice versa) in any finite dimension using binary transformations. The essential steps are:

Convert the Hilbert index from decimal to binary.
Transform the binary number into its Gray code representation.
Disentangle the Gray code right into a coordinate structure.
Apply rotations and reflections using XOR operations.
Convert the binary coordinates back to decimal

4.2 Binary Representation

To grasp why binaries are a lot better suited to computing points of the Hilbert Curve from Hilbert Indices and vice versa the next examples might help (we discuss all the pieces in 2D, however the algorithm works in any dimensional space):
The Hilbert Curve is defined on a 2×2, 4×4, 8×8, 16×16…etc. grid. (Remember the definition above and its recursive approach).
By taking a look at the numbers, one might discover that the variety of intervals grow with 2ⁿ, where n is the order of the curve. This matches perfectly with binary encoding: for an n-th order curve, we
need exactly n bits per axis to explain the grid.
Take the 4 × 4 grid (second order) for instance. Two bits per axis are sufficient:

The primary bit identifies the most important quadrant (lower left, upper left, lower right, or upper right).
The second bit specifies the position inside that quadrant.

For example, Hilbert index 2 has the binary form 0010. Interpreting this:

00 selects the lower-left quadrant.
10 selects the upper-right subquadrant inside it.

Figure 3: Mapping binaries to grid cells. The primary two bits encode the most important quadrant, the last two bits the
subquadrant. Consider the repetitive pattern of 00, 01, 10, 11 in every quadrant, forming a Hilbert curve of
order 1. (Image by creator)

Nonetheless, if we proceed this process for indices greater than 3, we encounter a challenge: the orientation of the curve changes from one quadrant to the following. Appropriately handling these rotations and reflections is strictly where Gray code and XOR operations (as in Skilling’s algorithm) change into essential.

4.3 Gray Code Representation

The subsequent step in Skilling’s algorithm is a metamorphosis from binary to Gray code. The important thing difference is that in Gray code, consecutive numbers differ in just one bit. This property is crucial: It ensures that the curve moves easily from one quadrant to the following (despite the fact that the orientation of the curve in each quadrant remains to be not correct)

By taking a look at the binary numbers and the orientation of the various sections of the curve, we are able to see that the curve remains to be not correct, but the tip of every quadrant now connects to the start of the following.

Figure 4: After transforming binary values to Gray code, the last cell of a current quadrant has the identical value
as the primary cell of the following (Image by creator)

4.4 Disentanglement of the Bits

The true “magic” of Skilling’s method begins with a reordering of the Gray-coded bits—a step called . In our 4 × 4 example, we originally interpreted the 4 bits as , bitx, bity) where the primary pair encodes the most important quadrant and the second pair the sub-quadrant. Nonetheless, for coordinate computation we’d like a structure of the shape (bitx, bitx, bity, bity) so that each one x-bits and y-bits can later be combined into the respective decimal coordinates (x, y). This step is named disentanglement of the bits.

Figure 5: Orientation of subquadrants in a 4×4 grid after Gray code disentanglement (Image by creator)

4.5 Corrective Transformations

After disentangling the bits, the ultimate step of Skilling’s algorithm is to rotate and mirror the subcurves inside each quadrant in order that they connect seamlessly into the Hilbert curve of order n.

Figure 6 illustrates this process for the 4 × 4 case. The table on the left shows how Gray-coded coordinates are converted into standard binary numbers by applying easy transformations: swaps and bit-flips.

The diagram on the suitable visualizes the effect: the upper quadrants are rotated by 180◦, the lower quadrants are mirrored along the diagonal, and in some cases (e.g. the yellow quadrant) no transformation is required in any respect.

The important thing insight is that after these corrective transformations, the coordinates are once more in standard binary form. Because of this the output of Skilling’s algorithm could be converted on to decimal coordinates within the format (x, y), without further adjustment

Figure 6: Final transformations to convert gray code to binary coordinates (Image by creator)

Skilling algorithm key transformations: Input: Gray code formatted (bitx, bitx, bity, bity) In python the format could be: [-1, ndims, nbits]. Example: The number 4 could be represented as the next list/np-array: [[01],[10]]. For the x-Dimension 1 is the least significant bit (LSB), and 0 probably the most significant bit
(MSB).

Loop from probably the most significant bit (MSB) to least significant bit (LSB)
Innerloop from highest dimension (y in 2D) to lowest dimension
: Take a look at the present bit. If 1: Flip every lower bit in dimension 0 (normally x) If 0: Swap values between
lower bits in current dimension and dimension 0 (in the event that they differ).

Step 3 could be easily computed with numpy using XOR operations. The entire strategy of flipping and swapping bits in each iteration is visualized in the next animations.

Figure 7: Creation strategy of a 2D Hilbert curve using the algorithm of John Skilling (Image by creator)

Figure 8: Creation strategy of a 3D Hilbert curve using the algorithm of John Skilling (Image by creator)

If you must analyze the algorithm in additional detail or just generate your personal animations in 2D or 3D, try my GitHub Repository

5 Applications of Space Filling Curves

After discussing theoretical facets and implementation details of the Hilbert Curve, the query arises, where it might probably be applied. Through the implementation we saw the way to transform Hilbert Indices into coordinates. For the next application, the inverse of this process is more interesting.

One priceless aspect of the Hilbert Curve is that it maps a 1D ordered set (i.e. 1,2,3…) to coordinates in an n-dimensional space. It gives an order to the points it traverses and it might probably live in vector spaces of arbitrary size. Thus, the Hilbert Curve is used for data partitioning and cluster, image compression and likewise for constructing features in machine learning, when coping with spatial data.

5.1 Data Partitioning/Clustering using SFCs

Some of the distinguished applications of SFCs is data partitioning. For instance, in Databricks, Z-ordering is predicated on the Z-curve, while liquid clustering relies on the Hilbert Curve. The rationale is easy:
the Hilbert curve preserves locality higher than the Z-curve, which is crucial when indexing and partitioning multidimensional data. In figure 9 you may see how some exemplary data points are mapped to points of the Hilbert curve, by assigning each point to at least one partition given by the curve.

Figure 9: Mapping of information to points of the Hilbert Curve. The red dashed arrows indicate some mappings
exemplarily (Image by creator)

When a question is applied to the information (e.g. SELECT * FROM table WHERE x in (1,2) and y in (2,3), all points on this range ((1,2), (1,3), (2,2), (2,3)) are converted to Hilbert indices and the system can directly retrieve all matching entries. The important thing advantage is that this mapping enables fast and versatile data retrieval. Unlike traditional indexing, the Hilbert-based partitioning adapts naturally to updates or growth within the dataset — without requiring the whole index to be recomputed.

5.2 Data Indexing: Hilbert Curve vs. Z-Curve

To focus on the sensible benefits of the Hilbert curve, I compared its performance with the Z-curve on a set of synthetic range queries.

For the experiment, I generated 100 random range queries of fixed size. For every query, I computed the Hilbert and Z-curve indices and counted the variety of clusters, while a cluster is a set of consecutive indices. For instance, if the query returned the indices [1,2,3,5,6,8,9], this could form three clusters: [1,2,3], [5,6], and [8,9].
If the information is stored in index order, clusters correspond to sequential reads, whereas gaps between clusters imply costly jumps to latest storage addresses.

Figure 10: 100 random queries for a 2D setup using Hilbert curve and Z-curve. As you may see, you may’t see anything! 😉
(Image by creator)

To quantify performance, I used two metrics:

Cluster count: Fewer clusters imply less fragmentation and fewer storage jumps.
Intra-cluster spread: The common variety of indices per cluster

The worst-case scenario could be extreme fragmentation: every point forming a cluster of its own. Figure 11 compares the performance for the Z-curve and Hilbert curve for 2, three and 4 dimensions, a question size of seven (7×7 in 2D, 7x7x7 in 3D etc.) and 6 bits per axis (i.e. 64 values per axis)

Figure 11: Comparison of Hilbert and Z curve based on variety of clusters and intra-cluster spread for two,3 and 4 dimensions. The outcomes clearly show that the Hilbert curve preserves locality a lot better than the Z-curve (Image by creator)

The outcomes clearly show that the Hilbert curve preserves locality a lot better than the Z-curve. Across all tested dimensions, queries lead to fewer clusters and thus higher intra-cluster density with Hilbert indices. In practice, this translates into more efficient data retrieval and reduced I/O costs, particularly for multidimensional range queries.

6 Beyond Space-Filling Curves

The goal of this text was for instance the elegance of SFCs and to offer a glimpse into their applications in data indexing. Nonetheless, the newest research on this field goes beyond classical SFCs.

The predominant limitation of all space-filling curves is their fixed mechanism. Once defined, their structure offers little room for adaptation to different datasets or workload patterns. In practice, this rigidity can limit performance.

To beat this, researchers comparable to Chen et al. (University of Electronic Science and Technology of China & Huawei) have proposed AdaCurve, a machine learning–based approach. As a substitute of counting on a predetermined mapping, AdaCurve trains a model to generate a one-dimensional index directly from high-dimensional data points, optimized in response to each the dataset and the query workload. [3]

This concept is very promising: while Hilbert and other SFCs offer elegant but rigid mappings, AdaCurve adapts dynamically, producing an indexing system that’s tailored to the information and queries at hand. Such adaptability could pave the way in which for significantly more efficient indexing in large-scale data platforms in the longer term.

References

[1] H. Sagan, Space-Filling Curves. Springer-Verlag, 1994.

[2] M. Bader, Space-Filling Curves – An Introduction with Applications in Scientific Computing. Springer-Verlag, 2013.

[3] X. CHEN, “Optimizing block skipping for high-dimensional data with learned adaptive curve,” SIGMOD, vol. 3, 2025. [Online]. Available:https://zheng- kai.com/paper/2025_sigmod_chen.pdf