Introduction
set is probably the most beautiful mathematical objects ever discovered, a fractal so intricate that regardless of how much you zoom in, you retain finding infinite detail. But what if we asked a neural network to learn it?
At first glance, this appears like an odd query. The Mandelbrot set is fully deterministic, there’s no data, no noise, no hidden rules. But this simplicity makes it an ideal sandbox to check how neural networks represent complex functions.
In this text, we’ll explore how an easy neural network can learn to approximate the Mandelbrot set, and the way Gaussian Fourier Features completely transform its performance, turning blurry approximations into sharp fractal boundaries.
Along the best way, we’ll dive into why vanilla multi-layer perceptrons (MLPs) struggle with high-frequency patterns (the ) and the way Fourier features solve it.
The Mandelbrot Set
The Mandelbrot set is defined over the complex plane. For every complex number (cin mathbb{C}), we consider the iterative sequence:
$$z_{n+1} = z_n^2 + c, z_0 = 0$$
If this sequence stays bounded, then (c) belongs to the Mandelbrot set.
In practice, we approximate this condition using the escape-time algorithm. The escape-time algorithm iterates the sequence as much as a set maximum variety of steps and monitors the magnitude (|z_n|). If (|z_n|) exceeds a selected escape radius (typically 2), the sequence is guaranteed to diverge, and (c) is classed as outside of the mandelbrot set. If the sequence doesn’t escape inside the maximum variety of iterations, (c) is assumed to belong to the Mandelbrot set, with the iteration count often used for visualization purposes.
Turning the Mandelbrot Set right into a Learning Problem
To coach our neural network, we want two things. First, we must define a , that’s, what the model should predict and from what inputs. Second, we want : a big collection of input-output pairs drawn from that problem.
Defining the Learning Problem
At its core, the Mandelbrot set defines a function over the complex plane. Each point (c = x +iy in mathbb{C}) is mapped to an end result: either the sequence generated by the iteration stays bounded, or it diverges. This inmediately suggests a binary classification problem, where the input is a fancy number and the output indicates whether the number is contained in the set or not.
Nevertheless, this formulation poses difficulties for learning. The boundary of the Mandelbrot set is infinitely complex, and arbitrarily small perturbations in (c) can change the classification end result. From a learning perspective, this leads to a highly discontinuous goal function, making optimization unstable and data-inefficient.
To acquire a smoother and more informative learning goal, we as an alternative reformulate the issue using the escape-time information introduced within the previous section. Slightly than predicting a binary label, the model is trained to predict a continuous variable derived from the iteration count at which the sequence escapes.
To supply a continuous goal function we don’t use the raw escape iteration count directly. The raw escape iteration count is a discrete quantity, using this might introduce discontinuities, particularly near the boundary of the Mandelbrot set, where small changes in (c) may cause large jumps within the iteration count. To handle this, we employ a smooth escape-time value, which includes the escape iteration count to provide a continuous goal. As well as, we also apply a logarithmic scaling that spreads early escape values and compresses larger ones, yielding a more balanced goal distribution.
def smooth_escape(x: float, y: float, max_iter: int = 1000) -> float:
c = complex(x, y)
z = 0j
for n in range(max_iter):
z = z*z + c
r2 = z.real*z.real + z.imag*z.imag
if r2 > 4.0:
r = math.sqrt(r2)
mu = n + 1 - math.log(math.log(r)) / math.log(2.0) # smooth
# log-scale to spread small mu
v = math.log1p(mu) / math.log1p(max_iter)
return float(np.clip(v, 0.0, 1.0))
return 1.0
With this definition, the Mandelbrot set becomes a regression problem. The neural network is trained to approximate a function $$f : mathbb{R}^2 rightarrow [0,1]$$
mapping spatial coordinates ((x, y)) within the complex plane to a smooth escape-time value.
Data Sampling Strategy
A uniform sampling of the complex plane could be highly inefficient, most points lie removed from the boundary and carry little information. To handle this, the dataset is biased towards boundary regions by oversampling and filtering points whose escape values fall inside a predefined band.
def build_boundary_biased_dataset(
n_total=800_000,
frac_boundary=0.7,
xlim=(-2.4, 1.0),
res_for_ylim=(3840, 2160),
ycenter=0.0,
max_iter=1000,
band=(0.35, 0.95),
seed=0,
):
"""
- Mixture of uniform samples + boundary-band samples.
- 'band' selects points with goal in (low, high), which tends to pay attention near boundary.
"""
rng = np.random.default_rng(seed)
ylim = compute_ylim_from_x(xlim, res_for_ylim, ycenter=ycenter)
n_boundary = int(n_total * frac_boundary)
n_uniform = n_total - n_boundary
# Uniform set
Xu = sample_uniform(n_uniform, xlim, ylim, seed=seed)
# Boundary pool: oversample, then filter by band
pool_factor = 20
pool = sample_uniform(n_boundary * pool_factor, xlim, ylim, seed=seed + 1)
yp = np.empty((pool.shape[0],), dtype=np.float32)
for i, (x, y) in enumerate(pool):
yp[i] = smooth_escape(float(x), float(y), max_iter=max_iter)
mask = (yp > band[0]) & (yp < band[1])
Xb = pool[mask]
yb = yp[mask]
if len(Xb) < n_boundary:
# If band too strict, calm down it routinely
keep = min(len(Xb), n_boundary)
print(f"[warn] Boundary band too strict; got {len(Xb)} boundary points, using {keep}.")
Xb = Xb[:keep]
yb = yb[:keep]
n_boundary = keep
n_uniform = n_total - n_boundary
Xu = sample_uniform(n_uniform, xlim, ylim, seed=seed)
else:
Xb = Xb[:n_boundary]
yb = yb[:n_boundary]
yu = np.empty((Xu.shape[0],), dtype=np.float32)
for i, (x, y) in enumerate(Xu):
yu[i] = smooth_escape(float(x), float(y), max_iter=max_iter)
X = np.concatenate([Xu, Xb], axis=0).astype(np.float32)
y = np.concatenate([yu, yb], axis=0).astype(np.float32)
# Shuffle once
perm = rng.permutation(X.shape[0])
return X[perm], y[perm], ylim
Baseline Model: A Deep Residual MLP
Our first attempt uses a deep residual MLP that takes raw Cartesian coordinates ((x, y)) as input and predicts the sleek escape value.
# Baseline model
class MLPRes(nn.Module):
def __init__(
self,
hidden_dim=256,
num_blocks=8,
act="silu",
dropout=0.0,
out_dim=1,
):
super().__init__()
activation = nn.ReLU if act.lower() == "relu" else nn.SiLU
self.in_proj = nn.Linear(2 , hidden_dim)
self.in_act = activation()
self.blocks = nn.Sequential(*[
ResidualBlock(hidden_dim, act=act, dropout=dropout)
for _ in range(num_blocks)
])
self.out_ln = nn.LayerNorm(hidden_dim)
self.out_act = activation()
self.out_proj = nn.Linear(hidden_dim, out_dim)
def forward(self, x):
x = self.in_proj(x)
x = self.in_act(x)
x = self.blocks(x)
x = self.out_act(self.out_ln(x))
return self.out_proj(x)
# Residual block
class ResidualBlock(nn.Module):
def __init__(self, dim: int, act: str = "silu", dropout: float = 0.0):
super().__init__()
activation = nn.ReLU if act.lower() == "relu" else nn.SiLU
# pre-norm-ish (LayerNorm helps lots for stability with deep residual MLPs)
self.ln1 = nn.LayerNorm(dim)
self.fc1 = nn.Linear(dim, dim)
self.ln2 = nn.LayerNorm(dim)
self.fc2 = nn.Linear(dim, dim)
self.act = activation()
self.drop = nn.Dropout(dropout) if dropout and dropout > 0 else nn.Identity()
# optional: small init for the last layer to begin near-identity
nn.init.zeros_(self.fc2.weight)
nn.init.zeros_(self.fc2.bias)
def forward(self, x):
h = self.ln1(x)
h = self.act(self.fc1(h))
h = self.drop(h)
h = self.ln2(h)
h = self.fc2(h)
return x + h
This network has ample capability: deep, residual, and trained on a big dataset with stable optimization.
Result
The worldwide shape of the Mandelbrot set is clearly recognizable. Nevertheless, tremendous details near the boundary are visibly blurred. Regions that ought to exhibit intricate fractal structure appear overly smooth, and thin filaments are either poorly defined or entirely missing.
This will not be a matter of resolution, data, or depth. So what's going flawed?
The Spectral Bias Problem
Neural networks have a well known problem called spectral bias:
they have an inclination to learn low-frequency functions first, and struggle to represent functions with rapid oscillations or tremendous detail.
The Mandelbrot boundary is dominated by highly irregular, stuffed with small-scale structures, especially near its boundary. To capture it, the network would wish to represent very high-frequency variations within the output as (x) and (y) change.
Fourier Features: Encoding Coordinates in Frequency Space
Probably the most elegant solutions to the spectral bias problem was introduced in 2020 by Tancik et al. of their paper .
The thought is to remodel the input coordinates before feeding them into the neural network. As an alternative of giving the raw ((x, y)), we pass their sinusoidal projection otno random directions in a higher-dimensional space.
Formally:
$$gamma(x)=[sin(2 pi Bx),cos(2 pi Bx)]$$
where (B in mathbb{R}^{d_{in}​×d_{feat}}​) is a random Gaussian matrix.
This mapping acts like a random Fourier basis expansion, letting the network more easily represent high-frequency details.
class GaussianFourierFeatures(nn.Module):
def __init__(self, in_dim=2, num_feats=256, sigma=5.0):
super().__init__()
B = torch.randn(in_dim, num_feats) * sigma
self.register_buffer("B", B)
def forward(self, x):
proj = (2 * torch.pi) * (x @ self.B)
return torch.cat([torch.sin(proj), torch.cos(proj)], dim=-1)
Multi-Scale Gaussian Fourier Features
A single frequency scale may not be sufficient. The Mandelbrot set exhibits structure in any respect resolutions (a particular feature of fractal geometry).
To capture this, we use multi-scale Gaussian Fourier features, combining several frequency bands:
class MultiScaleGaussianFourierFeatures(nn.Module):
def __init__(self, in_dim=2, num_feats=512, sigmas=(2.0, 6.0, 10.0), seed=0):
super().__init__()
# split features across scales
k = len(sigmas)
per = [num_feats // k] * k
per[0] += num_feats - sum(per)
Bs = []
g = torch.Generator()
g.manual_seed(seed)
for s, m in zip(sigmas, per):
B = torch.randn(in_dim, m, generator=g) * s
Bs.append(B)
self.register_buffer("B", torch.cat(Bs, dim=1))
def forward(self, x):
proj = (2 * torch.pi) * (x @ self.B)
return torch.cat([torch.sin(proj), torch.cos(proj)], dim=-1)
This effectively provides the network with a multi-resolution frequency basis, perfectly aligned with the self-similar nature of fractals.
Final Model
The ultimate model has the identical architecture because the baseline model, the one difference is that it uses the MultiScaleGaussianFourierFeatures.
class MLPFourierRes(nn.Module):
def __init__(
self,
num_feats=256,
sigma=5.0,
hidden_dim=256,
num_blocks=8,
act="silu",
dropout=0.0,
out_dim=1,
):
super().__init__()
self.ff = MultiScaleGaussianFourierFeatures(
2,
num_feats=num_feats,
sigmas=(2.0, 6.0, sigma),
seed=0
)
self.in_proj = nn.Linear(2 * num_feats, hidden_dim)
self.blocks = nn.Sequential(*[
ResidualBlock(hidden_dim, act=act, dropout=dropout)
for _ in range(num_blocks)
])
self.out_ln = nn.LayerNorm(hidden_dim)
activation = nn.ReLU if act.lower() == "relu" else nn.SiLU
self.out_act = activation()
self.out_proj = nn.Linear(hidden_dim, out_dim)
def forward(self, x):
x = self.ff(x)
x = self.in_proj(x)
x = self.blocks(x)
x = self.out_act(self.out_ln(x))
return self.out_proj(x)
Training Dynamics
Training Without Fourier Features
The model slowly learns the general shape of the Mandelbrot set, but then plateaus. Additional training fails so as to add more detail.
Training With Fourier Features
Here the coarse structures appear first, followed by progressively finer details. The model continues to refine its prediction as an alternative of plateauing.
Final Results
Each models used the identical architecture, dataset, and training procedure. The network is a deep residual multilayer perceptron (MLP) trained as a regression model on the sleek escape-time formulation.
- Dataset: 1.000.000 samples from the complex plane, with 70% of points concentrated near the fractal boundary due to the biased data sampling.
- Architecture: Residual MLP with 20 residual blocks, and a hidden dimension of 512 units.
- Activation Function: SiLU
- Training: 100 epochs, batch size 4096, Adam-based optimizer, Cosine Annealing scheduler.
The one difference between the 2 models is the representation of the input coordinates. The baseline model uses the raw Cartesian coordinates, while the second model uses the multi-scale Fourier Features for the representation.
Global View


Zoom 1 View


Zoom 2 View


Conclusions
Fractals comparable to the Mandelbrot set are an extreme example of functions dominated by high-frecuency structures. Approximating them directly from raw coordinates forces neural networks to synthesize increasingly detailed oscillations, a task for which MLPs are poorly suited.
What this text shows is that the limitation will not be architectural capability, data volume, or optimization. It's representation.
By encoding input coordinates with multi-scale Gaussian Fourier Features, we shift much of the issue’s complexity into the input space. High-frequency structure becomes explicit, allowing an otherwise peculiar neural network to approximate a function that will be too complex in its original form.
This concept extends beyond fractals. Coordinate-based neural networks are utilized in computer graphics, physics-informed learning, and singal processing. In all of those settings, the selection of input encoding may be the difference between smooth approximations and wealthy, high-detailed structure.
Note on visual assets
All images, animations, and videos shown in this text were generated by the creator from the outputs of the neural network models described above. No external fractal renderers or third-party visual assets were used. The total code used for training, rendering images, and generating animations is obtainable within the accompanying repository.
