At the start, we’d like synthetic data to work with. The info should exhibit some non-linear dependency. Let’s define it like this:
In python it’ll have the next shape:
np.random.seed(42)
X = np.random.normal(1, 4.5, 10000)
y = np.piecewise(X, [X < -2,(X >= -2) & (X < 2), X >= 2], [lambda X: 2*X + 5, lambda X: 7.3*np.sin(X), lambda X: -0.03*X**3 + 2]) + np.random.normal(0, 1, X.shape)
After visualization:
Since we’re visualizing a 3D space, our neural network will only have 2 weights. This implies the ANN will consist of a single hidden neuron. Implementing this in PyTorch is kind of intuitive:
class ANN(nn.Module):
def __init__(self, input_size, N, output_size):
super().__init__()
self.net = nn.Sequential()
self.net.add_module(name='Layer_1', module=nn.Linear(input_size, N, bias=False))
self.net.add_module(name='Tanh',module=nn.Tanh())
self.net.add_module(name='Layer_2',module=nn.Linear(N, output_size, bias=False))def forward(self, x):
return self.net(x)
Vital! Don’t forget to show off the biases in your layers, otherwise you’ll find yourself having x2 more parameters.
To construct the error surface, we first must create a grid of possible values for W1 and W2. Then, for every weight combination, we are going to update the parameters of the network and calculate the error:
W1, W2 = np.arange(-2, 2, 0.05), np.arange(-2, 2, 0.05)
LOSS = np.zeros((len(W1), len(W2)))
for i, w1 in enumerate(W1):
model.net._modules['Layer_1'].weight.data = torch.tensor([[w1]], dtype=torch.float32)for j, w2 in enumerate(W2):
model.net._modules['Layer_2'].weight.data = torch.tensor([[w2]], dtype=torch.float32)
model.eval()
total_loss = 0
with torch.no_grad():
for x, y in test_loader:
preds = model(x.reshape(-1, 1))
total_loss += loss(preds, y).item()
LOSS[i, j] = total_loss / len(test_loader)
It could take a while. In the event you make the resolution of this grid too coarse (i.e., the step size between possible weight values), you would possibly miss local minima and maxima. Remember how the educational rate is commonly schedule to diminish over time? Once we do that, absolutely the change in weight values could be as small as 1e-3 or less. A grid with a 0.5 step simply won’t capture these tremendous details of the error surface!
At this point, we don’t care in any respect concerning the quality of the trained model. Nonetheless, we do wish to concentrate to the educational rate, so let’s keep it between 1e-1 and 1e-2. We’ll simply collect the load values and errors throughout the training process and store them in separate lists:
model = ANN(1,1,1)
epochs = 25
lr = 1e-2optimizer = optim.SGD(model.parameters(),lr =lr)
model.net._modules['Layer_1'].weight.data = torch.tensor([[-1]], dtype=torch.float32)
model.net._modules['Layer_2'].weight.data = torch.tensor([[-1]], dtype=torch.float32)
errors, weights_1, weights_2 = [], [], []
model.eval()
with torch.no_grad():
total_loss = 0
for x, y in test_loader:
preds = model(x.reshape(-1,1))
error = loss(preds, y)
total_loss += error.item()
weights_1.append(model.net._modules['Layer_1'].weight.data.item())
weights_2.append(model.net._modules['Layer_2'].weight.data.item())
errors.append(total_loss / len(test_loader))
for epoch in tqdm(range(epochs)):
model.train()
for x, y in train_loader:
pred = model(x.reshape(-1,1))
error = loss(pred, y)
optimizer.zero_grad()
error.backward()
optimizer.step()
model.eval()
test_preds, true = [], []
with torch.no_grad():
total_loss = 0
for x, y in test_loader:
preds = model(x.reshape(-1,1))
error = loss(preds, y)
test_preds.append(preds)
true.append(y)
total_loss += error.item()
weights_1.append(model.net._modules['Layer_1'].weight.data.item())
weights_2.append(model.net._modules['Layer_2'].weight.data.item())
errors.append(total_loss / len(test_loader))
Finally, we will visualize the information now we have collected using plotly. The plot can have two scenes: surface and SGD trajectory. One in every of the ways to do the primary part is to create a figure with a plotly surface. After that we’ll style it slightly by updating a layout.
The second part is so simple as it’s — just use Scatter3d function and specify all three axes.
import plotly.graph_objects as go
import plotly.io as pioplotly_template = pio.templates["plotly_dark"]
fig = go.Figure(data=[go.Surface(z=LOSS, x=W1, y=W2)])
fig.update_layout(
title='Loss Surface',
scene=dict(
xaxis_title='w1',
yaxis_title='w2',
zaxis_title='Loss',
aspectmode='manual',
aspectratio=dict(x=1, y=1, z=0.5),
xaxis=dict(showgrid=False),
yaxis=dict(showgrid=False),
zaxis=dict(showgrid=False),
),
width=800,
height=800
)
fig.add_trace(go.Scatter3d(x=weights_2, y=weights_1, z=errors,
mode='lines+markers',
line=dict(color='red', width=2),
marker=dict(size=4, color='yellow') ))
fig.show()
Running it in Google Colab or locally in Jupyter Notebook will help you investigate the error surface more closely. Truthfully, I spent a buch of time just this figure:)
I’d like to see you surfaces, so please be at liberty to share it in comments. I strongly imagine that the more imperfect the surface is the more interesting it’s to analyze it!
===========================================
All my publications on Medium are free and open-access, that’s why I’d really appreciate for those who followed me here!
P.s. I’m extremely captivated with (Geo)Data Science, ML/AI and Climate Change. So if you need to work together on some project pls contact me in LinkedIn and take a look at my website!
🛰️Follow for more🛰️