Home Artificial Intelligence Pytorch Introduction — Enter NonLinear Functions

Pytorch Introduction — Enter NonLinear Functions

Pytorch Introduction — Enter NonLinear Functions

With our data in-place, it’s time to coach our first Neural Network. We’ll use an identical architecture to what we’ve done within the last blog post of the series, using a Linear version of our Neural Network with the flexibility to handle linear patterns:

from torch import nn

class LinearModel(nn.Module):
def __init__(self):
self.layer_1 = nn.Linear(in_features=12, out_features=5)
self.layer_2 = nn.Linear(in_features=5, out_features=1)

def forward(self, x):
return self.layer_2(self.layer_1(x))

This neural network uses the nn.Linearmodule from pytorch to create a Neural Network with 1 deep layer (one input layer, a deep layer and an output layers).

Although we will create our own class inheriting from nn.Module , we can even use (more elegantly) the nn.Sequential constructor to do the identical:

model_0 = nn.Sequential(
nn.Linear(in_features=12, out_features=5),
nn.Linear(in_features=5, out_features=1)
model_0 Neural Network Architecture — Image by Writer

Cool! So our Neural Network incorporates a single inner layer with 5 neurons (this may be seen by the out_features=5 on the primary layer).

This inner layer receives the identical variety of connections from each input neuron. The 12 in in_features in the primary layer reflects the variety of features and the 1 in out_features of the second layer reflects the output (a single value raging from 0 to 1).

To coach our Neural Network, we’ll define a loss function and an optimizer. We’ll define BCEWithLogitsLoss (PyTorch 2.1 documentation) as this loss function (torch implementation of Binary Cross-Entropy, appropriate for classification problems) and Stochastic Gradient Descent because the optimizer (using torch.optim.SGD ).

# Binary Cross entropy
loss_fn = nn.BCEWithLogitsLoss()

# Stochastic Gradient Descent for Optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(),

Finally, as I’ll also need to calculate the accuracy for each epoch of coaching process, we’ll design a function to calculate that:

def compute_accuracy(y_true, y_pred):
tp_tn = torch.eq(y_true, y_pred).sum().item()
acc = (tp_tn / len(y_pred)) * 100
return acc

Time to coach our model! Let’s train our model for 1000 epochs and see how an easy linear network is in a position to cope with this data:


epochs = 1000

train_acc_ev = []
test_acc_ev = []

# Construct training and evaluation loop
for epoch in range(epochs):


y_logits = model_0(X_train).squeeze()

loss = loss_fn(y_logits,
# Calculating accuracy using predicted logists
acc = compute_accuracy(y_true=y_train,


# Training steps

# Inference mode for prediction on the test data
with torch.inference_mode():

test_logits = model_0(X_test).squeeze()
test_loss = loss_fn(test_logits,
test_acc = compute_accuracy(y_true=y_test,

# Print out accuracy and loss every 100 epochs
if epoch % 100 == 0:
print(f"Epoch: {epoch}, Loss: {loss:.5f}, Accuracy: {acc:.2f}% | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%")

Unfortunately the neural network we’ve just built isn’t ok to resolve this problem. Let’s see the evolution of coaching and test accuracy:

Train and Test Accuracy through the Epochs — Image by Writer

(I’m plotting accuracy as an alternative of loss because it is simpler to interpret on this problem)

Interestingly, our Neural Network isn’t able improve much of the test set accuracy.

With the knowledge have from previous blog posts, we will try so as to add more layers and neurons to our neural network. Let’s attempt to do each and see the consequence:

deeper_model = nn.Sequential(
nn.Linear(in_features=12, out_features=20),
nn.Linear(in_features=20, out_features=20),
nn.Linear(in_features=20, out_features=1)
deeper_model Neural Network Architecture — Image by Writer

Although our deeper model is a little more complex with an additional layer and more neurons, that doesn’t translate into more performance within the network:

Train and Test Accuracy through the Epochs for deeper model— Image by Writer

Though our model is more complex, that doesn’t really bring more accuracy to our classification problem.

To have the opportunity to attain more performance, we want to unlock a recent feature of Neural Networks — activation functions!

If making our model wider and bigger didn’t bring much improvement, there have to be something else that we will do with Neural Networks that may have the opportunity to enhance its performance, right?

That’s where activation functions may be used! In our example, we’ll return to our simpler model, but this time with a twist:

model_non_linear = nn.Sequential(
nn.Linear(in_features=12, out_features=5),
nn.Linear(in_features=5, out_features=1)

What’s the difference between this model and the primary one? The difference is that we added a recent block to our neural network — nn.ReLU . The rectified linear unit is an activation function that may change the calculation in each of the weights of the Neural Network:

ReLU Illustrative Example — Image by Writer

Every value that goes through our weights within the Neural Network will likely be computed against this function. If the worth of the feature times the load is negative, the worth is about to 0, otherwise the calculated value is assumed. Just this small change adds loads of power to a Neural Network architecture — in torch we have now different activation functions we will use reminiscent of nn.ReLU , nn.Tanh or nn.ELU . For an outline of all activation functions, check this link.

Our neural network architecture incorporates a small twist, in the intervening time:

Neural Network Architecture — ReLU — Image by Writer

With this small twist within the Neural Network, every value coming from the primary layer (represented by nn.Linear(in_features=12, out_features=5) ) can have to undergo the “ReLU” test.

Let’s see the impact of fitting this architecture on our data:

Train and Test Accuracy through the Epochs for non-linear model — Image by Writer

Cool! Although we see among the performance degrading after 800 epochs, this model doesn’t exhibit overfitting because the previous ones. Take note that our dataset could be very small, so there’s a likelihood that our results are higher just by randomness. Nevertheless, adding activation functions to your torch models definitely has a big impact by way of performance, training and generalization, particularly when you will have loads of data to coach on.

Now that you recognize the ability of non-linear activation functions, it’s also relevant to know:



Please enter your comment!
Please enter your name here