Home Artificial Intelligence Introduction to PyTorch: from training loop to prediction

Introduction to PyTorch: from training loop to prediction

Introduction to PyTorch: from training loop to prediction

Image by creator.

On this post we’ll cover the way to implement a logistic regression model using PyTorch in Python.

PyTorch is one of the crucial famous and used deep learning frameworks by the community of knowledge scientists and machine learning engineers on this planet, and thus learning this tool becomes a vital step in your learning path if you must construct a profession in the sector of applied AI.

It joins TensorFlow, one other very famous deep learning framework developed by Google.

There aren’t any notable fundamental differences, aside from the structure and organization of their APIs, which will be very different.

While each frameworks allow us to create very complex neural networks, PyTorch is mostly preferred because of its more pythonic style and the liberty it allows the developer to integrate custom logic into the software.

We’ll use the Sklearn breast cancer dataset, an open source dataset already used previously in a few of my previous article to coach a binary classification model.

The goal is to clarify the way to:

  • go from a pandas dataframe to PyTorch’s Datasets and DataLoaders
  • create a neural network for binary classification in PyTorch
  • create predictions
  • evaluate the performance of our model with utility functions and matplotlib
  • use this network to make predictions

By the top of this text we may have a transparent idea of the way to create a neural network in PyTorch and the way the training loop works.

Let’s start!

We start our project by making a virtual environment in a dedicated folder.

Visit this link to learn the way to create a virtual environment with Conda.

Once our virtual environment has been created, we are able to run the command

$ pip install torch -U

within the terminal. This command will install the most recent version of PyTorch, which as of this writing is version 2.0.

Starting a notebook, we are able to check the library version using torch.__version__ after doing import torch.

We will confirm that PyTorch is accurately installed within the environment by importing and launching a small test script, as shown within the official guide.

import torch

x = torch.rand(5, 3)

>>> tensor([[0.3890, 0.6087, 0.2300],
[0.1866, 0.4871, 0.9468],
[0.2254, 0.7217, 0.4173],
[0.1243, 0.1482, 0.6797],
[0.2430, 0.4608, 0.8886]])

If the script executes accurately then we’re able to proceed with the project. Otherwise I suggest the reader to check with the official guide situated here https://pytorch.org/get-started/locally/.

Let’s proceed with the installation of the extra dependencies:

  • Sklearn; pip install scikit-learn
  • Pandas; pip install pandas
  • Matplotlib; pip install matplotlib

Libraries like Numpy are robotically install whenever you install PyTorch.

Let’s start by importing the installed libraries and breast cancer dataset from Sklearn with the next code snippet

import torch
import pandas as pd
import numpy as np

from sklearn.datasets import load_breast_cancer

import matplotlib.pyplot as plt

breast_cancer_dataset = load_breast_cancer(as_frame=True, return_X_y=True)

Let’s create a dataframe dedicated to holding our X and y like this

df = breast_cancer_dataset[0]
df['target'] = breast_cancer_dataset[1]
Example of the dataframe. Image by creator.

Our goal is to create a model that may predict the goal column based on the characteristics in the opposite columns.

Let’s go do a minimum of exploratory evaluation to get some awareness of the dataset. We’ll use the sweetviz library to robotically create an evaluation report.

We will install sweetviz with the command pip install sweetviz and create an EDA (exploratory data evaluation) report with this piece of code

import sweetviz as sv

eda_report = sv.analyze(df)

Sweetviz analyzing our dataset. Image by creator.

Sweetviz will create a report right in our notebook for us to explore.

“Association” tab in Sweetviz. Image by creator.

We see how several columns are highly related to a worth of 0 or 1 of our goal column.

Being a multidimensional dataset and having variables with different distributions, a neural network is a legitimate choice to model this data. That said, this dataset may also be modeled by simpler models, comparable to decision trees.

We’ll now import two other libraries with the intention to visualize the dataset. We will use PCA (Principal Component Evaluation) from Sklearn and Seaborn to visualise the multidimensional dataset.

PCA will help us compress the massive variety of variables into just two, which we’ll use because the X and Y axis in a Seaborn scatterplot. Seaborn takes a further parameter called hue to paint the dots based on a further variable. We’ll use our goal.

import seaborn as sns
from sklearn import decomposition

pca = decomposition.PCA(n_components=2)

X = df.drop("goal", axis=1).values
y = df['target'].values

vecs = pca.fit_transform(X)
x0 = vecs[:, 0]
x1 = vecs[:, 1]

sns.scatterplot(x=x0, y=x1, hue=y)
plt.title("Proiezione PCA")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")

PCA projection of the breast cancer dataset. Image by creator.

We see how class 1 data points group based on common characteristics. It’s going to be the goal of our neural network to categorise the rows between targets 0 or 1.

PyTorch provides Dataset and DataLoader objects to permit us to efficiently organize and cargo our data into the neural network.

It could be possible to make use of pandas directly, but this might have disadvantages because it could make our code less efficient.

The Dataset class allows us to specify the appropriate format on your data and apply the retrieval and transformation logics which might be often fundamental (consider the information augmentation applied to pictures).

Let’s see the way to create a PyTorch Dataset object.

from torch.utils.data import Dataset

class BreastCancerDataset(Dataset):
def __init__(self, X, y):
# create feature tensors
self.features = torch.tensor(X, dtype=torch.float32)
# create label tensors
self.labels = torch.tensor(y, dtype=torch.long)

def __len__(self):
# we define a technique to retrieve the length of the dataset
return self.features.shape[0]

def __getitem__(self, idx):
# mandatory override of the __getitem__ method which helps to index our data
x = self.features[idx]
y = self.labels[idx]
return x, y

This can be a class that inherits from Dataset and allows the DataLoader, which we’ll create shortly, to efficiently retrieve batches of knowledge.

The category takes X and y as input.

Before proceeding to the next steps, it is crucial to create training, validation and test sets.

These will help us evaluate the performance of our model and understand the standard of the predictions.

For the interested reader, I suggest reading the article 6 Things You Should Do Before Training Your Model and what’s cross-validation in machine learning to raised understand why splitting our data into three partitions is an efficient method for performance evaluation.

With Sklearn this becomes easy with the train_test_split method.

from sklearn import model_selection

train_ratio = 0.50
validation_ratio = 0.20
test_ratio = 0.20

x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=1 - train_ratio)
x_val, x_test, y_val, y_test = model_selection.train_test_split(x_test, y_test, test_size=test_ratio/(test_ratio + validation_ratio))

print(x_train.shape, x_val.shape, x_test.shape)

>>> (284, 30) (142, 30) (143, 30)

With this small snippet of code we created our training, validation and test sets in line with controllable splits.

When doing deep learning, even for a walk in the park like binary classification, it’s at all times mandatory to normalize our data.

Normalizing means bringing all of the values of the assorted columns within the dataset to the identical numerical scale. This helps the neural network converge more effectively and thus make accurate predictions faster.

We’ll use Sklearn’s StandardScaler.

from sklearn import preprocessing

scaler = preprocessing.StandardScaler()

x_train_scaled = scaler.fit_transform(x_train)
x_val_scaled = scaler.transform(x_val)
x_test_scaled = scaler.transform(x_test)

Notice how fit_trasform is applied only to the training set, while transform is applied to the opposite two datasets. That is to avoid data leakage — when information from our validation or test set is unintentionally leaked into our training set. We would like our training set to be the one source of learning, unaffected by test data.

This data is now able to be input to the BreastCancerDataset class.

train_dataset = BreastCancerDataset(x_train_scaled, y_train)
val_dataset = BreastCancerDataset(x_val_scaled, y_val)
test_dataset = BreastCancerDataset(x_test_scaled, y_test)

We import the dataloader and initialize the objects.

from torch.utils.data import DataLoader

train_loader = DataLoader(

val_loader = DataLoader(

test_loader = DataLoader(

The ability of the DataLoader is that it allows us to specify whether to shuffling our data and in what variety of batches the information needs to be supplied to the model. The batch size is to be considered a hyperparameter of the model and due to this fact can impact the outcomes of our inferences.

Making a model in PyTorch might sound complex, nevertheless it really only requires understanding just a few basic concepts.

  1. When writing a model in PyTorch, we’ll use an object-based approach, like with datasets. It means that we’ll create a category like class MyModel which inherits from PyTorch’s nn.Module class.
  2. PyTorch is an autodifferentiation software. It implies that after we write a neural network based on the backpropagation algorithm, the calculation of the derivatives to calculate the loss is finished robotically behind the scenes. This requires writing some dedicated code which may get confusing the primary time around.

I counsel the reader who desires to know the fundamentals of how neural networks work to seek the advice of the article Introduction to neural networks — weights, biases and activation

That said, let’s see what the code for writing a logistic regression model looks like.

class LogisticRegression(nn.Module):
Our neural network accepts num_features and num_classes.

num_features - variety of features to learn from
num_classes: variety of classes in output to expect (on this case, 1 or 2, for the reason that output is binary (0 or 1))

def __init__(self, num_features, num_classes):
super().__init__() # initialize the init approach to nn.Module

self.num_features = num_features
self.num_classes = num_classes

# create a single layer of neurons on which to use the log reg
self.linear1 = nn.Linear(in_features=num_features, out_features=num_classes)

def forward(self, x):
logits = self.linear1(x) # pass our data through the layer
probs = torch.sigmoid(logits) # we apply a sigmoid function to acquire the chances of belonging to a category (0 or 1)
return probs # return probabilities

Our class inherits from nn.Module. This class provides the methods behind the scenes that make the model work.

__init__ method

The __init__ approach to a category comprises the logic that runs when instantiating a category in Python. Here we pass two arguments: the variety of features and the variety of classes to predict.

num_features corresponds to the variety of columns that make up our dataset minus our goal variable, while num_classes corresponds to the variety of results that the neural network must return.

Along with the 2 arguments and their class variables, we see the super().__init__() line. The super function initializes the init approach to the parent class. This permits us to have the functionality of nn.Module inside our model.

All the time within the init block, we implement a linear layer called self.linear1, which takes as arguments the variety of features and the variety of results to return.

forward() method

By writing the forward method we tell Python to override the identical method inside PyTorch’s nn.Module parent class. Actually, this method known as when performing a forward pass — that’s, when our data passes from one layer to a different.

forward accepts input x which comprises the features on which the model will calibrate its performance.

The input passes through the primary layer, creating the logits variable. The logits are the neural network calculations that usually are not yet converted into probabilities by the ultimate activation function, which on this case is a sigmoid. Actually, they’re the inner representation of the neural network before being mapped to a function that enables it to be interpreted.

On this case the sigmoid function will map the logits to probabilities between 0 and 1. If the output is lower than 0, then the category might be 0 otherwise it’s going to be 1. This happens in the road self.probs = torch.sigmoid(x).

Let’s create utility functions to make use of within the training loop that we’ll see shortly. These two are used to compute the accuracy at the top of every epoch and to display the performance curves at the top of the training.

def compute_accuracy(model, dataloader):
This function puts the model in evaluation mode (model.eval()) and calculates the accuracy with respect to the input dataloader
model = model.eval()
correct = 0
total_examples = 0
for idx, (features, labels) in enumerate(dataloader):
with torch.no_grad():
logits = model(features)
predictions = torch.where(logits > 0.5, 1, 0)
lab = labels.view(predictions.shape)
comparison = lab == predictions

correct += torch.sum(comparison)
total_examples += len(comparison)
return correct / total_examples

def plot_results(train_loss, val_loss, train_acc, val_acc):
This function takes lists of values and creates side-by-side graphs to point out training and validation performance
fig, ax = plt.subplots(1, 2, figsize=(15, 5))
train_loss, label="train", color="red", linestyle="--", linewidth=2, alpha=0.5
val_loss, label="val", color="blue", linestyle="--", linewidth=2, alpha=0.5
train_acc, label="train", color="red", linestyle="--", linewidth=2, alpha=0.5
val_acc, label="val", color="blue", linestyle="--", linewidth=2, alpha=0.5

Now we come to the part where most deep learning newcomers struggle: the PyTorch training loop.

Let’s have a look at the code after which comment it

import torch.nn.functional as F

model = LogisticRegression(num_features=x_train_scaled.shape[1], num_classes=1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

num_epochs = 10

train_losses, val_losses = [], []
train_accs, val_accs = [], []

for epoch in range(num_epochs):

model = model.train()
t_loss_list, v_loss_list = [], []
for batch_idx, (features, labels) in enumerate(train_loader):

train_probs = model(features)
train_loss = F.binary_cross_entropy(train_probs, labels.view(train_probs.shape))


if batch_idx % 10 == 0:
f"Epoch {epoch+1:02d}/{num_epochs:02d}"
f" | Batch {batch_idx:02d}/{len(train_loader):02d}"
f" | Train Loss {train_loss:.3f}"


model = model.eval()
for batch_idx, (features, labels) in enumerate(val_loader):
with torch.no_grad():
val_probs = model(features)
val_loss = F.binary_cross_entropy(val_probs, labels.view(val_probs.shape))


train_acc = compute_accuracy(model, train_loader)
val_acc = compute_accuracy(model, val_loader)


f"Train accuracy: {train_acc:.2f}"
f" | Val accuracy: {val_acc:.2f}"

Unlike TensorFlow, PyTorch requires us to jot down a training loop in pure Python.

Let’s see the procedure step-by-step:

  1. We instantiate the model and the optimizer
  2. We settle on a lot of epochs
  3. We create a for loop that iterates through the epochs
  4. For every epoch, we set the model to training mode with model.train() and cycle through the train_loader
  5. For every batch of the train_loader, calculate the loss, bring the calculation of the derivatives to 0 with optimizer.zero_grad() and update the weights of the network with optimizer.step()

At this point the training loop is complete, and for those who want you’ll be able to integrate the identical logic on the validation dataloader as written within the code.

Here is the results of the training after the launch of this code

Training in progress. Image by creator.

We use the previously created utility function to plot loss in training and validation.

plot_results(train_losses, val_losses, train_accs, val_accs)
Performances of the neural network. Image by creator.

Our binary classification model quickly converges to high accuracy, and we see how the loss drops at the top of every epoch.

The dataset seems to be easy to model and the low variety of examples doesn’t help to see a more gradual convergence towards high performance by the network.

I emphasize that it is feasible to integrate the TensorBoard software into PyTorch to have the opportunity to log performance metrics robotically between the assorted experiments.

We’ve got reached the top of this guide. Let’s see the code to create predictions for our entire dataset.

# we transform all our features with the scaler
X_scaled_all = scaler.transform(X)

# transform in tensors
X_scaled_all_tensors = torch.tensor(X_scaled_all, dtype=torch.float32)

# we set the model in inference mode and create the predictions
with torch.inference_mode():
logits = model(X_scaled_all_tensors)
predictions = torch.where(logits > 0.5, 1, 0)

df['predictions'] = predictions.numpy().flatten()

Now let’s import the metrics package from Sklearn which allows us to quickly calculate the confusion matrix and classification report directly on our pandas dataframe.

from sklearn import metrics
from pprint import pprint

pprint(metrics.classification_report(y_pred=df.predictions, y_true=df.goal))

Summary of performance on all the dataset with a classification report. Image by creator.

And the confusion matrix, which shows the variety of correct answers on the diagonal

metrics.confusion_matrix(y_pred=df.predictions, y_true=df.goal)

>>> array([[197, 15],
[ 13, 344]])

Here’s a small function to create a classification line that separates the classes within the PCA graph

def plot_boundary(model):

w1 = model.linear1.weight[0][0].detach()
w2 = model.linear1.weight[0][1].detach()
b = model.linear1.bias[0].detach()

x1_min = -1000
x2_min = (-(w1 * x1_min) - b) / w2

x1_max = 1000
x2_max = (-(w1 * x1_max) - b) / w2

return x1_min, x1_max, x2_min, x2_max

sns.scatterplot(x=x0, y=x1, hue=y)
plt.title("PCA Projection")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.plot([x1_min, x1_max], [x2_min, x2_max], color="k", label="Classification", linestyle="--")

And here’s how the model separates benign from malignant cells

Classification boundary visualized. Image by creator.

In this text now we have seen the way to create a binary classification model with PyTorch, ranging from a Pandas dataframe.

We’ve seen what the training loop looks like, the way to evaluate the model, and the way to create predictions and visualizations to help interpretation.

With PyTorch it is feasible to create very complex neural networks … just think that Tesla, the manufacturer of electrical cars based on AI, uses PyTorch to create its models.

For many who want to start out their deep learning journey, learning PyTorch as early as possible becomes a high priority task because it means that you can construct vital technologies that may solve complex data-driven problems.



Please enter your comment!
Please enter your name here