PyTorch Tutorial for Beginners: Construct a Multiple Regression Model from Scratch

before LLMs became hyped, there was an separating Machine Learning frameworks from Deep Learning frameworks.

The talk was targeting Scikit-Learn, XGBoost, and similar for ML, while PyTorch and TensorFlow dominated the scene when Deep Learning was the matter.

After the AI explosion, though, I even have been seeing PyTorch dominating the scene way more than TensorFlow. Each frameworks are really powerful, enabling Data Scientists to unravel different sorts of problems, Natural Language Processing being considered one of them, due to this fact increasing the recognition of Deep Learning once more.

Well, on this post, my idea will not be to speak about NLP, but as a substitute, I’ll work with a multivariable linear regression problem with two objectives in mind:

Teaching learn how to create a model using PyTorch
Sharing knowledge about Linear Regression that will not be at all times present in other tutorials.

Let’s dive in.

Preparing the Data

Alright, let me spare you from a flowery definition of Linear Regression. You almost certainly saw that too over and over in countless tutorials everywhere in the Web. So, enough to say that

Dataset

For this exercise, let’s use the Abalone dataset [1].

Nash, W., Sellers, T., Talbot, S., Cawthorn, A., & Ford, W. (1994). Abalone [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C55C7W.

Based on the dataset documentation,

So, allow us to go ahead and cargo the info. Moreover, we’ll One Hot Encode the variable Sex, because it is the one categorical one.

# Data Load
from ucimlrepo import fetch_ucirepo
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
from feature_engine.encoding import OneHotEncoder

# fetch dataset
abalone = fetch_ucirepo(id=1)

# data (as pandas dataframes)
X = abalone.data.features
y = abalone.data.targets

# One Hot Encode Sex
ohe = OneHotEncoder(variables=['Sex'])
X = ohe.fit_transform(X)

# View
df = pd.concat([X,y], axis=1)

Here’s the dataset.

Dataset header. Image by the creator.

So, with the intention to create a greater model, let’s explore the info.

Exploring the Data

The primary steps I wish to perform when exploring a dataset are:

1. Checking the goal variable’s distribution.

# Taking a look at our Goal variable
plt.hist(y)
plt.title('Rings [Target Variable] Distribution');

The graphic shows that the goal variable will not be normally distributed. That may impact the regression, but normally could be corrected with an influence transformation, similar to log or Box-Cox.

Goal variable distribution. Image by the creator.

2. Take a look at the statistical description.

The stats can show us necessary information like mean, standard deviation, and simply spot some discrepancies by way of minimum or maximum values. The explanatory variables are just about okay, inside a smaller range, and same scale. The goal variable (Rings) is in a unique scale.

# Statistical description
df.describe()

Statistical description. Image by the creator.

Next, let’s check the correlations.

# Taking a look at the correlations
(df
 .drop(['Sex_M', 'Sex_I', 'Sex_F'],axis=1)
 .corr()
 .style
 .background_gradient(cmap='coolwarm')
)

The explanatory variables have a moderate to strong correlation with Rings. We can even see that there’s some collinearity between Whole_weight with Shucked_weight, Viscera_weight, and Shell_weight. Length and Diameter are also collinear. We will test removing them later.

sns.pairplot(df);

After we plot the pairs scatterplots and take a look at the connection of the variables with Rings, we will quickly discover some problems

The idea of homoscedasticity is violated. Which means that the connection will not be homogeneous by way of variance.
Look how the plots form a cone shape, increasing the variance of Y because the X values increase. When estimating the worth of Rings for higher values of the X variables, the estimate is not going to be very accurate.
The variable Height has at the least two outliers which might be very visible when .

Pairplots no transformation. Image by the creator.

Removing the outliers and remodeling the goal variable to logarithms will end in the subsequent plot of the pairs. It is healthier, but still doesn’t solve the homoscedasticity problem.

Pairplots after transformation. Image by the creator.

One other quick exploration we will do is plotting some graphics to envision the connection of the variables when grouped by the Sex variable.

The variable Diameter has probably the most linear relationship when Sex=I, but that’s all.

# Create a FacetGrid with scatterplots
sns.lmplot(x="Diameter", y="Rings", hue="Sex", col="Sex", order=2, data=df);

Alternatively, Shell_weight has an excessive amount of dispersion for top values, distorting the linear relationship.

# Create a FacetGrid with scatterplots
sns.lmplot(x="Shell_weight", y="Rings", hue="Sex", col="Sex", data=df);

Shell_weight x Rings. Image by the creator.

All of this shows that a Linear Regression model can be really difficult for this dataset, and can probably fail. But we still wish to do it.

By the way in which, I don’t remember seeing a post where we actually undergo what went mistaken. So, by doing this, we can even learn beneficial lessons.

Modeling: Using Scikit-Learn

Let’s run the sklearn model and evaluate it using Root Mean Squared Error.

from sklearn.linear_model import LinearRegression
from sklearn.metrics import root_mean_squared_error

df2 = df.query('Height < 0.3 and Rings > 2 ').copy()
X = df2.drop(['Rings'], axis=1)
y = np.log(df2['Rings'])

lr = LinearRegression()
lr.fit(X, y)

predictions = lr.predict(X)

df2['Predictions'] = np.exp(predictions)
print(root_mean_squared_error(df2['Rings'], df2['Predictions']))

2.2383762717104916

If we take a look at the header, we will confirm that the model struggles with estimates for higher values (e.g., rows 0, 6, 7, and 9).

Header with predictions. Image by the creator.

One Step Back: Trying Other Transformations

Alright. So what can we do now?

Probably remove more outliers and check out again. Let’s try using an unsupervised algorithm to seek out some more outliers. We are going to apply the Local Outlier Factor, dropping 5% of the outliers.

We can even remove the multicollinearity, dropping Whole_weight and Length.

from sklearn.neighbors import LocalOutlierFactor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# fetch dataset
abalone = fetch_ucirepo(id=1)

# data (as pandas dataframes)
X = abalone.data.features
y = abalone.data.targets

# One Hot Encode Sex
ohe = OneHotEncoder(variables=['Sex'])
X = ohe.fit_transform(X)

# Drop Whole Weight and Length (multicolinearity)
X.drop(['Whole_weight', 'Length'], axis=1, inplace=True)

# View
df = pd.concat([X,y], axis=1)

# Let's create a Pipeline to scale the info and find outliers using KNN Classifier
steps = [
('scale', StandardScaler()),
('LOF', LocalOutlierFactor(contamination=0.05))
]
# Fit and predict
outliers = Pipeline(steps).fit_predict(X)

# Add column
df['outliers'] = outliers

# Modeling
df2 = df.query('Height < 0.3 and Rings > 2 and outliers != -1').copy()
X = df2.drop(['Rings', 'outliers'], axis=1)
y = np.log(df2['Rings'])

lr = LinearRegression()
lr.fit(X, y)

predictions = lr.predict(X)

df2['Predictions'] = np.exp(predictions)
print(root_mean_squared_error(df2['Rings'], df2['Predictions']))

2.238174395913869

Same result. Hmm….

Okay. we will keep fidgeting with the variables and have engineering, and we’ll start seeing some improvements here and there, like after we add the squared of Height, Diameter, and Shell_weight. That added to the outliers treatment will drop the RMSE to 2.196.

# Second Order Variables
X['Diameter_2'] = X['Diameter'] ** 2
X['Height_2'] = X['Height'] ** 2
X['Shell_2'] = X['Shell_weight'] ** 2

Actually, it’s fair to notice that each variable added in Linear Regression models will impact the R² and sometimes inflate the result, giving a false concept that the model is improving, when it will not be. On this case, the model is definitely improving, since we’re adding some non-linear components to it with the second order variables. We will prove that by calculating the adjusted R². It went from 0.495 to 0.517.

# Adjusted R²
from sklearn.metrics import r2_score

r2 = r2_score(df2['Rings'], df2['Predictions'])
n= df2.shape[0]
p = df2.shape[1] - 1
adj_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
print(f'R²: {r2}')
print(f'Adjusted R²: {adj_r2}')

Alternatively, bringing back Whole_weight and Length can improve slightly more the numbers, but I might not recommend it. If we do this, we’re adding multicolinearity and inflating the importance of some variables’ coefficients, resulting in potential estimation errors in the longer term.

Modeling: Using PyTorch

Okay. Now that we now have a base model created, the concept is to create a Linear model using Deep Learning and check out to beat the RMSE of two.196.

Right. To begin, let me state this upfront: Deep Learning models work higher with scaled data. Nonetheless, as our X variables are all in the identical scale, we won’t must worry about that. So let’s keep moving.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

We want to organize the info for modeling with PyTorch. Here, we’d like some adjustments to make the info acceptable by the PyTorch framework, because it won’t take regular pandas dataframes.

Let’s use the identical data frame from our base model.
Split X and Y
Transform the Y variable to log
Transform each to numpy arrays, since PyTorch won’t take dataframes.

df2 = df.query('Height < 0.3 and Rings > 2 and outliers != -1').copy()
X = df2.drop(['Rings', 'outliers'], axis=1)
y = np.log(df2[['Rings']])

# X and Y to Numpy
X = X.to_numpy()
y = y.to_numpy()

Next, using TensorDataset, we make X and Y turn into a Tensor object, and print the result.

# Prepare with TensorData
# TensorData helps us transforming the dataset to Tensor object
dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).float())

input_sample, label_sample = dataset[0]
print(f'** Input sample: {input_sample}, n** Label sample: {label_sample}')

** Input sample: tensor([0.3650, 0.0950, 0.2245, 0.1010, 0.1500, 1.0000, 
0.0000, 0.0000, 0.1332, 0.0090, 0.0225]), 
** Label sample: tensor([2.7081])

Then, using the DataLoader function, we will create batches of knowledge. Which means that the Neural Network will take care of a batch_size amount of knowledge at a time.

# Next, let's use DataLoader
batch_size = 500
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

PyTorch models are best defined as classes.

The class is predicated on the nn.Module, which is PyTorch’s base class for neural networks.
We define the model layers we would like to make use of within the init method.
- super().__init__() ensures the category will behave like a torch object.
The forward method describes what happens to the input when passed to the model.

Here, we pass it through Linear layers that we defined within the init method, and use ReLU activation functions so as to add some non-linearity to the model within the forward pass.

# 2. Making a class
class AbaloneModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear1 = nn.Linear(in_features=X.shape[1], out_features=128)
    self.linear2 = nn.Linear(128, 64)
    self.linear3 = nn.Linear(64, 32)
    self.linear4 = nn.Linear(32, 1)

  def forward(self, x):
    x = self.linear1(x)
    x = nn.functional.relu(x)
    x = self.linear2(x)
    x = nn.functional.relu(x)
    x = self.linear3(x)
    x = nn.functional.relu(x)
    x = self.linear4(x)
    return x

# Instantiate model
model = AbaloneModel()

Next, let’s try the model for the primary time using a script that simulates a Random Search.

Create an error criterion for model evaluation
Create a listing to carry the info from the very best model and setup the best_loss as a high value, so it can get replaced by higher loss numbers throughout the iteration.
Setup the range for the educational rate. We are going to use power aspects from -2 to -4 (e.g. from 0.01 to 0.0001).
Setup a spread for the momentum from 0.9 to 0.99.
Get the info
Zero the gradient to clear gradient calculations from previous iterations.
Fit the model
Compute the loss and register the very best model’s numbers.
Compute the weights and biases with the backward pass.
Iterate N times and print the very best model.

# Mean Squared Error (MSE) is standard for regression
criterion = nn.MSELoss()

# Random Search
values = []
best_loss = 999
for idx in range(1000):
  # Randomly sample a learning rate factor between 2 and 4
  factor = np.random.uniform(2,5)
  lr = 10 ** -factor

  # Randomly select a momentum between 0.85 and 0.99
  momentum = np.random.uniform(0.90, 0.99)

  # 1. Get Data
  feature, goal = dataset[:]
  # 2. Zero Gradients: Clear old gradients before the backward pass
  optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
  optimizer.zero_grad()
  # 3. Forward Pass: Compute prediction
  y_pred = model(feature)
  # 4. Compute Loss
  loss = criterion(y_pred, goal)
  # 4.1 Register best Loss
  if loss < best_loss:
    best_loss = loss
    best_lr = lr
    best_momentum = momentum
    best_idx = idx

  # 5. Backward Pass: Compute gradient of the loss w.r.t W and b'
  loss.backward()
  # 6. Update Parameters: Adjust W and b using the calculated gradients
  optimizer.step()
  values.append([idx, lr, momentum, loss])

print(f'n: {idx},lr: {lr}, momentum: {momentum}, loss: {loss}')

n: 999,lr: 0.004782946959508322, momentum: 0.9801209929050066, loss: 0.06135804206132889

Once we get the very best learning rate and momentum, we will move on.

# --- 3. Loss Function and Optimizer ---

# Mean Squared Error (MSE) is standard for regression
criterion = nn.MSELoss()

# Stochastic Gradient Descent (SGD) with a small learning rate (lr)
optimizer = optim.SGD(model.parameters(), lr=0.004, momentum=0.98)

Then, we'll re-train this model, using the identical steps as before, but this time keeping the identical learning rate and momentum.

Fitting a PyTorch model needs an extended script than the regular fit() method from Scikit-Learn. Nevertheless it will not be an enormous deal. The structure will at all times be much like these steps:

Activate the model.train() mode
Create a loop for the variety of iterations you wish. Each iteration is named an epoch.
Zero the gradients from previous passes with optimizer.zero_grad().
Get the batches from the dataloader.
Compute the predictions with model(X)
Calculate the loss using criterion(y_pred, goal).
Do the Backward Pass to compute the weights and bias: loss.backward()
Update the weights and bias with optimizer.step()

We are going to train this model for 1000 epochs (iterations). Here, we're only adding a step to get the very best model at the top, so we be sure to make use of the model with the very best loss.

# 4. Training
torch.manual_seed(42)
NUM_EPOCHS = 1001
loss_history = []
best_loss = 999

# Put model in training mode
model.train()

for epoch in range(NUM_EPOCHS):
  for data in dataloader:

    # 1. Get Data
    feature, goal = data

    # 2. Zero Gradients: Clear old gradients before the backward pass
    optimizer.zero_grad()

    # 3. Forward Pass: Compute prediction
    y_pred = model(feature)

    # 4. Compute Loss
    loss = criterion(y_pred, goal)
    loss_history.append(loss)

    # Get Best Model
    if loss < best_loss:
      best_loss = loss
      best_model_state = model.state_dict()  # save best model

    # 5. Backward Pass: Compute gradient of the loss w.r.t W and b'
    loss.backward()

    # 6. Update Parameters: Adjust W and b using the calculated gradients
    optimizer.step()

    # Load the very best model before returning predictions
    model.load_state_dict(best_model_state)

  # Print status every 50 epochs
  if epoch % 200 == 0:
    print(epoch, loss.item())
    print(f'Best Loss: {best_loss}')

0 0.061786893755197525
Best Loss: 0.06033024191856384
200 0.036817338317632675
Best Loss: 0.03243456035852432
400 0.03307393565773964
Best Loss: 0.03077109158039093
600 0.032522525638341904
Best Loss: 0.030613820999860764
800 0.03488151729106903
Best Loss: 0.029514113441109657
1000 0.0369877889752388
Best Loss: 0.029514113441109657

Nice. The model is trained. Now it's time to guage.

Evaluation

Let’s check if this model did higher than the regular regression. For that, I'll put the model in evaluation mode through the use of model.eval(), so PyTorch knows that it needs to alter the behavior from training and get into inference mode. It can turn off layer normalization and dropouts, for instance.

# Get features
features, targets = dataset[:]

# Get Predictions
model.eval()
with torch.no_grad():
  predictions = model(features)

# Add to dataframe
df2['Predictions'] = np.exp(predictions.detach().numpy())

# RMSE
print(root_mean_squared_error(df2['Rings'], df2['Predictions']))

2.1108551025390625

The advance was modest, about 4%.

Let’s take a look at some predictions from each model.

Predictions from each models. Image by the creator.

Each models are getting very similar results. They struggle more because the variety of Rings becomes higher. That's because of the cone shape of the goal variable.

If we predict that through for a moment:

Because the variety of Rings increases, there's more variance coming from the explanatory variable.
An Abalone with 15 rings can be inside a much wider range of values than one other one with 4 rings.
This confuses the model since it needs to attract a single line in the course of the info that will not be that linear.

Before You Go

We learned loads on this project:

Tips on how to explore data.
Tips on how to check if the linear model can be option.
Tips on how to create a PyTorch model for a multivariable Linear Regression.

In the long run, we saw that a goal variable that will not be homogeneous, even after power transformations, can result in a low-performing model. Our model continues to be higher than shooting the common value for all of the predictions, however the error continues to be high, staying about 20% of the mean value.

We tried to make use of Deep Learning to enhance the result, but all that power was not enough to lower the error considerably. I might probably go together with the Scikit-Learn model, because it is less complicated and more explainable.

Other options to try to enhance the outcomes can be making a custom ensemble model with a Random Forest + Linear Regression. But that may be a task that I leave to you, should you want.

In case you liked this content, find me on my website.

https://gustavorsantos.me

GitHub Repository

The code for this exercise.

https://github.com/gurezende/Linear-Regression-PyTorch

References

[1. Abalone Dataset – UCI Repository, CC BY 4.0 license.] https://archive.ics.uci.edu/dataset/1/abalone

[2. Eval mode] https://stackoverflow.com/questions/60018578/what-does-model-eval-do-in-pytorch

https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval

[3. PyTorch Docs] https://docs.pytorch.org/docs/stable/nn.html

[4. Kaggle Notebook] https://www.kaggle.com/code/samlakhmani/s4e4-deeplearning-with-oof-strategy

[5. GitHub Repo] https://github.com/gurezende/Linear-Regression-PyTorch

PyTorch Tutorial for Beginners: Construct a Multiple Regression Model from Scratch

Preparing the Data

Dataset

Exploring the Data

Modeling: Using Scikit-Learn

One Step Back: Trying Other Transformations

Modeling: Using PyTorch

Evaluation

Before You Go

GitHub Repository

References

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Patch Time Series Transformer in Hugging Face

Constitutional AI with Open LLMs

Hugging Face Text Generation Inference available for AWS Inferentia2

The best way to Leverage Slash Commands to Code Effectively

Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

PyTorch Tutorial for Beginners: Construct a Multiple Regression Model from Scratch

Preparing the Data

Dataset

Exploring the Data

Modeling: Using Scikit-Learn

One Step Back: Trying Other Transformations

Modeling: Using PyTorch

Evaluation

Before You Go

GitHub Repository

References

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.