before LLMs became hyped, there was an separating Machine Learning frameworks from Deep Learning frameworks.
The talk was targeting Scikit-Learn, XGBoost, and similar for ML, while PyTorch and TensorFlow dominated the scene when Deep Learning was the matter.
After the AI explosion, though, I even have been seeing PyTorch dominating the scene way more than TensorFlow. Each frameworks are really powerful, enabling Data Scientists to unravel different sorts of problems, Natural Language Processing being considered one of them, due to this fact increasing the recognition of Deep Learning once more.
Well, on this post, my idea will not be to speak about NLP, but as a substitute, I’ll work with a multivariable linear regression problem with two objectives in mind:
- Teaching learn how to create a model using PyTorch
- Sharing knowledge about Linear Regression that will not be at all times present in other tutorials.
Let’s dive in.
Preparing the Data
Alright, let me spare you from a flowery definition of Linear Regression. You almost certainly saw that too over and over in countless tutorials everywhere in the Web. So, enough to say that
Dataset
For this exercise, let’s use the Abalone dataset [1].
Nash, W., Sellers, T., Talbot, S., Cawthorn, A., & Ford, W. (1994). Abalone [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C55C7W.
Based on the dataset documentation,
So, allow us to go ahead and cargo the info. Moreover, we’ll One Hot Encode the variable Sex, because it is the one categorical one.
# Data Load
from ucimlrepo import fetch_ucirepo
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
from feature_engine.encoding import OneHotEncoder
# fetch dataset
abalone = fetch_ucirepo(id=1)
# data (as pandas dataframes)
X = abalone.data.features
y = abalone.data.targets
# One Hot Encode Sex
ohe = OneHotEncoder(variables=['Sex'])
X = ohe.fit_transform(X)
# View
df = pd.concat([X,y], axis=1)
Here’s the dataset.
So, with the intention to create a greater model, let’s explore the info.
Exploring the Data
The primary steps I wish to perform when exploring a dataset are:
1. Checking the goal variable’s distribution.
# Taking a look at our Goal variable
plt.hist(y)
plt.title('Rings [Target Variable] Distribution');
The graphic shows that the goal variable will not be normally distributed. That may impact the regression, but normally could be corrected with an influence transformation, similar to log or Box-Cox.

2. Take a look at the statistical description.
The stats can show us necessary information like mean, standard deviation, and simply spot some discrepancies by way of minimum or maximum values. The explanatory variables are just about okay, inside a smaller range, and same scale. The goal variable (Rings) is in a unique scale.
# Statistical description
df.describe()

Next, let’s check the correlations.
# Taking a look at the correlations
(df
.drop(['Sex_M', 'Sex_I', 'Sex_F'],axis=1)
.corr()
.style
.background_gradient(cmap='coolwarm')
)

The explanatory variables have a moderate to strong correlation with Rings. We can even see that there’s some collinearity between Whole_weight with Shucked_weight, Viscera_weight, and Shell_weight. Length and Diameter are also collinear. We will test removing them later.
sns.pairplot(df);
After we plot the pairs scatterplots and take a look at the connection of the variables with Rings, we will quickly discover some problems
- The idea of homoscedasticity is violated. Which means that the connection will not be homogeneous by way of variance.
- Look how the plots form a cone shape, increasing the variance of Y because the X values increase. When estimating the worth of
Ringsfor higher values of the X variables, the estimate is not going to be very accurate. - The variable
Heighthas at the least two outliers which might be very visible when .

Removing the outliers and remodeling the goal variable to logarithms will end in the subsequent plot of the pairs. It is healthier, but still doesn’t solve the homoscedasticity problem.

One other quick exploration we will do is plotting some graphics to envision the connection of the variables when grouped by the Sex variable.
The variable Diameter has probably the most linear relationship when Sex=I, but that’s all.
# Create a FacetGrid with scatterplots
sns.lmplot(x="Diameter", y="Rings", hue="Sex", col="Sex", order=2, data=df);

Alternatively, Shell_weight has an excessive amount of dispersion for top values, distorting the linear relationship.
# Create a FacetGrid with scatterplots
sns.lmplot(x="Shell_weight", y="Rings", hue="Sex", col="Sex", data=df);

All of this shows that a Linear Regression model can be really difficult for this dataset, and can probably fail. But we still wish to do it.
By the way in which, I don’t remember seeing a post where we actually undergo what went mistaken. So, by doing this, we can even learn beneficial lessons.
Modeling: Using Scikit-Learn
Let’s run the sklearn model and evaluate it using Root Mean Squared Error.
from sklearn.linear_model import LinearRegression
from sklearn.metrics import root_mean_squared_error
df2 = df.query('Height < 0.3 and Rings > 2 ').copy()
X = df2.drop(['Rings'], axis=1)
y = np.log(df2['Rings'])
lr = LinearRegression()
lr.fit(X, y)
predictions = lr.predict(X)
df2['Predictions'] = np.exp(predictions)
print(root_mean_squared_error(df2['Rings'], df2['Predictions']))
2.2383762717104916
If we take a look at the header, we will confirm that the model struggles with estimates for higher values (e.g., rows 0, 6, 7, and 9).

One Step Back: Trying Other Transformations
Alright. So what can we do now?
Probably remove more outliers and check out again. Let’s try using an unsupervised algorithm to seek out some more outliers. We are going to apply the Local Outlier Factor, dropping 5% of the outliers.
We can even remove the multicollinearity, dropping Whole_weight and Length.
from sklearn.neighbors import LocalOutlierFactor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# fetch dataset
abalone = fetch_ucirepo(id=1)
# data (as pandas dataframes)
X = abalone.data.features
y = abalone.data.targets
# One Hot Encode Sex
ohe = OneHotEncoder(variables=['Sex'])
X = ohe.fit_transform(X)
# Drop Whole Weight and Length (multicolinearity)
X.drop(['Whole_weight', 'Length'], axis=1, inplace=True)
# View
df = pd.concat([X,y], axis=1)
# Let's create a Pipeline to scale the info and find outliers using KNN Classifier
steps = [
('scale', StandardScaler()),
('LOF', LocalOutlierFactor(contamination=0.05))
]
# Fit and predict
outliers = Pipeline(steps).fit_predict(X)
# Add column
df['outliers'] = outliers
# Modeling
df2 = df.query('Height < 0.3 and Rings > 2 and outliers != -1').copy()
X = df2.drop(['Rings', 'outliers'], axis=1)
y = np.log(df2['Rings'])
lr = LinearRegression()
lr.fit(X, y)
predictions = lr.predict(X)
df2['Predictions'] = np.exp(predictions)
print(root_mean_squared_error(df2['Rings'], df2['Predictions']))
2.238174395913869
Same result. Hmm….
Okay. we will keep fidgeting with the variables and have engineering, and we’ll start seeing some improvements here and there, like after we add the squared of Height, Diameter, and Shell_weight. That added to the outliers treatment will drop the RMSE to 2.196.
# Second Order Variables
X['Diameter_2'] = X['Diameter'] ** 2
X['Height_2'] = X['Height'] ** 2
X['Shell_2'] = X['Shell_weight'] ** 2
Actually, it’s fair to notice that each variable added in Linear Regression models will impact the R² and sometimes inflate the result, giving a false concept that the model is improving, when it will not be. On this case, the model is definitely improving, since we’re adding some non-linear components to it with the second order variables. We will prove that by calculating the adjusted R². It went from 0.495 to 0.517.
# Adjusted R²
from sklearn.metrics import r2_score
r2 = r2_score(df2['Rings'], df2['Predictions'])
n= df2.shape[0]
p = df2.shape[1] - 1
adj_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
print(f'R²: {r2}')
print(f'Adjusted R²: {adj_r2}')
Alternatively, bringing back Whole_weight and Length can improve slightly more the numbers, but I might not recommend it. If we do this, we’re adding multicolinearity and inflating the importance of some variables’ coefficients, resulting in potential estimation errors in the longer term.
Modeling: Using PyTorch
Okay. Now that we now have a base model created, the concept is to create a Linear model using Deep Learning and check out to beat the RMSE of two.196.
Right. To begin, let me state this upfront: Deep Learning models work higher with scaled data. Nonetheless, as our X variables are all in the identical scale, we won’t must worry about that. So let’s keep moving.
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
We want to organize the info for modeling with PyTorch. Here, we’d like some adjustments to make the info acceptable by the PyTorch framework, because it won’t take regular pandas dataframes.
- Let’s use the identical data frame from our base model.
- Split X and Y
- Transform the Y variable to log
- Transform each to numpy arrays, since PyTorch won’t take dataframes.
df2 = df.query('Height < 0.3 and Rings > 2 and outliers != -1').copy()
X = df2.drop(['Rings', 'outliers'], axis=1)
y = np.log(df2[['Rings']])
# X and Y to Numpy
X = X.to_numpy()
y = y.to_numpy()
Next, using TensorDataset, we make X and Y turn into a Tensor object, and print the result.
# Prepare with TensorData
# TensorData helps us transforming the dataset to Tensor object
dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).float())
input_sample, label_sample = dataset[0]
print(f'** Input sample: {input_sample}, n** Label sample: {label_sample}')
** Input sample: tensor([0.3650, 0.0950, 0.2245, 0.1010, 0.1500, 1.0000,
0.0000, 0.0000, 0.1332, 0.0090, 0.0225]),
** Label sample: tensor([2.7081])
Then, using the DataLoader function, we will create batches of knowledge. Which means that the Neural Network will take care of a batch_size amount of knowledge at a time.
# Next, let's use DataLoader
batch_size = 500
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
PyTorch models are best defined as classes.
- The class is predicated on the
nn.Module, which is PyTorch’s base class for neural networks. - We define the model layers we would like to make use of within the init method.
super().__init__()ensures the category will behave like a torch object.
- The
forwardmethod describes what happens to the input when passed to the model.
Here, we pass it through Linear layers that we defined within the init method, and use ReLU activation functions so as to add some non-linearity to the model within the forward pass.
# 2. Making a class
class AbaloneModel(nn.Module):
def __init__(self):
super().__init__()
self.linear1 = nn.Linear(in_features=X.shape[1], out_features=128)
self.linear2 = nn.Linear(128, 64)
self.linear3 = nn.Linear(64, 32)
self.linear4 = nn.Linear(32, 1)
def forward(self, x):
x = self.linear1(x)
x = nn.functional.relu(x)
x = self.linear2(x)
x = nn.functional.relu(x)
x = self.linear3(x)
x = nn.functional.relu(x)
x = self.linear4(x)
return x
# Instantiate model
model = AbaloneModel()
Next, let’s try the model for the primary time using a script that simulates a Random Search.
- Create an error criterion for model evaluation
- Create a listing to carry the info from the very best model and setup the
best_lossas a high value, so it can get replaced by higher loss numbers throughout the iteration. - Setup the range for the educational rate. We are going to use power aspects from -2 to -4 (e.g. from 0.01 to 0.0001).
- Setup a spread for the momentum from 0.9 to 0.99.
- Get the info
- Zero the gradient to clear gradient calculations from previous iterations.
- Fit the model
- Compute the loss and register the very best model’s numbers.
- Compute the weights and biases with the backward pass.
- Iterate N times and print the very best model.
# Mean Squared Error (MSE) is standard for regression
criterion = nn.MSELoss()
# Random Search
values = []
best_loss = 999
for idx in range(1000):
# Randomly sample a learning rate factor between 2 and 4
factor = np.random.uniform(2,5)
lr = 10 ** -factor
# Randomly select a momentum between 0.85 and 0.99
momentum = np.random.uniform(0.90, 0.99)
# 1. Get Data
feature, goal = dataset[:]
# 2. Zero Gradients: Clear old gradients before the backward pass
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
optimizer.zero_grad()
# 3. Forward Pass: Compute prediction
y_pred = model(feature)
# 4. Compute Loss
loss = criterion(y_pred, goal)
# 4.1 Register best Loss
if loss < best_loss:
best_loss = loss
best_lr = lr
best_momentum = momentum
best_idx = idx
# 5. Backward Pass: Compute gradient of the loss w.r.t W and b'
loss.backward()
# 6. Update Parameters: Adjust W and b using the calculated gradients
optimizer.step()
values.append([idx, lr, momentum, loss])
print(f'n: {idx},lr: {lr}, momentum: {momentum}, loss: {loss}')
n: 999,lr: 0.004782946959508322, momentum: 0.9801209929050066, loss: 0.06135804206132889
Once we get the very best learning rate and momentum, we will move on.
# --- 3. Loss Function and Optimizer ---
# Mean Squared Error (MSE) is standard for regression
criterion = nn.MSELoss()
# Stochastic Gradient Descent (SGD) with a small learning rate (lr)
optimizer = optim.SGD(model.parameters(), lr=0.004, momentum=0.98)
Then, we'll re-train this model, using the identical steps as before, but this time keeping the identical learning rate and momentum.
Fitting a PyTorch model needs an extended script than the regular fit() method from Scikit-Learn. Nevertheless it will not be an enormous deal. The structure will at all times be much like these steps:
- Activate the
model.train()mode - Create a loop for the variety of iterations you wish. Each iteration is named an epoch.
- Zero the gradients from previous passes with
optimizer.zero_grad(). - Get the batches from the dataloader.
- Compute the predictions with
model(X) - Calculate the loss using
criterion(y_pred, goal). - Do the Backward Pass to compute the weights and bias:
loss.backward() - Update the weights and bias with
optimizer.step()
We are going to train this model for 1000 epochs (iterations). Here, we're only adding a step to get the very best model at the top, so we be sure to make use of the model with the very best loss.
# 4. Training
torch.manual_seed(42)
NUM_EPOCHS = 1001
loss_history = []
best_loss = 999
# Put model in training mode
model.train()
for epoch in range(NUM_EPOCHS):
for data in dataloader:
# 1. Get Data
feature, goal = data
# 2. Zero Gradients: Clear old gradients before the backward pass
optimizer.zero_grad()
# 3. Forward Pass: Compute prediction
y_pred = model(feature)
# 4. Compute Loss
loss = criterion(y_pred, goal)
loss_history.append(loss)
# Get Best Model
if loss < best_loss:
best_loss = loss
best_model_state = model.state_dict() # save best model
# 5. Backward Pass: Compute gradient of the loss w.r.t W and b'
loss.backward()
# 6. Update Parameters: Adjust W and b using the calculated gradients
optimizer.step()
# Load the very best model before returning predictions
model.load_state_dict(best_model_state)
# Print status every 50 epochs
if epoch % 200 == 0:
print(epoch, loss.item())
print(f'Best Loss: {best_loss}')
0 0.061786893755197525
Best Loss: 0.06033024191856384
200 0.036817338317632675
Best Loss: 0.03243456035852432
400 0.03307393565773964
Best Loss: 0.03077109158039093
600 0.032522525638341904
Best Loss: 0.030613820999860764
800 0.03488151729106903
Best Loss: 0.029514113441109657
1000 0.0369877889752388
Best Loss: 0.029514113441109657
Nice. The model is trained. Now it's time to guage.
Evaluation
Let’s check if this model did higher than the regular regression. For that, I'll put the model in evaluation mode through the use of model.eval(), so PyTorch knows that it needs to alter the behavior from training and get into inference mode. It can turn off layer normalization and dropouts, for instance.
# Get features
features, targets = dataset[:]
# Get Predictions
model.eval()
with torch.no_grad():
predictions = model(features)
# Add to dataframe
df2['Predictions'] = np.exp(predictions.detach().numpy())
# RMSE
print(root_mean_squared_error(df2['Rings'], df2['Predictions']))
2.1108551025390625
The advance was modest, about 4%.
Let’s take a look at some predictions from each model.

Each models are getting very similar results. They struggle more because the variety of Rings becomes higher. That's because of the cone shape of the goal variable.
If we predict that through for a moment:
- Because the variety of Rings increases, there's more variance coming from the explanatory variable.
- An Abalone with 15 rings can be inside a much wider range of values than one other one with 4 rings.
- This confuses the model since it needs to attract a single line in the course of the info that will not be that linear.
Before You Go
We learned loads on this project:
- Tips on how to explore data.
- Tips on how to check if the linear model can be option.
- Tips on how to create a PyTorch model for a multivariable Linear Regression.
In the long run, we saw that a goal variable that will not be homogeneous, even after power transformations, can result in a low-performing model. Our model continues to be higher than shooting the common value for all of the predictions, however the error continues to be high, staying about 20% of the mean value.
We tried to make use of Deep Learning to enhance the result, but all that power was not enough to lower the error considerably. I might probably go together with the Scikit-Learn model, because it is less complicated and more explainable.
Other options to try to enhance the outcomes can be making a custom ensemble model with a Random Forest + Linear Regression. But that may be a task that I leave to you, should you want.
In case you liked this content, find me on my website.
https://gustavorsantos.me
GitHub Repository
The code for this exercise.
https://github.com/gurezende/Linear-Regression-PyTorch
References
[1. Abalone Dataset – UCI Repository, CC BY 4.0 license.] https://archive.ics.uci.edu/dataset/1/abalone
[2. Eval mode] https://stackoverflow.com/questions/60018578/what-does-model-eval-do-in-pytorch
https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval
[3. PyTorch Docs] https://docs.pytorch.org/docs/stable/nn.html
[4. Kaggle Notebook] https://www.kaggle.com/code/samlakhmani/s4e4-deeplearning-with-oof-strategy
[5. GitHub Repo] https://github.com/gurezende/Linear-Regression-PyTorch
