are sometimes presented as black boxes.
Layers, activations, gradients, backpropagation… it may feel overwhelming, especially when every thing is hidden behind model.fit().
We are going to construct a neural network regressor from scratch using Excel. Every computation can be explicit. Every intermediate value can be visible. Nothing can be hidden.
By the tip of this text, you’ll understand how a neural network performs regression, how forward propagation works, and the way the model can approximate non-linear functions using just a number of parameters.
Before starting, if you’ve gotten not already read my previous articles, it is best to first have a look at the implementation of linear regression and logistic regression.
You will notice that a neural network just isn’t a brand new object. It’s a natural extension of those models.
As usual, we are going to follow these steps:
- First, we are going to have a look at how the model of a Neural Network Regressor works. Within the case of neural networks, this step is named forward propagation.
- Then we are going to train this function using gradient descent. This process is named backpropagation.
1. Forward propagation
On this part, we are going to define our model, then implement it in Excel to see how the prediction works.
1.1 A Easy Dataset
We are going to use a quite simple dataset that I generated. It consists of just 12 observations and a single feature.
As you possibly can see, the goal variable has a nonlinear relationship with x.
And for this dataset, we are going to use two neurons within the hidden layer.
1.2 Neural Network Structure
Our example neural network has:
- One input layer with the feature x as input
- One hidden layer with two neurons within the hidden layer, and these two neurons will allow us to create a nonlinear relationship
- The output layer is only a linear regression
Here is the diagram that represents this neural network, together with all of the parameters that have to be estimated. There are a complete of seven parameters.
Hidden layer:
- a11: weight from x to hidden neuron 1
- b11: bias of hidden neuron 1
- a12: weight from x to hidden neuron 2
- b12: bias of hidden neuron 2
Output layer:
- a21: weight from hidden neuron 1 to output
- a22: weight from hidden neuron 2 to output
- b2: output bias
At its core, a neural network is only a function. A composed function.
In case you write it explicitly, there may be nothing mysterious about it.

We often represent this function with a diagram fabricated from “neurons”.
For my part, the most effective option to interpret this diagram is as a visual representation of a composed mathematical function, not as a claim that it literally reproduces how biological neurons work.

Why does this function work?
Each sigmoid behaves like a smooth step.
With two sigmoids, the model can increase, decrease, bend, and flatten the output curve.
By combining them linearly, the network can approximate smooth non-linear curves.
Because of this for this dataset, two neurons are already enough. But would you have the option to seek out a dataset for which this structure just isn’t suitable?
1.3 Implementation of the function in Excel
On this section, we are going to suppose that the 7 coefficients are already found. And we are able to then implement the formula we saw just before.
To visualise the neural network, we are able to use latest continuous values of x starting from -2 to 2 with a step of 0.02.
Here is the screenshot, and we are able to see that the ultimate function suits the form of the input data quite well.

2. Backpropagation (Gradient descent)
At this point, the model is fully defined.
Because it is a regression problem, we are going to use the MSE (mean squared error), similar to for a linear regression.
Now, we’ve to seek out the 7 parameters that minimize the MSE.
2.1 Details of the backpropagation algorithm
The principle is straightforward. BUT, since there are lots of composed functions and lots of parameters, we’ve to be organized with the derivatives.
I won’t derive all of the 7 partial derivatives explicitly. I’ll just give the outcomes.

As we are able to see, there may be the error term. So with a view to implement the entire process, we’ve to follow this loop:
- initialize the weights,
- compute the output (forward propagation),
- compute the error,
- compute gradients using partial derivatives,
- update the weights,
- repeat until convergence.
2.2 Initialization
Let’s start by putting the input dataset in a column format, which can make it easier to implement the formulas in Excel.

In theory, we are able to begin with random values for the initialization of the values of the parameters. But in practice, the variety of iterations may be large to realize full convergence. And since the fee function just isn’t convex, we are able to get stuck in a neighborhood minimum.
So we’ve to decide on “properly” the initial values. I actually have prepared some for you. You possibly can make small changes to see what happens.

2.3 Forward propagation
Within the columns from AG to BP, we perform the forward propagation phase. We compute A1 and A2 first, followed by the output. These are the identical formulas utilized in the sooner a part of the forward propagation.
To simplify the computations and make them more manageable, we perform the calculations for every statement individually. This implies we’ve 12 columns for every hidden layer (A1 and A2) and the output layer. As a substitute of using a summation formula, we calculate the values for every statement individually.
To facilitate the for loop process throughout the gradient descent phase, we organize the training dataset in columns, and we are able to then extend the formula in Excel by row.

2.4 Errors and the Cost function
In columns BQ to CN, we are able to now compute the values of the fee function.

2.5 Partial derivatives
We can be computing 7 partial derivatives corresponding to the weights of our neural network. For every of those partial derivatives, we are going to have to compute the values for all 12 observations, leading to a complete of 84 columns. Nevertheless, we’ve made efforts to simplify this process by organizing the sheet with color coding and formulas for ease of use.

So we are going to begin with the output layer, for the parameters: a21, a22 and b2. We will find them within the columns from CO to DX.

Then for the parameters a11 and a12, we are able to find them from columns DY to EV:

And eventually, for the bias parameters b11 and b12, we use columns EW to FT.

And to wrap it up, we sum all of the partial derivatives across the 12 observations. These aggregated gradients are neatly arranged in columns Z to AF. The parameter updates are then performed in columns R to X, using these values.

2.6 Visualization of the convergence
To raised understand the training process, we visualize how the parameters evolve during gradient descent using a graph. At the identical time, the decrease of the fee function is tracked in column Y, making the convergence of the model clearly visible.

Conclusion
A neural network regressor just isn’t magic.
It is just a composition of elementary functions, controlled by a certain variety of parameters and trained by minimizing a well-defined mathematical objective.
By constructing the model explicitly in Excel, every step becomes visible. Forward propagation, error computation, partial derivatives, and parameter updates aren’t any longer abstract concepts, but concrete calculations that you would be able to inspect and modify.
The complete implementation of our neural network, from forward propagation to backpropagation, is now complete. You’re encouraged to experiment by changing the dataset, the initial parameter values, or the educational rate, and observe how the model behaves during training.
Through this hands-on exercise, we’ve seen how gradients drive learning, how parameters are updated iteratively, and the way a neural network regularly shapes itself to suit the info. This is strictly what happens inside modern machine learning libraries, only hidden behind a number of lines of code.
When you understand it this manner, neural networks stop being black boxes.
