learns machine learning often starts with linear regression, not simply because it’s easy, but since it introduces us to the important thing concepts that we use in neural networks and deep learning.
We already know that linear regression is used to predict continuous values.
Now now we have this data, we’d like to construct an easy linear regression model to predict price of the home using its size.
We generally use python to implement this algorithm.
Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# 1. Data
X = np.array([1, 2, 3]).reshape(-1, 1)
y = np.array([11, 12, 19])
# 2. Train the scikit-learn model
model = LinearRegression()
model.fit(X, y)
# 3. Extract the parameters and predictions
intercept = model.intercept_
slope = model.coef_[0]
y_pred = model.predict(X)
errors = y - y_pred
print("--- Scikit-Learn Results ---")
print(f"Intercept (Beta 0): {intercept:.0f}")
print(f"Slope (Beta 1): {slope:.0f}")
print(f"Predictions: {y_pred}")
print(f"Errors (Residuals): {errors}")
# 4. Create the 2D Scatterplot
plt.figure(figsize=(8, 6))
# Plot the actual data points
plt.scatter(X, y, color='blue', s=100, label='Actual Data (y)')
# Plot the scikit-learn line of best fit
plt.plot(X, y_pred, color='red', linewidth=2, label='scikit-learn Best Fit Line')
# Draw the vertical residual lines (errors)
for i in range(len(X)):
plt.plot([X[i][0], X[i][0]], [y[i], y_pred[i]], color='green', linestyle='--', linewidth=2)
plt.text(X[i][0] + 0.05, (y[i] + y_pred[i])/2, f'e={errors[i]:.0f}', color='green', fontsize=12)
plt.xlabel('Size (x, in 1000 sq ft)')
plt.ylabel('Price (y, in $100k)')
plt.title('scikit-learn Easy Linear Regression')
plt.legend()
plt.grid(True, linestyle=':', alpha=0.7)
# Display the plot
plt.show()
Then we get these values:

Prepare!
This time we’re going to take a special route to unravel this problem.
Since we’re taking a less explored route, let’s prepare ourselves so we don’t wander away midway through our journey of understanding.
The route we’re going to take is Vector Projection and for that allow’s recollect our basics on vectors.
In this primary part, we’re going to construct our geometric intuition around vectors, dot products, and projections. These are absolutely the fundamentals we’d like to learn to grasp linear regression clearly.
Once now we have that foundation down, Part 2 will dive into the precise implementation.
Let’s go.
What’s a Vector?
Let’s return to highschool where we first got introduced to vectors.
One in every of the primary examples we study vectors is Speed vs. Velocity.
This instance tells us that fifty km/hour is the speed, and it’s a scalar quantity since it only has magnitude, whereas 50 km/hour within the east direction is a vector quantity since it has each magnitude and direction.
Now, let’s draw it on a graph.

If we plot the coordinates (2, 4), we consider it some extent in 2D space.
But when we connect the origin to that time with an arrow, we now consider it a vector since it has each magnitude and direction.
We will consider (2, 4) as a set of instructions: it says to take 2 steps to the suitable along the x-axis, after which 4 steps up parallel to the y-axis
The way in which it points gives us the direction.
The length of an arrow gives us the magnitude of the vector.

[
text{From the plot we can observe the formation of a right-angled triangle.}
text{From the Pythagorean theorem we know that}
c = sqrt{a^2 + b^2}
text{For a vector } v = (x,y), text{ the magnitude is}
||v|| = sqrt{x^2 + y^2}
text{Substituting the values of the vector } (2,4)
||v|| = sqrt{2^2 + 4^2}
||v|| = sqrt{4 + 16}
||v|| = sqrt{20}
||v|| approx 4.47 text{ units}
]
Now let’s draw one other vector (6,2) in our graph.

By the vectors, we are able to see that they’re generally pointing up and to the suitable.
They aren’t pointing in the very same direction, but they’re clearly leaning the identical way.
The angle between the vectors is small.
As a substitute of just observing and stating this, we are able to measure how two vectors actually agree with one another. For that, we use the dot product.
From the plot, now we have two vectors:
[
mathbf{A} = (2,4)
]
[
mathbf{B} = (6,2)
]
We already know that we are able to interpret these numbers as movements along the axes.
Vector (A):
[
A = (2,4)
]
means
[
2 text{ units in the } xtext{-direction}
]
[
4 text{ units in the } ytext{-direction}
]
Vector (B):
[
B = (6,2)
]
means
[
6 text{ units in the } xtext{-direction}
]
[
2 text{ units in the } ytext{-direction}
]
To measure how much the 2 vectors agree with one another along each axis, we multiply their corresponding components.
Along the (x)-axis:
[
2 times 6
]
Along the (y)-axis:
[
4 times 2
]
Then we add these contributions together:
[
2 times 6 + 4 times 2
]
[
= 12 + 8
]
[
= 20
]
This operation is named the Dot Product.
On the whole, for 2 vectors
[
mathbf{A} = (a_1,a_2)
]
[
mathbf{B} = (b_1,b_2)
]
the dot product is defined as
[
mathbf{A} cdot mathbf{B} = a_1 b_1 + a_2 b_2
]
We got a dot product of 20, but what does that mean?
Since 20 is a positive number, we are able to observe that the angle between the vectors is lower than 90o.
We can even consider it as a positive relationship between the 2 variables represented by these vectors. This concept will turn into clearer after we begin discussing the Easy Linear Regression problem.
You might have a doubt about how the dot product is said to the angle between two vectors and the way we are able to say that the angle is lower than 90o.
Before understanding this relationship, we’ll have a look at two more cases of the dot product in order that our understanding becomes clearer about what the dot product is definitely measuring. After that, we’ll move on to the angle between vectors.
Now let’s have a look at two more examples to higher understand the dot product.

After we have a look at vectors with a dot product equal to 0, we are able to say that they’re orthogonal, meaning they’re perpendicular to one another. On this case, the vectors haven’t any linear relationship, which corresponds to zero correlation.
Now, after we observe vectors with a negative dot product, we are able to see that the angle is obtuse, which suggests the vectors are pointing in opposite directions, and this represents a negative correlation.
Now, once more, let’s consider the 2 vectors [2,4] and [6,2].

We got the dot product of 20.
Now, there’s one other solution to find the dot product, which involves the lengths of the vectors and the angle between them.
This shows that 20 shouldn’t be a random number; it indicates that the vectors are leaning in the identical direction.
Let the 2 vectors be
[
A = (2,4), qquad B = (6,2)
]
First compute the lengths (magnitudes) of the vectors.
[
|A| = sqrt{2^2 + 4^2}
]
[
|A| = sqrt{4 + 16} = sqrt{20}
]
[
|B| = sqrt{6^2 + 2^2}
]
[
|B| = sqrt{36 + 4} = sqrt{40}
]
Now using the dot product formula
[
A cdot B = |A| |B| cos(theta)
]
From the component formula of the dot product
[
A cdot B = 2 times 6 + 4 times 2
]
[
A cdot B = 12 + 8 = 20
]
Substitute into the angle formula
[
20 = sqrt{20} times sqrt{40} times cos(theta)
]
[
cos(theta) = frac{20}{sqrt{20}sqrt{40}}
]
Now simplify the denominator
[
sqrt{20} = 2sqrt{5}, qquad sqrt{40} = 2sqrt{10}
]
[
sqrt{20}sqrt{40} = (2sqrt{5})(2sqrt{10})
]
[
= 4sqrt{50}
]
[
= 4(5sqrt{2})
]
[
= 20sqrt{2}
]
So,
[
cos(theta) = frac{20}{20sqrt{2}}
]
[
cos(theta) = frac{1}{sqrt{2}}
]
[
theta = 45^circ
]
We get this formula by utilizing the Law of Cosines. That is the geometric way of solving the dot product.
From this equation, we are able to understand that if now we have the lengths of the vectors and the angle between them, we are able to easily find the dot product of the 2 vectors.
Up thus far, we’ve gotten a basic idea of what vectors are, their dot products, and the angles between them.
I do know all the things has been mostly mathematical up thus far. Nevertheless, we at the moment are going to make use of what we’ve learned to debate projections, and things will turn into even clearer after we finally solve an easy linear regression problem.
Vector Projections
Now imagine we’re driving along a highway through a forest, and on our solution to reach the home somewhere deep contained in the forest, far-off from the highway.
Let’s say our home is at a hard and fast point, (2,4). There may be a mud road through the forest that leads directly there, but because of incessant rains, we cannot take that route.
Now now we have an alternative choice which is highway through the forest, which runs within the direction of (6,2).
Now what now we have to do is travel along the highway, park the automobile alongside the highway, after which take the bags and walk to your own home.
We’re carrying heavy luggage, and we don’t need to walk much. So, we’d like to stop your automobile at the purpose on the highway where the walking distance to our house is the shortest.

The query now could be: How far do we’d like to travel along that highway (the [6, 2] direction) to get as close as possible to our home at (2, 4)?

Now, by the visual above, let’s see what we are able to observe.
If we stop our automobile at point A, it is just too early; we are able to see that the red line connecting to our house is a protracted walk.
Next, if we stop the automobile at point C, now we have already gone past our home, so we’d like to show back, and this can be a protracted walk.
We will observe that time B is the perfect spot to stop the automobile because our walk home forms an ideal 90o angle with the highway.
We’d like to search out the precise point on the highway to park our automobile.
Let’s start from the origin and first find the direct distance to our home, which is positioned at (2, 4).
In linear algebra, this distance is just the length of the vector. Here, the length is , which is 4.47. We will say that our house is 4.47 kilometers from the origin within the direction of (2, 4).
But we cannot take that direct route due to rain; it’s a muddy, unpaved road. We only have one option: drive along the highway within the direction of (6, 2).

We have now a highway pointing within the direction of (6, 2), which we call a vector.
On this highway, we are able to only move forward or backward and that is the one dimension now we have.
Every point we are able to possibly reach makes up the span of the vector.
It’s important to grasp that the highway is an infinite road. It doesn’t actually start at (0, 0); we are only starting our journey from that specific point in the midst of it.
To reduce our walk through the mud, we’d like to search out the spot on the highway closest to our home, which is all the time a perpendicular path.
To know our driving distance along the highway, let’s consider a milestone signpost on our highway at (6, 2) which we use as a reference for direction on the highway.
If we calculate the physical distance from our start line (0, 0) to this signpost, the length is , which is 6.32. So, our reference signpost is strictly 6.32 km down the road.
There are several ways to search out our exact parking point. First, if we have a look at any two known points on the highway like our start at (0, 0) and our signpost at (6, 2), we are able to calculate the slope of the road:
$$
m = frac{y_2 – y_1}{x_2 – x_1}
$$
$$
m = frac{2 – 0}{6 – 0}
$$
$$
m = frac{2}{6} = frac{1}{3}
$$
A slope of 1/3 signifies that for each 3 units of increase in x, there’s 1 unit increase in y. Because every point on the highway vector follows this exact rule, we are able to write the equation for our road as:
$$
y = frac{1}{3}x
$$
This implies every point on the highway including the parking point have the coordinates .
We just need to search out the proper that minimizes the walk between our automobile at () and our home at (2, 4).
To avoid coping with square roots in our calculus and to make the calculation easier, let’s minimize the squared distance.
We wish to attenuate the squared distance, f(x), between our automobile at () and our home at (2, 4). The squared distance formula is:
The squared distance formula:
$$
f(x) = (x_2 – x_1)^2 + (y_2 – y_1)^2
$$
$$
f(x) = (x – 2)^2 + left(frac{1}{3}x – 4right)^2
$$
Expanding the binomials:
$$
f(x) = (x^2 – 4x + 4) + left(frac{1}{9}x^2 – frac{8}{3}x + 16right)
$$
Grouping the terms together:
$$
f(x) = left(1 + frac{1}{9}right)x^2 – left(4 + frac{8}{3}right)x + (4 + 16)
$$
$$
f(x) = frac{10}{9}x^2 – left(frac{12}{3} + frac{8}{3}right)x + 20
$$
$$
f(x) = frac{10}{9}x^2 – frac{20}{3}x + 20
$$
If we graph the error function f(x), it forms a U-shaped parabola.
To seek out the minimum point, we take the derivative and set it to zero.
$$
f'(x) = frac{d}{dx}left(frac{10}{9}x^2 – frac{20}{3}x + 20right)
$$
$$
f'(x) = frac{20}{9}x – frac{20}{3}
$$
Setting the derivative equal to zero:
$$
frac{20}{9}x – frac{20}{3} = 0
$$
$$
frac{20}{9}x = frac{20}{3}
$$
$$
x = frac{20}{3} times frac{9}{20}
$$
$$
x = 3
$$
Plug x=3 back into the highway equation:
$$
y = frac{1}{3}x
$$
$$
y = frac{1}{3}(3) = 1
$$
The proper parking spot is (3,1).
Through the use of the calculus method, we found the parking spot at (3, 1). If we compare this to our milestone signpost at (6, 2), we are able to observe that the parking point is strictly half the space to the signpost.
Because of this if we drive halfway to the signpost, we reach the precise point where we are able to park and take the shortest path to home.
[
begin{gathered}
text{Our Parking Spot: } mathbf{P} = (3, 1)
text{Our Signpost: } mathbf{V} = (6, 2)
text{Relationship:}
(3, 1) = 0.5 times (6, 2)
text{Therefore, our optimal multiplier } c text{ is } 0.5.
end{gathered}
]
This 0.5 is strictly what we discover in linear regression. We’ll get a fair clearer idea of this after we apply these concepts to unravel a real-world regression problem.
From the plot, we are able to say that the vector from the origin (0,0) to the parking point (3,1) is the projection of the house vector onto the highway, whose length is
$$
text{Driving Distance} = sqrt{x^2 + y^2}
$$
$$
= sqrt{3^2 + 1^2}
$$
$$
= sqrt{9 + 1}
$$
$$
= sqrt{10}
$$
$$
approx 3.16 text{ km}
$$
That is how we calculate the Vector Projections.
Now, we even have a shortcut to search out the parking point.
Earlier, we calculated the dot product of those two vectors, which is 20.
Now, let’s multiply the length of the projection vector by the length of the highway vector (from the origin to the signpost).
3.16 times 6.32, which also equals 20. From this, we are able to understand that the dot product gives us the length of the projection multiplied by the length of the highway.
We have now a dot product of 20, and the squared length of the highway is 40. We’re using the squared length since the dot product itself has squared units; after we multiply a1b1 and a2b2 and add them, the units also get multiplied.
Now, if we divide the dot product by the squared length (20 / 40), we get 0.5. We call this the scaling factor.
Because we wish to search out the precise point along the highway, we scale the highway vector by 0.5, which provides us (3, 1).
In vector vocabulary, we call the highway the bottom vector and the house vector the goal vector.
And that’s how we get our parking point at (3, 1).
What we discussed thus far will be expressed using an easy mathematical formula called the projection formula.
$$
text{proj}_{mathbf{b}}(mathbf{a}) =
frac{mathbf{a}cdotmathbf{b}}{|mathbf{b}|^2}mathbf{b}
$$
Let
$$
A = (2,4), quad B = (6,2)
$$
First compute the dot product.
$$
Acdot B = 2times6 + 4times2
$$
$$
Acdot B = 12 + 8
$$
$$
Acdot B = 20
$$
Now compute the squared length of the highway vector.
$$
|B|^2 = 6^2 + 2^2
$$
$$
|B|^2 = 36 + 4
$$
$$
|B|^2 = 40
$$
Now divide the dot product by the squared length.
$$
frac{Acdot B}B = frac{20}{40}
$$
$$
= 0.5
$$
This value is the scaling factor.
Now scale the highway vector.
$$
text{proj}_{B}(A) = 0.5(6,2)
$$
$$
= (3,1)
$$
So the projection point (the parking point) is
$$
(3,1)
$$
Now, what can we are saying about this 3.16 km distance along the highway?
Let’s say, for instance, we take the direct mud route and ignore the highway. As we move along our path to home, we are literally traveling in two directions concurrently: parallel to the highway and sideways toward our home.
By the point we finally reach our home, now we have effectively traveled 3.16 km within the direction of the highway.
However, what if we travel along the highway? If we drive exactly 3.16 km along the highway, then we reach our parking point at (3, 1).
This specific point is where the trail to our house is perfectly perpendicular to the highway.
Most significantly, this implies it represents absolutely the shortest walking path from the highway to our home!
I hope you’re walking away with intuitive understanding of vectors, dot products, and projections!
In Part 2, we’ll take exactly what we learned today and use it to unravel an actual linear regression problem.
If anything on this post felt unclear, be at liberty to comment.
Meanwhile, I recently wrote a deep dive on the Chi-Square test. If you happen to have an interest, you’ll be able to read it here.
Thanks a lot for reading!
