Yes, You read it right. There may be already a lot open-source content available over the web on this topic but I still select to put in writing this text. The one reason behind that is I still feel something is missing. What if I would like to know find out how to perform easy linear regression in python without using the machine learning library, and what if I also wish to learn the way it’s internally working?

In Easy words, everyone knows find out how to make Meggie 🍝 in 2 min but we do not know what’s there in that masala which makes it unique. Frankly speaking, nowadays nobody cares about what’s there in that “Meggie Masala” 🍝 it’s just how good you’ll be able to make a Meggie using masala available. So here in this text will do each, first will learn find out how to make Meggie (How one can apply easy linear regression using python code) can even learn the way this masala is made(Math behind Linear regression)

So principally will divide this post into three sections.

  1. Introduction
  2. How one can apply SLR in python using sklearn library & How we are able to calculate using easy python code manually.
  3. The mathematics behind the m(slope/coefficient) & c(constant/Intercept) values.


So for instance someone gave you past data of employees having two columns, the primary is salary, and the opposite is years of experience. And your job is to develop some system that takes years of experience as input and predicts the salary. Now the query is the way you will do it. As an information scientist what you suppose to do is to coach the model based on past data which accepts the years of experience as input and provides the salary as output. Now the query is find out how to train the model. How can I solve this problem? How some ways I can solve this problem?

There are numerous ways to unravel this problem. Below are the names of the most well-liked methods.

  1. Least Square Method
  2. Gradient Descent Method
  3. Adams Method
  4. Singular Value Decomposition Method

On this post will study about First Method which is the least square method.

Let’s consider you bought the information of salary and years of experience. Once you plot a traditional scatter plot it is going to look something just like the below image.

Figure 1

Now what linear regression does is, it tries to seek out out the linear relation between the goal variable(salary) and independent variable (Yrs of Experience)

Because the name suggests so as to predict the salary we’re depending on years of experience so salary is our dependent variable and years of experience is an independent variable. Note that there could be multiple independent variables, but as of now, we’re keeping just one independent variable.

So what linear regression does is it creates some line which is passing through the purpose as shown in Fig 2. Its goal just isn’t simply to create some line nevertheless it has to create a line such that it passes through the utmost variety of points.

Figure 2

As you’ll be able to see in Fig 2 we have now created a line. Now we have studied at school in regards to the line equation right? Let me write it again. Fig 3 is the equation of the road.

Figure 3

Now let’s write the identical equation for our example.

Figure 4

So principally if we’re capable of find the worth of m and c using past data. we’ll give you the option to predict the salary by taking years of experience as input. And if the expected salary is the actual salary then our model is correct.

Now before moving ahead let’s understand two vital terms. Loss Function and price Function.

What’s loss function? In easy terms, it’s the difference between the expected value and the unique value for a single row/entry. And value function is the typical loss over all of the points/dataset.

Figure 5

But wait how these two terms are relevant to this problem? So ultimately our objective is to calculate m & c value keeping the associated fee function as minimal as we are able to.

Now consider that by minimizing the associated fee function we got the equation of m & c as shown within the below image. The later a part of the article will explore how this value got here,but as of now consider that by some means we got the equation of m and c.

As shown in Figure let’s consider we got some data ( x & y). And taking X as an input we have now to predict Y. So for that we have now to calculate the m & c value. And we already got the equation for that as shown in below Figure 6.

Figure 6


So here on this section before starting the python code will calculate m and c by a straightforward note pen calculation as shown in below Figure 7,8.

Figure 7
Figure 8

Now May even do the identical using python code with the assistance of obtainable library of python (Sklearn) and without using it. Figure 9 shows the identical calculation using sklearn library.

Figure 9

Figure 10 is python code for the Figure 7 & 8 calculation which we did using note pen method, but here we have now using python inbuilt functions like std(For traditional deviation),corrcoef(For correlation), and mean(For mean value).

Figure 10

As you’ll be able to see the m=2 and c=4.5 for using both-way calculation



Please enter your comment!
Please enter your name here