## Master elements of linear algebra, start with easy and visual explanations of basic concepts

Often the important difficulty one faces when one desires to begin one’s journey into the world of machine learning is having to grasp math concepts. Sometimes this could be difficult in case you should not have a solid background in subjects akin to linear algebra, statistics, probability, optimization theory, or others. 🤔💭🔢✖️🧮

In this text then, I would really like to start out by giving** intuitive explanations of basic linear algebra concepts **which can be essential before delving into the world of Machine Learning. Obviously, this text just isn’t meant to be exhaustive there may be lots to find out about this subject, but perhaps it could possibly be a primary approach to tackling this subject!

- Introduction
- What’s a vector?
- Easy Vector Operations
- Projections
- Basis, Vector Space and Linear Indipendence
- Matrices and Solving Equations

## Introduction

**Why Linear Algebra is significant for Data Science?**

**Linear algebra **allows us to **solve real-life problems**, especially problems which can be quite common in data science.

Assume we go to the market to purchase 3 avocados, and 4 broccoli and pay $8. The following day we buy 11 avocados and a couple of broccoli and pay $12.

Now we would like to learn how much a single avocado and a single broccoli cost. We’ve to unravel the next expressions concurrently.

One other typical problem is to **find one of the best parameters of a function** for it **to suit the info we’ve collected**. So suppose we already know what type of function we want to make use of, but this **function can change its form because it is determined by some parameters**. We wish to **find one of the best form and subsequently one of the best parameters**.

Let’s for instance call *µ** = param1 *and *θ** = param2*.

Often, in Machine Learning, we would like to **iteratively update bot [µ, θ]** to seek out at the tip some good curve that matches our data.

Let’s say that **a curve far-off from the optimal green curve has a high error**, while **a curve much like the green one has a low error**. We often say that we would like to seek out those parameters [µ, θ] with a view to **minimize the error**, so find the curve which is as closest as possible to the green one.

Let’s see how linear algebra may also help us with these problems!

## What’s a vector?

A **vector **in physics is a **mathematical entity **that has a direction an indication and a magnitude. So it is often represented visually with an arrow.

Often in **computer science, the concept of vector is generalized**. Actually, you’ll hear again and again the term list as a substitute of vector. On this conception, the vector is nothing greater than a **list of properties **that we will use to represent anything.

Suppose we would like to represent houses in response to 3 of their properties:

1. The variety of rooms

2. The variety of bathrooms

3. Square meters

For instance, within the image above we’ve two vectors. The primary represents a house with 4 bedrooms, 2 bathrooms and 85 square meters. The second, then again, represents a house with 3 rooms, 1 bathroom and 60 square meters.

In fact, if we’re thinking about other properties of the home we will create a for much longer vector. On this case, we’ll say that the vector as a substitute of getting 3 dimensions may have *n dimensions.*** In machine learning, we will often have lots of or hundreds of dimensions**!

## Easy Vector Operations

There are operations we will perform with vectors, the best of that are definitely addition between two vectors, and multiplication of a vector by a **scalar **(**i.e., an easy number**).

To **add 2 vectors you should utilize the parallelogram rule**. That’s, you draw vectors parallel to those we would like so as to add after which draw the diagonal. The diagonal will likely be the resulting vector of the addition. Imagine me, it is way easier to grasp this by looking directly at the next example.

While **multiplication by a scalar stretches the vector by n units**. See the next example.

## Modulus and Inner Product

**A vector is definitely all the time expressed by way of other vectors**. For instance, allow us to take as** reference vectors, two vectors i and j each with length 1 and orthogonal to one another**.

Now we define a recent vector *r*, which **starts from the origin**, that’s, from the purpose where *i* and *j* meet, and which is *a* times longer than *i,* and *b* times longer than j.

More commonly **we consult with a vector using its coordinates r = [a,b]**, in this fashion we will discover various vectors in a **vector space**.

Now we’re able to define a recent operation, the **modulus of a vector**,** that’s, its length **could be derived from its coordinates and is defined as follows.

The **Inner Product **then again is one other operation with which given two vectors, it multiplies all their components and returns the sum.

The inner product has some properties which may be useful in some cases :

- commutative :
*r*s = s*r* - distributive over addition :
*r*(s*t) = r*s + r*t* - associative over scalar multiplication:
*r*(a*s) = a*(r*s) where a is a scalar*

Notice that in case you compute the inner product of a vector per itself, you’re going to get its modulus squared!

## Cosine (dot) Product

To this point we’ve only seen a mathematical definition of the inner product based on the coordinates of vectors. Now allow us to **see a geometrical interpretation **of it. Allow us to create 3 vectors *r, s* and their difference *r-s*, in order to form a triangle with 3 sides *a,b,c*.

We all know from our highschool days that **we will derive c using an easy rule of trigonometry**.

But then we will derive from the above that:

So the comprised angle has a powerful effect on the results of this operation. Actually in some special cases where the angle is 0°, 90°, and 180° we may have that the cosine will likely be 0,1,-1 respectively. And so we may have computer graphics on this operation. So for instance,** 2 vectors which can be 90 degrees to one another will all the time have a dot product = 0**.

## Projection

Let’s consider two vectors r and s. These two vectors are close to one another from one side and make an angle *θ* in between them. **Let’s put a torch on top of s, and we’ll see a shadow of s on r**.

**That’s the projection of**

*s*on*r*.There are 2 basics projection operations:

**Scalar Projection**: gives us the magnitude of the projection**Vector Projections**: gives us the projection vector itself

## Changing Basis

Changing basis in linear algebra refers back to the **strategy of expressing a vector in a unique set of coordinates**, **called a basis**. A **basis is a set of linearly independent vectors that could be used to precise any vector in a vector space.** **When a vector is expressed on a unique basis, its coordinates change.**

We’ve seen, for instance, that in two dimensions each vector could be represented as a sum of two basis vectors [0,1] and [1,0]. These two vectors are the idea of our space. But **can we use two other vectors as the idea and never just these two? Definitely but on this case the coordinates of every vector in our space will change**. Let’s see how.

Within the image above, I even have two bases. The bottom (e1, e2), and the bottom (b1,b2). As well as, I even have a vector r (in red). This vector has coordinates [3,4] when expressed by way of (e1,e2) which is the bottom we’ve all the time utilized by default. But how do its coordinates change into when expressed by way of (b1,b2)?

To seek out these coordinates we want to go by steps. First, we want to seek out the projections of the vector r onto the vectors of the brand new base (b1,b2).

It’s easy to see that the sum of those projections we created is just r.

r = p1 + p2.

Moreover, with a view to change the idea, **I even have to envision that the brand new basis can be orthogonal**, meaning that the vectors are at 90 degrees to one another, so that they can define the entire space.

**To envision this just see if the cosine of the angle is 0 which suggests an angle of 90 degrees.**

Now we go on to **calculate the vector projections of r on the vectors (b1,b2)**, with the formula we saw within the previous chapter.

The worth circled in red within the vector projection will give us the coordinate of the brand new vector r expressed in base b : (b1,b2) as a substitute of e : (e1,e2).

To envision that the calculations are right we want to envision that the sum of the projections is just r in base e:(e1,e2).

**[4,2] + [-1,2] = [3,4]**

## Basis, Vector Space and Linear Indipendence

We’ve already seen and talked about basis. But let’s define more precisely what a vector basis is in a vector space.

**A basis is a set of n vectors** that:

**are usually not linear mixtures of one another**(linearly independent)**span the space**: the space is n-dimensional

The primary point signifies that if, for instance, I even have 3 vectors *a,b,c *forming a basis, meaning there isn’t a strategy to add these vectors together and multiply them by scalars and get zero!

If I denote by *x y *and *z* any three scalars (two numbers), it signifies that :

*xa + yb +zc != 0*

(obviously excluding the trivial case where x = y = z = 0). On this case, we’ll say that the vectors are linearly independent.

This implies, for instance, that **there isn’t a strategy to multiply by scalars and add a and b together to get c**. It signifies that

**if**.

*a*and*b*lie in space in two dimensions c lies in a 3rd dimension as a substituteWhile the second point signifies that I can multiply these vectors by scalars and sum them together to get any possible vectors in a third-dimensional space. **So these 3 basis vectors are enough for me to define the entire space of dimension n=3**.

## Matrices and solving simultaneous equations

By now you have to be pretty good at handling vectors and doing operations with them. But what are they used for in real life? We saw at first that one in every of our goals was to unravel multiple equations together concurrently, for instance, to determine the costs of vegetables on the supermarket.

But now that we all know the vectors we will rewrite these equations in an easier way. We put the vectors of coefficients [2,10] and [3,1] next to one another in forming a matrix (set of vectors). Then we may have the vector of unknowns [a,b] and at last the result [8,3].

Now you could ask whether this recent type of writing the issue is absolutely higher or not. **How do you do multiplication between a matrix and a vector?** It is extremely easy. Just multiply each row of the matrix by the vector. In case we had a multiplication between two matrices we’d must multiply each row of the primary matrix by each column of the second matrix.

**So by applying this rule rows by columns we must always regain the unique shape.**

**This way, nonetheless, has other benefits as well. It gives us a geometrical interpretation of what is occurring. Every matrix defines a change in space. So if I even have a degree in an area and I apply a matrix, my point will move indirectly.**

But then we also can say that** a matrix is nothing greater than a function that takes a degree as input and generates a recent one as output.**

So our initial problem could be interpreted as follows, “What’s the unique vector [a,b] on which the transformation leads to [8,3]?”

In this fashion, **you’ll be able to take into consideration solving simultaneous equations as transformations over vectors in a vector space**. Plus operations with matrices have the next properties that could be very useful.

Given A(r) = r2 where A is a matrix and r, r2 are each scalar:

*A(nr) = ns*where*n*is a scalar*A(r+s) = A(r) + A(s)*where*s*is a vector

## Matrices and space transformations

To grasp the results of a matrix then we will see how they transform the vectors to which they’re applied. Specifically, we would see what’s the impact of a matrix when applied on the eigenbasis.

If we’ve a 2×2 matrix and we’re in an area in two dimensions, the primary column of the matrix will tell us what the effect will likely be on the vector e1 = [1,0] and the second column as a substitute will tell us what the effect will likely be on the vector e1 = [0,2].

We then see the effect of some known matrices. These transformations are sometimes useful in Machine Learning for data augmentation on images, you’ll be able to stretch or shrink those images for instance.

**We also can apply multiple consecutive transformations to a vector**. So if we’ve two transformations represented by the matrices A1 and A2 we will apply them consecutively A2(A1(vector)).

But that is different from applying them inversely i.e. A1(A2(vector)). That’s the reason** the product between matrices doesn’t benefit from the commutative property.**

In this primary a part of my articles on linear algebra, you need to have understood why this subject is so essential for Machine Learning and maybe you have got learned basic concepts quickly and intuitively.

what a vector and a matrix are, easy methods to represent these entities in a vector space and easy methods to do operations with these elements. Follow along so that you don’t miss the continuation of this text! 😊

*Marcello Politi*

relax