## 4 other ways to have a look at it

Data Science is the sector that extracts knowledge and insights from structured or unstructured data using scientific computing methods, processes and algorithms.

The info — be it structured (reminiscent of data tables) or unstructured (reminiscent of images) — are represented as matrices.

The operations this data undergo, as processed by computing processes reminiscent of machine learning models, predominantly involve matrix multiplications.

So, a deeper insight into matrix multiplication operation advantages those pursuing data science and machine learning fields.

The multiplication of two matrices will be perceived in 4 other ways:

- Dot Products of Rows and Columns
- Linear Combination of Columns
- Linear Combination of Rows
- Sum of Rank-1 Matrices

Let *A* and *B* be the matrices being multiplied as *AB. *Let the scale of *A* and *B* be

and *m*x*p*

, respectively. We all know that to find a way to multiply A and B, the variety of columns in A should match the variety of rows in *p*x*n**B.*

Allow us to consider the instance dimensions below for simplicity and without lack of generality.

And as we all know, *AB* below is the product of the 2 matrices, *A* and *B*.

In matrix *AB,* `element-11`

* *will be seen because the dot product of `row-1`

from *A* and `column-1`

from *B.*

Similarly*,* `element-12`

* *will be seen because the dot product of `row-1`

from *A* and `column-2`

from *B.*

`Element-`

in *ij**AB* is the dot product of `row-`

*i** *from *A* and `column-`

*j** *from *B.*

To form the column perspective of Matrix Multiplication, reorganize the matrix *AB* as below.

Column-1 of *AB* will be seen because the sum of

times *b11*`column-1`

of *A* and

times *b21*`column-2`

. That’s, `column-1`

of *AB* is the linear combination (weighted sum) of the columns of *A,* where the weights of the mix are the weather of `column-1`

of *B.*

Similarly, `column-2`

of *AB* is the linear combination of the columns of *A*, where the weights of the mix are the weather of `column-2`

of *A*.

Each column in *AB* is a linear combination of columns of *A*, where the weights of the mix are the weather of the corresponding column in *B*.

Now, allow us to have a look at *AB* from the row perspective by rewriting it as below.

Row-1 of *AB* will be seen because the sum of

times *a11*`row-1`

of *B* and

times *a12*`row-2`

. That’s, `row-1`

of *AB* is the linear combination (weighted sum) of the rows of *B,* where the weights of the mix are the weather of `row-1`

of *A.*

Similarly, `row-2`

of *AB* is the linear combination of the rows of *B*, where the weights of the mix are the weather of `row-2`

of *A*.

Each row in *AB* is a linear combination of rows of *B*, where the weights of the mix are the weather of the corresponding row in *A*.

Rewriting *AB* as below gives us two rank-1 matrices, each with the scale same as that of *AB*.

It is obvious that the above two matrices are of rank-1 since their rows (and columns) are all linearly dependent, i.e., all the opposite rows (columns) are a multiple of 1 row (column). Hence rank-1.

Matrix *AB* is a sum of *p* rank-1 matrices of size *m*x*n*, where the

matrix (amongst *i_*th*p) *is the results of multiplying `column-`

of *i**A* with the `row-`

*i** *of *B*.

These different perspectives find their relevance on different occasions.

For instance, within the *Attention *mechanism in Transformer neural network architecture, Attention matrix calculation will be seen as a matrix multiplication from the ‘dot product of rows and columns’ perspective.

More about Attention mechanism and Transformer will be present in the below article.

I hope these perspectives on matrix multiplication enable readers to realize a more intuitive understanding of knowledge flow in machine learning and data science algorithms and models.