4 other ways to have a look at it
Data Science is the sector that extracts knowledge and insights from structured or unstructured data using scientific computing methods, processes and algorithms.
The info — be it structured (reminiscent of data tables) or unstructured (reminiscent of images) — are represented as matrices.
The operations this data undergo, as processed by computing processes reminiscent of machine learning models, predominantly involve matrix multiplications.
So, a deeper insight into matrix multiplication operation advantages those pursuing data science and machine learning fields.
The multiplication of two matrices will be perceived in 4 other ways:
- Dot Products of Rows and Columns
- Linear Combination of Columns
- Linear Combination of Rows
- Sum of Rank-1 Matrices
Let A and B be the matrices being multiplied as AB. Let the scale of A and B be mxp
and pxn
, respectively. We all know that to find a way to multiply A and B, the variety of columns in A should match the variety of rows in B.
Allow us to consider the instance dimensions below for simplicity and without lack of generality.
And as we all know, AB below is the product of the 2 matrices, A and B.
In matrix AB, element-11
will be seen because the dot product of row-1
from A and column-1
from B.
Similarly, element-12
will be seen because the dot product of row-1
from A and column-2
from B.
Element-ij
in AB is the dot product of row-i
from A and column-j
from B.
To form the column perspective of Matrix Multiplication, reorganize the matrix AB as below.
Column-1 of AB will be seen because the sum of b11
times column-1
of A and b21
times column-2
. That’s, column-1
of AB is the linear combination (weighted sum) of the columns of A, where the weights of the mix are the weather of column-1
of B.
Similarly, column-2
of AB is the linear combination of the columns of A, where the weights of the mix are the weather of column-2
of A.
Each column in AB is a linear combination of columns of A, where the weights of the mix are the weather of the corresponding column in B.
Now, allow us to have a look at AB from the row perspective by rewriting it as below.
Row-1 of AB will be seen because the sum of a11
times row-1
of B and a12
times row-2
. That’s, row-1
of AB is the linear combination (weighted sum) of the rows of B, where the weights of the mix are the weather of row-1
of A.
Similarly, row-2
of AB is the linear combination of the rows of B, where the weights of the mix are the weather of row-2
of A.
Each row in AB is a linear combination of rows of B, where the weights of the mix are the weather of the corresponding row in A.
Rewriting AB as below gives us two rank-1 matrices, each with the scale same as that of AB.
It is obvious that the above two matrices are of rank-1 since their rows (and columns) are all linearly dependent, i.e., all the opposite rows (columns) are a multiple of 1 row (column). Hence rank-1.
Matrix AB is a sum of p rank-1 matrices of size mxn, where the i_th
matrix (amongst p) is the results of multiplying column-i
of A with the row-i
of B.
These different perspectives find their relevance on different occasions.
For instance, within the Attention mechanism in Transformer neural network architecture, Attention matrix calculation will be seen as a matrix multiplication from the ‘dot product of rows and columns’ perspective.
More about Attention mechanism and Transformer will be present in the below article.
I hope these perspectives on matrix multiplication enable readers to realize a more intuitive understanding of knowledge flow in machine learning and data science algorithms and models.