Understanding Matrices | Part 2: Matrix-Matrix Multiplication

In the primary story [1] of this series, we’ve got:

Addressed multiplication of a matrix by a vector,
Introduced the concept of X-diagram for a given matrix,
Observed behavior of several special matrices, when being multiplied by a vector.

In the present 2nd story, we’ll grasp the physical meaning of matrix-matrix multiplication, understand why multiplication is just not a symmetrical operation (i.e., why “* ≠ *“), and at last, we’ll see how several special matrices behave when being multiplied over one another.

So let’s start, and we’ll do it by recalling the definitions that I take advantage of throughout this series:

Matrices are denoted with uppercase (like ‘‘ and ‘‘), while vectors and scalars are denoted with lowercase (like ‘‘, ‘‘ or ‘‘, ‘‘).
|| – is the length of vector ‘‘,
() – variety of rows of matrix ‘‘,
() – variety of columns of matrix ‘‘.

The concept of multiplying matrices

Multiplication of two matrices “” and “” might be essentially the most common operation in matrix evaluation. A known fact is that “” and “” will be multiplied provided that “() = ()”. At the identical time, “” can have any variety of rows, and “” can have any variety of columns. Cells of the product matrix “ = *” are calculated by the next formula:

[
begin{equation*}
c_{i,j} = sum_{k=1}^{p} a_{i,k}*b_{k,j}
end{equation*}
]

where “ = () = ()”. The result matrix “” may have the size:

() = (),
() = ().

Acting upon the multiplication formula, when calculating “*” we should always scan -th row of “” in parallel to scanning -th column of ““, and after summing up all of the products “*” we may have the worth of ““.

One other well-known fact is that matrix multiplication is just not a symmetrical operation, i.e., “* ≠ *“. Without going into details, we will already see that when multiplying 2 rectangular matrices:

For newbies, the indisputable fact that matrix multiplication is just not a symmetrical operation often seems strange, as multiplication defined for nearly another object is a symmetrical operation. One other indisputable fact that is commonly unclear is why matrix multiplication is performed by such an odd formula.

On this story, I’m going to provide my answers to each of those questions, and never only to them…

Derivation of the matrices multiplication formula

Multiplying “*” should produce such a matrix ‘‘, that:

= * = (*)* = *(*).

In other words, multiplying any vector ‘‘ by the product matrix “=*” should lead to the identical vector ‘‘, which we’ll receive if at first multiplying ‘‘ by ‘‘, after which multiplying ‘‘ by that intermediate result.

This already explains why in “=*“, the condition that “() = ()” ought to be kept. That’s due to length of the intermediate vector. Let’s denote it as ‘‘:

= *,
= * = (*)* = *(*) = *.

Obviously, as “ = *“, we’ll receive a vector ‘‘ of length “|| = ()”. But later, matrix ‘‘ goes to be multiplied by ‘‘, which requires ‘‘ to have the length “|| = ()”. From those 2 facts, we will already determine that:

() = || = (), or
() = ().

In the primary story [1] of this series, we’ve got learned the “X-way interpretation” of matrix-vector multiplication “*“. Considering that for “ = (*)“, vector ‘‘ goes at first through the transformation of matrix ‘‘, after which it continues through the transformation of matrix ‘‘, we will broaden the concept of “X-way interpretation” and present matrix-matrix multiplication “*” as 2 adjoining X-diagrams:

Now, what should a certain cell “” of matrix ‘‘ be equal to? From part 1 – “matrix-vector multiplication” [1], we do not forget that the physical meaning of “” is – how much the input value ‘‘ affects the output value ‘‘. Considering the image above, let’s see how some input value ‘‘ can affect another output value ‘‘. It might probably affect through the intermediate value ‘‘, i.e., through arrows “” and ““. Also, the love can happen through the intermediate value ‘‘, i.e., through arrows “” and ““. Generally, the love of ‘‘ on ‘‘ can happen through any value ‘‘ of the intermediate vector ‘‘, i.e., through arrows “” and ““.

So there are ‘‘ possible ways during which the worth ‘‘ influences ‘‘, where ‘‘ is the length of the intermediate vector: “ = || = |*|”. The influences are:

[begin{equation*}
begin{matrix}
a_{i,1}*b_{1,j},
a_{i,2}*b_{2,j},
a_{i,3}*b_{3,j},
dots
a_{i,p}*b_{p,j}
end{matrix}
end{equation*}]

All those ‘‘ influences are independent of one another, which is why within the formula of matrices multiplication they participate as a sum:

[begin{equation*}
c_{i,j} =
a_{i,1}*b_{1,j} + a_{i,2}*b_{2,j} + dots + a_{i,p}*b_{p,j} =
sum_{k=1}^{p} a_{i,k}*b_{k,j}
end{equation*}]

That is my visual explanation of the matrix-matrix multiplication formula. By the way in which, interpreting “*” as a concatenation of X-diagrams of “” and “” explicitly shows why the condition “() = ()” ought to be held. That’s easy, because otherwise it would not be possible to concatenate the 2 X-diagrams:

Why is it that “AB ≠ BA”

Interpreting matrix multiplication “*” as a concatenation of X-diagrams of “” and “” also explains why multiplication is just not symmetrical for matrices, i.e., why “* ≠ *“. Let me show that on two certain matrices:

[begin{equation*}
A =
begin{bmatrix}
0 & 0 & 0 & 0
0 & 0 & 0 & 0
a_{3,1} & a_{3,2} & a_{3,3} & a_{3,4}
a_{4,1} & a_{4,2} & a_{4,3} & a_{4,4}
end{bmatrix}
, B =
begin{bmatrix}
b_{1,1} & b_{1,2} & 0 & 0
b_{2,1} & b_{2,2} & 0 & 0
b_{3,1} & b_{3,2} & 0 & 0
b_{4,1} & b_{4,2} & 0 & 0
end{bmatrix}
end{equation*}]

Here, matrix ‘‘ has its upper half stuffed with zeroes, while ‘‘ has zeroes on its right half. Corresponding X-diagrams are:

What is going to occur if attempting to multiply “*“? Then A’s X-diagram ought to be placed to the left of B’s X-diagram.

Having such a placement, we see that input values ‘₁‘ and ‘₂‘ can affect each output values ‘₃‘ and ‘₄‘. Particularly, which means that the product matrix “*” is non-zero.

[
begin{equation*}
A*B =
begin{bmatrix}
0 & 0 & 0 & 0
0 & 0 & 0 & 0
c_{3,1} & c_{3,2} & 0 & 0
c_{4,1} & c_{4,2} & 0 & 0
end{bmatrix}
end{equation*}
]

Now, what’s going to occur if we attempt to multiply these two matrices in the alternative order? For presenting the product “*“, B’s X-diagram ought to be drawn to the left of A’s diagram:

We see that now there isn’t any connected path, by which any input value “” can affect any output value ““. In other words, within the product matrix “*” there isn’t any affection in any respect, and it is definitely a zero-matrix.

[begin{equation*}
B*A =
begin{bmatrix}
0 & 0 & 0 & 0
0 & 0 & 0 & 0
0 & 0 & 0 & 0
0 & 0 & 0 & 0
end{bmatrix}
end{equation*}]

This instance clearly illustrates why order is essential for matrix-matrix multiplication. In fact, many other examples can be found out.

Multiplying chain of matrices

X-diagrams can be concatenated once we multiply 3 or more matrices. For instance, for the case of:

= **,

we will draw the concatenation in the next way:

Here we now have 2 intermediate vectors:

= *, and
= (*)* = *(*x) = *

while the result vector is:

= (**)* = *(*(*)) = *(*) = *.

The variety of possible ways during which some input value “” can affect some output value “” grows here by an order of magnitude.

More precisely, the influence of certain “” over “” can come through any item of the primary intermediate stack ““, and any item of the second intermediate stack ““. So the number of how of influence becomes “||*||”, and the formula for “” becomes:

[begin{equation*}
g_{i,j} = sum_{v=1}^s sum_{u=1}^t a_{i,v}*b_{v,u}*c_{u,j}
end{equation*}]

Multiplying matrices of special types

We will already visually interpret matrix-matrix multiplication. In the primary story of this series [1], we also learned about several special kinds of matrices – the dimensions matrix, shift matrix, permutation matrix, and others. So let’s take a take a look at how multiplication works for those kinds of matrices.

Multiplication of scale matrices

A scale matrix has non-zero values only on its diagonal:

From theory, we all know that multiplying two scale matrices leads to one other scale matrix. Why is it that way? Let’s concatenate X-diagrams of two scale matrices:

The concatenation X-diagram clearly shows that any input item “” can still affect only the corresponding output item ““. It has no way of influencing another output item. Subsequently, the result structure behaves the identical way as another scale matrix.

Multiplication of shift matrices

A shift matrix is one which, when multiplied over some input vector ‘‘, shifts upwards or downwards values of ‘‘ by some ‘‘ positions, filling the emptied slots with zeroes. To realize that, a shift matrix ‘‘ should have 1(s) on a line parallel to its foremost diagonal, and 0(s) in any respect other cells.

The speculation says that multiplying 2 shift matrices ‘‘ and ‘‘ leads to one other shift matrix. Interpretation with X-diagrams gives a transparent explanation of that. Multiplying the shift matrices ‘‘ and ‘‘ corresponds to concatenating their X-diagrams:

We see that if shift matrix ‘‘ shifts values of its input vector by ‘‘ positions upwards, and shift matrix ‘‘ shifts values of the input vector by ‘‘ positions upwards, then the outcomes matrix “ = *” will shift values of the input vector by ‘+‘ positions upwards, which suggests that “” can be a shift matrix.

Multiplication of permutation matrices

A permutation matrix is one which, when multiplied by an input vector ‘‘, rearranges the order of values in ‘‘. To act like that, the x-sized permutation matrix ‘‘ must satisfy the next criteria:

it must have 1(s),
no two 1(s) ought to be on the identical row or the identical column,
all remaining cells ought to be 0(s).

Upon theory, multiplying 2 permutation matrices ‘‘ and ‘‘ leads to one other permutation matrix ‘‘. While the explanation for this may not be clear enough if matrix multiplication within the extraordinary way (as scanning rows of ‘‘ and columns of ‘‘), it becomes much clearer if it through the interpretation of X-diagrams. Multiplying “*” is similar as concatenating X-diagrams of ‘‘ and ‘‘.

We see that each input value ‘‘ of the best stack still has just one path for reaching another position ‘‘ on the left stack. So “*” still acts as a rearrangement of all values of the input vector ‘‘, in other words, “*” can be a permutation matrix.

Multiplication of triangular matrices

A triangular matrix has all zeroes either above or below its foremost diagonal. Here, let’s think about upper-triangular matrices, where zeroes are below the foremost diagonal. The case of lower-triangular matrices is analogous.

The indisputable fact that non-zero values of ‘‘ are either on its foremost diagonal or above, makes all of the arrows of its X-diagram either horizontal or directed upwards. This, in turn, signifies that any input value ‘‘ of the best stack can affect only those output values ‘‘ of the left stack, which have a lesser or equal index (i.e., “ ≤ “). That’s one in every of the properties of an upper-triangular matrix.

Based on theory, multiplying two upper-triangular matrices leads to one other upper-triangular matrix. And here too, interpretation with X-diagrams provides a transparent explanation of that fact. Multiplying two upper-triangular matrices ‘‘ and ‘‘ is similar as concatenating their X-diagrams:

We see that putting two X-diagrams of triangular matrices ‘‘ and ‘‘ near one another leads to such a diagram, where every input value ‘‘ of the best stack still can affect only those output values ‘‘ of the left stack, that are either on its level or above it (in other words, “ ≤ “). Because of this the product “*” also behaves like an upper-triangular matrix; thus, it should have zeroes below its foremost diagonal.

Conclusion

In the present 2nd story of this series, we observed how matrix-matrix multiplication will be presented visually, with the assistance of so-called “X-diagrams”. We now have learned that doing multiplication “ = *” is similar as concatenating X-diagrams of those two matrices. This method clearly illustrates various properties of matrix multiplications, like why it is just not a symmetrical operation (“* ≠ *“), in addition to explains the formula:

[begin{equation*}
c_{i,j} = sum_{k=1}^{p} a_{i,k}*b_{k,j}
end{equation*}]

We now have also observed why multiplication behaves in certain ways when operands are matrices of special types (scale, shift, permutation, and triangular matrices).

I hope you enjoyed reading this story!

In the approaching story, we’ll address how matrix transposition “” will be interpreted with X-diagrams, and what we will gain from such interpretation, so subscribe to my page to not miss the updates!

References

[1] – Understanding matrices | Part 1: matrix-vector multiplication : https://towardsdatascience.com/understanding-matrices-part-1-matrix-vector-multiplication/

Understanding Matrices | Part 2: Matrix-Matrix Multiplication

The concept of multiplying matrices

Derivation of the matrices multiplication formula

Why is it that “AB ≠ BA”

Multiplying chain of matrices