Introduction
When coping with high-dimension data, it is not uncommon to make use of methods akin to Principal Component Evaluation (PCA) to cut back the dimension of the info. This converts the info to a unique (lower dimension) set of features. This contrasts with feature subset selection which selects a subset of the unique features (see [1] for a turorial on feature selection).
PCA is a linear transformation of the info to a lower dimension space. In this text we start off by explaining what a linear transformation is. Then we show with Python examples how PCA works. The article concludes with an outline of Linear Discriminant Evaluation (LDA) a supervised linear transformation method. Python code for the methods presented in that paper is out there on GitHub.
Linear Transformations
Imagine that after a vacation Bill owes Mary £5 and $15 that should be paid in euro (€). The rates of exchange are; £1 = €1.15 and $1 = €0.93. So the debt in € is:
Here we’re converting a debt in two dimensions (£,$) to at least one dimension (€). Three examples of this are illustrated in Figure 1, the unique (£5, $15) debt and two other debts of (£15, $20) and (£20, $35). The green dots are the unique debts and the red dots are the debts projected right into a single dimension. The red line is that this latest dimension.
On the left within the figure we will see how this will be represented as matrix multiplication. The unique dataset is a 3 by 2 matrix (3 samples, 2 features), the rates of exchange form a 1D matrix of two components and the output is a 1D matrix of three components. The exchange rate matrix is the transformation; if the exchange rates are modified then the transformation changes.
We are able to perform this matrix multiplication in Python using the code below. The matrices are represented as numpy arrays; the ultimate line calls the dot
method on the cur
matrix to perform matrix multiplication (dot product). This…