Introduction
When coping with high-dimension data, it is not uncommon to make use of methods akin to Principal Component Evaluation (PCA) to cut back the dimension of the info. This converts the info to a unique (lower dimension) set of features. This contrasts with feature subset selection which selects a subset of the unique features (see [1] for a turorial on feature selection).
PCA is a linear transformation of the info to a lower dimension space. In this text we start off by explaining what a linear transformation is. Then we show with Python examples how PCA works. The article concludes with an outline of Linear Discriminant Evaluation (LDA) a supervised linear transformation method. Python code for the methods presented in that paper is out there on GitHub.
Linear Transformations
Imagine that after a vacation Bill owes Mary £5 and $15 that should be paid in euro (€). The rates of exchange are; £1 = €1.15 and $1 = €0.93. So the debt in € is:
Here we’re converting a debt in two dimensions (£,$) to at least one dimension (€). Three examples of this are illustrated in Figure 1, the unique (£5, $15) debt and two other debts of (£15, $20) and (£20, $35). The green dots are the unique debts and the red dots are the debts projected right into a single dimension. The red line is that this latest dimension.