Feature Transformations: A Tutorial on PCA and LDA

-

Reducing the dimension of a dataset using methods akin to PCA

Photo by Nicole Cagnina on Unsplash

Introduction

When coping with high-dimension data, it is not uncommon to make use of methods akin to Principal Component Evaluation (PCA) to cut back the dimension of the info. This converts the info to a unique (lower dimension) set of features. This contrasts with feature subset selection which selects a subset of the unique features (see [1] for a turorial on feature selection).

PCA is a linear transformation of the info to a lower dimension space. In this text we start off by explaining what a linear transformation is. Then we show with Python examples how PCA works. The article concludes with an outline of Linear Discriminant Evaluation (LDA) a supervised linear transformation method. Python code for the methods presented in that paper is out there on GitHub.

Linear Transformations

Imagine that after a vacation Bill owes Mary £5 and $15 that should be paid in euro (€). The rates of exchange are; £1 = €1.15 and $1 = €0.93. So the debt in € is:

Here we’re converting a debt in two dimensions (£,$) to at least one dimension (€). Three examples of this are illustrated in Figure 1, the unique (£5, $15) debt and two other debts of (£15, $20) and (£20, $35). The green dots are the unique debts and the red dots are the debts projected right into a single dimension. The red line is that this latest dimension.

A depiction of example currency conversions (£,$ -> €).” class=”bg nv nw c” width=”700″ height=”248″ loading=”lazy”></picture></div>
</div><figcaption class=Figure 1. An illustration of how converting £,$ debts to € is a linear transformation. Image by writer.

On the left within the figure we will see how this will be represented as matrix multiplication. The unique dataset is a 3 by 2 matrix (3 samples, 2 features), the rates of exchange form a 1D matrix of two components and the output is a 1D matrix of three components. The exchange rate matrix is the transformation; if the exchange rates are modified then the transformation changes.

We are able to perform this matrix multiplication in Python using the code below. The matrices are represented as numpy arrays; the ultimate line calls the dot method on the cur matrix to perform matrix multiplication (dot product). This…

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

1 COMMENT

0 0 votes
Article Rating
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

1
0
Would love your thoughts, please comment.x
()
x