A tutorial on PCA

https://arxiv.org/pdf/1404.1100.pdf

You have m x n matrix, “n” samples and “m” features collected. “n” is a large number. How do you visualize it or understand it?

Answer: Reduce the dimensions. Basically. Now it has “n” basis vectors reduce it to 5 or whatever, but what should be our approach to reduce?

Basically, find a matrix P such that PX = Y. Rows of P are principal basis.

Now, those basis vectors should be such that

the Y (data on new basis) should have

  1. low redundancy (one measurement in inches, one measurement in centimeters. Having both is pointless. So covarnace between features should be zero)

  2. The directions of basis vectors should be such that along the direction, maximum signal/noise ratio is capture

following first condition covaraince matrix of Y should be diagonal

$$ C_Y = \frac{1}{n} Y Y^T \\ = \frac{1}{n} (PX)(PX)^T \\ = \frac{1}{n} P X X^T P^T = P C_X P^T

$$

We have the choice to have P, P should be chosen such that $C_Y$should be diagonal. Also, it would be really convenient if the directions of new basis(rows of P) are orthogonal.

Why? We will see that soon.

According to a theorem, in Linear algebra. A Symmetric matrix can be diaognalized as in this manner

$SymmetricMatrix = E D E^T$