A tutorial on PCA
https://arxiv.org/pdf/1404.1100.pdf
You have m x n matrix, “n” samples and “m” features collected. “n” is a large number. How do you visualize it or understand it?
Answer: Reduce the dimensions. Basically. Now it has “n” basis vectors reduce it to 5 or whatever, but what should be our approach to reduce?
Basically, find a matrix P such that PX = Y. Rows of P are principal basis.
Now, those basis vectors should be such that
the Y (data on new basis) should have
low redundancy (one measurement in inches, one measurement in centimeters. Having both is pointless. So covarnace between features should be zero)
The directions of basis vectors should be such that along the direction, maximum signal/noise ratio is capture
following first condition covaraince matrix of Y should be diagonal
$$ C_Y = \frac{1}{n} Y Y^T \\ = \frac{1}{n} (PX)(PX)^T \\ = \frac{1}{n} P X X^T P^T = P C_X P^T
$$
We have the choice to have P, P should be chosen such that $C_Y$should be diagonal. Also, it would be really convenient if the directions of new basis(rows of P) are orthogonal.
Why? We will see that soon.
According to a theorem, in Linear algebra. A Symmetric matrix can be diaognalized as in this manner
$SymmetricMatrix = E D E^T$