Clustering: PCA vs t-SNE on the Fashion MNIST dataset

Principal Component Analysis Recently I’ve been working on projects involving high-dimensional datasets with hundreds or thousands of variables, which naturally led me to dimension reduction techniques to better visualise and model the data (e.g. cluster analysis). The first port of call for most people will be Principal Component Analysis (“PCA”). In simple terms, PCA determines the directions (principal components) in which the data varies the most by decomposing the sample covariance matrix, \(S\), into its eigenvectors and eigenvalues....

April 26, 2022 · 13 min · Josh Cheema