Clustering: PCA vs t-SNE on the Fashion MNIST dataset

Principal Component Analysis Recently I’ve been working on projects involving high-dimensional datasets with hundreds or thousands of variables, which naturally led me to dimension reduction techniques to better visualise and model the data (e.g. cluster analysis). The first port of call for most people will be Principal Component Analysis (“PCA”). In simple terms, PCA determines the directions (principal components) in which the data varies the most by decomposing the sample covariance matrix, \(S\), into its eigenvectors and eigenvalues....

April 26, 2022 · 13 min · Josh Cheema

LDA vs QDA

Introduction When looking at binary classification problems, a common modelling approach is logistic regression, which makes use of the logistic function to determine whether an observation belongs to one of \(K\) classes. However, while logistic regression is a valid approach, alternative methods may be required. In particular, for datasets where classes are completely (or almost completely) separate. In this article, we discuss two methods that do not suffer from this class separation issue: linear discriminant analysis (“LDA”) and quadratic discriminant analysis (“QDA”)....

April 20, 2022 · 5 min · Josh Cheema