high-dimensional data

Oct 2021

library(factoextra)
library(pheatmap)
library(directlabels)
library(GGally)
library(vegan)
library(andrews)

Data

We usually think of high-dimensional data as consisting of multiple measures on a group of samples:

dimension means “number of variables”

may be divided into predictors and responses

Types of measures

Many scientists traditionally think of high-dimensional data as having parallel, continuous measures:

These may be complemented by a smaller number of “metadata” variables, which may be more diverse in type (count, categorical, etc.):

More and more datasets don’t follow this:

Goals

Approaches

radar chart

Duality

We study the rows (samples) using the columns (measures)

But we can also do the opposite!

Heatmaps

PCA

A beautiful decomposition based on the idea that data points are points in a Euclidean space

We can think about the PCA as a decomposition (making observed points from idealized points)

Or we can think about it as minimizing distances:

Accurate dataviz

To what extent can we make visual distances reflect data distances?