further principles of data visualization

Packages

library(ggplot2); theme_set(theme_bw())
library(rainbow)  ## bagplots etc.
library(ggthemes)
library(directlabels)
theme_update(panel.spacing=grid::unit(0,"lines"))
library(cowplot) ## for arranging multiple plots, labeling, etc.
library(Hmisc)
library(dplyr)

John Tukey:
exploratory data analysis

Tukey (1915-2000): principles

stem-and-leaf plot

Distribution of horsepower:

stem(mtcars$hp)

## 
##   The decimal point is 2 digit(s) to the right of the |
## 
##   0 | 5677799
##   1 | 0011111122
##   1 | 55888888
##   2 | 123
##   2 | 556
##   3 | 4

boxplot

ggplot(mtcars,aes(cyl,hp,group=cyl))+geom_boxplot()

bag plot (2D boxplot)

(Rousseeuw et al. 1999) In ggplot (note hidden code):

ggplot(iris, aes(Sepal.Length, Sepal.Width, colour=Species,
                 shape=Species, fill=Species))+  geom_point()+ geom_bag()


The rainbow package implements functional boxplots, for high-dimensional (functional) data analysis (also fda, roahd packages): it uses various forms of projection or dimension reduction, followed by a bagplot of the first two projected dimensions

rainbow::fboxplot(data = ElNino_ERSST_region_1and2,
                  plot.type = "bivariate",
                  type = "bag", projmethod="PCAproj")

is Tukey still relevant?

Cleveland:
quantifying viz efficacy

principles

perceptual experiments

perceptual experiments: results

is Cleveland still relevant?


Heer et al. (2010)

Edward Tufte

Tufte principles

“Understand that Tufte’s ideas are a good starting point, not a religion” Robert Kosara

data ink

g0 <- ggplot(OrchardSprays,aes(treatment,decrease))+scale_y_log10()
print(plot_grid(g0 + geom_boxplot(),  g0 + geom_tufteboxplot(stat="boxplot")))

ggthemes::geom_tufteboxplot() (stat="boxplot" for Tukey-style definition)

information at the point of need

g1 <- ggplot(iris,aes(Sepal.Length,Petal.Length,colour=Species,
                shape=Species))+geom_point()
print(plot_grid(g1,direct.label(g1)))

direct labeling

other

Rules of thumb

Rules of thumb (continued)

Rules of thumb (3)

Data presentation scales with data size

examples

Some examples (from a screed on “dynamite plots”):

Notes

  1. the dreaded “dynamite plot”. Problems:
  1. inferential (point ± 2 SE) plot

Notes (continued)

  1. points ± 1 and 2 SE
  1. points alone

Notes (continued)

  1. boxplots
  1. violin plots

Example

References

Dawson, R. 2011.. Journal of Statistics Education 19 (2): 1–12.

Elliott, K. 2016.. Medium. https://medium.com/@kennelliott/39-studies-about-human-perception-in-30-minutes-4728f9e31a73.

Heer, J et al. 2010.. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 203–212. ACM.

McGill, R et al. 1978.. The American Statistician 32 (1): 12–16. doi:10.2307/2683468. http://www.jstor.org/stable/2683468.

Rousseeuw, PJ et al. 1999.. The American Statistician 53 (4) (November): 382–387. doi:10.1080/00031305.1999.10474494.

Sciani, M. 2018.. https://github.com/marcosci/cividis.