tables vs graphs

setup

load packages

## graphics
library(tidyverse)
theme_set(theme_bw() + theme(panel.spacing = grid::unit(0, "lines")))
library(ggh4x) ## for nested facets

turning tables into graphs

why graphs instead of tables?

Gelman, Pasarica, and Dodhia (2002); Gelman (2011)

tables are best suited for looking up specific information, and graphs are better for perceiving trends and making comparisons and predictions

easier to read and compare
easier to perceive magnitudes
less prone to dichotomization

why not tables instead of graphs?

looking up specific values (dynamic graphs?)
cultural familiarity
includes all the information
- include data separately/machine-readably?

principles

use small multiples
use appropriate scales
Cleveland hierarchy etc.

example: Wei (2017) Table 5.5

rearranged data

dd <- read_table("../data/wei_tab5.5.txt")
head(dd)

## # A tibble: 6 × 11
##   dataset     r type  MGHD.ERR MGHD.ARI MST.ERR MST.ARI `MI/MGHD.ERR`
##   <chr>   <dbl> <chr>    <dbl>    <dbl>   <dbl>   <dbl>         <dbl>
## 1 sim1     0.05 est     0.0608   0.774   0.0688  0.771         0.121 
## 2 sim1     0.05 sd      0.0292   0.0925  0.0557  0.0998        0.0302
## 3 sim1     0.1  est     0.0578   0.782   0.277   0.456         0.188 
## 4 sim1     0.1  sd      0.0116   0.0412  0.0895  0.215         0.0392
## 5 sim1     0.2  est     0.0674   0.752   0.231   0.562         0.311 
## 6 sim1     0.2  sd      0.0335   0.108   0.0604  0.105         0.0552
## # … with 3 more variables: MI/MGHD.ARI <dbl>, MI/MST.ERR <dbl>,
## #   MI/MST.ARI <dbl>

rearrange

dd2 <- (dd
  %>% pivot_longer(names_to = "model", values_to = "val",
                   cols = -c(dataset, r, type))
  %>% separate(model, into = c("model", "stat"), sep = "\\.")
  ## est + sd in a single row
  %>% pivot_wider(names_from = type, values_from = val)
)
head(dd2, 4)

## # A tibble: 4 × 6
##   dataset     r model stat     est     sd
##   <chr>   <dbl> <chr> <chr>  <dbl>  <dbl>
## 1 sim1     0.05 MGHD  ERR   0.0608 0.0292
## 2 sim1     0.05 MGHD  ARI   0.774  0.0925
## 3 sim1     0.05 MST   ERR   0.0688 0.0557
## 4 sim1     0.05 MST   ARI   0.771  0.0998

add auxiliary information

simtab <- read.table(header=TRUE,text="
dataset distribution covstruc separation
sim1 MGHD VEE well-separated
sim2 MGHD VEE overlapping
sim3 MST VEI well-separated
sim4 MST VEI overlapping
sim5 GMM VEE well-separated
sim6 GMM VEE overlapping
")
dd3 <- dd2 %>% full_join(simtab,by="dataset")

code

gg1 <- (ggplot(dd3,aes(factor(r),est,colour=model)) 
  + geom_point()+geom_line(aes(group=model))   ## points and lines
  ## transparent ribbons, +/- 1 SD:
  + geom_ribbon(aes(ymin=est-sd,ymax=est+sd,group=model,fill=model),
                colour=NA,alpha=0.3)
  ## limit y-axis, compress out-of-bounds values
  + scale_y_continuous(limits=c(0,1),oob=scales::squish)
  + ggh4x::facet_nested(stat~distribution+covstruc+separation)
  + labs(x="r (proportion missing)",y="")
  + scale_colour_brewer(palette="Dark2")
  + scale_fill_brewer(palette="Dark2"))

picture

possible improvements?

order models
colour panel backgrounds according to whether well-separated or not (geom_rect)
direct labeling (only in rightmost facets?)
could collapse sim labels
change x-axis to continuous?
invert ARI or ERR so rankings are the same?

table-to-graph tricks

tidyr::pivot_longer (wide to long format)
tabulizer package (extract_tables, extract_area)
expand.grid (or tidyr::expand_grid) to create hierarchical columns
zoo::na.locf (or tidyr::fill) to fill in blank spaces
dealing with multi-row headers

references

Gelman, Andrew. 2011. “Why Tables Are Really Much Better Than Graphs.” Journal of Computational and Graphical Statistics 20 (1): 3–7. https://doi.org/10.1198/jcgs.2011.09166.

Gelman, Andrew, Cristian Pasarica, and Rahul Dodhia. 2002. “Let’s Practice What We Preach: Turning Tables into Graphs.” The American Statistician 56 (2): 121–30. http://www.tandfonline.com/doi/abs/10.1198/000313002317572790.

Wei, Yuhong. 2017. “Extending Growth Mixture Models and Handling Missing Values via Mixtures of Non-Elliptical Distributions.” Thesis. https://macsphere.mcmaster.ca/handle/11375/21987.