setup

load packages

## graphics
library(tidyverse)
theme_set(theme_bw() + theme(panel.spacing = grid::unit(0, "lines")))
library(ggh4x) ## for nested facets

turning tables into graphs

why graphs instead of tables?

  • Gelman, Pasarica, and Dodhia (2002); Gelman (2011)

tables are best suited for looking up specific information, and graphs are better for perceiving trends and making comparisons and predictions

  • easier to read and compare
  • easier to perceive magnitudes
  • less prone to dichotomization

why not tables instead of graphs?

  • looking up specific values (dynamic graphs?)
  • cultural familiarity
  • includes all the information
    • include data separately/machine-readably?

principles

  • use small multiples
  • use appropriate scales
  • Cleveland hierarchy etc.

example: Wei (2017) Table 5.5

rearranged data

dd <- read_table("../data/wei_tab5.5.txt")
head(dd)
## # A tibble: 6 × 11
##   dataset     r type  MGHD.ERR MGHD.ARI MST.ERR MST.ARI `MI/MGHD.ERR`
##   <chr>   <dbl> <chr>    <dbl>    <dbl>   <dbl>   <dbl>         <dbl>
## 1 sim1     0.05 est     0.0608   0.774   0.0688  0.771         0.121 
## 2 sim1     0.05 sd      0.0292   0.0925  0.0557  0.0998        0.0302
## 3 sim1     0.1  est     0.0578   0.782   0.277   0.456         0.188 
## 4 sim1     0.1  sd      0.0116   0.0412  0.0895  0.215         0.0392
## 5 sim1     0.2  est     0.0674   0.752   0.231   0.562         0.311 
## 6 sim1     0.2  sd      0.0335   0.108   0.0604  0.105         0.0552
## # … with 3 more variables: MI/MGHD.ARI <dbl>, MI/MST.ERR <dbl>,
## #   MI/MST.ARI <dbl>

rearrange

dd2 <- (dd
  %>% pivot_longer(names_to = "model", values_to = "val",
                   cols = -c(dataset, r, type))
  %>% separate(model, into = c("model", "stat"), sep = "\\.")
  ## est + sd in a single row
  %>% pivot_wider(names_from = type, values_from = val)
)
head(dd2, 4)
## # A tibble: 4 × 6
##   dataset     r model stat     est     sd
##   <chr>   <dbl> <chr> <chr>  <dbl>  <dbl>
## 1 sim1     0.05 MGHD  ERR   0.0608 0.0292
## 2 sim1     0.05 MGHD  ARI   0.774  0.0925
## 3 sim1     0.05 MST   ERR   0.0688 0.0557
## 4 sim1     0.05 MST   ARI   0.771  0.0998

add auxiliary information

simtab <- read.table(header=TRUE,text="
dataset distribution covstruc separation
sim1 MGHD VEE well-separated
sim2 MGHD VEE overlapping
sim3 MST VEI well-separated
sim4 MST VEI overlapping
sim5 GMM VEE well-separated
sim6 GMM VEE overlapping
")
dd3 <- dd2 %>% full_join(simtab,by="dataset")

code

gg1 <- (ggplot(dd3,aes(factor(r),est,colour=model)) 
  + geom_point()+geom_line(aes(group=model))   ## points and lines
  ## transparent ribbons, +/- 1 SD:
  + geom_ribbon(aes(ymin=est-sd,ymax=est+sd,group=model,fill=model),
                colour=NA,alpha=0.3)
  ## limit y-axis, compress out-of-bounds values
  + scale_y_continuous(limits=c(0,1),oob=scales::squish)
  + ggh4x::facet_nested(stat~distribution+covstruc+separation)
  + labs(x="r (proportion missing)",y="")
  + scale_colour_brewer(palette="Dark2")
  + scale_fill_brewer(palette="Dark2"))

picture

possible improvements?

  • order models
  • colour panel backgrounds according to whether well-separated or not (geom_rect)
  • direct labeling (only in rightmost facets?)
  • could collapse sim labels
  • change x-axis to continuous?
  • invert ARI or ERR so rankings are the same?

table-to-graph tricks

  • tidyr::pivot_longer (wide to long format)
  • tabulizer package (extract_tables, extract_area)
  • expand.grid (or tidyr::expand_grid) to create hierarchical columns
  • zoo::na.locf (or tidyr::fill) to fill in blank spaces
  • dealing with multi-row headers

references

Gelman, Andrew. 2011. “Why Tables Are Really Much Better Than Graphs.” Journal of Computational and Graphical Statistics 20 (1): 3–7. https://doi.org/10.1198/jcgs.2011.09166.

Gelman, Andrew, Cristian Pasarica, and Rahul Dodhia. 2002. “Let’s Practice What We Preach: Turning Tables into Graphs.” The American Statistician 56 (2): 121–30. http://www.tandfonline.com/doi/abs/10.1198/000313002317572790.

Wei, Yuhong. 2017. “Extending Growth Mixture Models and Handling Missing Values via Mixtures of Non-Elliptical Distributions.” Thesis. https://macsphere.mcmaster.ca/handle/11375/21987.