Principles

Validation

  • Does my model work as expected?

  • Baseline coverage: In a null simulation, does my confidence interval contain the right answer (β=0) \(1-\alpha\) of the time?

  • General coverage: In a range of simulations, does my confidence interval contain the right answer (β=0) \(1-\alpha\) of the time?

Power

  • Do I have good design and sample size?

  • Are my confidence intervals likely to be small enough to provide useful information?

  • If my scientific hypothesis is correct, are my confidence intervals likely to exclude a null effect?

  • What are some dangers of not having enough power?

Approach

  • Simulate some data under several hypotheses (null, one or more alternatives)
  • Do a statistical test and examine the results
  • Once you’re happy, wrap these steps in a function and repeat a couple of thousand times

Evaluation

  • If you have validated your model approach, it is consistent with your assumptions, but not necessarily with the real world
  • This is what diagnostic plots are for
  • But we can also “evaluate” models to some extent by simulating data that don’t exactly match our assumptions
    • Don’t do this until you are happy with what you’re getting with data that do match your assumptions

Simulation details

stages

  • simulate experimental design (sample size, distribution of covariates, balance, missing values, etc.): expand.grid, sample, seq(), rep(), r* functions
  • compute moments (means/SDs of response variables) from covariates and parameters (effects)
  • sample means/SDs of response variables (r* functions)

parameterizations

  • need to know something about the parameters of distributions you’re going to smulate from
  • often need to translate from mean+SD or mean+CV (coefficient of variance) to the parameters that R uses
  • distributions
    • Gaussian is easy/familiar (mean, SD)
    • Poisson is easy (mean)
    • negative binomial: need to specify mean as mu; size is the dispersion parameter, smaller means more variability (mean = \(\mu (1+\mu/k)\)).
    • log-Normal: meanlog, sdlog are the mean and SD on the log scale
    • Gamma: shape parameter is \(1/\sqrt{\textrm{CV}}\); scale parameter is (mean/shape)

tools

  • in general simulate samples new values based on a fitted model
  • however, simulate() (in lme4) and simulate_new() (in glmmTMB) can simulate values de novo, by specifying covariates and parameter values

For example:

data("sleepstudy", package = "lme4")
set.seed(101)
library(glmmTMB)
y <- simulate_new( ~ Days + (Days | Subject),
                  nsim = 1,
                  family = "gaussian",
                  newdata = sleepstudy, ## covariates
                  newparams = list(
                      beta = c(250, 10),   ## intercept and slope (pop-level)
                      ## log-SD of intercept and slope; transformed correlation
                      theta = c(2, 1, 0),
                      betad = 2)  ## log-SD of residual variation
                  )[[1]]  ## simulate_new always returns a list

The hardest part is understanding the random-effects parameters (theta); see here or ask us.

Since glmmTMB can handle models with or without random effects, and a wide range of GLM-type distributions, this may be a good way to simulate responses once you’ve simulated your experimental design/covariate distribution.