Principles

Validation

Does my model work as expected?
Baseline coverage: In a null simulation, does my confidence interval contain the right answer (β=0) \(1-\alpha\) of the time?
General coverage: In a range of simulations, does my confidence interval contain the right answer (β=0) \(1-\alpha\) of the time?

Power

Do I have good design and sample size?
Are my confidence intervals likely to be small enough to provide useful information?
If my scientific hypothesis is correct, are my confidence intervals likely to exclude a null effect?
What are some dangers of not having enough power?

Approach

Simulate some data under several hypotheses (null, one or more alternatives)
Do a statistical test and examine the results
Once you’re happy, wrap these steps in a function and repeat a couple of thousand times

Evaluation

If you have validated your model approach, it is consistent with your assumptions, but not necessarily with the real world
This is what diagnostic plots are for
But we can also “evaluate” models to some extent by simulating data that don’t exactly match our assumptions
- Don’t do this until you are happy with what you’re getting with data that do match your assumptions

Simulation details

stages

simulate experimental design (sample size, distribution of covariates, balance, missing values, etc.): expand.grid, sample, seq(), rep(), r* functions
compute moments (means/SDs of response variables) from covariates and parameters (effects)
sample means/SDs of response variables (r* functions)

parameterizations

need to know something about the parameters of distributions you’re going to smulate from
often need to translate from mean+SD or mean+CV (coefficient of variance) to the parameters that R uses
distributions
- Gaussian is easy/familiar (mean, SD)
- Poisson is easy (mean)
- negative binomial: need to specify mean as mu; size is the dispersion parameter, smaller means more variability (mean = \(\mu (1+\mu/k)\)).
- log-Normal: meanlog, sdlog are the mean and SD on the log scale
- Gamma: shape parameter is \(1/\sqrt{\textrm{CV}}\); scale parameter is (mean/shape)

tools

in general simulate samples new values based on a fitted model
however, simulate() (in lme4) and simulate_new() (in glmmTMB) can simulate values de novo, by specifying covariates and parameter values

For example:

data("sleepstudy", package = "lme4")
set.seed(101)
library(glmmTMB)
y <- simulate_new( ~ Days + (Days | Subject),
                  nsim = 1,
                  family = "gaussian",
                  newdata = sleepstudy, ## covariates
                  newparams = list(
                      beta = c(250, 10),   ## intercept and slope (pop-level)
                      ## log-SD of intercept and slope; transformed correlation
                      theta = c(2, 1, 0),
                      betad = 2)  ## log-SD of residual variation
                  )[[1]]  ## simulate_new always returns a list

The hardest part is understanding the random-effects parameters (theta); see here or ask us.

Since glmmTMB can handle models with or without random effects, and a wide range of GLM-type distributions, this may be a good way to simulate responses once you’ve simulated your experimental design/covariate distribution.

Generalized linear models

Ben Bolker and Jonathan Dushoff

27 March 2024