Introduction

History

  • \(t\) test, ANOVA, regression are all (part of) the same thing, the general linear model
  • “linear model” (lm in R), to distinguish it from the generalized linear model (glm)
  • foundation of many other statistical methods

Cheat sheet

Basic theory

Assumptions

  • Independence
  • Response (dependent) variables are linear functions of predictor (derived) variables, in turn based on input (observed, “independent”) variables
  • One or more predictor variables per input variable
  • Estimate a parameter for the effect of each predictor variable
  • Errors or residuals have constant variance and are Normally distributed
  • In other words, the difference between our model predictions and our observations is Normal
  • not assuming the marginal distribution is Normal
  • (Predictor variables measured without error)

Marginal vs conditional distributions

Marginal vs conditional distributions

Linear or non-linear models?

  • \(y = a + b x\)
  • \(y = a + b x + c x^2 + d x^3\)
  • \(y = \frac{a}{b+x}\)
  • \(y = a + b \log(x)\)
  • \(y = \log(a + b x)\)
  • \(y = a x^b\)
  • \(\log(y) = a + b \log(x)\)

Machinery

  • Least squares fit - minimize the squared differences between predictions and observations
  • lots of nice properties (variance partitioning etc.)
  • Solution is computationally easy (linear algebra)
  • Sensitive to some violations of assumptions - e.g. outliers have a strong effect

One-parameter variables

  • Continuous predictor: straight line, one parameter
  • \(Y = a+bX\): \(Y\) is the response, \(X\) is the input variable
    (\(b\) is the slope - expected change in \(Y\) per unit change in \(X\))
  • Categorical predictor with two categories: one parameter
    • difference in predicted value between levels, or code the categories as 0, 1 … (dummy variables)
  • Parameters are (usually) easy to interpret
  • can get \(p\)-values, CIs for each parameter

Multi-parameter variables

  • Often have have several predictor variables (parameters) associated with each input variable
  • Non-linear response to an input variable
    • Linear models can work for non-linear responses! e.g. \(Y = a + bX + cX^2\) (predictor vars \(\{1, X, X^2 \}\))
      • For complex smooth curves better to use splines (splines::ns(), or mgcv package)
  • Categorical variables with \(>2\) levels
    • Why can’t we just code them, for example as 0, 1, 2?
    • dummy variables or contrasts

Effect sizes/\(p\)-values for multi-parameter variables

  • If you’re just trying to control for it, include it in the model and then ignore it!
  • Otherwise, you can (and should) get a \(p\) value for the variable as a whole (it’s not in summary() output)
  • But we can only get CIs on the parameters — and there are different ways to parameterize (contrasts)
    • If you do have (a) clear scientific question(s), you can set up contrasts that allow you to test it/them (we can help)
    • If you must, do pairwise comparisons, test each pair of variables for differences and make an aabbc list (more later)

Interactions

  • Interactions allow the value of one predictor to affect the relationship between another predictor and the response variable
  • Interpreting main effects in the presence of interactions is tricky (principle of marginality)
  • Estimated effect of \(A\) depends on baseline value of \(B\): pick a fixed point (Schielzeth 2010), or average in some way
  • Example: \(Y = a + b_1 X_1 + b_2 X_2 + b_{12} X_1 X_2\)
  • The response to \(X_1\) is \(Y = (a+b_2 X_2) + (b_1+b_{12}X_2) X_1\) the response to \(X_1\) depends on the value of \(X_2\).

Example (nlme::Orthodont data)

An experimental example

  • You want to know whether (how!) a drug treatment changes the metabolism of some rabbits
  • You’re using adult, prime-aged rabbits and keeping them under controlled conditions, so you don’t expect their metabolism to change without the drug.
    • Not good enough!
  • You also introduce some control rabbits and treat them exactly the same, including giving them fake pills. You find no significant change in the metabolism of the control rabbits through time
    • Still not good enough! Why?

Testing interactions

  • Use an interaction: \[ M = a + B_x X + B_t t + B_{xt} Xt \]
  • The interaction term \(B_{xt}\) represents the difference in the response between the two groups.
  • It asks: how different are the changes in the treatment group from the changes in the control group?

Interactions and parameters

  • In simple cases (2x2), the interaction only uses one parameter
  • We can use CIs, and coefficient plots, and understand what’s going on
  • In complicated cases, interaction terms have many parameters
  • These have all the interpretation problems of other multi-parameter variables, or more
  • Think about “differences in differences”

Interactions: example

  • Bear movement behaviour (Karelus et al. 2017)
  • Predictor variables: sex (categorical), road type (categorical: major/minor), road length (continuous)
  • Two-way interactions
    • sex \(\times\) road length: “are females more or less sensitive to amount of road than males?”
    • sex \(\times\) road type: “do females vary behaviour between road type more or less than males?”
    • road type \(\times\) road length: “how differently does amount of road affect crossings for different road types?”

Statistical philosophy (again!)

  • Don’t accept the null hypothesis
    • Don’t throw out predictors you wanted to test because they’re not significant
    • Don’t throw out interactions you wanted to test because they’re not significant
  • This may make your life harder, but it’s worth it for the karma
    • There are techniques to deal with ‘too many’ predictors (e.g., ridge or lasso regression)
    • good ways to estimate sensible main effects in the presence of interactions (centering variables (Schielzeth 2010); “sum-to-zero contrasts”)

Diagnostics

  • Because the linear model depends (somewhat!) on assumptions, it is good to evaluate them
  • Concerns (in order):
    • Bias/linearity: did you miss a nonlinear relationship?
    • Heteroscedasticity (does variance change across the data set, e.g. increasing variance with increasing mean?)
    • Independence is hard to test (unless your observations are ordered, e.g. in time or space)
    • Normality (assuming no overall problems, do your residuals look Normal?) [the least important of these assumptions]

Diagnostic plots: base R

Data from Paczolt et al. (2015)

skewdata <- read.csv("../data/skewdat.csv")
m1 <- lm(skew~Size, data=skewdata)
plot(m1, id.n=4)

performance::check_model()

performance::check_model(m1)

DHARMa

library(DHARMa); plot(simulateResiduals(m1))

Using significance tests for diagnostics

  • another mine field
  • small data sets: tests not very powerful
  • large data sets: tests too powerful
  • answering the wrong question! not “are my data Normal?” (no), but “are my data sufficiently non-Normal to give me a (sufficiently) wrong answer”?
  • problems with two-stage approaches (using a test to decide what to do) (Shamsudheen and Hennig 2023)
  • what should you do??

Transformations

  • Can deal with some problems in model assumptions by transforming variables
  • A transformed scale may be as natural (or more natural) a way to think about your data as your original scale
  • The linear scale (no transformation) often has direct meaning, if you are adding things up or scaling them
  • The log scale is often the best scale for thinking about physical quantities

Transformation tradeoffs

  • A transformation may help you meet model assumptions
    • Homoscedasticity
    • Linearity
    • Normality
  • No guarantee that you can fix them all at once
  • Hard: Counts, data with zeros, fat tails …

Transformations to consider

  • log \(y\)-lin \(x\) for \(y \propto e^x\); log \(y\)-log \(x\) for \(y \propto x^b\)
  • Box-Cox and related transformations
  • GLM may be preferable to classical transform + LM for:
    • proportions (logistic, arcsin-sqrt) (Warton and Hui 2011; Smithson and Verkuilen 2006; Kubinec 2022)
    • count data (log, log(1+x)) (O’Hara and Kotze 2010; Ives 2015; Warton et al. 2016; Morrissey and Ruxton 2020)

Deciding whether to transform

  • It’s not OK to pick transformations based on trying different ones and looking at \(p\) values
  • Box-Cox transformation tries out transformations of the form \((y^\lambda-1)/\lambda\) (\(\lambda=0\) corresponds to log-transformation) (positive data only)
  • tricky for data \(\le 0\)
  • see MASS::boxcox() or car::powerTransform()

Collinearity

  • ignore common advice about “variance inflation factors”
  • simply throwing away correlated predictors is snoop-y and unstable
  • other solutions: PCA, a priori choice
  • Dormann et al. (2012), Graham (2003), Morrissey and Ruxton (2018), Vanhove (2021)

Tools for fitting and inference

Basic tools

  • lm fits a linear model
  • summary prints statistics associated with the fitted parameters
  • dotwhisker::dwplot() plots a coefficient table (Gelman, Pasarica, and Dodhia 2002); by_2sd=TRUE automatically scales continuous predictors
  • car::Anova() and drop1() finds \(p\) values for multi-parameter effects. anova(model1, model2) tests the difference between two specific models
  • broom::tidy() gives you tidy coefficient tables; consider conf.int = TRUE
  • parameters package

Plotting

  • plot.lm() (S3 method): diagnostic** plots
  • base R:
    • predict to predict values (newdata, se.fit args)
    • simulate to simulate values from the fitted model
    • methods(class="lm") shows lots of possibilities
  • ggplot: geom_smooth(method="lm") fits a linear model to each group of data (i.e. by facet/colour/etc.) and plots predictions with CIs
  • effects, emmeans, sjPlot, marginaleffects, ggpredict, …

Examples

Simple model

data (Schoener 1970)

library(tidyverse)
lizards <- read.csv("../data/lizards.csv", stringsAsFactors=TRUE) |>
  mutate(across(time, ~ factor(., levels=c("early","midday","late"))))
lm_int   <- lm(grahami~light*time, data=lizards)
lm_add   <- update(lm_int, . ~ light + time)
lm_light <- update(lm_add, . ~ . - time)
lm_time  <-  update(lm_add, . ~ . - light)

Summary

summary(lm_add)
## 
## Call:
## lm(formula = grahami ~ light + time, data = lizards)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.4250 -11.6937   0.2125   6.6250  28.5750 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   27.288      5.631   4.846 0.000112 ***
## lightsunny   -19.325      5.734  -3.370 0.003213 ** 
## timemidday    13.137      7.106   1.849 0.080099 .  
## timelate      -9.500      6.853  -1.386 0.181743    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.71 on 19 degrees of freedom
## Multiple R-squared:  0.5466, Adjusted R-squared:  0.475 
## F-statistic: 7.634 on 3 and 19 DF,  p-value: 0.001513

Test effect of time (2 df) by comparing with additive model with light model (dropping time)

anova(lm_add, lm_light)
## Analysis of Variance Table
## 
## Model 1: grahami ~ light + time
## Model 2: grahami ~ light
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     19 3569.6                              
## 2     21 5481.9 -2   -1912.3 5.0895 0.01698 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Test light

Because light has only two levels (sunny/shady), this gives the same \(p\)-value as shown in summary() for lightsunny. drop1 and car::Anova give the same \(p\)-value too …

anova(lm_add, lm_time)
## Analysis of Variance Table
## 
## Model 1: grahami ~ light + time
## Model 2: grahami ~ time
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1     19 3569.6                                
## 2     20 5703.6 -1     -2134 11.359 0.003213 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Test both light and time

drop1(lm_add, test="F")
## Single term deletions
## 
## Model:
## grahami ~ light + time
##        Df Sum of Sq    RSS    AIC F value   Pr(>F)   
## <none>              3569.6 124.03                    
## light   1    2134.0 5703.6 132.81 11.3589 0.003213 **
## time    2    1912.3 5481.9 129.90  5.0895 0.016983 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Better anova table

car::Anova(lm_add)
## Anova Table (Type II tests)
## 
## Response: grahami
##           Sum Sq Df F value   Pr(>F)   
## light     2134.0  1 11.3589 0.003213 **
## time      1912.3  2  5.0895 0.016983 * 
## Residuals 3569.6 19                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interaction model

print(summary(lm_int))
## 
## Call:
## lm(formula = grahami ~ light * time, data = lizards)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.250  -4.125   1.250   6.875  17.750 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             23.500      5.375   4.372 0.000416 ***
## lightsunny             -11.750      7.602  -1.546 0.140606    
## timemidday              27.750      7.602   3.650 0.001980 ** 
## timelate               -12.750      7.602  -1.677 0.111798    
## lightsunny:timemidday  -32.833     11.190  -2.934 0.009266 ** 
## lightsunny:timelate      6.500     10.751   0.605 0.553432    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.75 on 17 degrees of freedom
## Multiple R-squared:  0.7504, Adjusted R-squared:  0.677 
## F-statistic: 10.22 on 5 and 17 DF,  p-value: 0.0001179

drop1 for interaction model

Drops only the interaction term (respects marginality)

drop1(lm_int, test = "F")
## Single term deletions
## 
## Model:
## grahami ~ light * time
##            Df Sum of Sq    RSS    AIC F value   Pr(>F)   
## <none>                  1964.9 114.30                    
## light:time  2    1604.7 3569.6 124.03  6.9416 0.006254 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA for models with interactions

From ?car::Anova [emphasis added]:

Type-II tests are calculated according to the principle of marginality, testing each term after all others, except ignoring the term’s higher-order relatives; so-called type-III tests violate marginality, testing each term in the model after all of the others. … Be very careful in formulating the model for type-III tests, or the hypotheses tested will not make sense.

Probably best to set options(contrasts = c("contr.sum", "contr.poly")) if you want to use type 3 ANOVA!

compare …

Three different questions, three different answers …

car::Anova(lm_int, type = "II")
car::Anova(lm_int, type = "III")
## alternative to setting contrasts globally
lm_int_sum <- update(lm_int,
                     contrasts = list(light = contr.sum,
                                      time = contr.sum))
car::Anova(lm_int_sum, type = "III")

Contrasts

Contrasts allow you to ask specific questions (rather than “is the overall term significant?” or “all pairwise contrasts”)

help("contrast-methods", package = "emmeans")

library(emmeans)
em_time <- emmeans(lm_int, ~ time)
## NOTE: Results may be misleading due to involvement in interactions
## default does a multiple comparisons correction
print(em_contr <- contrast(em_time, method = "eff", adjust = "none"))
##  contrast      estimate   SE df t.ratio p.value
##  early effect    -0.611 3.15 17  -0.194  0.8483
##  midday effect   10.722 3.27 17   3.278  0.0044
##  late effect    -10.111 3.15 17  -3.214  0.0051
## 
## Results are averaged over the levels of: light

Plot contrasts

plot(em_contr) + geom_vline(xintercept = 0, lty = 2)

Multiple comparisons

Multiple comparisons

  • Multiple comparisons are tricky
  • can get away without them for a linear model (use variable-level \(p\) values) unless you want to do pairwise comparisons.
  • Unlike variable-level tests, MCs depend on how you set up your model (i.e. centering of continuous variables, contrasts of categorical variables)

compact letter displays

There is a tradition of figuring out which pairs of treatments are significantly different and assigning a letter to each such “not-clearly-not-equivalent” group. If you need to do this to make your supervisor happy, you can use ggpubr. The multcomp package can do the basic computations too.

From Russ Lenth (emmeans author):

These displays promote visually the idea that two means that are “not significantly different” are to be judged as being equal; and that is a very wrong interpretation. In addition, they draw an artificial “bright line” between P values on either side of alpha, even ones that are very close.

Pairwise comparisons with emmeans

See emmeans vignette on comparisons

library(emmeans)
library(ggplot2); theme_set(theme_bw())
e1 <- emmeans(lm_add, spec = ~ time)
plot(pairs(e1)) + geom_vline(xintercept = 0, lty = 2)

effects plot

library(effects)
## Loading required package: carData
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(lm_int))

dot-whisker plot

  • depends on contrasts set in the model!
  • probably works best for numeric covariates: by_2sd standardization available
  • good for comparing models (show coefficients side-by-side, dodged)

dot-whisker plot

library(dotwhisker)
dotwhisker::dwplot(
              list(interaction = lm_int,
                   additive = lm_add)) + geom_vline(xintercept=0, lty =2)

Diagnostic vs prediction vs inferential

  • diagnostic plots: base-R plot(), performance::check_model(), DHARMa
  • inferential: coefficient plots, contrast plots
  • predictive: emmeans, effects, ggpredict

Prediction plots

  • For multi-parameter models, have to make a decision about how to set non-focal parameters
  • this is what emmeans and effects packages are originally designed for

emmeans (1)

plot(emmeans(lm_int, ~ time*light))

emmeans (2)

plot(emmeans(lm_int, ~ time), comparison = TRUE)
## NOTE: Results may be misleading due to involvement in interactions

Adds 83.4% “non-overlap” arrows (Knol, Pestman, and Grobbee 2011)

Generalizations

(part of) the statistical universe

Extensions (Faraway 2016)

  • Generalized linear models
    • (Some) non-linear relationships
    • Non-normal response families (binomial, Poisson, …)
  • Mixed models incorporate random effects
    • Categories in data represent samples from a population
    • e.g. species, sites, genes … experimental blocks
  • Generalized additive models (GAMs)
    • much broader range of non-linear relationships

References

Dormann, Carsten F., Jane Elith, Sven Bacher, Carsten Buchmann, Gudrun Carl, Gabriel Carré, Jaime R. García Marquéz, et al. 2012. “Collinearity: A Review of Methods to Deal with It and a Simulation Study Evaluating Their Performance.” Ecography, no–. https://doi.org/10.1111/j.1600-0587.2012.07348.x.

Faraway, Julian J. 2016. Extending Linear Models with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. 2nd ed. Chapman & Hall/CRC.

Gelman, Andrew, Cristian Pasarica, and Rahul Dodhia. 2002. “Let’s Practice What We Preach: Turning Tables into Graphs.” The American Statistician 56 (2): 121–30. http://www.jstor.org/stable/3087382.

Graham, Michael H. 2003. “Confronting Multicollinearity in Ecological Multiple Regression.” Ecology 84 (11): 2809–15. https://doi.org/10.1890/02-3114.

Ives, Anthony R. 2015. “For Testing the Significance of Regression Coefficients, Go Ahead and Log-Transform Count Data.” Methods in Ecology and Evolution 6 (7): 828–35.

Karelus, Dana L, J Walter McCown, Brian K Scheick, Madelon van de Kerk, Benjamin M Bolker, and Madan K Oli. 2017. “Effects of Environmental Factors and Landscape Features on Movement Patterns of Florida Black Bears.” Journal of Mammalogy 98 (5): 1463–78. https://doi.org/10.1093/jmammal/gyx066.

Knol, Mirjam J., Wiebe R. Pestman, and Diederick E. Grobbee. 2011. “The (Mis)use of Overlap of Confidence Intervals to Assess Effect Modification.” European Journal of Epidemiology 26 (4): 253–54. https://doi.org/10.1007/s10654-011-9563-8.

Kubinec, Robert. 2022. “Ordered Beta Regression: A Parsimonious, Well-Fitting Model for Continuous Data with Lower and Upper Bounds.” Political Analysis, July, 1–18. https://doi.org/10.1017/pan.2022.20.

Morrissey, Michael B., and Graeme D. Ruxton. 2018. “Multiple Regression Is Not Multiple Regressions: The Meaning of Multiple Regression and the Non- Problem of Collinearity.” Philosophy, Theory, and Practice in Biology 10 (3).

———. 2020. “Revisiting Advice on the Analysis of Count Data.” Methods in Ecology and Evolution 11 (9): 1133–40.

O’Hara, Robert, and Johan Kotze. 2010. “Do Not Log-Transform Count Data.” Nature Precedings, 1–1.

Paczolt, Kimberly A., Courtney N. Passow, Pablo J. Delclos, Holly K. Kindsvater, Adam G. Jones, and Gil G. Rosenthal. 2015. “Multiple Mating and Reproductive Skew in Parental and Introgressed Females of the Live-Bearing Fish Xiphophorus Birchmanni.” Journal of Heredity 106 (1): 57–66. https://doi.org/10.1093/jhered/esu066.

Schielzeth, Holger. 2010. “Simple Means to Improve the Interpretability of Regression Coefficients.” Methods in Ecology and Evolution 1: 103–13. https://doi.org/10.1111/j.2041-210X.2010.00012.x.

Schoener, Thomas W. 1970. “Nonsynchronous Spatial Overlap of Lizards in Patchy Habitats.” Ecology 51 (3): 408–18. https://doi.org/10.2307/1935376.

Shamsudheen, Iqbal, and Christian Hennig. 2023. “Should We Test the Model Assumptions Before Running a Model-Based Test?” Journal of Data Science, Statistics, and Visualisation 3 (3). https://doi.org/10.52933/jdssv.v3i3.73.

Smithson, Michael, and Jay Verkuilen. 2006. “A Better Lemon Squeezer? Maximum-likelihood Regression with Beta-Distributed Dependent Variables.” Psychological Methods 11 (1): 54–71. https://doi.org/10.1037/1082-989X.11.1.54.

Vanhove, Jan. 2021. “Collinearity Isn’t a Disease That Needs Curing.” Meta-Psychology 5 (April). https://doi.org/10.15626/MP.2021.2548.

Warton, David I., and Francis K. C. Hui. 2011. “The Arcsine Is Asinine: The Analysis of Proportions in Ecology.” Ecology 92 (January): 3–10. https://doi.org/10.1890/10-0340.1.

Warton, David I., Mitchell Lyons, Jakub Stoklosa, and Anthony R. Ives. 2016. “Three Points to Consider When Choosing a LM or GLM Test for Count Data.” Methods in Ecology and Evolution 7 (8): 882–90.