lm in R), to distinguish it from the generalized linear model (glm)https://lindeloev.github.io/tests-as-linear/
splines::ns(), or mgcv package)summary() output)aabbc list (more later)nlme::Orthodont data)Data from Paczolt et al. (2015)
skewdata <- read.csv("../data/skewdat.csv")
m1 <- lm(skew~Size, data=skewdata)
plot(m1, id.n=4)
performance::check_model(m1)
library(DHARMa); plot(simulateResiduals(m1))
MASS::boxcox() or car::powerTransform()lm fits a linear modelsummary prints statistics associated with the fitted parametersdotwhisker::dwplot() plots a coefficient table (Gelman, Pasarica, and Dodhia 2002); by_2sd=TRUE automatically scales continuous predictorscar::Anova() and drop1() finds \(p\) values for multi-parameter effects. anova(model1, model2) tests the difference between two specific modelsbroom::tidy() gives you tidy coefficient tables; consider conf.int = TRUEplot.lm() (S3 method): diagnostic** plotspredict to predict values (newdata, se.fit args)simulate to simulate values from the fitted modelmethods(class="lm") shows lots of possibilitiesggplot: geom_smooth(method="lm") fits a linear model to each group of data (i.e. by facet/colour/etc.) and plots predictions with CIseffects, emmeans, sjPlot, marginaleffects, ggpredict, …data (Schoener 1970)
library(tidyverse)
lizards <- read.csv("../data/lizards.csv", stringsAsFactors=TRUE) |>
mutate(across(time, ~ factor(., levels=c("early","midday","late"))))
lm_int <- lm(grahami~light*time, data=lizards)
lm_add <- update(lm_int, . ~ light + time)
lm_light <- update(lm_add, . ~ . - time)
lm_time <- update(lm_add, . ~ . - light)
summary(lm_add)
## ## Call: ## lm(formula = grahami ~ light + time, data = lizards) ## ## Residuals: ## Min 1Q Median 3Q Max ## -19.4250 -11.6937 0.2125 6.6250 28.5750 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 27.288 5.631 4.846 0.000112 *** ## lightsunny -19.325 5.734 -3.370 0.003213 ** ## timemidday 13.137 7.106 1.849 0.080099 . ## timelate -9.500 6.853 -1.386 0.181743 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 13.71 on 19 degrees of freedom ## Multiple R-squared: 0.5466, Adjusted R-squared: 0.475 ## F-statistic: 7.634 on 3 and 19 DF, p-value: 0.001513
anova(lm_add, lm_light)
## Analysis of Variance Table ## ## Model 1: grahami ~ light + time ## Model 2: grahami ~ light ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 19 3569.6 ## 2 21 5481.9 -2 -1912.3 5.0895 0.01698 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Because light has only two levels (sunny/shady), this gives the same \(p\)-value as shown in summary() for lightsunny. drop1 and car::Anova give the same \(p\)-value too …
anova(lm_add, lm_time)
## Analysis of Variance Table ## ## Model 1: grahami ~ light + time ## Model 2: grahami ~ time ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 19 3569.6 ## 2 20 5703.6 -1 -2134 11.359 0.003213 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(lm_add, test="F")
## Single term deletions ## ## Model: ## grahami ~ light + time ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 3569.6 124.03 ## light 1 2134.0 5703.6 132.81 11.3589 0.003213 ** ## time 2 1912.3 5481.9 129.90 5.0895 0.016983 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car::Anova(lm_add)
## Anova Table (Type II tests) ## ## Response: grahami ## Sum Sq Df F value Pr(>F) ## light 2134.0 1 11.3589 0.003213 ** ## time 1912.3 2 5.0895 0.016983 * ## Residuals 3569.6 19 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(summary(lm_int))
## ## Call: ## lm(formula = grahami ~ light * time, data = lizards) ## ## Residuals: ## Min 1Q Median 3Q Max ## -30.250 -4.125 1.250 6.875 17.750 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 23.500 5.375 4.372 0.000416 *** ## lightsunny -11.750 7.602 -1.546 0.140606 ## timemidday 27.750 7.602 3.650 0.001980 ** ## timelate -12.750 7.602 -1.677 0.111798 ## lightsunny:timemidday -32.833 11.190 -2.934 0.009266 ** ## lightsunny:timelate 6.500 10.751 0.605 0.553432 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.75 on 17 degrees of freedom ## Multiple R-squared: 0.7504, Adjusted R-squared: 0.677 ## F-statistic: 10.22 on 5 and 17 DF, p-value: 0.0001179
drop1 for interaction modelDrops only the interaction term (respects marginality)
drop1(lm_int, test = "F")
## Single term deletions ## ## Model: ## grahami ~ light * time ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 1964.9 114.30 ## light:time 2 1604.7 3569.6 124.03 6.9416 0.006254 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From ?car::Anova [emphasis added]:
Type-II tests are calculated according to the principle of marginality, testing each term after all others, except ignoring the term’s higher-order relatives; so-called type-III tests violate marginality, testing each term in the model after all of the others. … Be very careful in formulating the model for type-III tests, or the hypotheses tested will not make sense.
Probably best to set options(contrasts = c("contr.sum", "contr.poly")) if you want to use type 3 ANOVA!
Three different questions, three different answers …
car::Anova(lm_int, type = "II")
car::Anova(lm_int, type = "III")
## alternative to setting contrasts globally
lm_int_sum <- update(lm_int,
contrasts = list(light = contr.sum,
time = contr.sum))
car::Anova(lm_int_sum, type = "III")
Contrasts allow you to ask specific questions (rather than “is the overall term significant?” or “all pairwise contrasts”)
help("contrast-methods", package = "emmeans")
library(emmeans) em_time <- emmeans(lm_int, ~ time)
## NOTE: Results may be misleading due to involvement in interactions
## default does a multiple comparisons correction print(em_contr <- contrast(em_time, method = "eff", adjust = "none"))
## contrast estimate SE df t.ratio p.value ## early effect -0.611 3.15 17 -0.194 0.8483 ## midday effect 10.722 3.27 17 3.278 0.0044 ## late effect -10.111 3.15 17 -3.214 0.0051 ## ## Results are averaged over the levels of: light
plot(em_contr) + geom_vline(xintercept = 0, lty = 2)
There is a tradition of figuring out which pairs of treatments are significantly different and assigning a letter to each such “not-clearly-not-equivalent” group. If you need to do this to make your supervisor happy, you can use ggpubr. The multcomp package can do the basic computations too.
From Russ Lenth (emmeans author):
These displays promote visually the idea that two means that are “not significantly different” are to be judged as being equal; and that is a very wrong interpretation. In addition, they draw an artificial “bright line” between P values on either side of alpha, even ones that are very close.
See emmeans vignette on comparisons
library(emmeans) library(ggplot2); theme_set(theme_bw()) e1 <- emmeans(lm_add, spec = ~ time)
plot(pairs(e1)) + geom_vline(xintercept = 0, lty = 2)
library(effects)
## Loading required package: carData
## lattice theme set by effectsTheme() ## See ?effectsTheme for details.
plot(allEffects(lm_int))
by_2sd standardization availablelibrary(dotwhisker)
dotwhisker::dwplot(
list(interaction = lm_int,
additive = lm_add)) + geom_vline(xintercept=0, lty =2)
plot(), performance::check_model(), DHARMaemmeans, effects, ggpredictemmeans and effects packages are originally designed forplot(emmeans(lm_int, ~ time*light))
plot(emmeans(lm_int, ~ time), comparison = TRUE)
## NOTE: Results may be misleading due to involvement in interactions
Adds 83.4% “non-overlap” arrows (Knol, Pestman, and Grobbee 2011)
Dormann, Carsten F., Jane Elith, Sven Bacher, Carsten Buchmann, Gudrun Carl, Gabriel Carré, Jaime R. García Marquéz, et al. 2012. “Collinearity: A Review of Methods to Deal with It and a Simulation Study Evaluating Their Performance.” Ecography, no–. https://doi.org/10.1111/j.1600-0587.2012.07348.x.
Faraway, Julian J. 2016. Extending Linear Models with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. 2nd ed. Chapman & Hall/CRC.
Gelman, Andrew, Cristian Pasarica, and Rahul Dodhia. 2002. “Let’s Practice What We Preach: Turning Tables into Graphs.” The American Statistician 56 (2): 121–30. http://www.jstor.org/stable/3087382.
Graham, Michael H. 2003. “Confronting Multicollinearity in Ecological Multiple Regression.” Ecology 84 (11): 2809–15. https://doi.org/10.1890/02-3114.
Ives, Anthony R. 2015. “For Testing the Significance of Regression Coefficients, Go Ahead and Log-Transform Count Data.” Methods in Ecology and Evolution 6 (7): 828–35.
Karelus, Dana L, J Walter McCown, Brian K Scheick, Madelon van de Kerk, Benjamin M Bolker, and Madan K Oli. 2017. “Effects of Environmental Factors and Landscape Features on Movement Patterns of Florida Black Bears.” Journal of Mammalogy 98 (5): 1463–78. https://doi.org/10.1093/jmammal/gyx066.
Knol, Mirjam J., Wiebe R. Pestman, and Diederick E. Grobbee. 2011. “The (Mis)use of Overlap of Confidence Intervals to Assess Effect Modification.” European Journal of Epidemiology 26 (4): 253–54. https://doi.org/10.1007/s10654-011-9563-8.
Kubinec, Robert. 2022. “Ordered Beta Regression: A Parsimonious, Well-Fitting Model for Continuous Data with Lower and Upper Bounds.” Political Analysis, July, 1–18. https://doi.org/10.1017/pan.2022.20.
Morrissey, Michael B., and Graeme D. Ruxton. 2018. “Multiple Regression Is Not Multiple Regressions: The Meaning of Multiple Regression and the Non- Problem of Collinearity.” Philosophy, Theory, and Practice in Biology 10 (3).
———. 2020. “Revisiting Advice on the Analysis of Count Data.” Methods in Ecology and Evolution 11 (9): 1133–40.
O’Hara, Robert, and Johan Kotze. 2010. “Do Not Log-Transform Count Data.” Nature Precedings, 1–1.
Paczolt, Kimberly A., Courtney N. Passow, Pablo J. Delclos, Holly K. Kindsvater, Adam G. Jones, and Gil G. Rosenthal. 2015. “Multiple Mating and Reproductive Skew in Parental and Introgressed Females of the Live-Bearing Fish Xiphophorus Birchmanni.” Journal of Heredity 106 (1): 57–66. https://doi.org/10.1093/jhered/esu066.
Schielzeth, Holger. 2010. “Simple Means to Improve the Interpretability of Regression Coefficients.” Methods in Ecology and Evolution 1: 103–13. https://doi.org/10.1111/j.2041-210X.2010.00012.x.
Schoener, Thomas W. 1970. “Nonsynchronous Spatial Overlap of Lizards in Patchy Habitats.” Ecology 51 (3): 408–18. https://doi.org/10.2307/1935376.
Shamsudheen, Iqbal, and Christian Hennig. 2023. “Should We Test the Model Assumptions Before Running a Model-Based Test?” Journal of Data Science, Statistics, and Visualisation 3 (3). https://doi.org/10.52933/jdssv.v3i3.73.
Smithson, Michael, and Jay Verkuilen. 2006. “A Better Lemon Squeezer? Maximum-likelihood Regression with Beta-Distributed Dependent Variables.” Psychological Methods 11 (1): 54–71. https://doi.org/10.1037/1082-989X.11.1.54.
Vanhove, Jan. 2021. “Collinearity Isn’t a Disease That Needs Curing.” Meta-Psychology 5 (April). https://doi.org/10.15626/MP.2021.2548.
Warton, David I., and Francis K. C. Hui. 2011. “The Arcsine Is Asinine: The Analysis of Proportions in Ecology.” Ecology 92 (January): 3–10. https://doi.org/10.1890/10-0340.1.
Warton, David I., Mitchell Lyons, Jakub Stoklosa, and Anthony R. Ives. 2016. “Three Points to Consider When Choosing a LM or GLM Test for Count Data.” Methods in Ecology and Evolution 7 (8): 882–90.