lm
in R), to distinguish it from the generalized linear model (glm
in R)PROC GLM
splines::ns()
, or mgcv
package)aabbc
list (more later)skewdata <- read.csv("../data/skewdat.csv") m1 <- lm(skew~Size, data=skewdata)
plot(m1, id.n=4)
performance::check_model(m1)
MASS::boxcox()
or car::powerTransform()
lm
fits a linear modelsummary
prints statistics associated with the parameters that were fitteddotwhisker::dwplot()
plots a coefficient table (Gelman, Pasarica, and Dodhia 2002); by_2sd=TRUE
automatically scales continuous predictorscar::Anova()
and drop1()
will find \(p\) values that test the effect of variables that have more than one parameter. anova(model1, model2)
tests the difference between two specific models (only needed for specialized comparisons)broom::tidy()
gives you tidy coefficient tables; consider conf.int = TRUE
plot
applied to an lm
object gives you diagnostic plotsggplot
, geom_smooth(method="lm")
fits a linear model to each group of data (i.e. each group that you have identified by plotting it in a different colour, within a different facet, etc.effects
, emmeans
, sjPlot
packages …predict
gives predicted values, and standard errors.simulate
simulates values from the fitted modelmethods(class="lm")
shows lots of possibilitieslizards <- read.csv("../data/lizards.csv", stringsAsFactors=TRUE) lizards$time <- factor(lizards$time,levels=c("early","midday","late")) lmint <- lm(grahami~light*time, data=lizards) lmboth <- lm(grahami~light + time, data=lizards) lmlight <- lm(grahami~light, data=lizards) lmtime <- lm(grahami~time, data=lizards)
summary(lmboth)
## ## Call: ## lm(formula = grahami ~ light + time, data = lizards) ## ## Residuals: ## Min 1Q Median 3Q Max ## -19.4250 -11.6937 0.2125 6.6250 28.5750 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 27.288 5.631 4.846 0.000112 *** ## lightsunny -19.325 5.734 -3.370 0.003213 ** ## timemidday 13.137 7.106 1.849 0.080099 . ## timelate -9.500 6.853 -1.386 0.181743 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 13.71 on 19 degrees of freedom ## Multiple R-squared: 0.5466, Adjusted R-squared: 0.475 ## F-statistic: 7.634 on 3 and 19 DF, p-value: 0.001513
anova(lmboth, lmlight)
## Analysis of Variance Table ## ## Model 1: grahami ~ light + time ## Model 2: grahami ~ light ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 19 3569.6 ## 2 21 5481.9 -2 -1912.3 5.0895 0.01698 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lmboth, lmtime)
## Analysis of Variance Table ## ## Model 1: grahami ~ light + time ## Model 2: grahami ~ time ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 19 3569.6 ## 2 20 5703.6 -1 -2134 11.359 0.003213 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## In this simple case, drop1 and car::Anova ## will do exactly the same thing we did above
drop1(lmboth, test="F")
## Single term deletions ## ## Model: ## grahami ~ light + time ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 3569.6 124.03 ## light 1 2134.0 5703.6 132.81 11.3589 0.003213 ** ## time 2 1912.3 5481.9 129.90 5.0895 0.016983 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car::Anova(lmboth)
## Anova Table (Type II tests) ## ## Response: grahami ## Sum Sq Df F value Pr(>F) ## light 2134.0 1 11.3589 0.003213 ** ## time 1912.3 2 5.0895 0.016983 * ## Residuals 3569.6 19 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(summary(lmint))
## ## Call: ## lm(formula = grahami ~ light * time, data = lizards) ## ## Residuals: ## Min 1Q Median 3Q Max ## -30.250 -4.125 1.250 6.875 17.750 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 23.500 5.375 4.372 0.000416 *** ## lightsunny -11.750 7.602 -1.546 0.140606 ## timemidday 27.750 7.602 3.650 0.001980 ** ## timelate -12.750 7.602 -1.677 0.111798 ## lightsunny:timemidday -32.833 11.190 -2.934 0.009266 ** ## lightsunny:timelate 6.500 10.751 0.605 0.553432 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.75 on 17 degrees of freedom ## Multiple R-squared: 0.7504, Adjusted R-squared: 0.677 ## F-statistic: 10.22 on 5 and 17 DF, p-value: 0.0001179
If you want to make comparisons you should centre continuous input variables and use sum-to-zero contrasts for factors:
lmint_c <- update(lmint, contrasts = list(time = contr.sum, light = contr.sum)) print(summary(lmint_c))
## ## Call: ## lm(formula = grahami ~ light * time, data = lizards, contrasts = list(time = contr.sum, ## light = contr.sum)) ## ## Residuals: ## Min 1Q Median 3Q Max ## -30.250 -4.125 1.250 6.875 17.750 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 18.2361 2.2547 8.088 3.14e-07 *** ## light1 10.2639 2.2547 4.552 0.000282 *** ## time1 -0.6111 3.1463 -0.194 0.848299 ## time2 10.7222 3.2714 3.278 0.004440 ** ## light1:time1 -4.3889 3.1463 -1.395 0.181001 ## light1:time2 12.0278 3.2714 3.677 0.001870 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.75 on 17 degrees of freedom ## Multiple R-squared: 0.7504, Adjusted R-squared: 0.677 ## F-statistic: 10.22 on 5 and 17 DF, p-value: 0.0001179
There is a tradition of figuring out which pairs of treatments are significantly different and assigning a letter to each such “not-clearly-not-equivalent” group. If you need to do this to make your supervisor happy, you can use ggpubr. The multcomp
package can do the basic computations too.
See emmeans vignette on comparisons
library(emmeans) library(ggplot2); theme_set(theme_bw()) e1 <- emmeans(lmboth, spec = ~ time) plot(pairs(e1)) + geom_vline(xintercept = 0, lty = 2)
Or you could just look at the picture:
plot(e1)
Dormann, Carsten F., Jane Elith, Sven Bacher, Carsten Buchmann, Gudrun Carl, Gabriel Carré, Jaime R. García Marquéz, et al. 2012. “Collinearity: A Review of Methods to Deal with It and a Simulation Study Evaluating Their Performance.” Ecography, no–. https://doi.org/10.1111/j.1600-0587.2012.07348.x.
Faraway, Julian J. 2016. Extending Linear Models with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. 2nd ed. Chapman & Hall/CRC.
Gelman, Andrew, Cristian Pasarica, and Rahul Dodhia. 2002. “Let’s Practice What We Preach: Turning Tables into Graphs.” The American Statistician 56 (2): 121–30. http://www.jstor.org/stable/3087382.
Graham, Michael H. 2003. “Confronting Multicollinearity in Ecological Multiple Regression.” Ecology 84 (11): 2809–15. https://doi.org/10.1890/02-3114.
Ives, Anthony R. 2015. “For Testing the Significance of Regression Coefficients, Go Ahead and Log-Transform Count Data.” Methods in Ecology and Evolution 6 (7): 828–35.
Morrissey, Michael B., and Graeme D. Ruxton. 2018. “Multiple Regression Is Not Multiple Regressions: The Meaning of Multiple Regression and the Non- Problem of Collinearity.” Philosophy, Theory, and Practice in Biology 10 (3).
———. 2020. “Revisiting Advice on the Analysis of Count Data.” Methods in Ecology and Evolution 11 (9): 1133–40.
O’Hara, Robert, and Johan Kotze. 2010. “Do Not Log-Transform Count Data.” Nature Precedings, 1–1.
Schielzeth, Holger. 2010. “Simple Means to Improve the Interpretability of Regression Coefficients.” Methods in Ecology and Evolution 1: 103–13. https://doi.org/10.1111/j.2041-210X.2010.00012.x.
Vanhove, Jan. 2021. “Collinearity Isn’t a Disease That Needs Curing.” Meta-Psychology 5 (April). https://doi.org/10.15626/MP.2021.2548.
Warton, David I., and Francis K. C. Hui. 2011. “The Arcsine Is Asinine: The Analysis of Proportions in Ecology.” Ecology 92 (January): 3–10. https://doi.org/10.1890/10-0340.1.
Warton, David I., Mitchell Lyons, Jakub Stoklosa, and Anthony R. Ives. 2016. “Three Points to Consider When Choosing a LM or GLM Test for Count Data.” Methods in Ecology and Evolution 7 (8): 882–90.