lm
in R), to distinguish it from the
generalized linear model (glm
in R)PROC GLM
splines::ns()
, or mgcv
package)aabbc
list (more later)skewdata <- read.csv("../data/skewdat.csv")
m1 <- lm(skew~Size, data=skewdata)
plot(m1, id.n=4)
performance::check_model(m1)
MASS::boxcox()
or
car::powerTransform()
lm
fits a linear modelsummary
prints statistics associated with the
parameters that were fitteddotwhisker::dwplot()
plots a coefficient
table (Gelman, Pasarica, and Dodhia
2002); by_2sd=TRUE
automatically scales continuous
predictorscar::Anova()
and drop1()
will find \(p\) values that test the effect of
variables that have more than one parameter.
anova(model1, model2)
tests the difference between two
specific models (only needed for specialized comparisons)broom::tidy()
gives you tidy coefficient tables;
consider conf.int = TRUE
plot
applied to an lm
object gives you
diagnostic plotsggplot
, geom_smooth(method="lm")
fits a
linear model to each group of data (i.e. each group that you have
identified by plotting it in a different colour, within a different
facet, etc.effects
, emmeans
, sjPlot
packages …predict
gives predicted values, and standard
errors.simulate
simulates values from the fitted modelmethods(class="lm")
shows lots of possibilitieslizards <- read.csv("../data/lizards.csv", stringsAsFactors=TRUE)
lizards$time <- factor(lizards$time,levels=c("early","midday","late"))
lmint <- lm(grahami~light*time, data=lizards)
lmboth <- lm(grahami~light + time, data=lizards)
lmlight <- lm(grahami~light, data=lizards)
lmtime <- lm(grahami~time, data=lizards)
summary(lmboth)
##
## Call:
## lm(formula = grahami ~ light + time, data = lizards)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.4250 -11.6937 0.2125 6.6250 28.5750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.288 5.631 4.846 0.000112 ***
## lightsunny -19.325 5.734 -3.370 0.003213 **
## timemidday 13.137 7.106 1.849 0.080099 .
## timelate -9.500 6.853 -1.386 0.181743
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.71 on 19 degrees of freedom
## Multiple R-squared: 0.5466, Adjusted R-squared: 0.475
## F-statistic: 7.634 on 3 and 19 DF, p-value: 0.001513
anova(lmboth, lmlight)
## Analysis of Variance Table
##
## Model 1: grahami ~ light + time
## Model 2: grahami ~ light
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 19 3569.6
## 2 21 5481.9 -2 -1912.3 5.0895 0.01698 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lmboth, lmtime)
## Analysis of Variance Table
##
## Model 1: grahami ~ light + time
## Model 2: grahami ~ time
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 19 3569.6
## 2 20 5703.6 -1 -2134 11.359 0.003213 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## In this simple case, drop1 and car::Anova
## will do exactly the same thing we did above
drop1(lmboth, test="F")
## Single term deletions
##
## Model:
## grahami ~ light + time
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 3569.6 124.03
## light 1 2134.0 5703.6 132.81 11.3589 0.003213 **
## time 2 1912.3 5481.9 129.90 5.0895 0.016983 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car::Anova(lmboth)
## Anova Table (Type II tests)
##
## Response: grahami
## Sum Sq Df F value Pr(>F)
## light 2134.0 1 11.3589 0.003213 **
## time 1912.3 2 5.0895 0.016983 *
## Residuals 3569.6 19
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(summary(lmint))
##
## Call:
## lm(formula = grahami ~ light * time, data = lizards)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.250 -4.125 1.250 6.875 17.750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.500 5.375 4.372 0.000416 ***
## lightsunny -11.750 7.602 -1.546 0.140606
## timemidday 27.750 7.602 3.650 0.001980 **
## timelate -12.750 7.602 -1.677 0.111798
## lightsunny:timemidday -32.833 11.190 -2.934 0.009266 **
## lightsunny:timelate 6.500 10.751 0.605 0.553432
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.75 on 17 degrees of freedom
## Multiple R-squared: 0.7504, Adjusted R-squared: 0.677
## F-statistic: 10.22 on 5 and 17 DF, p-value: 0.0001179
If you want to make comparisons you should centre continuous input variables and use sum-to-zero contrasts for factors:
lmint_c <- update(lmint, contrasts = list(time = contr.sum, light = contr.sum))
print(summary(lmint_c))
##
## Call:
## lm(formula = grahami ~ light * time, data = lizards, contrasts = list(time = contr.sum,
## light = contr.sum))
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.250 -4.125 1.250 6.875 17.750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.2361 2.2547 8.088 3.14e-07 ***
## light1 10.2639 2.2547 4.552 0.000282 ***
## time1 -0.6111 3.1463 -0.194 0.848299
## time2 10.7222 3.2714 3.278 0.004440 **
## light1:time1 -4.3889 3.1463 -1.395 0.181001
## light1:time2 12.0278 3.2714 3.677 0.001870 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.75 on 17 degrees of freedom
## Multiple R-squared: 0.7504, Adjusted R-squared: 0.677
## F-statistic: 10.22 on 5 and 17 DF, p-value: 0.0001179
There is a tradition of figuring out which pairs of
treatments are significantly different and assigning a letter to each
such “not-clearly-not-equivalent” group. If you need to do this to make
your supervisor happy, you can use
ggpubr. The multcomp
package can do the basic
computations too.
See emmeans vignette on comparisons
library(emmeans)
library(ggplot2); theme_set(theme_bw())
e1 <- emmeans(lmboth, spec = ~ time)
plot(pairs(e1)) + geom_vline(xintercept = 0, lty = 2)
Or you could just look at the picture:
plot(e1)