The previously explored ant-colony example:
Define data:
forest <- c(9, 6, 4, 6, 7, 10)
field <- c(12, 9, 12, 10)
ants <- data.frame(
place=rep(c("field","forest"),
c(length(field), length(forest))),
colonies=c(field,forest)
)
## utility function for pretty printing
pr <- function(m) printCoefmat(coef(summary(m)),
digits=3,signif.stars=FALSE)
pr(lm1 <- lm(colonies~place,data=ants))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.75 0.98 10.97 4.2e-06
## placeforest -3.75 1.27 -2.96 0.018
(Intercept)
row refers to \(\beta_1\), which is the mean density in the
“field” sites (“field” comes before “forest”).placeforest
row indicates we are looking at the
effect of forest
level of the place
variable,
i.e. the difference between “forest” and “field” sites. (To know that
“field” is the baseline level we must (1) remember, or look at
levels(ants$place)
or (2) notice which level is
missing from the list of parameter estimates.)R’s behaviour may seem annoying at first – it seems like the
estimated values of the groups are what we’re really interested in – but
it is really designed for testing differences among groups. To
get the estimates per group (what SAS calls LSMEANS
), you
could:
colonies~place-1
, or
equivalently colonies~place+0
, to suppress the implicit
intercept term:## Estimate Std. Error t value Pr(>|t|)
## placefield 10.75 0.98 10.97 4.2e-06
## placeforest 7.00 0.80 8.75 2.3e-05
When you use the colonies~place-1
formula, the meanings
of the parameters change: \(\beta_1\)
is the same (mean of “field”), but \(\beta_2\) is ‘mean of “field”’ rather than
(“(field)-(forest)”).
predict
function:predict(lm1,newdata=data.frame(place=c("field","forest")),
interval="confidence")
effects
package:library("effects")
summary(allEffects(lm1))
emmeans
package:library("emmeans")
emmeans(lm1,specs=~place)
Graphical summaries:
plot(allEffects(lm1))
or
library("dotwhisker")
dwplot(lm0)
Some data on lizard perching behaviour (brglm
package;
Schoener 1970 Ecology 51:408-418). (Ignore the
fact that these are proportions/GLM would be better …)
lizards <- read.csv("../data/lizards.csv")
Response is number of Anolis grahami lizards found on perches in particular conditions.
Start with the time
variable.
If we leave the factors alphabetical then \(\beta_1\)=“early”, \(\beta_2\)=“late”-“early”, \(\beta_3\)=“midday”-“early”. Change the order of the levels:
lizards <- mutate(lizards,
light=factor(light),
time=factor(time,
levels=c("early","midday","late")))
This just swaps the definitions of \(\beta_2\) and \(\beta_3\).
We could also use sum-to-zero contrasts:
pr(lm(grahami~time,data=lizards,contrasts=list(time=contr.sum)))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.30 3.53 5.47 2.4e-05
## time1 -1.67 4.93 -0.34 0.74
## time2 12.85 5.10 2.52 0.02
Now the (Intercept)
parameter is the overall mean:
time1
and time2
are the deviations of the
first (“early”) and second (“midday”) groups from the overall mean. (See
also car::contr.Sum
.)
There are other ways to change the contrasts (i.e., use the
contrasts()
function to change the contrasts for a
particular variable permanently, or use
options(contrasts=c("contr.sum","contr.poly")))
to change
the contrasts for all variables), but this way may be the most
transparent.
There are other options for contrasts such as
MASS::contr.sdif()
, which gives the successive differences
between levels.
library("MASS")
pr(lm(grahami~time,data=lizards,contrasts=list(time=contr.sdif)))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.30 3.53 5.47 2.4e-05
## timemidday-early 14.52 8.74 1.66 0.112
## timelate-midday -24.02 8.74 -2.75 0.012
You might have particular contrasts in mind (e.g. “control” vs. all other treatments, then “low” vs “high” within treatments), in which case it is probably worth learning how to set contrasts. (We will talk about testing all pairwise differences later, when we discuss multiple comparisons. This approach is probably not as useful as it is common.)
Consider the light
variable in addition to
time
.
pr(lmTL1 <- lm(grahami~time+light,data=lizards))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.29 5.63 4.85 0.00011
## timemidday 13.14 7.11 1.85 0.08010
## timelate -9.50 6.85 -1.39 0.18174
## lightsunny -19.32 5.73 -3.37 0.00321
\(\beta_1\) is the intercept
(“early”,“sunny”); \(\beta_2\) and
\(\beta_3\) are the differences from
the baseline level (“early”) of the first variable
(time
) in the baseline level of the other
parameter(s) (light
=“shady”); \(\beta_4\) is the difference from the
baseline level (“sunny”) of the second variable
(light
) in the baseline level of time
(“early”).
pr(lmTL2 <- lm(grahami~time*light,data=lizards))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.50 5.38 4.37 0.00042
## timemidday 27.75 7.60 3.65 0.00198
## timelate -12.75 7.60 -1.68 0.11180
## lightsunny -11.75 7.60 -1.55 0.14061
## timemidday:lightsunny -32.83 11.19 -2.93 0.00927
## timelate:lightsunny 6.50 10.75 0.60 0.55343
Parameters \(\beta_1\) to \(\beta_4\) have the same meanings as before. Now we also have \(\beta_5\) and \(\beta_6\), labelled “timemidday:lightsunny” and “timelate:lightsunny”, which describe the difference between the expected mean value of these treatment combinations based on the additive model (which are \(\beta_1 + \beta_2 + \beta_4\) and \(\beta_1 + \beta_3 + \beta_4\) respectively) and their actual values.
The fits are easy:
pr(lmTL1S <- update(lmTL1,contrasts=list(time=contr.sum,light=contr.sum)))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.84 2.87 6.57 2.7e-06
## time1 -1.21 4.01 -0.30 0.7654
## time2 11.92 4.15 2.87 0.0097
## light1 9.66 2.87 3.37 0.0032
pr(lmTL2S <- update(lmTL2,contrasts=list(time=contr.sum,light=contr.sum)))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.236 2.255 8.09 3.1e-07
## time1 -0.611 3.146 -0.19 0.84830
## time2 10.722 3.271 3.28 0.00444
## light1 10.264 2.255 4.55 0.00028
## time1:light1 -4.389 3.146 -1.39 0.18100
## time2:light1 12.028 3.271 3.68 0.00187
(The intercept doesn’t stay exactly the same when we add the
interaction because the data are unbalanced: try
with(lizards,table(light,time))
)
I didn’t do the pictures.
dwplot(list(additive=lmTL1,interaction=lmTL2))+
geom_vline(xintercept=0,lty=2)
plot(allEffects(lmTL2))
Session info:
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 23.10
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
##
## locale:
## [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
## [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
## [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/Toronto
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] MASS_7.3-60 dplyr_1.1.4 tidyr_1.3.0 dotwhisker_0.7.4
## [5] ggplot2_3.4.4 emmeans_1.8.9 effects_4.2-2 carData_3.0-5
## [9] knitr_1.45
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.4 xfun_0.41 bslib_0.6.1 bayestestR_0.13.1
## [5] insight_0.19.7 lattice_0.22-5 vctrs_0.6.5 tools_4.3.1
## [9] generics_0.1.3 datawizard_0.9.0 sandwich_3.0-2 tibble_3.2.1
## [13] fansi_1.0.6 highr_0.10 pkgconfig_2.0.3 Matrix_1.6-4
## [17] lifecycle_1.0.4 farver_2.1.1 compiler_4.3.1 stringr_1.5.1
## [21] munsell_0.5.0 ggstance_0.3.6 mitools_2.4 codetools_0.2-19
## [25] survey_4.2-1 htmltools_0.5.7 sass_0.4.8 yaml_2.3.7
## [29] pillar_1.9.0 nloptr_2.0.3 jquerylib_0.1.4 cachem_1.0.8
## [33] boot_1.3-28.1 multcomp_1.4-25 nlme_3.1-164 tidyselect_1.2.0
## [37] digest_0.6.33 stringi_1.8.2 mvtnorm_1.2-4 purrr_1.0.2
## [41] labeling_0.4.3 splines_4.3.1 fastmap_1.1.1 colorspace_2.1-0
## [45] cli_3.6.1 magrittr_2.0.3 survival_3.5-7 utf8_1.2.4
## [49] TH.data_1.1-2 withr_2.5.2 scales_1.3.0 estimability_1.4.1
## [53] rmarkdown_2.25 nnet_7.3-19 lme4_1.1-35.1 zoo_1.8-12
## [57] coda_0.19-4 evaluate_0.23 parameters_0.21.3 rlang_1.1.2
## [61] Rcpp_1.0.11 xtable_1.8-4 glue_1.6.2 DBI_1.1.3
## [65] minqa_1.2.6 jsonlite_1.8.8 plyr_1.8.9 R6_2.5.1
gmodels::fit.contrast()
(show parameter estimates based on re-fitting models with new
contrasts), rms::contrast.rms()
(ditto, for rms
-based fits)