Up to now, univariate models; have to decide which is “the” response variable.
Might measure lots of different characteristics for each individual. Could model characteristics separately, but we don’t get any information about the overall changes among individuals. We might want:
\[ \left( \begin{array}{cccc} \sigma^2_1 & \sigma_{12} & \sigma_{13} & \dots \\ . & \sigma^2_{2} & \sigma_{23} & \dots \\ . & . & \sigma^2_{3} & \dots \\ \vdots & \vdots & \vdots & \ddots \end{array} \right) \]
library(lme4)
library(glmmTMB)
## Warning in check_dep_version(dep_pkg = "TMB"): package version mismatch:
## glmmTMB was built with TMB package version 1.9.19
## Current TMB package version is 1.9.20
## Please re-install glmmTMB from source or restore original 'TMB' package (see '?reinstalling' for more information)
library(tidyverse); theme_set(theme_bw())
library(corrplot)
library(broom.mixed)
library(dotwhisker)
library(car)
## install.packages('gllvm',
## repos = c('https://jenniniku.r-universe.dev',
## 'https://cloud.r-project.org'))
library(gllvm)
## maybe?
library(MCMCglmm)
library(brms)
We’ll use a Drosophila melanogaster data set from Ian Dworkin’s PhD work (Dworkin 2005). This was from a study that was meant to test predictions of a model on how mutational and environmental variation can influence the overall structure of phenotypic variation. The data include several traits (lengths) on the first leg as well as the number of sex comb teeth (a structure used to clasp females during copulation) for different wild type strains (line) reared at different developmental temperatures (temp), with and without a mutation that effects proximal-distal axis development in limbs (genotype).
Get data from https://doi.org/10.5061/dryad.8376 or this repo:
dll_data <- read_csv("../data/dll.csv", show_col_types = FALSE) |>
mutate(across(c(temp, replicate, where(is.character)), factor))
morph_vars <- c("femur","tibia","tarsus","SCT")
dll_data
## # A tibble: 1,973 × 8
## replicate line genotype temp femur tibia tarsus SCT
## <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 1 line-1 Dll 25 0.590 0.499 0.219 9
## 2 1 line-1 Dll 25 0.550 0.501 0.214 13
## 3 1 line-1 Dll 25 0.588 0.488 0.211 11
## 4 1 line-1 Dll 25 0.588 0.515 0.211 NA
## 5 1 line-1 Dll 25 0.596 0.502 0.207 12
## 6 1 line-1 Dll 25 0.577 0.499 0.207 14
## 7 1 line-1 Dll 25 0.618 0.494 0.204 11
## 8 1 line-1 Dll 25 0.582 0.528 0.203 12
## 9 1 line-1 Dll 25 0.572 0.479 0.202 10
## 10 1 line-1 Dll 25 0.568 0.479 0.202 12
## # ℹ 1,963 more rows
## tabulate and show only the first five lines
with(dll_data, table(temp, line, genotype))[,1:5,]
## , , genotype = Dll
##
## line
## temp line-1 line-11 line-12 line-13 line-15
## 25 20 16 20 17 19
## 30 9 30 20 6 14
##
## , , genotype = wt
##
## line
## temp line-1 line-11 line-12 line-13 line-15
## 25 42 22 22 19 20
## 30 21 32 14 21 16
grp <- as.numeric(as.factor(dll_data$temp)) ## numeric line value
pairs(dll_data[morph_vars], gap = 0,
col = grp)
(there’s a GGally::ggpairs() function but it’s
clunky)
form <- ~ femur + tibia + tarsus + SCT | interaction(genotype, temp)
car::scatterplotMatrix( form,
ellipse = TRUE, data = dll_data, gap = 0,
plot.points = TRUE, pch = 20, cex = 0.5)
dll_data <- (dll_data |>
## scale & center (and convert back from matrix to vector)
mutate(across(any_of(morph_vars), ~drop(scale(.)),
.names = "{.col}_s"))
)
morph_vars_s <- paste0(morph_vars, "_s")
\[ \mathbf{Y} = \mathbf{XB} + \mathbf{E} \]
mlm_fit1 <- lm(as.matrix(dll_data[morph_vars_s]) ~ genotype,
data = dll_data)
print(mlm_fit1)
##
## Call:
## lm(formula = as.matrix(dll_data[morph_vars_s]) ~ genotype, data = dll_data)
##
## Coefficients:
## femur_s tibia_s tarsus_s SCT_s
## (Intercept) 0.0861 0.2204 -0.0799 0.1422
## genotypewt -0.1514 -0.3931 0.1419 -0.2948
summary() gives a list of summaries, one for each
responsecoef() gives the coefficient matrixbroom::tidy() works toocoef(mlm_fit1)
## femur_s tibia_s tarsus_s SCT_s
## (Intercept) 0.0861 0.220 -0.0799 0.142
## genotypewt -0.1514 -0.393 0.1419 -0.295
head(broom::tidy(mlm_fit1))
## # A tibble: 6 × 6
## response term estimate std.error statistic p.value
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 femur_s (Intercept) 0.0861 0.0331 2.60 9.40e- 3
## 2 femur_s genotypewt -0.151 0.0442 -3.43 6.22e- 4
## 3 tibia_s (Intercept) 0.220 0.0339 6.50 1.03e-10
## 4 tibia_s genotypewt -0.393 0.0453 -8.69 7.91e-18
## 5 tarsus_s (Intercept) -0.0799 0.0343 -2.33 2.01e- 2
## 6 tarsus_s genotypewt 0.142 0.0458 3.10 1.97e- 3
car::Anova(mlm_fit1)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## genotype 1 0.102 54 4 1913 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can include multiple predictors, interactions, etc..
mlm_fit2 <- update(mlm_fit1, . ~ genotype*temp)
car::Anova(mlm_fit2)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## genotype 1 0.1042 55.6 4 1911 <2e-16 ***
## temp 1 0.3077 212.3 4 1911 <2e-16 ***
## genotype:temp 1 0.0761 39.3 4 1911 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The previous expression for the multivariate model was
\[ \mathbf{Y} = \mathbf{XB} + \mathbf{E} \]
In contrast, the expression for the mixed model is
\[ y = \mathbf{X\beta} + \mathbf{Zb} + \epsilon \]
where \(\mathbf{b}\) is a set of Gaussian variables with a variance-covariance matrix \(\mathbf{\Sigma}\) which we estimate.
tidyr::pivot_longer() the data to be
\(mp\) observations longunits)Generate a single column with the numeric values of the responses, and a second variable that stores the trait type.
dll_melt <- (dll_data
|> select(-any_of(morph_vars))
|> mutate(units=factor(1:n()))
|> pivot_longer(names_to = "trait", cols = morph_vars_s)
|> drop_na()
|> arrange(units)
)
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(morph_vars_s)
##
## # Now:
## data %>% select(all_of(morph_vars_s))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## saveRDS(dll_melt, file="dll_melt.rds")
Look at how this has changed the structure of the data from a wide format:
head(dll_data) # original wide
## # A tibble: 6 × 12
## replicate line genotype temp femur tibia tarsus SCT femur_s tibia_s
## <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 line-1 Dll 25 0.590 0.499 0.219 9 1.51 0.597
## 2 1 line-1 Dll 25 0.550 0.501 0.214 13 0.138 0.654
## 3 1 line-1 Dll 25 0.588 0.488 0.211 11 1.45 0.194
## 4 1 line-1 Dll 25 0.588 0.515 0.211 NA 1.44 1.16
## 5 1 line-1 Dll 25 0.596 0.502 0.207 12 1.70 0.712
## 6 1 line-1 Dll 25 0.577 0.499 0.207 14 1.04 0.599
## # ℹ 2 more variables: tarsus_s <dbl>, SCT_s <dbl>
to a long format:
head(dll_melt)
## # A tibble: 6 × 7
## replicate line genotype temp units trait value
## <fct> <fct> <fct> <fct> <fct> <chr> <dbl>
## 1 1 line-1 Dll 25 1 femur_s 1.51
## 2 1 line-1 Dll 25 1 tibia_s 0.597
## 3 1 line-1 Dll 25 1 tarsus_s 1.75
## 4 1 line-1 Dll 25 1 SCT_s -1.22
## 5 1 line-1 Dll 25 2 femur_s 0.138
## 6 1 line-1 Dll 25 2 tibia_s 0.654
lmer_form <- value ~ 0 + trait:(genotype*temp) +
(0+trait|line) + (0+trait|units)
+0 (or -1) to drop intercept0 + trait:(genotype*temp) is equivalent totrait:(1+genotype+temp+genotype:temp)
trait + trait:genotype + trait:temp + trait:genotype:temp(trait+0|line) means “variance/covariances of traits
among lines”+0 so we consider traits, not differences
among traits
(trait+0|units) specifies “var/cov of traits among
units (individuals)”lmer always includes a residual variance term. This is
redundant because we have only one data point per individual per trait:
lmerControl(...) tells lmer not complainlmer1 <- lmer(lmer_form,
data=dll_melt,
control=lmerControl(
check.nobs.vs.nlev="ignore",
check.nobs.vs.nRE="ignore"))
print(lmer1), summary(lmer1) useful but
awkward (16 fixed effect coefficients, we’ll look at graphical summaries
insteadisSingular() tells you if the model is singularfixef() (fixed effects), ranef()
(line/indiv deviations from pop mean), coef()
(line/indiv-level estimates), VarCorr (random effects
var/cov)vv1 <- VarCorr(lmer1)
print(vv1)
## Groups Name Std.Dev. Corr
## units traitfemur_s 0.654
## traitSCT_s 0.728 0.05
## traittarsus_s 0.565 0.58 0.20
## traittibia_s 0.670 0.91 0.12 0.47
## line traitfemur_s 0.489
## traitSCT_s 0.392 0.14
## traittarsus_s 0.462 0.81 0.37
## traittibia_s 0.472 0.87 0.22 0.83
## Residual 0.463
par(mfrow=c(1,2))
## fix unit variance-covariance by adding residual variance:
diag(vv1$units) <- diag(vv1$units)+sigma(lmer1)^2
corrplot.mixed(cov2cor(vv1$line),upper="ellipse")
corrplot.mixed(cov2cor(vv1$units),upper="ellipse")
Correlations of traits across lines are stronger than correlations within individuals. In both cases correlations are all positive (i.e. first axis of variation is size variation?)
cc1 <- tidy(lmer1,effect="fixed") |>
tidyr::separate(term,into=c("trait","fixeff"),extra="merge",
remove=FALSE)
dwplot(cc1)+facet_wrap(~fixeff,scale="free",ncol=2)+
geom_vline(xintercept=0,lty=2)
These results tell approximately the same story (coefficients are consistent across traits within a fixed effect, e.g. effect of higher temperature is to reduce scores on all traits).
This is very slow, you probably don’t want to evaluate it on your computer
t1_CI <- system.time(cc2 <- tidy(lmer1,
effect = "ran_pars",
conf.int = TRUE,
conf.method = "profile",
parallel="multicore",
ncpus=8))
saveRDS(cc2, file="../data/cc2.rds")
(about 40 minutes to run on 8 cores on a fast machine)
glmmTMBglmmTMB only at present)
rr(0+trait|line), d) flattens the 4-D covariance into a
\(d\)-D spaceMCMCglmm, brmsMCMCglmm, gllvm?