the big picture

Up to now, univariate models; have to decide which is “the” response variable.

Might measure lots of different characteristics for each individual. Could model characteristics separately, but we don’t get any information about the overall changes among individuals. We might want:

Alternatives

examples

what is a covariance matrix?

\[ \left( \begin{array}{cccc} \sigma^2_1 & \sigma_{12} & \sigma_{13} & \dots \\ . & \sigma^2_{2} & \sigma_{23} & \dots \\ . & . & \sigma^2_{3} & \dots \\ \vdots & \vdots & \vdots & \ddots \end{array} \right) \]

Useful packages

library(lme4)
library(glmmTMB)
## Warning in check_dep_version(dep_pkg = "TMB"): package version mismatch: 
## glmmTMB was built with TMB package version 1.9.19
## Current TMB package version is 1.9.20
## Please re-install glmmTMB from source or restore original 'TMB' package (see '?reinstalling' for more information)
library(tidyverse); theme_set(theme_bw())
library(corrplot)
library(broom.mixed)
library(dotwhisker)
library(car)
## install.packages('gllvm',
##                  repos = c('https://jenniniku.r-universe.dev',
##                            'https://cloud.r-project.org'))
library(gllvm)
## maybe?
library(MCMCglmm)
library(brms)

data

We’ll use a Drosophila melanogaster data set from Ian Dworkin’s PhD work (Dworkin 2005). This was from a study that was meant to test predictions of a model on how mutational and environmental variation can influence the overall structure of phenotypic variation. The data include several traits (lengths) on the first leg as well as the number of sex comb teeth (a structure used to clasp females during copulation) for different wild type strains (line) reared at different developmental temperatures (temp), with and without a mutation that effects proximal-distal axis development in limbs (genotype).

Get data from https://doi.org/10.5061/dryad.8376 or this repo:

dll_data <- read_csv("../data/dll.csv", show_col_types = FALSE) |>
  mutate(across(c(temp, replicate, where(is.character)), factor))
morph_vars <- c("femur","tibia","tarsus","SCT")

dll_data
## # A tibble: 1,973 × 8
##    replicate line   genotype temp  femur tibia tarsus   SCT
##    <fct>     <fct>  <fct>    <fct> <dbl> <dbl>  <dbl> <dbl>
##  1 1         line-1 Dll      25    0.590 0.499  0.219     9
##  2 1         line-1 Dll      25    0.550 0.501  0.214    13
##  3 1         line-1 Dll      25    0.588 0.488  0.211    11
##  4 1         line-1 Dll      25    0.588 0.515  0.211    NA
##  5 1         line-1 Dll      25    0.596 0.502  0.207    12
##  6 1         line-1 Dll      25    0.577 0.499  0.207    14
##  7 1         line-1 Dll      25    0.618 0.494  0.204    11
##  8 1         line-1 Dll      25    0.582 0.528  0.203    12
##  9 1         line-1 Dll      25    0.572 0.479  0.202    10
## 10 1         line-1 Dll      25    0.568 0.479  0.202    12
## # ℹ 1,963 more rows

## tabulate and show only the first five lines
with(dll_data, table(temp, line, genotype))[,1:5,]
## , , genotype = Dll
## 
##     line
## temp line-1 line-11 line-12 line-13 line-15
##   25     20      16      20      17      19
##   30      9      30      20       6      14
## 
## , , genotype = wt
## 
##     line
## temp line-1 line-11 line-12 line-13 line-15
##   25     42      22      22      19      20
##   30     21      32      14      21      16

viz 1

grp <- as.numeric(as.factor(dll_data$temp)) ## numeric line value
pairs(dll_data[morph_vars], gap = 0,
      col = grp)

(there’s a GGally::ggpairs() function but it’s clunky)

viz 2

form <- ~ femur + tibia + tarsus + SCT | interaction(genotype, temp)
car::scatterplotMatrix( form,
                  ellipse = TRUE, data = dll_data, gap = 0,
                  plot.points = TRUE, pch = 20, cex  = 0.5)

Should you scale the response variables?

scaling

dll_data <- (dll_data |>
               ## scale & center (and convert back from matrix to vector)
               mutate(across(any_of(morph_vars), ~drop(scale(.)),
                             .names = "{.col}_s"))
)
morph_vars_s <- paste0(morph_vars, "_s")

Multivariate linear model

\[ \mathbf{Y} = \mathbf{XB} + \mathbf{E} \]

fit the model

mlm_fit1 <- lm(as.matrix(dll_data[morph_vars_s]) ~ genotype,
               data = dll_data)
print(mlm_fit1)
## 
## Call:
## lm(formula = as.matrix(dll_data[morph_vars_s]) ~ genotype, data = dll_data)
## 
## Coefficients:
##              femur_s  tibia_s  tarsus_s  SCT_s  
## (Intercept)   0.0861   0.2204  -0.0799    0.1422
## genotypewt   -0.1514  -0.3931   0.1419   -0.2948

coef(mlm_fit1)
##             femur_s tibia_s tarsus_s  SCT_s
## (Intercept)  0.0861   0.220  -0.0799  0.142
## genotypewt  -0.1514  -0.393   0.1419 -0.295
head(broom::tidy(mlm_fit1))
## # A tibble: 6 × 6
##   response term        estimate std.error statistic  p.value
##   <chr>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 femur_s  (Intercept)   0.0861    0.0331      2.60 9.40e- 3
## 2 femur_s  genotypewt   -0.151     0.0442     -3.43 6.22e- 4
## 3 tibia_s  (Intercept)   0.220     0.0339      6.50 1.03e-10
## 4 tibia_s  genotypewt   -0.393     0.0453     -8.69 7.91e-18
## 5 tarsus_s (Intercept)  -0.0799    0.0343     -2.33 2.01e- 2
## 6 tarsus_s genotypewt    0.142     0.0458      3.10 1.97e- 3

car::Anova(mlm_fit1)
## 
## Type II MANOVA Tests: Pillai test statistic
##          Df test stat approx F num Df den Df Pr(>F)    
## genotype  1     0.102       54      4   1913 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

a more complicated model

We can include multiple predictors, interactions, etc..

mlm_fit2 <- update(mlm_fit1, . ~ genotype*temp)
car::Anova(mlm_fit2)
## 
## Type II MANOVA Tests: Pillai test statistic
##               Df test stat approx F num Df den Df Pr(>F)    
## genotype       1    0.1042     55.6      4   1911 <2e-16 ***
## temp           1    0.3077    212.3      4   1911 <2e-16 ***
## genotype:temp  1    0.0761     39.3      4   1911 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Effect sizes?

Mixed models

The previous expression for the multivariate model was

\[ \mathbf{Y} = \mathbf{XB} + \mathbf{E} \]

In contrast, the expression for the mixed model is

\[ y = \mathbf{X\beta} + \mathbf{Zb} + \epsilon \]

where \(\mathbf{b}\) is a set of Gaussian variables with a variance-covariance matrix \(\mathbf{\Sigma}\) which we estimate.


melting

Generate a single column with the numeric values of the responses, and a second variable that stores the trait type.

melting

dll_melt <- (dll_data
    |> select(-any_of(morph_vars))
    |> mutate(units=factor(1:n()))
    |> pivot_longer(names_to = "trait", cols = morph_vars_s)
    |> drop_na()
    |> arrange(units)
)
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
##   # Was:
##   data %>% select(morph_vars_s)
## 
##   # Now:
##   data %>% select(all_of(morph_vars_s))
## 
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## saveRDS(dll_melt, file="dll_melt.rds")

Look at how this has changed the structure of the data from a wide format:

head(dll_data) # original wide
## # A tibble: 6 × 12
##   replicate line   genotype temp  femur tibia tarsus   SCT femur_s tibia_s
##   <fct>     <fct>  <fct>    <fct> <dbl> <dbl>  <dbl> <dbl>   <dbl>   <dbl>
## 1 1         line-1 Dll      25    0.590 0.499  0.219     9   1.51    0.597
## 2 1         line-1 Dll      25    0.550 0.501  0.214    13   0.138   0.654
## 3 1         line-1 Dll      25    0.588 0.488  0.211    11   1.45    0.194
## 4 1         line-1 Dll      25    0.588 0.515  0.211    NA   1.44    1.16 
## 5 1         line-1 Dll      25    0.596 0.502  0.207    12   1.70    0.712
## 6 1         line-1 Dll      25    0.577 0.499  0.207    14   1.04    0.599
## # ℹ 2 more variables: tarsus_s <dbl>, SCT_s <dbl>

to a long format:

head(dll_melt)
## # A tibble: 6 × 7
##   replicate line   genotype temp  units trait     value
##   <fct>     <fct>  <fct>    <fct> <fct> <chr>     <dbl>
## 1 1         line-1 Dll      25    1     femur_s   1.51 
## 2 1         line-1 Dll      25    1     tibia_s   0.597
## 3 1         line-1 Dll      25    1     tarsus_s  1.75 
## 4 1         line-1 Dll      25    1     SCT_s    -1.22 
## 5 1         line-1 Dll      25    2     femur_s   0.138
## 6 1         line-1 Dll      25    2     tibia_s   0.654

lmer formula

lmer_form <- value ~ 0 + trait:(genotype*temp) +
  (0+trait|line) + (0+trait|units)

Fixed-effect component

Random-effect component

lme4: control specificatons

Now fit the model

lmer1 <- lmer(lmer_form,
              data=dll_melt,
              control=lmerControl(
                check.nobs.vs.nlev="ignore",
                check.nobs.vs.nRE="ignore"))

Notes on lme4 results

lme4: random-effects var/cov

vv1 <- VarCorr(lmer1)
print(vv1)
##  Groups   Name          Std.Dev. Corr           
##  units    traitfemur_s  0.654                   
##           traitSCT_s    0.728    0.05           
##           traittarsus_s 0.565    0.58 0.20      
##           traittibia_s  0.670    0.91 0.12 0.47 
##  line     traitfemur_s  0.489                   
##           traitSCT_s    0.392    0.14           
##           traittarsus_s 0.462    0.81 0.37      
##           traittibia_s  0.472    0.87 0.22 0.83 
##  Residual               0.463

lme4: correlation plot

par(mfrow=c(1,2))
## fix unit variance-covariance by adding residual variance:
diag(vv1$units) <- diag(vv1$units)+sigma(lmer1)^2
corrplot.mixed(cov2cor(vv1$line),upper="ellipse")
corrplot.mixed(cov2cor(vv1$units),upper="ellipse")

Correlations of traits across lines are stronger than correlations within individuals. In both cases correlations are all positive (i.e. first axis of variation is size variation?)

lme4: coefficient plot

cc1 <- tidy(lmer1,effect="fixed") |>
  tidyr::separate(term,into=c("trait","fixeff"),extra="merge",
                  remove=FALSE)
dwplot(cc1)+facet_wrap(~fixeff,scale="free",ncol=2)+
  geom_vline(xintercept=0,lty=2)

These results tell approximately the same story (coefficients are consistent across traits within a fixed effect, e.g. effect of higher temperature is to reduce scores on all traits).

lme4: random effects confidence intervals

This is very slow, you probably don’t want to evaluate it on your computer

t1_CI <- system.time(cc2 <- tidy(lmer1,
            effect = "ran_pars",
            conf.int = TRUE,
            conf.method = "profile",
            parallel="multicore",
            ncpus=8))
saveRDS(cc2, file="../data/cc2.rds")

(about 40 minutes to run on 8 cores on a fast machine)

miscellaneous

what the heck are eigenvalues (eigenvectors)?

References

Dworkin, Ian. 2005. “Evidence for Canalization of Distal-Less Function in the Leg of Drosophila Melanogaster.” Evolution & Development 7 (2): 89–100. https://doi.org/10.1111/j.1525-142X.2005.05010.x.