glm(), model specification as before:
glm(y~f1+x1+f2+x2, data=..., family=..., ...)family=binomial: proportions, out of a total number of
counts; includes binary (Bernoulli) (“logistic regression”)family=poisson: Poisson (independent counts, no set
maximum, or far from the maximum)"gaussian"), Gamma)glm is Gaussianfamily really doing?
- on what scale are the relationship(s) between predictor(s) and response linear?
- link function goes from data scale (counts, proportions etc.) to effect or link or linear predictor scale
- Poisson=log; binomial=logit
- inverse link function goes from effect scale to data scale
- Poisson=exponential; binomial=logistic
- each family has a default “canonical” link (sensible + nice math)
- default is usually fine (except: use
family=Gamma(link="log"))
- “linear predictor” \(\eta = \beta_0 + \beta_1 x_1 + \ldots\) describes patterns on the link scale
- Fit doesn’t transform the responses: instead applies inverse link function to the linear predictor
- instead of \(\log(y) \sim x\), we analyze \(y \sim \mathrm{Poisson}(\exp(x))\)
- this is good, because the observed value of \(y\) might be zero
- e.g. count (Poisson) phenotype vs. temperature (centered at 20°C)
- with \(\beta=\{1,0.1\}\), \(T=15\):
- counts \(\sim \textrm{Poisson}(\lambda=\exp(1 + 0.1 \cdot (15-20)))\)
- = \(\textrm{Poisson}(\lambda = \exp(0.5))=\textrm{Poisson}(\lambda=1.649)\)
Model setup is almost the same as linear models
but the linear relationship is assumed on the linear predictor (link) scale
plot is still somewhat
usefulcor(observed,predict(model, type="response")))(obs-exp)/sqrt(V(exp))]type="response" to back-transformperformance::check_model(), DHARMa package
are OK (simulateResiduals(model,plot=TRUE))
- too much variance: (residual deviance)/(residual df) should be \(\approx 1\). (Ratio >1.2 worrisome; ratio>3, v. worrisome (check your model & data!)
- quasi-likelihood models (e.g.
family=quasipoisson); fit, then adjust CIs/p-values- alternatives:
- Poisson \(\to\) negative binomial (
MASS::glm.nb,glmmTMB)- binomial \(\to\) beta-binomial (
glmmTMB)- overdispersion not relevant for
- binary responses
- families/models that estimate dispersion (Gaussian, Gamma, neg binom, …)
qlogis(0.5) == 0)qlogis(0.01)
## [1] -4.59512
qlogis(0.99)
## [1] 4.59512
plogis(0)= 0.5plogis(0+1)= 0.73summary()),
drop1(model,test="Chisq"),
anova(model1,model2)), profile confidence intervals
(MASS::confint.glm)
lmweights
argument)
cbind(successes,failures) [not
cbind(successes,total)]weights=Nglm(p~...,data,weights=rep(N,nrow(data))))... + offset(log(A)) in R formulalink="cloglog" (see here)ggplot2
geom_smooth(method="glm", method.args=list(family=...))dotwhisker, emmeans, effects,
sjPlot, marginaleffects
stat_sum,
position="jitter", geom_dotplot, (beeswarm
plot)glm() problemsfamily= in glm()predict() uses linear predictor scale by
defaultoffset(A) rather than
offset(log(A))data here
aids <- read.csv("../data/aids.csv")
aids <- transform(aids, date=year+(quarter-1)/4)
gg0 <- ggplot(aids,aes(date,cases))+geom_point()
gg1 <- gg0 + geom_smooth(method="glm",colour="red",
formula=y~x,
method.args=list(family="quasipoisson"))
g1 <- glm(cases~date, data = aids, family=quasipoisson(link="log"))
summary(g1)
plot(g1))performance::check_model(g1)
library(DHARMa)
g0 <- update(g1, family=poisson)
plot(simulateResiduals(g0))
## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,
## : Fitting terminated with step failure - check results carefully
print(gg2 <- gg1+geom_smooth(method="glm",formula=y~poly(x,2),
method.args=list(family="quasipoisson")))
g2 <- update(g1, . ~ poly(date,2))
g3 <- update(g2, data = transform(aids, date = date - min(date)))
summary(g3)
##
## Call:
## glm(formula = cases ~ poly(date, 2), family = quasipoisson(link = "log"),
## data = transform(aids, date = date - min(date)))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.86859 0.05004 77.311 < 2e-16 ***
## poly(date, 2)1 3.82934 0.25162 15.219 2.46e-11 ***
## poly(date, 2)2 -0.68335 0.19716 -3.466 0.00295 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 1.657309)
##
## Null deviance: 677.264 on 19 degrees of freedom
## Residual deviance: 31.992 on 17 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
anova(g1,g2,test="F") ## for quasi-models specifically
## Analysis of Deviance Table
##
## Model 1: cases ~ date
## Model 2: cases ~ poly(date, 2)
## Resid. Df Resid. Dev Df Deviance F Pr(>F)
## 1 18 53.020
## 2 17 31.992 1 21.028 12.688 0.002399 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1