library("lattice") ## built-in
## model-fitting
library("R2jags")
library("rstanarm")
library("arm") ## for coefplot, bayesglm
## handling fitted models (diagnostics & inference)
library("coda")
library("emdbook") ## for lump.mcmc.list() and as.mcmc.bugs()
library("dotwhisker")
library("broom.mixed")
library("ggplot2"); theme_set(theme_bw())
namedList <- lme4:::namedList ## utility
set.seed(411)
N <- 40
## predictor variables
a <- runif(N)
b <- runif(N)
c <- runif(N)
y <- rnorm(N,mean=2+1*a+4*b+1*c,sd=1)
dat <- data.frame(y,a,b,c)
print(summary(dat))
## y a b c
## Min. :1.780 Min. :0.04418 Min. :0.03454 Min. :0.02538
## 1st Qu.:3.321 1st Qu.:0.21733 1st Qu.:0.18682 1st Qu.:0.30197
## Median :5.270 Median :0.52227 Median :0.54248 Median :0.59167
## Mean :5.096 Mean :0.51398 Mean :0.48042 Mean :0.53889
## 3rd Qu.:6.484 3rd Qu.:0.73486 3rd Qu.:0.77407 3rd Qu.:0.80773
## Max. :8.528 Max. :0.93434 Max. :0.99014 Max. :0.97959
The nice thing about Bayesian models in JAGS etc. is that we use essentially the same forward specification (to simulate data given parameters) as the backward specification (to estimate parameters given data).
summary(lm(y~a+b+c,data=dat))
##
## Call:
## lm(formula = y ~ a + b + c, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.11619 -0.46816 0.06305 0.56538 2.05915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6664 0.4456 3.740 0.00064 ***
## a 0.6909 0.6129 1.127 0.26708
## b 5.0782 0.5332 9.523 2.25e-11 ***
## c 1.1771 0.5353 2.199 0.03440 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9981 on 36 degrees of freedom
## Multiple R-squared: 0.7574, Adjusted R-squared: 0.7371
## F-statistic: 37.46 on 3 and 36 DF, p-value: 3.645e-11
We need priors for the four regression parameters, and for the error variance. The BUGS/JAGS framework does not let us use improper priors, so we have to pick things that we think are “wide” enough.
For the error variance, we also have to pick something that’s always positive.
Finally, BUGS/JAGS thinks in terms of “precision” \(\tau=1/\sigma^2\), which is confusing, but you can get used to it. It’s best to start by guessing what you think a “large” standard deviation would be, then taking \(1/\sigma^2\) to get the relevant value of \(\tau.\) A plausible rule of thumb might be 10 times the expected value of your parameter. For example, if you expected that the difference between masses of two species might be 100g, you could use a prior with a standard deviation of 1000g or a precision of \(10^{-6}\). The normal priors below have low precision, so they have high variance.
The bugs
file is here.
## model {
## for (i in 1:N){
## y[i] ~ dnorm(pred[i],tau)
## pred[i] <- ma*a[i] + mb*b[i] + mc*c[i] + int
## }
## ma ~ dnorm(0, .0001)
## mb ~ dnorm(0, .0001)
## mc ~ dnorm(0, .0001)
## int ~ dnorm(0, .0001)
## tau ~ dgamma(.001, .001)
## }
jags1 <- jags(model.file='../code/bayes.bug',
parameters=c("ma","mb","mc","int"),
data = namedList(a, b, c, N, y),
n.chains = 4,
inits=NULL)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 40
## Unobserved stochastic nodes: 5
## Total graph size: 329
##
## Initializing model
inits=NULL
to have JAGS pick the initial
values randomly from the priors. For more complex models you might want
to pick starting values for the chains yourself (see the
?jags
documentation).bb <- jags1$BUGSoutput ## extract the "BUGS output" component
mm <- as.mcmc.bugs(bb) ## convert it to an "mcmc" object that coda can handle
plot(jags1) ## large-format graph
## plot(mm) ## trace + density plots, same as above
xyplot(mm,layout=c(2,3)) ## prettier trace plot
densityplot(mm,layout=c(2,3)) ## prettier density plot
## print(dwplot(jags1)) ## estimate + credible interval plot
tt_jags <- tidy(jags1, conf.int = TRUE)
gg_jags <- ggplot(tt_jags, aes(estimate, term, xmin = conf.low, xmax = conf.high)) + geom_pointrange()
print(gg_jags)
The lump.mcmc.list
function from the
emdbook
package can be useful for converting a set of MCMC
chains into a single long chain.
Implementing models with categorical variables (say, a t-test or an ANOVA) is a little bit more tedious than the multiple regression analysis shown above. There are two basic strategies:
as.numeric(f)
rather than f
to JAGS
in your data
list), and make your parameters a simple
vector of the means corresponding to each level, e.g. you could have a
vector of parameters catparam
and specify the predicted
value as pred[i] = catparam[f[i]]
model.matrix
function in R and pass the whole model matrix (X
) to JAGS.
Then, to get the predicted value for each observation, just add up the
relevant columns of the model matrix:
e.g. pred[i] = beta[1]*X[i,1]+beta[2]*X[i,2]+beta[3]*X[i,3]
.
You can also use the built-in inprod
(inner product)
function: pred[i] = inprod(X[i,],beta)
The second approach is a little bit harder to understand but
generalizes to more complicated situations, and gives you answers that
will more closely match the analogous analyses in R (e.g. using
lm
).
bayesglm
fits a “cheap” (maximum a posteriori)
model. See prior.mean
, prior.scale
,
prior.df
arguments for specifying independent
priors on the parameters (also see scaled
argument)
summary(b1 <- bayesglm(y~a+b+c,data=dat)) ## LINEAR model
##
## Call:
## bayesglm(formula = y ~ a + b + c, data = dat)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6791 0.4448 3.775 0.000578 ***
## a 0.6943 0.6104 1.137 0.262851
## b 5.0555 0.5319 9.505 2.37e-11 ***
## c 1.1704 0.5337 2.193 0.034842 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.9962802)
##
## Null deviance: 147.805 on 39 degrees of freedom
## Residual deviance: 35.866 on 36 degrees of freedom
## AIC: 119.15
##
## Number of Fisher Scoring iterations: 9
b2 <- bayesglm(y~a+b+c,data=dat, prior.scale=0.1)
dotwhisker::dwplot(list(flat=b1,shrunk=b2)) + geom_vline(xintercept=0,lty=2)
lms <- stan_lm(y~a+b+c,data=dat, prior=R2(0.2),
seed=101,
refresh=0 ## quiet!
)
## dwplot(lms)
tt_rstanarm <- tidy(lms, conf.int = TRUE)
gg_rstanarm <- gg_jags %+% tt_rstanarm
print(gg_rstanarm)
(warnings suppressed).
Similar to ‘lm’ or ‘aov’ but with regularizing priors on the model parameters that are driven by prior beliefs about \(R^2\), the proportion of variance in the outcome attributable to the predictors in a linear model.
Beware “divergent transitions”!
MCMCglmm
: mixed models including phylogenetic
etc.
brms
: rstanarm
on steroids (compile it
yourself)
blme
: lme4
+ priors
(deterministic/MAP)
INLA
: fancy spatial stuff
(deterministic/MAP)
Bayesian task view (see here for a posterior joke)