Bayesian example

11 March 2024

setup

library("lattice") ## built-in
## model-fitting
library("R2jags")
library("rstanarm")
library("arm")        ## for coefplot, bayesglm
## handling fitted models (diagnostics & inference)
library("coda")
library("emdbook")    ## for lump.mcmc.list() and as.mcmc.bugs()
library("dotwhisker")
library("broom.mixed")
library("ggplot2"); theme_set(theme_bw())
namedList <- lme4:::namedList  ## utility

Generate fake data

set.seed(411)
N <- 40
## predictor variables
a <- runif(N)
b <- runif(N)
c <- runif(N)
y <- rnorm(N,mean=2+1*a+4*b+1*c,sd=1)
dat <- data.frame(y,a,b,c)

print(summary(dat))

##        y               a                 b                 c          
##  Min.   :1.780   Min.   :0.04418   Min.   :0.03454   Min.   :0.02538  
##  1st Qu.:3.321   1st Qu.:0.21733   1st Qu.:0.18682   1st Qu.:0.30197  
##  Median :5.270   Median :0.52227   Median :0.54248   Median :0.59167  
##  Mean   :5.096   Mean   :0.51398   Mean   :0.48042   Mean   :0.53889  
##  3rd Qu.:6.484   3rd Qu.:0.73486   3rd Qu.:0.77407   3rd Qu.:0.80773  
##  Max.   :8.528   Max.   :0.93434   Max.   :0.99014   Max.   :0.97959

The nice thing about Bayesian models in JAGS etc. is that we use essentially the same forward specification (to simulate data given parameters) as the backward specification (to estimate parameters given data).

Fit a (frequentist) linear model

summary(lm(y~a+b+c,data=dat))

## 
## Call:
## lm(formula = y ~ a + b + c, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.11619 -0.46816  0.06305  0.56538  2.05915 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.6664     0.4456   3.740  0.00064 ***
## a             0.6909     0.6129   1.127  0.26708    
## b             5.0782     0.5332   9.523 2.25e-11 ***
## c             1.1771     0.5353   2.199  0.03440 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9981 on 36 degrees of freedom
## Multiple R-squared:  0.7574, Adjusted R-squared:  0.7371 
## F-statistic: 37.46 on 3 and 36 DF,  p-value: 3.645e-11

Bayesian analysis

With BUGS/JAGS

We need priors for the four regression parameters, and for the error variance. The BUGS/JAGS framework does not let us use improper priors, so we have to pick things that we think are “wide” enough.

For the error variance, we also have to pick something that’s always positive.

Finally, BUGS/JAGS thinks in terms of “precision” \(\tau=1/\sigma^2\), which is confusing, but you can get used to it. It’s best to start by guessing what you think a “large” standard deviation would be, then taking \(1/\sigma^2\) to get the relevant value of \(\tau.\) A plausible rule of thumb might be 10 times the expected value of your parameter. For example, if you expected that the difference between masses of two species might be 100g, you could use a prior with a standard deviation of 1000g or a precision of \(10^{-6}\). The normal priors below have low precision, so they have high variance.

R2jags

The bugs file is here.

## model {
##  for (i in 1:N){
##      y[i] ~ dnorm(pred[i],tau)
##      pred[i] <- ma*a[i] + mb*b[i] + mc*c[i] + int
##  }
##  ma ~ dnorm(0, .0001)
##  mb ~ dnorm(0, .0001)
##  mc ~ dnorm(0, .0001)
##  int ~ dnorm(0, .0001)
##  tau ~ dgamma(.001, .001)
## }

jags1 <- jags(model.file='../code/bayes.bug',
              parameters=c("ma","mb","mc","int"),
              data = namedList(a, b, c, N, y),
              n.chains = 4,
              inits=NULL)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 40
##    Unobserved stochastic nodes: 5
##    Total graph size: 329
## 
## Initializing model

You can use inits=NULL to have JAGS pick the initial values randomly from the priors. For more complex models you might want to pick starting values for the chains yourself (see the ?jags documentation).

Examine the chains and output

bb <- jags1$BUGSoutput  ## extract the "BUGS output" component
mm <- as.mcmc.bugs(bb)  ## convert it to an "mcmc" object that coda can handle
plot(jags1)             ## large-format graph
## plot(mm)                ## trace + density plots, same as above
xyplot(mm,layout=c(2,3))  ## prettier trace plot
densityplot(mm,layout=c(2,3)) ## prettier density plot
## print(dwplot(jags1))              ## estimate + credible interval plot
tt_jags <- tidy(jags1, conf.int = TRUE)
gg_jags <- ggplot(tt_jags, aes(estimate, term, xmin = conf.low, xmax = conf.high)) + geom_pointrange()
print(gg_jags)

The lump.mcmc.list function from the emdbook package can be useful for converting a set of MCMC chains into a single long chain.

Further notes

Categorical variables

Implementing models with categorical variables (say, a t-test or an ANOVA) is a little bit more tedious than the multiple regression analysis shown above. There are two basic strategies:

pass the categorical variable as a vector of numeric codes (i.e. pass as.numeric(f) rather than f to JAGS in your data list), and make your parameters a simple vector of the means corresponding to each level, e.g. you could have a vector of parameters catparam and specify the predicted value as pred[i] = catparam[f[i]]
construct a design matrix using the model.matrix function in R and pass the whole model matrix (X) to JAGS. Then, to get the predicted value for each observation, just add up the relevant columns of the model matrix: e.g. pred[i] = beta[1]*X[i,1]+beta[2]*X[i,2]+beta[3]*X[i,3]. You can also use the built-in inprod (inner product) function: pred[i] = inprod(X[i,],beta)

The second approach is a little bit harder to understand but generalizes to more complicated situations, and gives you answers that will more closely match the analogous analyses in R (e.g. using lm).

Built-in Bayesian modeling techniques

With bayesglm

bayesglm fits a “cheap” (maximum a posteriori) model. See prior.mean, prior.scale, prior.df arguments for specifying independent priors on the parameters (also see scaled argument)

summary(b1 <- bayesglm(y~a+b+c,data=dat)) ## LINEAR model

## 
## Call:
## bayesglm(formula = y ~ a + b + c, data = dat)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.6791     0.4448   3.775 0.000578 ***
## a             0.6943     0.6104   1.137 0.262851    
## b             5.0555     0.5319   9.505 2.37e-11 ***
## c             1.1704     0.5337   2.193 0.034842 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.9962802)
## 
##     Null deviance: 147.805  on 39  degrees of freedom
## Residual deviance:  35.866  on 36  degrees of freedom
## AIC: 119.15
## 
## Number of Fisher Scoring iterations: 9

b2 <- bayesglm(y~a+b+c,data=dat, prior.scale=0.1)
dotwhisker::dwplot(list(flat=b1,shrunk=b2)) + geom_vline(xintercept=0,lty=2)

rstanarm

lms <- stan_lm(y~a+b+c,data=dat, prior=R2(0.2),
               seed=101,
               refresh=0  ## quiet!
               )
## dwplot(lms)
tt_rstanarm <- tidy(lms, conf.int = TRUE)
gg_rstanarm <- gg_jags %+% tt_rstanarm
print(gg_rstanarm)

(warnings suppressed).

Similar to ‘lm’ or ‘aov’ but with regularizing priors on the model parameters that are driven by prior beliefs about \(R^2\), the proportion of variance in the outcome attributable to the predictors in a linear model.

Beware “divergent transitions”!

Others

MCMCglmm: mixed models including phylogenetic etc.
brms: rstanarm on steroids (compile it yourself)
blme: lme4 + priors (deterministic/MAP)
INLA: fancy spatial stuff (deterministic/MAP)
Bayesian task view (see here for a posterior joke)