Hierarchical models

We return to the socks-in-the-box example, this time recognizing that we have data from 105 people with differing abilities. We’ll treat the data as 105 blocks each with 8 observations.

The ‘person’ effect

We’ll define a person’s “throwing ability” as the probability the sock goes in the box when throwing from 3.5m with their bad hand. Recall that we centred the distance data by subtracting the mean, 3.5m, so that the intercept corresponds to the probability of success when distance is 3.5m. We modify the model so that each person has their own intercept, corresponding to their ability. The new code looks like this:


# Save this in the file "sox_DHPfixed.jags"
model{
   # Likelihood
   for(i in 1:N) {
     logit(p[i]) <- b0[person[i]] + 
                          bDist * distance[i] +
                          bHand * hand[i]
     y[i] ~ dbern(p[i])
   }
   # Priors
   for(j in 1:Nperson) {
     p0[j] ~ dbeta(1, 1)
     b0[j] <- logit(p0[j])
   }
   
   bDist ~ dunif(-5, 5)
   bHand ~ dunif(-5, 5)
 }

JAGS syntax: [person[i]] is an example of nested indexing, a useful technique in JAGS: person is vector of length 840 with index numbers indicating who made each throw. The first person in the data set is Laura, but the indices are in alphabetical order, so Laura is #50. Thus person[1] = 50 and JAGS uses b0[50] as the intercept. Notice that we have also wrapped the long line, leaving the + sign at the end of the incomplete line.

Now here’s the R code to prepare the data and run JAGS. It follows on from the code on the previous page:

# Prepare data for hierarchical model
( Nperson <- nlevels(sox$Name) )
# [1] 105
table(table(sox$Name))
#   8 
# 105 # 8 observations for each of the 105 participants

# Convert Name to an index number
person <- as.numeric(sox$Name)
head(person, 20)
#  [1]  50  50  50  50  50  50  50  50  74  74  74
# [12]  74  74  74  74  74 104 104 104 104

jagsData <- list(N=N, y = sox$result, distance = distanceC, 
    hand = hand, Nperson = Nperson, person = person)
str(jagsData)
# List of 6
# $ N       : int 840
# $ y       : int [1:840] 1 1 0 0 0 0 0 0 1 1 ...
# $ distance: num [1:840] -1.5 -1.5 -0.5 -0.5 0.5 0.5 ...
# $ hand    : num [1:840] 1 0 1 0 1 0 1 0 1 0 ...
# $ Nperson : int 105
# $ person  : num [1:840] 50 50 50 50 50 50 50 50 74 74 ...

wanted <- c("bDist", "bHand", "p0")

( soxDHPfixed <- jags(jagsData, NULL, wanted, "sox_DHPfixed.jags",
  n.chains=3, n.iter=10000, parallel=TRUE, DIC=FALSE) )
# ...
#           mean    sd   2.5%    50%  97.5%   Rhat n.eff
# bDist   -1.135 0.086 -1.308 -1.135 -0.968  1.000 30000
# bHand    0.257 0.154 -0.043  0.258  0.561  1.000 30000
# p0[1]    0.479 0.164  0.176  0.475  0.797  1.000 15214
# p0[2]    0.719 0.145  0.388  0.739  0.940  1.000  6152
# ...
diagPlot(soxDHPfixed)
jags.View(soxDHPfixed)# Opens new 'spreadsheet' window

The default adaptation (100 iterations) is adequate and we don’t need extra burn-in. We now have 105 values for throwing ability, p0, one for each person, though we have only shown the results for the first two above. The diagnostics are all good, and there’s no sign that the posteriors for bDist and bHand are constrained by the range of the priors.

Now we’ll look in more detail at the ability parameter. The next bit of R code pulls out the MCMC chains for p0 and gets the posterior means:

ability <- plogis(soxDHPfixed$sims.list$b0)
abilityMean <- colMeans(ability)
hist(abilityMean, col='lightgrey')
range(abilityMean)
# [1] 0.1464365 0.8256393

The histogram looks a bit ‘blocky’ because people who got the sock in the box the same number of times have about the same ability. For example, the 23 people who got it in 4 times have ability ratings close to 0.48. The range is very wide: do we really believe that some have only a 14% chance of getting the sock in the box from 3.5m and some have 83%?

Each of those ability ratings is based on only 8 observations for each person. That’s a ridiculously small sample for binary data. Some of those people who only scored once would likely do better if they tried again, and Kumbu, who scored 7, might not do so well. We can get better estimates by making throwing ability a random variable.

Random variables and partial pooling

For our first analysis, we ignored the fact that we had 105 people and pooled all the data to calculate a single intercept. In the last analysis, we did no pooling, treating each person separately. Now we are going to use an approach sometimes called “partial pooling”: we’ll still calculate a value for each person, but it will be based on that person’s results combined with the overall parameter values.

For this we need a model to describe how an individual’s throwing ability relates to everyone else’s. Many things in nature are normally distributed, such as human heights, and we’ll assume that throwing ability on the logit scale is also normally distributed with some mean and standard deviation, which we will estimate. We are treating throwing ability as a random variable. We modify the JAGS code so that each b0[j] is drawn from a normal distribution with mean mu and standard deviation sigma. The dnorm distribution in JAGS works with precision (tau) instead of standard deviation, and we’ll need to do the conversion.

for(j in 1:Nperson) {
   b0[j] ~ dnorm(mu, tau)
 }

Now we need priors for mu and sigma; these are sometimes called “hyper-priors”. Actually, mu is the logit-scale value of the average ability of the participants. and that’s a probability; so we can again use a Beta prior for this and convert to the logit scale to get mu.

p0 ~ dbeta(1,1)
mu <- logit(p0)

For sigma we’ll use a uniform prior over a suitable range and check that the posterior is not constrained. We then calculate tau.

sigma ~ dunif(0, 5)
tau <- 1/sigma^2

So new we can put together the code for the JAGS model:

# Save this in the file "sox_DHPrand.jags"
model{
   # Likelihood
   for(i in 1:N) {
     logit(p[i]) <- b0[person[i]] + 
                          bDist * distance[i] +
                          bHand * hand[i]
     y[i] ~ dbern(p[i])
   }
   # Priors
   for(j in 1:Nperson) {
     b0[j] ~ dnorm(mu, tau)
   }  
   bDist ~ dunif(-5, 5)
   bHand ~ dunif(-5, 5)
   # Hyperpriors
   pMean ~ dbeta(1,1)
   mu <- logit(pMean)
   sigma ~ dunif(0, 5)
   tau <- 1/sigma^2
 }

We can use the same data but need to change the parameters wanted to include mu and sigma. Models with random variables always seem mix poorly and to need lots more iterations to get decent results. Here we use 1000 adaptation/burn-in iterations, followed by 100,000 thinned by 10, so we retain 10,000 per chain. This takes about 6 mins to run.

wanted <- c("bDist", "bHand", "mu", "sigma", "b0")

( soxDHPrand <- jags(jagsData, NULL, wanted, "sox_DHPrand.jags",
  n.chains=3, n.adapt=1000,n.iter=1e5, n.thin=10, parallel=TRUE, DIC=FALSE
# ...
#           mean    sd   2.5%    50%  97.5%   Rhat n.eff
# bDist   -0.967 0.081 -1.130 -0.966 -0.812  1.000 30000
# bHand    0.399 0.160  0.083  0.399  0.712  1.001  1910
# mu      -0.472 0.120 -0.706 -0.471 -0.241  1.001  2150
# sigma    0.337 0.164  0.020  0.343  0.644  1.001  1922
# b0[1]   -0.427 0.344 -1.134 -0.435  0.300  1.000 13461
# b0[2]   -0.214 0.385 -0.855 -0.274  0.674  1.000 24808
# ...
diagPlot(soxDHPrand)
jags.View(soxDHPrand)

The diagnostic plots look ok, though we note that the green chain has sigma going down to zero and sticking there for hundreds of iterations. With random effects, sigma can be zero and sigma often mixes poorly. The effect of sigma = 0 shows up in the trace plots for the b0‘s.

Now we look at the new results for throwing ability:

# Extract the b0's and convert to probabilities
p0 <- plogis(soxDHPrand$sims.list$b0)
abilityMean <- colMeans(p0)
hist(abilityMean, col='lightgrey')
range(abilityMean)
# [1] 0.3288152 0.4686253

Now our estimates of throwing ability are much more closely grouped. People with very low or very high raw scores are probably closer to the average than the 8 data points indicate.

The effect of gender

Now that we have a model which includes persons, we can add a gender effect. Gender is an attribute of each person. Average throwing ability may differ between men and women, and the spread may also differ. We modify the JAGS code above to have separate values for pMean and sigma for males and females.


# Save this in the file "sox_DHPGrand.jags"
model{
   # Likelihood
   for(i in 1:N) {
     logit(p[i]) <- b0[person[i]] + 
                          bDist * distance[i] +
                          bHand * hand[i]
     y[i] ~ dbern(p[i])
   }
   # Priors
   for(j in 1:Nperson) {
     b0[j] ~ dnorm(mu[gender[j]], tau[gender[j]])
   }  
   bDist ~ dunif(-5, 5)
   bHand ~ dunif(-5, 5)
   # Hyperpriors
   for(g in 1:2) {   # Different for male and female
     pMean[g] ~ dbeta(1,1)
     mu[g] <- logit(pMean[g])
     sigma[g] ~ dunif(0, 5)
     tau[g] <- 1/sigma[g]^2
   }
 }

The R code is similar, but we need to prepare the gender vector of length 105 and pass this to JAGS:

# Organise gender data
tmp <- as.numeric(sox$gender) # Convert factor to numbers
gender <- tapply(tmp, sox$Name, mean)
table(gender)
# gender
#  1  2 
# 46 59
jagsData <- list(N=N, y = sox$result, distance = distanceC, 
    hand = hand, Nperson = Nperson, person = person, gender=gender)
str(jagsData)
# List of 7
#  $ N       : int 840
#  $ y       : int [1:840] 1 1 0 0 0 0 0 0 1 1 ...
#  $ distance: num [1:840] -1.5 -1.5 -0.5 -0.5 0.5 0.5 1.5 1.5 -1.5 -1.5 ...
#  $ hand    : num [1:840] 1 0 1 0 1 0 1 0 1 0 ...
#  $ Nperson : int 105
#  $ person  : num [1:840] 50 50 50 50 50 50 50 50 74 74 ...
#  $ gender  : num [1:105(1d)] 2 2 2 2 2 2 2 2 1 1 ...
#   ..- attr(*, "dimnames")=List of 1
#   .. ..$ : chr [1:105] "Abid" "Aiyat" "Aji" "Akmal" ...

The output from tapply is a one-dimensional array, which is an odd beast, but JAGS copes with it. It also has the names of the people as an attribute, which will be ignored by JAGS.

wanted <- c("bDist", "bHand", "pMean", "mu", "sigma", "b0")
( soxDHPGrand <- jags(jagsData, NULL, wanted, "sox_DHPGrand.jags",
  n.chains=3, n.adapt=1000,n.iter=1e5, n.thin=10, parallel=TRUE, DIC=FALSE) )
# ...
#            mean    sd   2.5%    50%  97.5%   Rhat n.eff
# bDist    -0.979 0.080 -1.137 -0.979 -0.825  1.000 22757
# bHand     0.400 0.160  0.087  0.401  0.713  1.000  6872
# pMean[1]  0.311 0.034  0.247  0.310  0.378  1.001  2158
# pMean[2]  0.445 0.035  0.378  0.445  0.514  1.000 30000
# mu[1]    -0.801 0.158 -1.112 -0.798 -0.497  1.001  2340
# mu[2]    -0.223 0.141 -0.498 -0.220  0.054  1.001 30000
# sigma[1]  0.280 0.190  0.011  0.258  0.692  1.009   239
# sigma[2]  0.295 0.183  0.013  0.281  0.673  1.002   846
# b0[1]    -0.220 0.325 -0.905 -0.219  0.452  1.000 29214
# b0[2]    -0.042 0.364 -0.629 -0.102  0.829  1.001  6628
# ...
diagPlot(soxDHPGrand)
jags.View(soxDHPGrand)

The density plots for sigma are ugly and the n.eff figure is far too low. I ran the model overnight with 5 million iterations (thinned by 500 to keep the output reasonably sized) and got adequate n.eff‘s and good plots for sigma:

Here are the key results for the longer run:

           mean    sd   2.5%    50%  97.5%    Rhat n.eff
bDist    -0.979 0.081 -1.139 -0.978 -0.826   1.000 30000
bHand     0.400 0.158  0.092  0.399  0.714   1.000  9144
pMean[1]  0.311 0.033  0.247  0.310  0.378   1.000 30000
pMean[2]  0.445 0.034  0.379  0.445  0.513   1.000 23439
mu[1]    -0.801 0.157 -1.112 -0.800 -0.498   1.000 30000
mu[2]    -0.220 0.139 -0.493 -0.220  0.054   1.000 24150
sigma[1]  0.283 0.186  0.014  0.261  0.688   1.000 20583
sigma[2]  0.302 0.183  0.016  0.289  0.686   1.000 22767
b0[1]    -0.217 0.329 -0.896 -0.219  0.476   1.000 30000
b0[2]    -0.031 0.367 -0.628 -0.089  0.852   1.000 18008

Just looking at the values for pMean – and in particular at the credible intervals – it’s clear that for participants in our 2016 Boot Camps, the men are better at throwing socks than the women. But the spread of abilities, sigma, is the same. We can plot pMean, and also calculate the difference and plot that:

# Convert to Bwiqid and plot
library(wiqid)
bw <- as.Bwiqid(soxDHPGrand)
par(mfrow = 2:1)
plot(bw, "pMean.1.", xlim=c(0.2, 0.55), xlab = "Female")
plot(bw, "pMean.2.", xlim=c(0.2, 0.55), xlab = "Male")
par(mfrow = c(1, 1))
diff <- bw$pMean.2. - bw$pMean.1.
plotPost(diff)

Plot the effect of distance and gender

It’s not easy to grasp what these numbers mean, and plotting them is a always a good idea. We’ll plot the probability of success versus distance from the box for men and women.

# Get the MCMC chains we need
mu <- soxDHPGrand$sims.list$mu
head(mu)
#            [,1]       [,2]
# [1,] -0.7581018 -0.2172109
# [2,] -0.8666499 -0.1790211
# [3,] -0.8636711 -0.2588688
# [4,] -0.7853515 -0.2554216
# [5,] -0.7410710 -0.2940089
# [6,] -0.7133230 -0.3651503
bDist <- soxDHPGrand$sims.list$bDist
head(bDist)
# [1] -0.8936494 -0.9633547 -0.9108005 -0.9209869 -1.0160170 -0.9343064
# Do values for the x axis and centre them
xx <- seq(2, 5, length=51)
xxC <- xx - 3.5
lf <- mean(mu[, 1]) + mean(bDist) * xxC
pf <- plogis(lf)
lm <- mean(mu[, 2]) + mean(bDist) * xxC
pm <- plogis(lm)
plot(xx, pm, type='l', lwd=3, col='blue', 
  ylim=range(pf, pm), las=1,
  xlab="Distance", ylab="Probability of success")
lines(xx, pf, lwd=3, col='red')
legend('topright', c("women", "men"), col=c("red", "blue"),
  lwd=3, bty='n')

We can see how the probability of getting the sock in the box changes with distance for both men and women according to the prediction of our model. But that plot does not give an indication of the uncertainty of the predictions. We could calculate the credible interval and add those to the plot, but an alternative is to plot a selection of posterior predictive curves. Each of these uses the values of mu and bDist from one step in the MCMC chain.

# Add 50 posterior predictive curves
toplot <- sample.int(length(bDist), 50)
for(i in 1:50) {
  pf1 <- plogis(mu[toplot[i], 1] + bDist[toplot[i]] * xxC)
  lines(xx, pf1, col='pink')
  pm1 <- plogis(mu[toplot[i], 2] + bDist[toplot[i]] * xxC)
  lines(xx, pm1, col='skyblue')
}
# Redraw the main lines
lines(xx, pf, lwd=3, col='red')
lines(xx, pm, lwd=3, col='blue')

We could continue to extend our model, for example to allow for differences in the bDist or bHand coefficients with gender or interaction between distance and hand. However, we’ll leave the socks here and go on to look at another type of hierarchical model, a simple occupancy model.

Leave a Reply

Your email address will not be published. Required fields are marked *