Hierarchical models

We return to the socks-in-the-box example, this time recognizing that we have data from 105 people with differing abilities. We’ll treat the data as 105 blocks each with 8 observations.

The ‘person’ effect

We’ll define a person’s “throwing ability” as the probability the sock goes in the box when throwing from 3.5m with their bad hand. Recall that we centred the distance data by subtracting the mean, 3.5m, so that the intercept corresponds to the probability of success when distance is 3.5m. We modify the model so that each person has their own intercept, corresponding to their ability. $${\rm logit}(p_i) = \beta_{0, person_i} + \beta_{distance} \times distance_i + \beta_{hand} \times hand_i$$

This model is very similar to the previous one except that we have 105 different values for $\beta_0$ and $p_0$:

The new JAGS code looks like this:


# Save this in the file "sox_DHPfixed.jags"
model{
   # Likelihood
   for(i in 1:N) {
     logit(p[i]) <- b0[person[i]] + 
                          bDist * distance[i] +
                          bHand * hand[i]
     y[i] ~ dbern(p[i])
   }
   # Priors
   for(j in 1:Nperson) {
     p0[j] ~ dbeta(1, 1)
     b0[j] <- logit(p0[j])
   }
   
   bDist ~ dunif(-5, 5)
   bHand ~ dunif(-5, 5)
 }

JAGS syntax: [person[i]] is an example of nested indexing, a useful technique in JAGS: person is a vector of length 840 with index numbers indicating who made each throw. The first person in the data set is Laura, but the indices are in alphabetical order, so Laura is #50. Thus person[1] = 50 and JAGS uses b0[50] as the intercept. Notice that we have also wrapped the long line, leaving the + sign at the end of the incomplete line.

Now here’s the R code to prepare the data and run JAGS. It follows on from the code on the previous page:

# Prepare data for hierarchical model
( Nperson <- nlevels(sox$Name) )
# [1] 105
table(table(sox$Name))
#   8 
# 105 # 8 observations for each of the 105 participants

# Convert Name to an index number
person <- as.numeric(sox$Name)
head(person, 20)
#  [1]  50  50  50  50  50  50  50  50  74  74  74
# [12]  74  74  74  74  74 104 104 104 104

jagsData <- list(N=N, y = sox$result, distance = distanceC, 
    hand = hand, Nperson = Nperson, person = person)
str(jagsData)
# List of 6
# $ N       : int 840
# $ y       : int [1:840] 1 1 0 0 0 0 0 0 1 1 ...
# $ distance: num [1:840] -1.5 -1.5 -0.5 -0.5 0.5 0.5 ...
# $ hand    : num [1:840] 1 0 1 0 1 0 1 0 1 0 ...
# $ Nperson : int 105
# $ person  : num [1:840] 50 50 50 50 50 50 50 50 74 74 ...

wanted <- c("bDist", "bHand", "p0")

( soxDHPfixed <- jags(jagsData, NULL, wanted,
  "sox_DHPfixed.jags",
  n.chains=3, n.iter=10000, parallel=TRUE, DIC=FALSE) )
# ...
#           mean    sd   2.5%    50%  97.5%   Rhat n.eff
# bDist   -1.135 0.086 -1.308 -1.135 -0.968  1.000 30000
# bHand    0.257 0.154 -0.043  0.258  0.561  1.000 30000
# p0[1]    0.479 0.164  0.176  0.475  0.797  1.000 15214
# p0[2]    0.719 0.145  0.388  0.739  0.940  1.000  6152
# ...
diagPlot(soxDHPfixed)
jags.View(soxDHPfixed)# Opens new 'spreadsheet' window

The default adaptation (100 iterations) is adequate and we don’t need extra burn-in. We now have 105 values for throwing ability, p0, one for each person, though we have only shown the results for the first two above. The diagnostics are all good, and there’s no sign that the posteriors for bDist and bHand are constrained by the range of the priors.

Now we’ll look in more detail at the ability parameter. The next bit of R code pulls out the MCMC chains for p0 and gets the posterior means:

ability <- soxDHPfixed$sims.list$p0  # corrected
abilityMean <- colMeans(ability)
hist(abilityMean, col='lightgrey')
range(abilityMean)
# [1] 0.1464365 0.8256393

The histogram looks a bit ‘blocky’ because people who got the sock in the box the same number of times have about the same ability. For example, the 23 people who got it in 4 times have ability ratings close to 0.48. The range is very wide: do we really believe that some have only a 14% chance of getting the sock in the box from 3.5m and some have 83%?

Each of those ability ratings is based on only 8 observations for each person. That’s a ridiculously small sample for binary data. Some of those people who only scored once would likely do better if they tried again, and Kumbu, who scored 7, might not do so well. We can get better estimates by making throwing ability a random variable.

Random variables and partial pooling

For our first analysis, we ignored the fact that we had 105 people and pooled all the data to calculate a single intercept. In the last analysis, we did no pooling, treating each person separately. Now we are going to use an approach sometimes called “partial pooling”: we’ll still calculate a value for each person, but it will be based on that person’s results combined with the overall parameter values.

For this we need a model to describe how an individual’s throwing ability relates to everyone else’s. Many things in nature are normally distributed, such as human heights, and we’ll assume that throwing ability on the logit scale is also normally distributed with some mean and standard deviation, which we will estimate. $$\beta_{0,person} \sim {\rm Normal}(\mu, \sigma)$$Now we are treating throwing ability as a random variable.

We modify the JAGS code so that each b0[j] is drawn from a normal distribution with mean mu and standard deviation sigma. The dnorm distribution in JAGS works with precision (tau) instead of standard deviation, and we’ll need to do the conversion.

for(j in 1:Nperson) {
   b0[j] ~ dnorm(mu, tau)
}

Now we need priors for mu and sigma; these are sometimes called “hyper-priors”. Actually, mu is the logit-scale value of the average ability of the participants, and that’s a probability; so we can again use a Beta prior for this and convert to the logit scale to get mu.

p0 ~ dbeta(1,1)
mu <- logit(p0)

For sigma we’ll use a uniform prior over a suitable range and check that the posterior is not constrained. We then calculate tau.

sigma ~ dunif(0, 5)
tau <- 1/sigma^2

Let’s look at the diagram:

So now we can put together the code for the JAGS model:

# Save this in the file "sox_DHPrand.jags"
model{
   # Likelihood
   for(i in 1:N) {
     logit(p[i]) <- b0[person[i]] + 
                          bDist * distance[i] +
                          bHand * hand[i]
     y[i] ~ dbern(p[i])
   }
   # Priors
   for(j in 1:Nperson) {
     b0[j] ~ dnorm(mu, tau)
   }  
   bDist ~ dunif(-5, 5)
   bHand ~ dunif(-5, 5)
   # Hyperpriors
   pMean ~ dbeta(1,1)
   mu <- logit(pMean)
   sigma ~ dunif(0, 5)
   tau <- 1/sigma^2
 }

We can use the same data but need to change the parameters wanted to include mu and sigma. Models with random variables always seem to mix poorly and to need lots more iterations to get decent results. Here we use 1000 adaptation/burn-in iterations, followed by 100,000 thinned by 10, so we retain 10,000 per chain. This takes about 6 mins to run.

wanted <- c("bDist", "bHand", "mu", "sigma", "b0")

( soxDHPrand <- jags(jagsData, NULL, wanted, "sox_DHPrand.jags",
  n.chains=3, n.adapt=1000,n.iter=1e5, n.thin=10, parallel=TRUE, DIC=FALSE
# ...
#           mean    sd   2.5%    50%  97.5%   Rhat n.eff
# bDist   -0.967 0.081 -1.130 -0.966 -0.812  1.000 30000
# bHand    0.399 0.160  0.083  0.399  0.712  1.001  1910
# mu      -0.472 0.120 -0.706 -0.471 -0.241  1.001  2150
# sigma    0.337 0.164  0.020  0.343  0.644  1.001  1922
# b0[1]   -0.427 0.344 -1.134 -0.435  0.300  1.000 13461
# b0[2]   -0.214 0.385 -0.855 -0.274  0.674  1.000 24808
# ...
diagPlot(soxDHPrand)
jags.View(soxDHPrand)

The diagnostic plots look ok, though we note that the green chain has sigma going down to zero and sticking there for hundreds of iterations. With random effects, sigma can be zero and sigma often mixes poorly. The effect of sigma = 0 shows up in the trace plots for the b0‘s.

Now we look at the new results for throwing ability:

# Extract the b0's and convert to probabilities
p0 <- plogis(soxDHPrand$sims.list$b0)
abilityMean <- colMeans(p0)
hist(abilityMean, col='lightgrey')
range(abilityMean)
# [1] 0.3288152 0.4686253

Now our estimates of throwing ability are much more closely grouped. People with very low or very high raw scores are probably closer to the average than the 8 data points indicate.

The effect of gender

Now that we have a model which includes persons, we can add a gender effect. Gender is an attribute of each person. Average throwing ability $\mu$ may differ between men and women, and the spread $\sigma$ may also differ. $$\beta_{0,person} \sim {\rm Normal}(\mu_g, \sigma_g)$$ where $g$ indexes gender.

We modify the JAGS code above to have separate values for pMean and sigma for males and females.


# Save this in the file "sox_DHPGrand.jags"
model{
   # Likelihood
   for(i in 1:N) {
     logit(p[i]) <- b0[person[i]] + 
                          bDist * distance[i] +
                          bHand * hand[i]
     y[i] ~ dbern(p[i])
   }
   # Priors
   for(j in 1:Nperson) {
     b0[j] ~ dnorm(mu[gender[j]], tau[gender[j]])
   }  
   bDist ~ dunif(-5, 5)
   bHand ~ dunif(-5, 5)
   # Hyperpriors
   for(g in 1:2) {   # Different for male and female
     pMean[g] ~ dbeta(1,1)
     mu[g] <- logit(pMean[g])
     sigma[g] ~ dunif(0, 5)
     tau[g] <- 1/sigma[g]^2
   }
 }

The R code is similar, but we need to prepare the gender vector of length 105 and pass this to JAGS:

# Organise gender data
tmp <- as.numeric(sox$gender) # Convert factor to numbers
gender <- tapply(tmp, sox$Name, mean)
table(gender)
# gender
#  1  2 
# 46 59
jagsData <- list(N=N, y = sox$result, distance = distanceC, 
    hand = hand, Nperson = Nperson, person = person, gender=gender)
str(jagsData)
# List of 7
#  $ N       : int 840
#  $ y       : int [1:840] 1 1 0 0 0 0 0 0 1 1 ...
#  $ distance: num [1:840] -1.5 -1.5 -0.5 -0.5 0.5 0.5 1.5 1.5 -1.5 -1.5 ...
#  $ hand    : num [1:840] 1 0 1 0 1 0 1 0 1 0 ...
#  $ Nperson : int 105
#  $ person  : num [1:840] 50 50 50 50 50 50 50 50 74 74 ...
#  $ gender  : num [1:105(1d)] 2 2 2 2 2 2 2 2 1 1 ...
#   ..- attr(*, "dimnames")=List of 1
#   .. ..$ : chr [1:105] "Abid" "Aiyat" "Aji" "Akmal" ...

The output from tapply is a one-dimensional array, which is an odd beast, but JAGS copes with it. It also has the names of the people as an attribute, which will be ignored by JAGS.

wanted <- c("bDist", "bHand", "pMean", "mu", "sigma", "b0")
( soxDHPGrand <- jags(jagsData, NULL, wanted,
    "sox_DHPGrand.jags",
    n.chains=3, n.adapt=1000,n.iter=1e5, n.thin=10,
    parallel=TRUE, DIC=FALSE) )
# ...
#            mean    sd   2.5%    50%  97.5%   Rhat n.eff
# bDist    -0.979 0.080 -1.137 -0.979 -0.825  1.000 22757
# bHand     0.400 0.160  0.087  0.401  0.713  1.000  6872
# pMean[1]  0.311 0.034  0.247  0.310  0.378  1.001  2158
# pMean[2]  0.445 0.035  0.378  0.445  0.514  1.000 30000
# mu[1]    -0.801 0.158 -1.112 -0.798 -0.497  1.001  2340
# mu[2]    -0.223 0.141 -0.498 -0.220  0.054  1.001 30000
# sigma[1]  0.280 0.190  0.011  0.258  0.692  1.009   239
# sigma[2]  0.295 0.183  0.013  0.281  0.673  1.002   846
# b0[1]    -0.220 0.325 -0.905 -0.219  0.452  1.000 29214
# b0[2]    -0.042 0.364 -0.629 -0.102  0.829  1.001  6628
# ...
diagPlot(soxDHPGrand)
jags.View(soxDHPGrand)

The density plots for sigma are ugly and the n.eff figure is far too low. I ran the model overnight with 5 million iterations (thinned by 500 to keep the output reasonably sized) and got adequate n.eff‘s and good plots for sigma:

Here are the key results for the longer run:

           mean    sd   2.5%    50%  97.5%    Rhat n.eff
bDist    -0.979 0.081 -1.139 -0.978 -0.826   1.000 30000
bHand     0.400 0.158  0.092  0.399  0.714   1.000  9144
pMean[1]  0.311 0.033  0.247  0.310  0.378   1.000 30000
pMean[2]  0.445 0.034  0.379  0.445  0.513   1.000 23439
mu[1]    -0.801 0.157 -1.112 -0.800 -0.498   1.000 30000
mu[2]    -0.220 0.139 -0.493 -0.220  0.054   1.000 24150
sigma[1]  0.283 0.186  0.014  0.261  0.688   1.000 20583
sigma[2]  0.302 0.183  0.016  0.289  0.686   1.000 22767
b0[1]    -0.217 0.329 -0.896 -0.219  0.476   1.000 30000
b0[2]    -0.031 0.367 -0.628 -0.089  0.852   1.000 18008

Just looking at the values for pMean – and in particular at the credible intervals – it’s clear that for participants in our 2016 Boot Camps, the men are better at throwing socks than the women. But the spread of abilities, sigma, is the same. We can plot pMean, and also calculate the difference and plot that:

# Convert to mcmcOutput and plot
library(mcmcOutput)
bw <- mcmcOutput(soxDHPGrand)
plot(bw, "pMean", xlim=c(0.2, 0.55))
diff <- bw$pMean[,2] - bw$pMean[,1]
postPlot(diff)

Plot the effect of distance and gender

It’s not easy to grasp what these numbers mean, and plotting them is a always a good idea. We’ll plot the probability of success versus distance from the box for men and women.

# Get the MCMC chains we need
mu <- soxDHPGrand$sims.list$mu
head(mu)
#            [,1]       [,2]
# [1,] -0.7581018 -0.2172109
# [2,] -0.8666499 -0.1790211
# [3,] -0.8636711 -0.2588688
# [4,] -0.7853515 -0.2554216
# [5,] -0.7410710 -0.2940089
# [6,] -0.7133230 -0.3651503
bDist <- soxDHPGrand$sims.list$bDist
head(bDist)
# [1] -0.8936494 -0.9633547 -0.9108005 -0.9209869 -1.0160170 -0.9343064
# Do values for the x axis and centre them
xx <- seq(2, 5, length=51)
xxC <- xx - 3.5
lf <- mean(mu[, 1]) + mean(bDist) * xxC
pf <- plogis(lf)
lm <- mean(mu[, 2]) + mean(bDist) * xxC
pm <- plogis(lm)
plot(xx, pm, type='l', lwd=3, col='blue', 
  ylim=range(pf, pm), las=1,
  xlab="Distance", ylab="Probability of success")
lines(xx, pf, lwd=3, col='red')
legend('topright', c("women", "men"), col=c("red", "blue"),
  lwd=3, bty='n')

We can see how the probability of getting the sock in the box changes with distance for both men and women according to the prediction of our model. But that plot does not give an indication of the uncertainty of the predictions. We could calculate the credible intervals and add those to the plot, but an alternative is to plot a selection of posterior predictive curves. Each of these uses the values of mu and bDist from one step in the MCMC chain.

# Add 50 posterior predictive curves
toplot <- sample.int(length(bDist), 50)
for(i in 1:50) {
  pf1 <- plogis(mu[toplot[i], 1] + bDist[toplot[i]] * xxC)
  lines(xx, pf1, col='pink')
  pm1 <- plogis(mu[toplot[i], 2] + bDist[toplot[i]] * xxC)
  lines(xx, pm1, col='skyblue')
}
# Redraw the main lines
lines(xx, pf, lwd=3, col='red')
lines(xx, pm, lwd=3, col='blue')

We could continue to extend our model, for example to allow for differences in the bDist or bHand coefficients with gender or interaction between distance and hand. However, we’ll leave the socks here and go on to look at another type of hierarchical model, a simple occupancy model.

Download a ZIP file with the code here.

12 thoughts on “Hierarchical models”

  1. Hi Mike,
    I wonder, is it useful to have algebra formulas?
    Like
    y = a + b*(dist)
    Z~dbern(y)
    I feel sometime it help me, ask when read the paper.
    Best,
    Soy

    1. I must say I prefer to look at the JAGS code rather than algebra. I’ve put the algebra in now, with considerable simplification. In the last equation, $\mu_g$ should really be $\mu_{g(person(i))}$. I find Kruschke-style diagrams more helpful and plan to add those.

  2. Hi Mike,
    When I try to run the following code I obtain the following error:

    ability <- plogis(soxDHPfixed$sims.list$b0)
    Error in plogis(soxDHPfixed$sims.list$b0) : 
      Non-numeric argument to mathematical function

    Maybe, is this related to the fact that in the following code we have p0 values and not b0, even if b0 is calculated starting from p0?

    str(soxDHPfixed)
    List of 22
     $ sims.list  :List of 3
      ..$ p0   : num [1:30000, 1:105] 0.276 0.177 0.487 0.337 0.622 ...
      ..$ bDist: num [1:30000] -1.23 -1.22 -1.13 -1.26 -1.16 ...
      ..$ bHand: num [1:30000] 0.312 0.308 0.356 0.507 0.412 ...
     $ mean       :List of 3
      ..$ p0   : num [1:105(1d)] 0.479 0.721 0.359 0.149 0.6 ...
      ..$ bDist: num -1.14
      ..$ bHand: num 0.257
     $ sd         :List of 3
      ..$ p0   : num [1:105(1d)] 0.164 0.145 0.157 0.107 0.163 ...
      ..$ bDist: num 0.0869
      ..$ bHand: num 0.153

    Thank you.

    1. Thanks for reporting that mistake in the web page code Marcello. For the ‘fixed effect’ model, we can monitor either p0 or b0, and chose p0, but then b0 appeared in the post-processing. Now fixed.
      The code in the downloadable ZIP file is correct.

  3. Hi Mike,
    I’m curious about one thing and I would like to have your opinion. As far as the effect of the gender is concerned, to solve the problem regarding the absence of convergence for sigma, I tried to run a model with 20 chains and 10.000 adaptations (sorry, I don’t have the figure). Results showed that the issue was almost solved (i.e., Rhat ~ 1 and MCE% = 1.54). So maybe through the implementation of further chains (along with adaptations) such issue may have been solved even without increasing the number of iteractions and thinning. I assume that, as reported in your previous posts, the solution needs to be evaluated case by case, but may the increase of number of chains be considered as an adequate solution to solve convergence problems? or increasing the number of iteractions along with thinning still represent the best solution? Many thanks!

    1. When mixing is poor, the key number if the total iterations after adaptation and burn-in and before thinning across all chains. For the overnight run I used 3 chains with 5 million iterations (after adaptation), total 15 million.

      You are right that I could have used more chains, say 15 chains with 1 million iterations each, and still got 15 million total. But on my little laptop, with only 4 cores (aka “logical processors”), that would not have been more efficient. It would have been a good strategy on my 24-core desktop.

      For maximum efficiency (ie, minimum time for a fixed number of iterations), the number of chains should be somewhat less than the number of cores available on the machine – “somewhat less” because the operating system needs a bit of CPU capacity too. I find running 20 chains on the 24-core box gives 99% CPU loading.

      Running more chains than cores is less efficient, because each chain takes time for compilation and adaptation before it starts producing usable iterations. That becomes an issue if chains are “queuing up” to use the CPU.

  4. Thank you so much for these tutorials. I started learning bayesian just because of your tutorials. Easy explanations, no jargon and replicable codes. With this current tutorial, I am trying to use partial pooling for one of my own datasets. I am able to run the models but I have not been able to plot SE or CI on the plots. If you can show me how to do that it will be a great help.

    [comment edited by Admin]

    Thank you
    Jenis

    1. Glad you are finding the tutorials useful Jenis.

      We don’t use confidence intervals – that’s a strange frequentist concept – and use credible intervals instead. Here’s code to plot the credible intervals, following on from the main code:

      # Get the data to plot (mean and credible interval)
      mcrif <- mcrim <- matrix(NA, 51, 3)
      for(i in 1:51) {
        # for females
        lf2 <- mu[,1] + bDist * xxC[i]
        pf2 <- plogis(lf2)
        mcrif[i, 1] <- mean(pf2)
        mcrif[i, 2:3] <- hdi(pf2)
        # for males
        lm2 <- mu[,2] + bDist * xxC[i]
        pm2 <- plogis(lm2)
        mcrim[i, 1] <- mean(pm2)
        mcrim[i, 2:3] <- hdi(pm2)
      }
      
      # Plot semi-transparent polygons first, then add the lines
      plot(xx, pm, type='n', ylim=range(mcrif, mcrim), las=1,
        xlab="Distance", ylab="Probability of success")
      polygon(x=c(xx, rev(xx)), y=c(mcrif[, 2], rev(mcrif[, 3])),
        col=adjustcolor('pink', 0.5), border=NA)
      polygon(x=c(xx, rev(xx)), y=c(mcrim[, 2], rev(mcrim[, 3])),
        col=adjustcolor('skyblue', 0.5), border=NA)
      # Now add the lines
      lines(xx, mcrif[,1], lwd=3, col='red')
      lines(xx, mcrim[,1], lwd=3, col='blue')
      legend('topright', c("women", "men"), col=c("red", "blue"),
        lwd=3, bty='n')

Leave a Reply to Marcello Franchini Cancel reply

Your email address will not be published.

The maximum upload file size: 1 MB. You can upload: image, document, text, archive. Drop file here