# Hierarchical models

We return to the socks-in-the-box example, this time recognizing that we have data from 105 people with differing abilities. We’ll treat the data as 105 blocks each with 8 observations.

### The ‘person’ effect

We’ll define a person’s “throwing ability” as the probability the sock goes in the box when throwing from 3.5m with their bad hand. Recall that we centred the distance data by subtracting the mean, 3.5m, so that the intercept corresponds to the probability of success when distance is 3.5m. We modify the model so that each person has their own intercept, corresponding to their ability. $${\rm logit}(p_i) = \beta_{0, person_i} + \beta_{distance} \times distance_i + \beta_{hand} \times hand_i$$

This model is very similar to the previous one except that we have 105 different values for $\beta_0$ and $p_0$: The new JAGS code looks like this:


# Save this in the file "sox_DHPfixed.jags"
model{
# Likelihood
for(i in 1:N) {
logit(p[i]) <- b0[person[i]] +
bDist * distance[i] +
bHand * hand[i]
y[i] ~ dbern(p[i])
}
# Priors
for(j in 1:Nperson) {
p0[j] ~ dbeta(1, 1)
b0[j] <- logit(p0[j])
}

bDist ~ dunif(-5, 5)
bHand ~ dunif(-5, 5)
}

JAGS syntax: [person[i]] is an example of nested indexing, a useful technique in JAGS: person is a vector of length 840 with index numbers indicating who made each throw. The first person in the data set is Laura, but the indices are in alphabetical order, so Laura is #50. Thus person = 50 and JAGS uses b0 as the intercept. Notice that we have also wrapped the long line, leaving the + sign at the end of the incomplete line.

Now here’s the R code to prepare the data and run JAGS. It follows on from the code on the previous page:

# Prepare data for hierarchical model
( Nperson <- nlevels(sox$Name) ) #  105 table(table(sox$Name))
#   8
# 105 # 8 observations for each of the 105 participants

# Convert Name to an index number
person <- as.numeric(sox$Name) head(person, 20) #  50 50 50 50 50 50 50 50 74 74 74 #  74 74 74 74 74 104 104 104 104 jagsData <- list(N=N, y = sox$result, distance = distanceC,
hand = hand, Nperson = Nperson, person = person)
str(jagsData)
# List of 6
# $N : int 840 #$ y       : int [1:840] 1 1 0 0 0 0 0 0 1 1 ...
# $distance: num [1:840] -1.5 -1.5 -0.5 -0.5 0.5 0.5 ... #$ hand    : num [1:840] 1 0 1 0 1 0 1 0 1 0 ...
# $Nperson : int 105 #$ person  : num [1:840] 50 50 50 50 50 50 50 50 74 74 ...

wanted <- c("bDist", "bHand", "p0")

( soxDHPfixed <- jags(jagsData, NULL, wanted,
"sox_DHPfixed.jags",
n.chains=3, n.iter=10000, parallel=TRUE, DIC=FALSE) )
# ...
#           mean    sd   2.5%    50%  97.5%   Rhat n.eff
# bDist   -1.135 0.086 -1.308 -1.135 -0.968  1.000 30000
# bHand    0.257 0.154 -0.043  0.258  0.561  1.000 30000
# p0    0.479 0.164  0.176  0.475  0.797  1.000 15214
# p0    0.719 0.145  0.388  0.739  0.940  1.000  6152
# ...
diagPlot(soxDHPfixed)
jags.View(soxDHPfixed)# Opens new 'spreadsheet' window

The default adaptation (100 iterations) is adequate and we don’t need extra burn-in. We now have 105 values for throwing ability, p0, one for each person, though we have only shown the results for the first two above. The diagnostics are all good, and there’s no sign that the posteriors for bDist and bHand are constrained by the range of the priors.

Now we’ll look in more detail at the ability parameter. The next bit of R code pulls out the MCMC chains for p0 and gets the posterior means:

ability <- plogis(soxDHPfixed$sims.list$b0)
abilityMean <- colMeans(ability)
hist(abilityMean, col='lightgrey')
range(abilityMean)
#  0.1464365 0.8256393

The histogram looks a bit ‘blocky’ because people who got the sock in the box the same number of times have about the same ability. For example, the 23 people who got it in 4 times have ability ratings close to 0.48. The range is very wide: do we really believe that some have only a 14% chance of getting the sock in the box from 3.5m and some have 83%?

Each of those ability ratings is based on only 8 observations for each person. That’s a ridiculously small sample for binary data. Some of those people who only scored once would likely do better if they tried again, and Kumbu, who scored 7, might not do so well. We can get better estimates by making throwing ability a random variable.

### Random variables and partial pooling

For our first analysis, we ignored the fact that we had 105 people and pooled all the data to calculate a single intercept. In the last analysis, we did no pooling, treating each person separately. Now we are going to use an approach sometimes called “partial pooling”: we’ll still calculate a value for each person, but it will be based on that person’s results combined with the overall parameter values.

For this we need a model to describe how an individual’s throwing ability relates to everyone else’s. Many things in nature are normally distributed, such as human heights, and we’ll assume that throwing ability on the logit scale is also normally distributed with some mean and standard deviation, which we will estimate. $$\beta_{0,person} \sim {\rm Normal}(\mu, \sigma)$$Now we are treating throwing ability as a random variable.

We modify the JAGS code so that each b0[j] is drawn from a normal distribution with mean mu and standard deviation sigma. The dnorm distribution in JAGS works with precision (tau) instead of standard deviation, and we’ll need to do the conversion.

for(j in 1:Nperson) {
b0[j] ~ dnorm(mu, tau)
}

Now we need priors for mu and sigma; these are sometimes called “hyper-priors”. Actually, mu is the logit-scale value of the average ability of the participants, and that’s a probability; so we can again use a Beta prior for this and convert to the logit scale to get mu.

p0 ~ dbeta(1,1)
mu <- logit(p0)

For sigma we’ll use a uniform prior over a suitable range and check that the posterior is not constrained. We then calculate tau.

sigma ~ dunif(0, 5)
tau <- 1/sigma^2

Let’s look at the diagram: So now we can put together the code for the JAGS model:

# Save this in the file "sox_DHPrand.jags"
model{
# Likelihood
for(i in 1:N) {
logit(p[i]) <- b0[person[i]] +
bDist * distance[i] +
bHand * hand[i]
y[i] ~ dbern(p[i])
}
# Priors
for(j in 1:Nperson) {
b0[j] ~ dnorm(mu, tau)
}
bDist ~ dunif(-5, 5)
bHand ~ dunif(-5, 5)
# Hyperpriors
pMean ~ dbeta(1,1)
mu <- logit(pMean)
sigma ~ dunif(0, 5)
tau <- 1/sigma^2
}

We can use the same data but need to change the parameters wanted to include mu and sigma. Models with random variables always seem to mix poorly and to need lots more iterations to get decent results. Here we use 1000 adaptation/burn-in iterations, followed by 100,000 thinned by 10, so we retain 10,000 per chain. This takes about 6 mins to run.

wanted <- c("bDist", "bHand", "mu", "sigma", "b0")

( soxDHPrand <- jags(jagsData, NULL, wanted, "sox_DHPrand.jags",
# ...
#           mean    sd   2.5%    50%  97.5%   Rhat n.eff
# bDist   -0.967 0.081 -1.130 -0.966 -0.812  1.000 30000
# bHand    0.399 0.160  0.083  0.399  0.712  1.001  1910
# mu      -0.472 0.120 -0.706 -0.471 -0.241  1.001  2150
# sigma    0.337 0.164  0.020  0.343  0.644  1.001  1922
# b0   -0.427 0.344 -1.134 -0.435  0.300  1.000 13461
# b0   -0.214 0.385 -0.855 -0.274  0.674  1.000 24808
# ...
diagPlot(soxDHPrand)
jags.View(soxDHPrand)

The diagnostic plots look ok, though we note that the green chain has sigma going down to zero and sticking there for hundreds of iterations. With random effects, sigma can be zero and sigma often mixes poorly. The effect of sigma = 0 shows up in the trace plots for the b0‘s.

Now we look at the new results for throwing ability:

# Extract the b0's and convert to probabilities
p0 <- plogis(soxDHPrand$sims.list$b0)
abilityMean <- colMeans(p0)
hist(abilityMean, col='lightgrey')
range(abilityMean)
#  0.3288152 0.4686253

Now our estimates of throwing ability are much more closely grouped. People with very low or very high raw scores are probably closer to the average than the 8 data points indicate.

### The effect of gender

Now that we have a model which includes persons, we can add a gender effect. Gender is an attribute of each person. Average throwing ability $\mu$ may differ between men and women, and the spread $\sigma$ may also differ. $$\beta_{0,person} \sim {\rm Normal}(\mu_g, \sigma_g)$$ where $g$ indexes gender.

We modify the JAGS code above to have separate values for pMean and sigma for males and females.


# Save this in the file "sox_DHPGrand.jags"
model{
# Likelihood
for(i in 1:N) {
logit(p[i]) <- b0[person[i]] +
bDist * distance[i] +
bHand * hand[i]
y[i] ~ dbern(p[i])
}
# Priors
for(j in 1:Nperson) {
b0[j] ~ dnorm(mu[gender[j]], tau[gender[j]])
}
bDist ~ dunif(-5, 5)
bHand ~ dunif(-5, 5)
# Hyperpriors
for(g in 1:2) {   # Different for male and female
pMean[g] ~ dbeta(1,1)
mu[g] <- logit(pMean[g])
sigma[g] ~ dunif(0, 5)
tau[g] <- 1/sigma[g]^2
}
}

The R code is similar, but we need to prepare the gender vector of length 105 and pass this to JAGS:

# Organise gender data
tmp <- as.numeric(sox$gender) # Convert factor to numbers gender <- tapply(tmp, sox$Name, mean)
table(gender)
# gender
#  1  2
# 46 59
jagsData <- list(N=N, y = sox$result, distance = distanceC, hand = hand, Nperson = Nperson, person = person, gender=gender) str(jagsData) # List of 7 #$ N       : int 840
#  $y : int [1:840] 1 1 0 0 0 0 0 0 1 1 ... #$ distance: num [1:840] -1.5 -1.5 -0.5 -0.5 0.5 0.5 1.5 1.5 -1.5 -1.5 ...
#  $hand : num [1:840] 1 0 1 0 1 0 1 0 1 0 ... #$ Nperson : int 105
#  $person : num [1:840] 50 50 50 50 50 50 50 50 74 74 ... #$ gender  : num [1:105(1d)] 2 2 2 2 2 2 2 2 1 1 ...
#   ..- attr(*, "dimnames")=List of 1