We return to the socks-in-the-box example, this time recognizing that we have data from 105 people with differing abilities. We’ll treat the data as 105 **blocks** each with 8 observations.

### The ‘person’ effect

We’ll define a person’s “throwing ability” as the probability the sock goes in the box when throwing from 3.5m with their bad hand. Recall that we centred the distance data by subtracting the mean, 3.5m, so that the intercept corresponds to the probability of success when distance is 3.5m. We modify the model so that each person has their own intercept, corresponding to their ability. $${\rm logit}(p_i) = \beta_{0, person_i} + \beta_{distance} \times distance_i + \beta_{hand} \times hand_i$$

This model is very similar to the previous one except that we have 105 different values for $\beta_0$ and $p_0$:

The new JAGS code looks like this:

**
# Save this in the file "sox_DHPfixed.jags"
model{
# Likelihood
for(i in 1:N) {
logit(p[i]) <- b0[person[i]] +
bDist * distance[i] +
bHand * hand[i]
y[i] ~ dbern(p[i])
}
# Priors
for(j in 1:Nperson) {
p0[j] ~ dbeta(1, 1)
b0[j] <- logit(p0[j])
}
bDist ~ dunif(-5, 5)
bHand ~ dunif(-5, 5)
}**

*JAGS syntax: *`[person[i]]`

* is an example of nested indexing, a useful technique in JAGS: *`person `

*is a vector of length 840 with index numbers indicating who made each throw. The first person in the data set is Laura, but the indices are in alphabetical order, so Laura is #50. Thus *`person[1] = 50`

* and JAGS uses *`b0[50]`

* as the intercept. Notice that we have also wrapped the long line, leaving the *`+`

* sign at the end of the incomplete line.*

Now here’s the R code to prepare the data and run JAGS. It follows on from the code on the previous page:

**# Prepare data for hierarchical model
( Nperson <- nlevels(sox$Name) )
# [1] 105
table(table(sox$Name))
# 8
# 105 # 8 observations for each of the 105 participants
# Convert Name to an index number
person <- as.numeric(sox$Name)
head(person, 20)
# [1] 50 50 50 50 50 50 50 50 74 74 74
# [12] 74 74 74 74 74 104 104 104 104
jagsData <- list(N=N, y = sox$result, distance = distanceC,
hand = hand, Nperson = Nperson, person = person)
str(jagsData)
# List of 6
# $ N : int 840
# $ y : int [1:840] 1 1 0 0 0 0 0 0 1 1 ...
# $ distance: num [1:840] -1.5 -1.5 -0.5 -0.5 0.5 0.5 ...
# $ hand : num [1:840] 1 0 1 0 1 0 1 0 1 0 ...
# $ Nperson : int 105
# $ person : num [1:840] 50 50 50 50 50 50 50 50 74 74 ...
wanted <- c("bDist", "bHand", "p0")
( soxDHPfixed <- jags(jagsData, NULL, wanted,
"sox_DHPfixed.jags",
n.chains=3, n.iter=10000, parallel=TRUE, DIC=FALSE) )
# ...
# mean sd 2.5% 50% 97.5% Rhat n.eff
# bDist -1.135 0.086 -1.308 -1.135 -0.968 1.000 30000
# bHand 0.257 0.154 -0.043 0.258 0.561 1.000 30000
# p0[1] 0.479 0.164 0.176 0.475 0.797 1.000 15214
# p0[2] 0.719 0.145 0.388 0.739 0.940 1.000 6152
# ...
diagPlot(soxDHPfixed)
jags.View(soxDHPfixed)# Opens new 'spreadsheet' window**

The default adaptation (100 iterations) is adequate and we don’t need extra burn-in. We now have 105 values for throwing ability, `p0`

, one for each person, though we have only shown the results for the first two above. The diagnostics are all good, and there’s no sign that the posteriors for `bDist `

and `bHand`

are constrained by the range of the priors.

Now we’ll look in more detail at the ability parameter. The next bit of R code pulls out the MCMC chains for `p0 `

and gets the posterior means:

**ability <- soxDHPfixed$sims.list$p0 # corrected
abilityMean <- colMeans(ability)
hist(abilityMean, col='lightgrey')
range(abilityMean)
# [1] 0.1464365 0.8256393**

The histogram looks a bit ‘blocky’ because people who got the sock in the box the same number of times have about the same ability. For example, the 23 people who got it in 4 times have ability ratings close to 0.48. The range is very wide: do we really believe that some have only a 14% chance of getting the sock in the box from 3.5m and some have 83%?

Each of those ability ratings is based on only 8 observations for each person. That’s a ridiculously small sample for binary data. Some of those people who only scored once would likely do better if they tried again, and Kumbu, who scored 7, might not do so well. We can get better estimates by making throwing ability a random variable.

### Random variables and partial pooling

For our first analysis, we ignored the fact that we had 105 people and pooled all the data to calculate a single intercept. In the last analysis, we did no pooling, treating each person separately. Now we are going to use an approach sometimes called “partial pooling”: we’ll still calculate a value for each person, but it will be based on that person’s results combined with the overall parameter values.

For this we need a model to describe how an individual’s throwing ability relates to everyone else’s. Many things in nature are normally distributed, such as human heights, and we’ll assume that throwing ability on the logit scale is also normally distributed with some mean and standard deviation, which we will estimate. $$\beta_{0,person} \sim {\rm Normal}(\mu, \sigma)$$Now we are treating throwing ability as a *random variable*.

We modify the JAGS code so that each `b0[j]`

is drawn from a normal distribution with mean `mu`

and standard deviation `sigma`

. The `dnorm`

distribution in JAGS works with precision (`tau`

) instead of standard deviation, and we’ll need to do the conversion.

**for(j in 1:Nperson) {
b0[j] ~ dnorm(mu, tau)
}**

Now we need priors for `mu`

and `sigma`

; these are sometimes called “hyper-priors”. Actually, `mu`

is the logit-scale value of the average ability of the participants, and that’s a probability; so we can again use a Beta prior for this and convert to the logit scale to get `mu`

.

**p0 ~ dbeta(1,1)
mu <- logit(p0)**

For `sigma`

we’ll use a uniform prior over a suitable range and check that the posterior is not constrained. We then calculate `tau`

.

**sigma ~ dunif(0, 5)
tau <- 1/sigma^2**

Let’s look at the diagram:

So now we can put together the code for the JAGS model:

**# Save this in the file "sox_DHPrand.jags"
model{
# Likelihood
for(i in 1:N) {
logit(p[i]) <- b0[person[i]] +
bDist * distance[i] +
bHand * hand[i]
y[i] ~ dbern(p[i])
}
# Priors
for(j in 1:Nperson) {
b0[j] ~ dnorm(mu, tau)
}
bDist ~ dunif(-5, 5)
bHand ~ dunif(-5, 5)
# Hyperpriors
pMean ~ dbeta(1,1)
mu <- logit(pMean)
sigma ~ dunif(0, 5)
tau <- 1/sigma^2
}**

We can use the same data but need to change the parameters wanted to include `mu`

and `sigma`

. Models with random variables always seem to mix poorly and to need lots more iterations to get decent results. Here we use 1000 adaptation/burn-in iterations, followed by 100,000 thinned by 10, so we retain 10,000 per chain. This takes about 6 mins to run.

**wanted <- c("bDist", "bHand", "mu", "sigma", "b0")
( soxDHPrand <- jags(jagsData, NULL, wanted, "sox_DHPrand.jags",
n.chains=3, n.adapt=1000,n.iter=1e5, n.thin=10, parallel=TRUE, DIC=FALSE
# ...
# mean sd 2.5% 50% 97.5% Rhat n.eff
# bDist -0.967 0.081 -1.130 -0.966 -0.812 1.000 30000
# bHand 0.399 0.160 0.083 0.399 0.712 1.001 1910
# mu -0.472 0.120 -0.706 -0.471 -0.241 1.001 2150
# sigma 0.337 0.164 0.020 0.343 0.644 1.001 1922
# b0[1] -0.427 0.344 -1.134 -0.435 0.300 1.000 13461
# b0[2] -0.214 0.385 -0.855 -0.274 0.674 1.000 24808
# ...
diagPlot(soxDHPrand)
jags.View(soxDHPrand)**

The diagnostic plots look ok, though we note that the green chain has sigma going down to zero and sticking there for hundreds of iterations. With random effects, sigma can be zero and sigma often mixes poorly. The effect of sigma = 0 shows up in the trace plots for the `b0`

‘s.

Now we look at the new results for throwing ability:

**# Extract the b0's and convert to probabilities
p0 <- plogis(soxDHPrand$sims.list$b0)
abilityMean <- colMeans(p0)
hist(abilityMean, col='lightgrey')
range(abilityMean)
# [1] 0.3288152 0.4686253**

Now our estimates of throwing ability are much more closely grouped. People with very low or very high raw scores are probably closer to the average than the 8 data points indicate.

### The effect of gender

Now that we have a model which includes persons, we can add a gender effect. Gender is an attribute of each person. Average throwing ability $\mu$ may differ between men and women, and the spread $\sigma$ may also differ. $$\beta_{0,person} \sim {\rm Normal}(\mu_g, \sigma_g)$$ where $g$ indexes gender.

We modify the JAGS code above to have separate values for `pMean`

and `sigma`

for males and females.

**
# Save this in the file "sox_DHPGrand.jags"
model{
# Likelihood
for(i in 1:N) {
logit(p[i]) <- b0[person[i]] +
bDist * distance[i] +
bHand * hand[i]
y[i] ~ dbern(p[i])
}
# Priors
for(j in 1:Nperson) {
b0[j] ~ dnorm(mu[gender[j]], tau[gender[j]])
}
bDist ~ dunif(-5, 5)
bHand ~ dunif(-5, 5)
# Hyperpriors
for(g in 1:2) { # Different for male and female
pMean[g] ~ dbeta(1,1)
mu[g] <- logit(pMean[g])
sigma[g] ~ dunif(0, 5)
tau[g] <- 1/sigma[g]^2
}
}**

The R code is similar, but we need to prepare the gender vector of length 105 and pass this to JAGS:

**# Organise gender data
tmp <- as.numeric(sox$gender) # Convert factor to numbers
gender <- tapply(tmp, sox$Name, mean)
table(gender)
# gender
# 1 2
# 46 59
jagsData <- list(N=N, y = sox$result, distance = distanceC,
hand = hand, Nperson = Nperson, person = person, gender=gender)
str(jagsData)
# List of 7
# $ N : int 840
# $ y : int [1:840] 1 1 0 0 0 0 0 0 1 1 ...
# $ distance: num [1:840] -1.5 -1.5 -0.5 -0.5 0.5 0.5 1.5 1.5 -1.5 -1.5 ...
# $ hand : num [1:840] 1 0 1 0 1 0 1 0 1 0 ...
# $ Nperson : int 105
# $ person : num [1:840] 50 50 50 50 50 50 50 50 74 74 ...
# $ gender : num [1:105(1d)] 2 2 2 2 2 2 2 2 1 1 ...
# ..- attr(*, "dimnames")=List of 1
# .. ..$ : chr [1:105] "Abid" "Aiyat" "Aji" "Akmal" ...**

The output from `tapply`

is a one-dimensional array, which is an odd beast, but JAGS copes with it. It also has the names of the people as an attribute, which will be ignored by JAGS.

**wanted <- c("bDist", "bHand", "pMean", "mu", "sigma", "b0")
( soxDHPGrand <- jags(jagsData, NULL, wanted,
"sox_DHPGrand.jags",
n.chains=3, n.adapt=1000,n.iter=1e5, n.thin=10,
parallel=TRUE, DIC=FALSE) )
# ...
# mean sd 2.5% 50% 97.5% Rhat n.eff
# bDist -0.979 0.080 -1.137 -0.979 -0.825 1.000 22757
# bHand 0.400 0.160 0.087 0.401 0.713 1.000 6872
# pMean[1] 0.311 0.034 0.247 0.310 0.378 1.001 2158
# pMean[2] 0.445 0.035 0.378 0.445 0.514 1.000 30000
# mu[1] -0.801 0.158 -1.112 -0.798 -0.497 1.001 2340
# mu[2] -0.223 0.141 -0.498 -0.220 0.054 1.001 30000
# sigma[1] 0.280 0.190 0.011 0.258 0.692 1.009 239
# sigma[2] 0.295 0.183 0.013 0.281 0.673 1.002 846
# b0[1] -0.220 0.325 -0.905 -0.219 0.452 1.000 29214
# b0[2] -0.042 0.364 -0.629 -0.102 0.829 1.001 6628
# ...
diagPlot(soxDHPGrand)
jags.View(soxDHPGrand)**

The density plots for `sigma`

are ugly and the `n.eff`

figure is far too low. I ran the model overnight with 5 million iterations (thinned by 500 to keep the output reasonably sized) and got adequate `n.eff`

‘s and good plots for `sigma`

:

Here are the key results for the longer run:

** mean sd 2.5% 50% 97.5% Rhat n.eff
bDist -0.979 0.081 -1.139 -0.978 -0.826 1.000 30000
bHand 0.400 0.158 0.092 0.399 0.714 1.000 9144
pMean[1] 0.311 0.033 0.247 0.310 0.378 1.000 30000
pMean[2] 0.445 0.034 0.379 0.445 0.513 1.000 23439
mu[1] -0.801 0.157 -1.112 -0.800 -0.498 1.000 30000
mu[2] -0.220 0.139 -0.493 -0.220 0.054 1.000 24150
sigma[1] 0.283 0.186 0.014 0.261 0.688 1.000 20583
sigma[2] 0.302 0.183 0.016 0.289 0.686 1.000 22767
b0[1] -0.217 0.329 -0.896 -0.219 0.476 1.000 30000
b0[2] -0.031 0.367 -0.628 -0.089 0.852 1.000 18008**

Just looking at the values for `pMean`

– and in particular at the credible intervals – it’s clear that for participants in our 2016 Boot Camps, the men are better at throwing socks than the women. But the spread of abilities, `sigma`

, is the same. We can plot `pMean`

, and also calculate the difference and plot that:

**# Convert to mcmcOutput and plot
library(mcmcOutput)
bw <- mcmcOutput(soxDHPGrand)
plot(bw, "pMean", xlim=c(0.2, 0.55))**

**diff <- bw$pMean[,2] - bw$pMean[,1]
postPlot(diff)**

### Plot the effect of distance and gender

It’s not easy to grasp what these numbers mean, and plotting them is a always a good idea. We’ll plot the probability of success versus distance from the box for men and women.

**# Get the MCMC chains we need
mu <- soxDHPGrand$sims.list$mu
head(mu)
# [,1] [,2]
# [1,] -0.7581018 -0.2172109
# [2,] -0.8666499 -0.1790211
# [3,] -0.8636711 -0.2588688
# [4,] -0.7853515 -0.2554216
# [5,] -0.7410710 -0.2940089
# [6,] -0.7133230 -0.3651503
bDist <- soxDHPGrand$sims.list$bDist
head(bDist)
# [1] -0.8936494 -0.9633547 -0.9108005 -0.9209869 -1.0160170 -0.9343064
# Do values for the x axis and centre them
xx <- seq(2, 5, length=51)
xxC <- xx - 3.5
lf <- mean(mu[, 1]) + mean(bDist) * xxC
pf <- plogis(lf)
lm <- mean(mu[, 2]) + mean(bDist) * xxC
pm <- plogis(lm)
plot(xx, pm, type='l', lwd=3, col='blue',
ylim=range(pf, pm), las=1,
xlab="Distance", ylab="Probability of success")
lines(xx, pf, lwd=3, col='red')
legend('topright', c("women", "men"), col=c("red", "blue"),
lwd=3, bty='n')**

We can see how the probability of getting the sock in the box changes with distance for both men and women according to the prediction of our model. But that plot does not give an indication of the uncertainty of the predictions. We could calculate the credible intervals and add those to the plot, but an alternative is to plot a selection of posterior predictive curves. Each of these uses the values of `mu`

and `bDist`

from one step in the MCMC chain.

**# Add 50 posterior predictive curves
toplot <- sample.int(length(bDist), 50)
for(i in 1:50) {
pf1 <- plogis(mu[toplot[i], 1] + bDist[toplot[i]] * xxC)
lines(xx, pf1, col='pink')
pm1 <- plogis(mu[toplot[i], 2] + bDist[toplot[i]] * xxC)
lines(xx, pm1, col='skyblue')
}
# Redraw the main lines
lines(xx, pf, lwd=3, col='red')
lines(xx, pm, lwd=3, col='blue')**

We could continue to extend our model, for example to allow for differences in the `bDist`

or `bHand`

coefficients with gender or interaction between distance and hand. However, we’ll leave the socks here and go on to look at another type of hierarchical model, a simple occupancy model.

Download a ZIP file with the code here.

Hi Mike,

I wonder, is it useful to have algebra formulas?

Like

y = a + b*(dist)

Z~dbern(y)

I feel sometime it help me, ask when read the paper.

Best,

Soy

I must say I prefer to look at the JAGS code rather than algebra. I’ve put the algebra in now, with considerable simplification. In the last equation, $\mu_g$ should really be $\mu_{g(person(i))}$. I find Kruschke-style diagrams more helpful and plan to add those.

Hi Mike,

When I try to run the following code I obtain the following error:

Maybe, is this related to the fact that in the following code we have p0 values and not b0, even if b0 is calculated starting from p0?

Thank you.

sorry, the command is: str(soxDHPfixed)

Thanks for reporting that mistake in the web page code Marcello. For the ‘fixed effect’ model, we can monitor either p0 or b0, and chose p0, but then b0 appeared in the post-processing. Now fixed.

The code in the downloadable ZIP file is correct.

Hi Mike, thanks to you for clarifying.

Hi Mike,

I’m curious about one thing and I would like to have your opinion. As far as the effect of the gender is concerned, to solve the problem regarding the absence of convergence for sigma, I tried to run a model with 20 chains and 10.000 adaptations (sorry, I don’t have the figure). Results showed that the issue was almost solved (i.e., Rhat ~ 1 and MCE% = 1.54). So maybe through the implementation of further chains (along with adaptations) such issue may have been solved even without increasing the number of iteractions and thinning. I assume that, as reported in your previous posts, the solution needs to be evaluated case by case, but may the increase of number of chains be considered as an adequate solution to solve convergence problems? or increasing the number of iteractions along with thinning still represent the best solution? Many thanks!

When mixing is poor, the key number if the total iterations

afteradaptation and burn-in andbeforethinning across all chains. For the overnight run I used 3 chains with 5 million iterations (after adaptation), total 15 million.You are right that I could have used more chains, say 15 chains with 1 million iterations each, and still got 15 million total. But on my little laptop, with only 4 cores (aka “logical processors”), that would not have been more efficient. It would have been a good strategy on my 24-core desktop.

For maximum efficiency (ie, minimum time for a fixed number of iterations), the number of chains should be somewhat less than the number of cores available on the machine – “somewhat less” because the operating system needs a bit of CPU capacity too. I find running 20 chains on the 24-core box gives 99% CPU loading.

Running more chains than cores is less efficient, because each chain takes time for compilation and adaptation before it starts producing usable iterations. That becomes an issue if chains are “queuing up” to use the CPU.

Thank you very much for your exaustive response!