In this section we’ll work with a simple example with only one parameter to be estimated. This will allow us to explore different estimation methods, some of which work with only 1 or 2 parameters.

### Orangutan abundance

The example concerns the number of adult orangutan in a fictitious national park. Orangutan are charismatic creatures which occur in our home state of Sarawak, but we choose them here because they are solitary and not territorial. Youngsters stay close to their mums, but the adults move around independently, so we can use the Poisson distribution to handle counts of adults.

We have some prior information about the number of adults in the park: our best guess is 450, and we’re 95% certain the number is between 380 and 525.

A new survey has just been conducted. The park was divided into plots, and 1/8 of the plots were searched for orangutan. They are easy to detect in this area, as they do not flee from humans, instead they respond by breaking branches and ‘scolding’ the intruders. A total of 45 adults were counted, so a rough estimate would be 45 x 8 = 360 adults in the whole park, quite a bit less than the former estimate. But both are based on samples, so we would be surprised if they agreed exactly. Now we want to combine the new data with the old information.

### The gamma distribution

Before going further, we’ll introduce the gamma distribution, which as we’ll see has a special relationship with the Poisson distribution. The gamma distribution deals with continuous variables that are **non-negative** but with no upper limit. Lots of variables in nature are like this, including the density parameter of the Poisson distribution. The plot below shows a series of gamma distributions with mean = 5 and various SDs.

When the SD is small relative to the mean, the curve is almost a symmetrical bell-curve. As the SD increases, the mode moves towards the y axis with a long right tail; eventually the mode is at 0.

The prior probability for the orangutan is a gamma distribution with a mean of 450 and SD of 37. That’s shown in the plot below, together with the 95% credible interval:

## The model

Here are the equations and the “Kruschke-style” diagram describing our model:

The diagram is best read from the bottom upwards. We have our count data, $C$, which comes from a Poisson distribution with parameter $\lambda$. $\lambda$ is calculated from the number of animals in the park, $N$, and the proportion of the area surveyed, $a$. At the top is our prior for $N$, a gamma distribution with parameters $S$ (shape) and $R$ (rate).

### Conjugacy

The gamma distribution is the conjugate prior distribution for a Poisson likelihood. That means that the posterior distribution will also be a gamma distribution, and moreover it will be easy to calculate its parameters. The following code shows how:

**library(wiqid) # for parameter conversion functions
# Plot the prior
# --------------
# this has mean 450 and SD 37
xx <- 300:600
priorDens <- dgamma2(xx, 450, 37)
plot(xx, priorDens, main="Prior probability", xlab="N",
ylab="Probability", type='l', lwd=2)
abline(v=c(380, 525), lty=2, col='red')
# Convert prior to shape/rate
( priorPars <- getGammaPar(450, 37) )
# shape rate
# [1,] 147.9182 0.3287071
# Calculate the posterior
# -----------------------
# the data: 45 adults counted in 1/8 of the area of the park
# Add the count to the shape and the area to the rate
( postPars <- priorPars + c(45, 1/8) )
# shape rate
# [1,] 192.9182 0.4537071
# Now calculate and plot the posterior
# ------------------------------------
postDens <- dgamma(xx, shape=postPars[1], rate=postPars[2])
plot(xx, postDens, main="", xlab="N", ylab="Probability",
type='l', lwd=2, col='brown')
# add the prior
lines(xx, priorDens, lwd=2, col='blue')
# and the likelihood, which we need to scale
llh <- dpois(45, xx/8)
llh2plot <- llh/sum(llh)
lines(xx, llh2plot)
# Get posterior mean and 95% HDI and add to the plot
# ---------------------------------------------------
( mean <- postPars[1] / postPars[2] )
# [1] 425.2043
abline(v=mean, col='red')
( hdi <- hdi(qgamma, shape=postPars[1], rate=postPars[2]) )
# lower upper
# 365.9196 485.7290
abline(v=hdi, lty=2, col='red')**

The shape and rate of the prior distribution give us a hint as to how strong is our prior: it’s equivalent to having surveyed one-third of the park and counting 148 adults. That’s a much bigger area surveyed than for the new study, so the prior is relatively strong, as can be seen in the plot below.

The prior is the blue curve on the right of the plot above, and the scaled likelihood is the broad curve on the left. This prior has more information than the data. The posterior is between the prior and the likelihood – as usual – and is higher and narrow, as it combines the old and new information.

### The comb or grid method

With a single parameter we can use a brute-force approach: calculate the prior probability and likelihood for every possible value of the parameter, multiply them together as dictated by Bayes’ Rule, then fix the product (which won’t add to 1) so that the posterior does add up to 1.

We won’t consider every possible value, we’ll restrict ourselves to the range 300 to 600, though we could expand the range if necessary. The code below shows how to do this.

**library(wiqid) # for plotComb and HDI
# Set up the "comb" of plausible values
N <- 300:600
# Calculate the prior probability for each value of N
prior <- dgamma2(N, 450, 37)
sum(prior)
# [1] 0.9998955
# This is slightly less than 1 because values <300 or >600 have a tiny probability.
# Put those into a data frame so that we can 'View' it as a spreadsheet
ou <- data.frame(N, prior)
View(ou)
# Calculate the expected number of orangutan in the area surveyed
area <- 1/8
lambda <- N * area
# Calculate the Poisson likelihood for the count
count <- 45
ou$llh <- dpois(count, lambda)
# Multiply prior * likelihood
ou$product <- ou$prior * ou$llh
# Scale this so that it adds to 1
ou$post <- ou$product / sum(ou$product)
sum(ou$post)
# [1] 1
View(ou)
# Calculate the posterior mean
with(ou, sum(N * post))
# [1] 425.2045
# Now do some nice plots
with(ou, plot(N, post, type='l', lwd=2, col='brown',
ylab="Probability"))
with(ou, lines(N, prior, lwd=2, col='blue'))
llh2plot <- ou$llh / sum(ou$llh)
lines(N, llh2plot)
# Use the 'wiqid::plotComb' function to get an HDI
with(ou, plotComb(N, post))**

The image below shows the first part of the ‘spreadsheet’ with the step-by-step calculations. The plot of the three curves – prior, likelihood and posterior – is practically identical to that for the conjugate method above, so we only show the plot of the posterior with the 95% credible interval.

### Analysis with JAGS

Now we come to the method of choice for complex models, and we will see how it works for this simple estimation problem. First we set up the JAGS model code and save it in a file called “orangutan.jags”:

**# File name "orangutan.jags"
model {
# Likelihood:
lambda <- N * a # Expected no. of ou in area a
C ~ dpois(lambda)
# Prior:
N ~ dgamma(shape, rate) # No. of ou in whole area
}**

*JAGS syntax: The code above looks a lot like R code but there are several differences. JAGS does not work through the lines of code in order, they can be in any order; often we prefer to put the likelihood first, then define the necessary priors. There are two kinds of relations in JAGS: the *`<-`

* symbol defines a deterministic relation, where the left hand side is exactly determined by the values on the right; the *`~`

* symbol is for stochastic relations, and indicates the distribution a parameter is drawn from.*

The main parameter we want to estimate is $N$, the number of orangutan in the park, and we calculate $\lambda$, the expected number in the area surveyed. Then we use a Poisson likelihood with the count, $C$, and $\lambda$. We will include shape, rate, $a$ and $C$ in the data, so that we can change these values without editing the JAGS model file.

Next we need R code to run the model:

**library(jagsUI)
library(wiqid) # for plotting functions
# Prepare the data:
# -----------------
( priorPars <- getGammaPar(450, 37) )
# shape rate
# [1,] 147.9182 0.3287071
jagsData <- list(shape=priorPars[1], rate=priorPars[2],
a = 1/8, C = 45)
# Run the model in JAGS
# ------------------------
jagsout <- jags(jagsData, inits=NULL,
parameters.to.save="N",
model.file="orangutan.jags",
n.chains=3, n.iter=20000, n.burnin=0, DIC=FALSE)
jagsout
# JAGS output for model 'orangutan.jags', generated by jagsUI
# ...
# mean sd 2.5% 50% 97.5% Rhat n.eff
# N 425.279 30.653 367.282 424.478 487.843 1 60000
# ... (some output omitted)
diagPlot(jagsout) # Check all went well**

We don’t need to provide initial values and it runs quickly and converges well. We will look at the details of the output in a later post, but now we want to plot the posterior distribution for N and see what JAGS has produced.

**# Convert to an mcmcOutput object and plot
ouOut <- mcmcOutput(jagsout, default="N")
dim(ouOut)
# [1] 60000 1
head(ouOut)
# N
# 1 442.1731
# 2 428.6705
# 3 408.7989
# 4 450.6569
# 5 469.5928
# 6 469.7083
plot(ouOut)**

$\tt ouOut$ has one column and 60,000 rows with values for $N$. These are random draws from the posterior distribution for $N$ and we can use these to plot curves or histograms or to calculate summary statistics such as the mean or credible interval.

Note that the results for all three methods are practically the same, with some differences due to rounding.

### Three ways to represent a probability distribution

We have seen all three ways to describe a posterior distribution in this section:

**Algebra**: The result of the conjugacy analysis is a gamma distribution with known parameters. We just need to plug the values into the equation to get whatever summary we want. Ok, the equation for the gamma distribution is a mess, but R provides functions such as $\tt dgamma$ to take care of the arithmetic. Most parameters in complex models do not have such simple posteriors.**Graphic**: The comb method produced a vector of values for $N$ and a corresponding vector with the probability of each. We can use these to plot a graph, and we can also get various summaries. We do not need an equation.**Numeric**: JAGS produces a “mountain of numbers”^{1}thanks to Ngumbang Juat for this nice phrase! to describe the posterior distribution of $N$. With a large enough mountain of draws we can get very close to an accurate representation of the posterior, and can calculate all the summaries we need. A big advantage of this representation is the ease of calculating derived values.

Download a ZIP file with the code here.

Good evening,

I looked with great interest to this page and almost everything was clear. However, I have a couple of questions:

1. On what grounds a sd = 37 was considered? Is it the product of a calculation or just an assumption? Using Software R I calculated both the mean and sd of the interval represented by the minimum and maximum number of orangutans estimated within the area (N = 300:600) and, the mean corresponds to 450 (as defined by you), but the sd value is quite different (87.03). Please, may anyone explain?

2. The script used for JAGS analysis implemented in R is very inresting, but there are several functions of which I would like to know the meaning. Considering the following script:

jagsout <- jags(jagsData, inits=NULL, parameters.to.save="N", model.file="orangutan.jags", n.chains=3, n.iter=20000, n.burnin=0, DIC=FALSE)

what does inits=NULL, n.burnin=0, DIC=FALSE stand for? Furthermore, based on what criteria both the number of chains (n.chains) and interactions (n.iter) were defined? I searched in other tutorials but I dind't find such information. If such explanations will be provided later, please, don't waste your time to explain.

Sorry for the several queries that I did (and, probably, I will do) but, as I find such tutorials very useful and interesting, I would like to have all the steps clear in my notes.

Many thanks.

Hi Marcello,

1. The prior mean and SD are based on prior information about the population. Please see para 3. SD = 37 corresponds to “95% certain the number is between 380 and 525.” The range

`N = 300:600`

is arbitrary, see the para just above that block of code, and the mean and SD are unimportant. We could have chosen`N = 100:800`

or`0:100000`

(try it), but there’d be lots of zeros in`prior`

and`post`

.2. You can get information on any function in R with

`help`

or`?`

, ie,`help(jags)`

or`?jags`

once the`jagsUI`

package is loaded.The reason for using >1 chain and the number of

iterations(not interactions!) is on this page.Good morning,

Thank you very much!

Marcello