# Basic occupancy modelling

In this section we will look at the simplest occupancy model: one species and one time period. The species may be present at a site but not be detected, but we assume that the species is not recorded when it is absent, ie, no false positives. We also assume closure, that is the species is either present or absent at a site throughout the study; individual animals can come and go, but the status of the species must not change. To estimate (and correct for) probability of detection, we need replicate data for some of the sites, and we assume these replicate observations are independent.

### Salamanders

For our example we’ll use the data collected in 2001 for Blue Ridge two-lined salamanders (Eurycea wilderae) in the Great Smokey Mountains National Park in the USA. These data are used by MacKenzie et al (2018, p135)1MacKenzie, D.I., Nichols, J.D., Royle, A.J., Pollock, K.H., Bailey, L.L., & Hines, J.E. (2018) Occupancy estimation and modeling : inferring patterns and dynamics of species occurrence (2nd edn) Elsevier Publishing.

Each site consisted of a 50 x 3m plot where natural cover objects were examined and a coverboard transect with 5 coverboards at 10m intervals. The 39 sites were visited 5 times at 2-week intervals. The data consist of the number of visits where the target species was detected.

# Create vector of number of detections
( y <- rep(4:0, c(1,4,1,12,21)) )
#  [1] 4 3 3 3 3 2 1 1 1 1 1 1 1 1 1 1 1 1 0 0
# [21] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
table(y)
# y
#  0  1  2  3  4
# 21 12  1  4  1
n <- 5
( nSites <- length(y) )   # No. of surveys at each site
# [1] 39
mean(y > 0) # naive occupancy estimate
# [1] 0.4615

Salamanders were detected at least once at 18 of the 39 sites, while 21 sites had no detections. If we were to assume no detection means absence, ie, probability of detection if present = 1, we would estimate occupancy as 18/39 = 0.46. But that would be naïve; if it were true, the data would consist entirely of 5s and 0s.

## The model

#### Likelihood

Our likelihood model has 2 parts. The biological model describes the way we think presence works, with a parameter $z_i = 1$ if site $i$ is occupied and $z_i = 0$ if not occupied. For this simple model, $z_i$ is drawn from a Bernoulli distribution with probability parameter $\psi$: $$z_i \sim {\rm Bernoulli}(\psi)$$

The second part of the model describes the observation process. A detection is a success, and we model it with a binomial distribution. If the site is occupied ($z_i = 1$) the probability of success is $p$, but if it’s not occupied ($z_i = 0$) probability of success is 0. Thus: $$y_i \sim {\rm Binomial}(n, pz_i)$$

#### Priors

We need priors for both $\psi$ and $p$. They are probabilities, so we use the Beta distribution. If we had good information about these salamanders at these sites we could use informative priors, but here we’ll use minimally informative uniform priors: $$\psi \sim {\rm Beta}(1, 1)$$ $$p \sim {\rm Beta}(1, 1)$$

## The model diagram

### The JAGS code

# File name "salamanders.jags"
model {
# Likelihood
for(i in 1:nSites) {
# Biological model
z[i] ~ dbern(psi)  # z=1 if occupied
# Observation model
y[i] ~ dbin(p * z[i], n)
}

# Priors
p ~ dbeta(1, 1) # Uninformative prior
psi ~ dbeta(1, 1)

# Derived values
N <- sum(z)
}

JAGS syntax: Note that the dbin distribution has probability as the first argument and the number of trials as the second argument; this differs from the conventional order.

## R code to run the model

# Organise the data etc
jagsData <- list(y = y, n = n, nSites = nSites)
str(jagsData)
# List of 3
# $y : int [1:39] 4 3 3 3 3 2 1 1 1 1 ... #$ n     : num 5
# $nSites: int 39 wanted <- c("p", "psi", "N") ( outSal <- jags(jagsData, NULL, wanted, model="salamanders.jags", DIC=FALSE, n.chains=3, n.iter=10000) ) # ... # Error in jags.model(file = model.file, data = data, inits = inits, n.chains = n.chains, : # Error in node y[1] # Node inconsistent with parents y[1] # [1] 4 Hmm, that didn’t work. The node y[1] is inconsistent with parents; I checked y[1] and it’s 4, ie, salamanders were detected on 4 out of 5 visits to site #1. In our model we have y[i] ~ dbin(p * z[i], n), so the parents of y[1] are p, z[1] and n. The problem is due to the starting value of z[1], which we left to JAGS to generate – we didn’t specify any initial values. JAGS draws random 0/1 values from a Bernoulli distribution. If it sets z[1] = 0, ie, site #1 is not occupied, then detecting the species on 4 visits is obviously not possible, hence the error. We can solve this by specifying initial values for z. The simplest solution is just to set all the z to 1, but then we should discard the first values in the MCMC chains as they will be affected by our arbitrary starting value: we need to specify a burn-in value. # Specify initial values for 'z' inits <- function() list(z = rep(1, nSites)) inits() #$z
#  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# [21] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

( outSal <- jags(jagsData, inits, wanted,
model="salamanders.jags", DIC=FALSE,
n.chains=3, n.iter=11000, n.burn=1000) )
# ...
#       mean    sd   2.5%    50%  97.5% Rhat n.eff
# p    0.259 0.056  0.157  0.257  0.374    1 11013
# psi  0.613 0.123  0.395  0.604  0.884    1  9768
# N   24.103 4.061 19.000 23.000 34.000    1  9065
wiqid::diagPlot(outSal)

### Looking at the results

# convert to Bwiqid and do nice plots of marginal probabilities
library(wiqid)
salB <- as.Bwiqid(outSal, default='psi')
par(mfrow=c(1, 3))
plot(salB)
plot(salB, 'p')
plot(salB, 'N')
par(mfrow=c(1, 1))

The plots above show the marginal probabilities: the plot for psi ignores the values of p. Looking at those plots it’s easy to forget that the estimates of psi and p are not independent. We’ll plot the two together and estimate the correlation:

# Look at joint probability
plot(salB$psi, salB$p, pch=19, cex=0.2, col="#00000033",
xlab="psi", ylab="p", las=1)
abline(v=mean(salB$psi), h=mean(salB$p), col='blue')
# ... and correlation
cor(salB$psi, salB$p)
# [1] -0.5487149

We see a negative correlation between the posterior distributions of psi and p, with high p related to low psi. The correlation of -0.5 is moderate, but we should still take it into account if we are including both in a calculation.

There’s not a lot we can explore with this simple data set. In the next section we’ll work with a data set for willow tits in Switzerland, where we have covariates.