Occupancy with a categorical covariate

For the willow tit data, both the habitat covariates were continuous measures: elevation and percentage forest cover. We often want to use a categorical covariate such as forest type, and it’s easy to incorporate one such covariate in our JAGS code.

The example data set

For this module we’ll use camera trap data for Cape leopards from the Boland area in South Africa. Thanks to Rajan Amin and Zoological Society of London (ZSL) for making the data available. After a first survey, the area was affected by wildfires, so the team returned after the fires to compare leopard occupancy – or rather habitat use – in areas burnt vs unburnt.

You can load and inspect the data with the following code:

# Read in data
wt <- read.csv("http://bcss.org.my/data/Leopards.csv",
    comment="#")
str(leo)
# 'data.frame':   47 obs. of  7 variables:
# $ site: int  1 2 3 4 5 6 7 8 9 10 ...
# $ n1  : int  130 130 130 130 130 130 130 130 130 130 ...
# $ y1  : int  8 5 4 0 2 2 0 0 0 5 ...
# $ n2  : int  104 130 85 130 130 130 130 39 112 130 ...
# $ y2  : int  7 12 1 0 1 1 1 0 0 6 ...
# $ fire: chr  "Unburnt" "Unburnt" "Unburnt" "Unburnt" ...
# $ elev: int  286 894 426 1044 405 807 496 1174 653 793 ...

# Convert 'fire' to a factor
leo$fire <- factor(leo$fire)
summary(leo)
#    site            n1              y1               n2
# Min.   : 1.0   Min.   :  0.0   Min.   : 0.000   Min.   : 18.0
# 1st Qu.:12.5   1st Qu.:113.0   1st Qu.: 0.000   1st Qu.: 95.5
# Median :24.0   Median :130.0   Median : 2.000   Median :115.0
# Mean   :24.0   Mean   :114.1   Mean   : 3.128   Mean   :108.0
# 3rd Qu.:35.5   3rd Qu.:130.0   3rd Qu.: 5.000   3rd Qu.:130.0
# Max.   :47.0   Max.   :130.0   Max.   :20.000   Max.   :130.0
#     y2              fire         elev
# Min.   : 0.000   Burnt  :20   Min.   : 285.0
# 1st Qu.: 0.000   Unburnt:27   1st Qu.: 414.0
# Median : 2.000                Median : 592.0
# Mean   : 2.617                Mean   : 641.8
# 3rd Qu.: 3.000                3rd Qu.: 800.0
# Max.   :16.000                Max.   :1487.0

The rows correspond to 47 camera trap locations. n1 is the number of days that the camera trap was operating at the site during the first survey and y1 is the number of days when leopards were detected; n2 and y2 give the same data for the second survey. The fires occurred between the first and second surveys, and fire indicates whether or not the site was affected by the fire. elev is the elevation of the site in metres.

Let’s check the distribution of burnt sites across elevation so see if they may be confounded.


range(leo$elev)  # 285 1487
breaks <- seq(200, 1500, by=100)
op <- par(mfrow=2:1)
hist(leo$elev[leo$fire=="Burnt"], breaks=breaks, xlim=c(200, 1500),
    xlab="Elevation", main="Burnt")
hist(leo$elev[leo$fire=="Unburnt"], breaks=breaks, xlim=c(200, 1500),
    xlab="Elevation", main="Unburnt")
par(op)

Below 1100m there’s a good mix of burnt and unburnt sites, but all 5 sites above 1100m were burnt. If leopards avoid those 5 sites after the fires, we won’t be able to tell whether it’s because of the burning or the elevation. Fortunately we have data from before the fires, which will tell us about the effect of elevation alone.

For the next section we’ll only use the data from the second survey, after the fires.

Just one categorical covariate

With one covariate, which is categorical, we can simplify the model by estimating one value for occupancy for each category. We do not need to use a linear model.

This is almost like fitting a separate model to each category, and we need enough observations for each category to be able to do this. With a single model fit, we have the advantage of using the data from all the sites to estimate probability of detection.

Here’s the JAGS code for this model:

# File name "burn1.jags"
model{
  # likelihood
  for(i in 1:nSite) {
    z[i] ~ dbern(psi[fire[i]])
    y[i] ~ dbin(p * z[i], nOcc[i])
  }
  # priors
  for(i in 1:nCat) {
    psi[i] ~ dbeta(1,1)
  }
  p ~ dbeta(1,1)
}

Now we prepare the data. We need to convert the fire covariate to a vector with integers for the categories: 1 for burnt, 2 for unburnt (in alphabetical order, by default). If you had more categories, the numbers would be 1, 2, 3, …

fireI <- as.integer(leo$fire)
z <- ifelse(leo$y2 > 0, 1, NA)  # Known occupancy
bdata <- with(leo, list(nSite=47, nOcc=n2, y=y2,  fire=fireI,
    nCat=2, z=z))
str(bdata)
# List of 6
# $ nSite: num 47
# $ nOcc : int [1:47] 104 130 85 130 130 130 130 39 112 130 ...
# $ y    : int [1:47] 7 12 1 0 1 1 1 0 0 6 ...
# $ fire : int [1:47] 2 2 2 2 2 2 2 1 2 2 ...
# $ nCat : num 2
# $ z    : num [1:47] 1 1 1 NA 1 1 1 NA NA 1 ...

Now we can run the model:

library(jagsUI)
library(mcmcOutput)
wanted <- c("p", "psi")

outA1 <- jags(bdata, NULL, wanted, "burn1.jags", DIC=FALSE,
        n.chains=3, n.iter=1e4)
mcoA1 <- mcmcOutput(outA1)
diagPlot(mcoA1)
summary(mcoA1)
#         mean    sd median   l95   u95  Rhat MCEpc
# p      0.032 0.003  0.032 0.026 0.038 0.999 0.810
# psi[1] 0.595 0.111  0.596 0.377 0.806 1.000 0.689
# psi[2] 0.848 0.071  0.856 0.706 0.971 1.001 0.701

plot(mcoA1)

That ran very quickly and converged well. And it seems clear that leopards do prefer the unburnt sites (psi[2]) to the burnt sites (psi[1]). But let’s look at the effect of elevation before discussing the results further.

Categorical and continuous covariates

Once we have multiple covariates, things become a bit less simple. We need to use a linear predictor and a link function, usually the logistic (“logit”) link. But we can still use a different intercept for each category. And provided we standardise the continuous covariates, the intercept is the logit of the probability of occupancy at the mean values of all the continuous covariates. So we can use a prior on the probability. Here’s the JAGS code for this model:

# File name "burn2.jags"
model{
  # likelihood
  for(i in 1:nSite) {
    logit(psi[i]) <- b0[fire[i]] + bElev * elev[i]
    z[i] ~ dbern(psi[i])
    y[i] ~ dbin(p * z[i], nOcc[i])
  }
  # priors
  for(i in 1:nCat) {
    psi0[i] ~ dbeta(1,1)
    b0[i] <- logit(psi0[i])
  }
  bElev ~ dunif(-5, 5)
  p ~ dbeta(1,1)
}

We standardise the elevation to mean 0 and SD 1, then bundle the data and run the model:

library(wiqid)
bdata <- with(leo, list(nSite=47, nOcc=n2, y=y2, fire=fireI,
    nCat=2, elev=standardize(elev), z=z))
str(bdata)
# List of 7
# $ nSite: num 47
# $ nOcc : int [1:47] 104 130 85 130 130 130 130 39 112 130 ...
# $ y    : int [1:47] 7 12 1 0 1 1 1 0 0 6 ...
# $ fire : int [1:47] 2 2 2 2 2 2 2 1 2 2 ...
# $ nCat : num 2
# $ elev : num [1:47] -1.227 0.87 -0.744 1.388 -0.817 ...
# $ z    : num [1:47] 1 1 1 NA 1 1 NA NA NA 1 ...

wanted = c("p", "psi0", "bElev")

outA2 <- jags(bdata, NULL, wanted, "burn2.jags", DIC=FALSE,
        n.chains=3, n.iter=1e4)
mcoA2 <- mcmcOutput(outA2)
diagPlot(mcoA2)
plot(mcoA2)
summary(mcoA2)
#         mean    sd median   l95   u95  Rhat MCEpc
# p        0.032 0.003  0.032  0.026  0.038 0.999 0.739
# psi0[1]  0.661 0.125  0.666  0.419  0.899 1.000 1.105
# psi0[2]  0.850 0.076  0.860  0.702  0.983 0.999 0.965
# bElev   -1.020 0.532 -0.975 -2.074 -0.018 1.000 1.212

The elevation coefficient is definitely negative – occupancy declines with elevation. We’ll plot the curves for both burnt and unburnt sites. The code is similar to that for the willow tits, but we have to convert the two intercepts, psi0[1] and psi0[2], back to the logit scale.

# Convert psi0 back to b0
b0 <- qlogis(mcoA2$psi0)

# Create vector of elevations for the x axis
hist(leo$elev)  # only 1 site above 1200m
xx <- seq(250, 1200, , 101)

# Standardise in the same way that we standardised the original data
xxS <- standardize2match(xx, leo$elev)

# Get the values of the posterior mean and CrI for each value in xx
burnt <- unburnt <- matrix(NA, nrow=101, ncol=3)
for(i in 1:101) {
  logitPsi1 <- b0[,1] + mcoA2$bElev * xxS[i]
  psi1 <- plogis(logitPsi1)
  burnt[i, ] <- c(mean(psi1), hdi(psi1))
  logitPsi2 <- b0[,2] + mcoA2$bElev * xxS[i]
  psi2 <- plogis(logitPsi2)
  unburnt[i, ] <- c(mean(psi2), hdi(psi2))
}

plot(xx, unburnt[,1], type='n', ylim=range(burnt, unburnt),
    xlab="Elevation, m", ylab="Occupancy", las=1)
polygon(c(xx, rev(xx)), c(burnt[,2], rev(burnt[,3])), border=NA,
    col=adjustcolor('gray', 0.5))
polygon(c(xx, rev(xx)), c(unburnt[,2], rev(unburnt[,3])), border=NA,
    col=adjustcolor('skyblue', 0.5))
lines(xx, burnt[,1], lwd=2)
lines(xx, unburnt[,1], lwd=2, col='blue')
abline(v=mean(leo$elev), lty=3)
text(610, 0.3, 'mean elevation', pos=2) 
arrows(600, 0.3, 641, 0.3, length=0.1)
legend('bottomleft', c("burnt", "unburnt"),
    col=c('black', 'blue'), lwd=2, bty='n')

A couple of things are clear from the graph. First, the credible intervals are very wide; that should not be surprising given that we only have 20 sites in the burnt area and 27 in the unburnt area. Second, the effect of burning is modest compared to elevation: burnt sites at 250m have higher occupancy than unburnt sites at 1000m.

Results from before the fires

Now let’s turn to the data from before the fires, and see how occupancy compared between the soon-to-be-burnt sites and the unburnt sites.

Three of the 47 sites used for the second survey did not have camera traps during the first survey. They are the last 3 sites in the data table, and have n1 = 0 and y1 = 0. We can leave them in the data set even though they provide no information, or we can set nsites = 44, so that JAGS ignores the last 3 sites. We chose the latter for the code below.

z <- ifelse(leo$y1 > 0, 1, NA)  # Known occupancy
bdata <- with(leo, list(nSite=44, nOcc=n1, y=y1, fire=fireI,
    nCat=2, elev=standardize(elev), z=z))
str(bdata)
# List of 7
# $ nSite: num 44
# $ nOcc : int [1:47] 130 130 130 130 130 130 130 130 130 130 ...
# $ y    : int [1:47] 8 5 4 0 2 2 0 0 0 5 ...
# $ fire : int [1:47] 2 2 2 2 2 2 2 1 2 2 ...
# $ nCat : num 2
# $ elev : num [1:47] -1.227 0.87 -0.744 1.388 -0.817 ...
# $ z    : num [1:47] 1 1 1 NA 1 1 NA NA NA 1 ...

The code to run the model is the same, convergence is good, and here are the posterior plots:

The results are very similar, with higher occupancy at the burned sites even before the fires! Let’s get MCMC chains for the differences before and after and plot them together:

diffA2 <- mcoA2$psi0[,2] - mcoA2$psi0[,1]
diffB2 <- mcoB2$psi0[,2] - mcoB2$psi0[,1]
differences <- cbind(before=diffB2, after=diffA2)
postPlot(differences, compVal=0)

So, sure, occupancy was higher in the unburnt sites after the fires, but it was also higher in those sites before the fires. Indeed, the difference in occupancy has not changed as a result of the fires. Whatever is causing the difference, we can’t blame the fire.

The major lesson: don’t draw conclusions just by comparing impacted sites with controls. Look at data from before and after the impact. This is Before-After-Control-Impact (BACI) design.

Download a ZIP file with the code and data here.

Leave a Reply

Your email address will not be published. Required fields are marked *