Occupancy with survey covariates

In the last tutorial we worked with site covariates and predictors for probability of occupancy. We continue with the Swiss willow tits example, now adding survey covariates and predictors for the probability of detecting the species if it is present.

We have two survey-level covariates: day is the date of the survey, coded as an integer with day 1 = 1 April. Detectability will likely vary across the breeding season depending whether the birds are courting, incubating, or feeding chicks, and we’ll use a quadratic function for this. We also have dur, the duration of the visit to the site, which will affect probability of detection.

Data preparation

Because the probability of detection varies across visits to the same site, we can no longer use the aggregated data, but need to deal with visits one by one. So we need to work with the three-column matrix, Y.

The day and duration covariates are also three-column matrices, and we standardise these. Some sites were only visited twice, so the day matrix has some missing values, NA, in the third column. JAGS won’t use these for the analysis, but it does complain if there are missing values in covariates, so we replace those with zeros. The dur matrix has a few more NAs because the survey duration was not recorded; there are only four of these, so we’ll replace those with zero after standardising.

# Attach packages
library(jagsUI)
library(wiqid)

# Read in data
wt <- read.csv("http://bcss.org.my/data/willowtits.csv",
    comment="#")
    
# Extract the detection data and convert to a matrix:
Y <- as.matrix(wt[, 1:3])
n <- rowSums(!is.na(Y))

# Standardise the site covariates as before
elevS <- standardize(wt$elev)
forestS <- standardize(wt$forest)

# Extract the survey covariates as matrices and standardise
day <- as.matrix(wt[, 12:14])
dayS <- standardize(day)
head(dayS)
#          day.1      day.2     day.3
# [1,] -1.110455  0.1955548 0.8710771
# [2,] -1.831012 -0.6601068 0.3756941
# [3,] -1.065420 -0.2998282 0.9161119
# [4,] -1.380664 -0.4349327 0.7810075
# [5,] -1.155490  0.1054851 0.8710771
# [6,] -1.650873  0.1054851 0.8710771
dur <- as.matrix(wt[, 9:11])
durS <- standardize(dur)
head(durS)
#           dur.1       dur.2       dur.3
# [1,]  0.1469488  0.14694879  0.14694879
# [2,] -1.1170247 -1.19602301 -1.43301803
# [3,] -1.7490114 -1.98600642 -2.30199979
# [4,] -0.8010313 -0.95902798 -1.35401969
# [5,] -0.3270413 -0.09004623  0.06795045
# [6,] -1.2750213 -0.80103130 -0.56403628

# Replace missing values with zero
dayS[is.na(dayS)] <- 0
durS[is.na(durS)] <- 0

The model

Likelihood

The biological process is exactly the same as in the previous tutorial.

We modify the observation process so that detection probability for site $i$ on occasion $j$ is a function of an intercept and the predictors. $${\rm logit}(p_{ij}) = \alpha_0 + \alpha_{dur} Dur_{ij} + \alpha_{day} Day_{ij} + \alpha_{day2} Day_{ij}^2$$(We follow the usual convention of using $\beta$ for the occupancy coefficients and $\alpha$ for the detection coefficients.)

Detection probability is now different for each visit, so we can no longer use the binomial distribution. Instead, the individual 1/0 data for each visit are drawn from a Bernoulli distribution: $$y_{ij} \sim {\rm Bernoulli}(p_{ij} z_i)$$

Priors

We use the same priors for the coefficients in the submodel for occupancy as in the previous tutorial.

And we use the same approach to specifying priors for the intercept and coefficients for detection probability.

Model diagram

The JAGS code

# Save this in the file "wt_surveyCovs.jags"
model{
  # Likelihood
  for(i in 1:nSites) {
    # biological model
    logit(psi[i]) <- b0 + bFor * fst[i] + 
        bElev * ele[i] + bElev2 * ele2[i]
    z[i] ~ dbern(psi[i])
    # observation model
    for(j in 1:n[i]) {
      logit(p[i, j]) <- a0 + aDur * dur[i, j] + 
          aDay * day[i, j] + aDay2 * day2[i, j]
      Y[i, j] ~ dbern(p[i, j] * z[i])
    }
  }

  # Priors
  psi0 ~ dbeta(1,1)
  b0 <- logit(psi0)
  bFor ~ dunif(-5, 5)    # forest
  bElev ~ dunif(-5, 5)   # elevation
  bElev2 ~ dunif(-5, 5)  # elevation^2
  
  p0 ~ dbeta(1, 1)
  a0 <- logit(p0)
  aDur ~ dunif(-5, 5)    # duration
  aDay ~ dunif(-5, 5)    # day
  aDay2 ~ dunif(-5, 5)   # day^2
  
  # Derived variable
  N <- sum(z)
}

R code for running the model

This is very similar to the code in the previous tutorial. To get n.eff values around 10,000 I had to double the number of iterations (and also increased thinning). It took about 3 mins on my laptop.

# Prepare stuff for JAGS
y <- rowSums(Y, na.rm=TRUE)
jagsData <- list(Y = Y, n = n, nSites = length(n),
                z = ifelse(y > 0, 1, NA),
                fst = forestS, ele = elevS, ele2 = elevS^2,
                dur = durS, day = dayS, day2 = dayS^2)

wanted <- c("p0", "a0", "aDur", "aDay", "aDay2",
            "psi0", "b0", "bFor", "bElev", "bElev2", "N")

# Run the model (3 mins)
( out2 <- jags(jagsData, NULL, wanted,
    model="wt_surveyCovs.jags", DIC=FALSE,
    n.chains=3, n.iter=20000, n.thin=2, parallel=TRUE) )
#          mean    sd   2.5%    50%  97.5%  Rhat n.eff
# p0      0.741 0.047  0.642  0.743  0.825  1.000 14360
# a0      1.065 0.250  0.583  1.063  1.553  1.000 11346
# aDur    0.598 0.220  0.170  0.598  1.031  1.000 30000
# aDay    0.070 0.185 -0.298  0.071  0.428  1.000 30000
# aDay2  -0.039 0.155 -0.338 -0.042  0.270  1.000  9946
# psi0    0.473 0.072  0.336  0.472  0.618  1.000 12272
# b0     -0.109 0.296 -0.682 -0.113  0.482  1.000 12365
# bFor    0.904 0.261  0.422  0.895  1.439  1.000 26672
# bElev   2.107 0.332  1.495  2.090  2.796  1.001 30000
# bElev2 -1.178 0.292 -1.785 -1.167 -0.635  1.000 13367
# N      83.145 2.781 79.000 83.000 90.000  1.000 30000

wiqid::diagPlot(out2)

The diagnostics all look fine. Only the parameters for detection are shown, those for occupancy look much the same as before. From the print-out and the plots, duration seems to be important, but the day effect is small and probably not quadratic.

Plotting the output

Let’s plot the day effect in the same way that we did elevation in the last tutorial.

# Plot effect of day on detection at the average visit duration
range(day, na.rm=TRUE)
# [1]  13 107
xx <- seq(10, 110)
# Need to scale xx the same way we scaled the day data
xxS <- standardize2match(xx, day)

# Do a plot with credible interval
toplot <- matrix(NA, 101, 3)
for(i in 1:101) {
  logit.p.tmp <- with(out2$sims.list,
      a0 + aDay * xxS[i] + aDay2 * xxS[i]^2)
  p.tmp <- plogis(logit.p.tmp)
  toplot[i, 1] <- mean(p.tmp)
  toplot[i, 2:3] <- hdi(p.tmp)
}

plot(xx, toplot[, 1], ylim=range(toplot), type='n', las=1,
  xlab="Day", ylab="Detection probability")
polygon(x=c(xx, rev(xx)),
  y=c(toplot[, 2], rev(toplot[, 3])),
  col=adjustcolor('skyblue', 0.5), border=NA)
lines(xx, toplot[, 1], lwd=2, col='blue')
# Add intercept
abline(h=mean(out2$sims.list$p0))
abline(h=hdi(out2$sims.list$p0), lty=2)

I added the black lines showing the mean and CrI for the intercept. The day effect is tiny, well within the credible interval we’d get if we ignored it and just used the intercept, which would be the best strategy for willow tits (though not for all species).

Download a ZIP file with the code for this tutorial here.

2 thoughts on “Occupancy with survey covariates”

  1. Hi Mike, is this similar to using ROPE, except that we’re checking if the intercept + covariate is practically equivalent to the intercept only?

    1. Not really, for ROPE you specify the limits of what you consider to be practically equivalent to zero, ie, too small to matter. If we are 95% that the coefficient is too small to matter, we discard it. Here we are comparing the predictions from 2 models, with and without the day covariate; we get better predictions if we ignore day.

Leave a Reply to admin Cancel reply

Your email address will not be published. Required fields are marked *