Occupancy with survey covariates

In the last tutorial we worked with site covariates and predictors for probability of occupancy. We continue with the Swiss willow tits example, now adding survey covariates and predictors for the probability of detecting the species if it is present.

We have two survey-level covariates: day is the date of the survey, coded as an integer with day 1 = 1 April. Detectability will likely vary across the breeding season depending whether the birds are courting, incubating, or feeding chicks, and we’ll use a quadratic function for this. We also have dur, the duration of the visit to the site, which will affect probability of detection.

Only the occupied sites provide information for estimation of $p$, the probability of detecting the species given that the site is occupied; unoccupied sites provide no information. We estimated that 80 of the sites in the data set are occupied (N in the results from the previous analysis), and each visit to an occupied site provides one bit of information. All 80 sites had 3 visits, so the sample size for $p$ is 240. That should be plenty to estimate 3 coefficients.

Data preparation

Because the probability of detection varies across visits to the same site, we can no longer use the aggregated data, but need to deal with visits one by one. So we need to work with the three-column matrix, Y.

The day and duration covariates are also three-column matrices, and we standardise these. Some sites were only visited twice, so the day matrix has some missing values, NA, in the third column. JAGS won’t use these for the analysis, but it does complain if there are missing values in covariates, so we replace those with zeros. The dur matrix has a few more NAs because the survey duration was not recorded; there are only four of these, so we’ll replace those with zero after standardising.

# Attach packages
library(jagsUI)
library(wiqid)

# Read in data
wt <- read.csv("http://bcss.org.my/data/willowtits.csv",
    comment="#")
    
# Extract the detection data and convert to a matrix:
Y <- as.matrix(wt[, 1:3])
n <- rowSums(!is.na(Y))

# Standardise the site covariates as before
elevS <- standardize(wt$elev)
forestS <- standardize(wt$forest)

# Extract the survey covariates as matrices and standardise
day <- as.matrix(wt[, 12:14])
dayS <- standardize(day)
head(dayS)
#          day.1      day.2     day.3
# [1,] -1.110455  0.1955548 0.8710771
# [2,] -1.831012 -0.6601068 0.3756941
# [3,] -1.065420 -0.2998282 0.9161119
# [4,] -1.380664 -0.4349327 0.7810075
# [5,] -1.155490  0.1054851 0.8710771
# [6,] -1.650873  0.1054851 0.8710771
dur <- as.matrix(wt[, 9:11])
durS <- standardize(dur)
head(durS)
#           dur.1       dur.2       dur.3
# [1,]  0.1469488  0.14694879  0.14694879
# [2,] -1.1170247 -1.19602301 -1.43301803
# [3,] -1.7490114 -1.98600642 -2.30199979
# [4,] -0.8010313 -0.95902798 -1.35401969
# [5,] -0.3270413 -0.09004623  0.06795045
# [6,] -1.2750213 -0.80103130 -0.56403628

# Replace missing values with zero
dayS[is.na(dayS)] <- 0
durS[is.na(durS)] <- 0

The model

Likelihood

The biological process is exactly the same as in the previous tutorial.

We modify the observation process so that detection probability for site $i$ on occasion $j$ is a function of an intercept and the predictors. $${\rm logit}(p_{ij}) = \alpha_0 + \alpha_{dur} Dur_{ij} + \alpha_{day} Day_{ij} + \alpha_{day2} Day_{ij}^2$$(We follow the usual convention of using $\beta$ for the occupancy coefficients and $\alpha$ for the detection coefficients.)

Detection probability is now different for each visit, so we can no longer use the binomial distribution. Instead, the individual 1/0 data for each visit are drawn from a Bernoulli distribution: $$y_{ij} \sim {\rm Bernoulli}(p_{ij} z_i)$$

Priors

We use the same priors for the coefficients in the submodel for occupancy as in the previous tutorial.

And we use the same approach to specifying priors for the intercept and coefficients for detection probability.

Model diagram

The JAGS code

# Save this in the file "wt_surveyCovs.jags"
model{
  # Likelihood
  for(i in 1:nSites) {
    # biological model
    logit(psi[i]) <- b0 + bFor * fst[i] + 
        bElev * ele[i] + bElev2 * ele2[i]
    z[i] ~ dbern(psi[i])
    # observation model
    for(j in 1:n[i]) {
      logit(p[i, j]) <- a0 + aDur * dur[i, j] + 
          aDay * day[i, j] + aDay2 * day2[i, j]
      Y[i, j] ~ dbern(p[i, j] * z[i])
    }
  }

  # Priors
  psi0 ~ dbeta(1,1)
  b0 <- logit(psi0)
  bFor ~ dunif(-5, 5)    # forest
  bElev ~ dunif(-5, 5)   # elevation
  bElev2 ~ dunif(-5, 5)  # elevation^2
  
  p0 ~ dbeta(1, 1)
  a0 <- logit(p0)
  aDur ~ dunif(-5, 5)    # duration
  aDay ~ dunif(-5, 5)    # day
  aDay2 ~ dunif(-5, 5)   # day^2
  
  # Derived variable
  N <- sum(z)
}

R code for running the model

This is very similar to the code in the previous tutorial. To get n.eff values around 10,000 I had to double the number of iterations (and also increased thinning). It took about 3 mins on my laptop.

# Prepare stuff for JAGS
y <- rowSums(Y, na.rm=TRUE)
jagsData <- list(Y = Y, n = n, nSites = length(n),
                z = ifelse(y > 0, 1, NA),
                fst = forestS, ele = elevS, ele2 = elevS^2,
                dur = durS, day = dayS, day2 = dayS^2)

wanted <- c("p0", "a0", "aDur", "aDay", "aDay2",
            "psi0", "b0", "bFor", "bElev", "bElev2", "N")

# Run the model (3 mins)
( out2 <- jags(jagsData, NULL, wanted,
    model="wt_surveyCovs.jags", DIC=FALSE,
    n.chains=3, n.iter=20000, n.thin=2, parallel=TRUE) )
#          mean    sd   2.5%    50%  97.5%  Rhat n.eff
# p0      0.741 0.047  0.642  0.743  0.825  1.000 14360
# a0      1.065 0.250  0.583  1.063  1.553  1.000 11346
# aDur    0.598 0.220  0.170  0.598  1.031  1.000 30000
# aDay    0.070 0.185 -0.298  0.071  0.428  1.000 30000
# aDay2  -0.039 0.155 -0.338 -0.042  0.270  1.000  9946
# psi0    0.473 0.072  0.336  0.472  0.618  1.000 12272
# b0     -0.109 0.296 -0.682 -0.113  0.482  1.000 12365
# bFor    0.904 0.261  0.422  0.895  1.439  1.000 26672
# bElev   2.107 0.332  1.495  2.090  2.796  1.001 30000
# bElev2 -1.178 0.292 -1.785 -1.167 -0.635  1.000 13367
# N      83.145 2.781 79.000 83.000 90.000  1.000 30000

mcmcOutput::diagPlot(out2)

The diagnostics all look fine. Only the parameters for detection are shown, those for occupancy look much the same as before. From the print-out and the plots, duration seems to be important, but the day effect is small and probably not quadratic.

Plotting the output

Let’s plot the day effect in the same way that we did elevation in the last tutorial.

# Plot effect of day on detection at the average visit duration
range(day, na.rm=TRUE)
# [1]  13 107
xx <- seq(10, 110)
# Need to scale xx the same way we scaled the day data
xxS <- standardize2match(xx, day)

# Do a plot with credible interval
toplot <- matrix(NA, 101, 3)
for(i in 1:101) {
  logit.p.tmp <- with(out2$sims.list,
      a0 + aDay * xxS[i] + aDay2 * xxS[i]^2)
  p.tmp <- plogis(logit.p.tmp)
  toplot[i, 1] <- mean(p.tmp)
  toplot[i, 2:3] <- hdi(p.tmp)
}

plot(xx, toplot[, 1], ylim=range(toplot), type='n', las=1,
  xlab="Day", ylab="Detection probability")
polygon(x=c(xx, rev(xx)),
  y=c(toplot[, 2], rev(toplot[, 3])),
  col=adjustcolor('skyblue', 0.5), border=NA)
lines(xx, toplot[, 1], lwd=2, col='blue')
# Add intercept
abline(h=mean(out2$sims.list$p0))
abline(h=hdi(out2$sims.list$p0), lty=2)

I added the black lines showing the mean and CrI for the intercept. The day effect is tiny, well within the credible interval we’d get if we ignored it and just used the intercept, which would be the best strategy for willow tits (though not for all species).

Download a ZIP file with the code for this tutorial here (includes NIMBLE code).

11 thoughts on “Occupancy with survey covariates”

  1. Hi Mike, is this similar to using ROPE, except that we’re checking if the intercept + covariate is practically equivalent to the intercept only?

    1. Not really, for ROPE you specify the limits of what you consider to be practically equivalent to zero, ie, too small to matter. If we are 95% that the coefficient is too small to matter, we discard it. Here we are comparing the predictions from 2 models, with and without the day covariate; we get better predictions if we ignore day.

  2. Hi Mike,

    As always, thanks for your very informative posts! I just have a couple of questions:

    (1) Why didn’t you consider even the “ni “value in the calculation of “y(i, j)”? Would I be correct in saying that is because it came from a Bernoulli and not a Binomial distribution?

    (2) Just out of curiosity, I tryed to plot the relation between probability of occurrence and duration of each visit (attached file). It seems that the trend follows the results of the output, i.e., the higher the duration of a visit within a site, the greater is the occupancy probability. In the case of willow tits, for instance, if we consider a group of bird watchers, they will have higher probabilities to see such species if they remain stationed for a longer time during a single visit. Does it make sense? Many thanks and have a nice day.

    All the best,
    Marcello

    Plot

    1. Hi Marcello,

      (1) The Bernoulli distribution describes success/failure in a single trial; dbern in JAGS has only a single parameter, p. BTW R has no dbern function, so you have to use dbinom with size=1.

      (2) No, what you plotted is the probability of detection given that the site is occupied. The y axis label is correct. (Better if you use p instead of psi in the code.) That has nothing to do with the probability of occupancy, ie, the probability that willow tits are nesting in the quadrat, which is the biological process. It makes no sense to imagine that willow tits come to build nests because they somehow know that the observer will stay for a long time! It’s very important to clearly distinguish probability of occupancy (the biological process, which is not affected by the observer) and probability of detection if occupied (the observation process).

  3. Hi,
    So in this example, “day” is considered as a continuous variable meaning 4>3>2>1. Isn’t date a nominal covariate and, for instance, 2nd April doesn’t mean it’s a large value than 1st April? Shouldn’t “date” be considered as a categorical covariate here?

    1. Time is inherently continuous. Measurements of continuous variables are never perfectly precise and there’s always some degree of “granularity” or rounding. Here we choose to round to days. 2nd April is later than 1st April and hence “bigger” on the time scale, and it’s “smaller” that 3rd April, even though we don’t usually think of time as being big or small.

      Although detectability varies through the breeding season, we have no reason to think that detectability on 2nd April will be very different to detectability on 1st April or 3rd April. We’d expect it to be intermediate between the two. So we model the trend in detectability with time. And we use a quadratic relationship, so that detectability can increase initially, then decrease towards the end of the breeding season.

      Willow tits (and researchers!) are diurnal, so in some contexts it might make sense to treat each day separately and estimate a probability of detection if present, p, for each day. Here though you would have a categorical variable with >100 levels, and you don’t have enough data to estimate >100 values for p. It rarely makes sense to chop up an inherently continuous variable into arbitrary categories for modelling.

  4. Hi Mike,

    I had a question about my own data. So I have site covariates on detection (not survey covariates). So detection prob varies but not for each survey period. I would still switch to using the Bernoulli rather than the binomial distribution to measure detection probability, correct?

    Thank you!

    1. Jordan,
      If the survey covars vary among sites but do not change across occasions, you don’t need the Bernoulli or to loop through the occasions. Just use the binomial and set up p as a logistic function of the covariates.

Leave a Reply to Yan Ru Cancel reply

Your email address will not be published.

The maximum upload file size: 1 MB. You can upload: image, document, text, archive. Drop file here