This tutorial explores the use of NEON breeding landbird data for occupancy modeling. The focus is on building detection histories from observational data, fitting a simple occupancy model, and testing how site and survey conditions may influence occupancy and detection.
Although birds are used here, the same workflow could be applied to other observational sampling datasets that provide species-level detections, such as small mammals or fish. Environmental covariates for these models can also be drawn from other NEON data products, such as weather and climate measurements collected by NEON’s Automated Instrument Systems.After completing this tutorial you will be able to:
You will need a current version of R to complete this tutorial. We also recommend the RStudio IDE to work with R.
NEON’s breeding landbird point count dataset provides taxonomic identifications, detection methods, and associated survey and site information for birds observed during standardized six-minute surveys. The protocol focuses on diurnal, terrestrial breeding landbirds, a group chosen because they respond strongly to changes in climate, habitat, and land use.
Field crews survey spatially distributed plots across NEON field sites during the breeding season every year, recording every bird seen or heard. These repeated surveys produce rich detection–nondetection data suited for occupancy modeling.
NEON’s terrestrial field sites are designated as large or small, which determines the sampling design. Large sites are surveyed once per year and contain 5–10 plots each containing a bird grid, made up of a 3×3 array of nine points spaced 250 m apart (see Figure 1). Field crews visit every point in a grid and conduct a 6-minute point-count survey. Smaller sites, which cannot accommodate full grids, contain 2–25 point-count locations randomly distributed across the site and spaced at least 250 m apart. These are surveyed twice per year.
Data Tip: NEON’s bird data follow a hierarchical structure: DomainID → SiteID → PlotID → PointID. Large sites use points labeled 1–9, while small sites use point label 21 for all locations. For example: (Large Site) D07 → GRSM → GRSM_002 → 5 | (Small Site) D20 → PUUM → PUUM_001 → 21.
More details on NEON’s bird monitoring program are available on the NEON bird resource page. Protocols referenced in the data can be found in the Document Library.
When searching for a species, there are technically three possible outcomes:
Traditional presence–absence data cannot distinguish missed detections from true absences. Occupancy modeling addresses this challenge by explicitly modeling two linked processes: the true occupancy of the species and the probability of detecting the species given that it is present. A key assumption of occupancy modeling is closure during the sampling period: the true occupancy state of a site does not change between repeated surveys.
This framework estimates two key parameters:
To tease apart ψ and p, occupancy models require repeated surveys of the same site. NEON’s observational sampling design is ideal for this approach, because surveys include repeated, standardized visits to locations across time.
We’ll need several R packages in this tutorial. Install the packages, if not already installed, and load the libraries for each.
# run once to get the package, and re-run if you need to get updates
install.packages("neonUtilities") # work with NEON data
install.packages('RPresence', repo='https://www.mbr-pwrc.usgs.gov/mbrCRAN') # library for occupancy modelling
install.packages("dplyr") # data manipulation
install.packages("tidyr") # data reshaping and tidying
install.packages("stringr") # string handling and text processing
install.packages("ggplot2") # visualization and plotting
# load the libraries into your environment
library(neonUtilities)
library(RPresence)
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
First, we’ll download the breeding landbird data product (DP1.10003.001) for all NEON terrestrial field sites included in RELEASE-2026.
On the first run, this code block will download the full dataset for all sites. Because the data product is large and spans multiple years-including years before some sites became operational-the initial download may take some time. A cached version of the dataset is saved locally, so on re-run, the code simply loads the local file instead of downloading the data again.
A NEON token is used to download the original data for this workflow. Tokens are optional but recommended, as they allow authenticated access and help avoid API rate limits when downloading large datasets. For more information, see the Using an API Token when Accessing NEON Data with neonUtilities tutorial and the NEON API token documentation.
# Define NEON sites
large_sites <- c(
"BARR", "BART", "BONA", "CLBJ", "CPER", "DEJU", "DSNY", "GRSM",
"GUAN", "HARV", "HEAL", "JERC", "JORN", "KONZ", "MLBS", "MOAB",
"NIWO", "OAES", "ONAQ", "ORNL", "OSBS", "RMNP", "SJER", "SRER",
"STEI", "TALL", "TEAK", "UKFS", "UNDE", "WOOD", "WREF", "YELL"
)
small_sites <- c(
"DELA", "LENO", "TOOL", "SOAP", "STER", "PUUM", "KONA",
"SERC", "DCFS", "NOGP", "LAJA", "BLAN", "SCBI", "ABBY", "TREE"
)
sites <- c(large_sites, small_sites)
# Set up directories
if (!dir.exists("data")) {
dir.create("data")
}
if (!dir.exists("outputs")) {
dir.create("outputs")
}
# Path to cached dataset
bird_counts_path <- file.path("data", "bird_counts.RData")
# Load cached data if available; otherwise download and save
if (file.exists(bird_counts_path)) {
load(bird_counts_path)
} else {
bird.counts <- loadByProduct(
dpID = "DP1.10003.001",
package = "basic",
release = "RELEASE-2026",
token = Sys.getenv("NEON_TOKEN"),
site = sites,
check.size = FALSE
)
save(bird.counts, file = bird_counts_path)
}
NEON provides separate tables for point-count survey metadata
(brd_perpoint) and the bird counts themselves
(brd_countdata).
Take a look at their structure (that is, what fields are included)
and the data in them. We see that brd_perpoint contains
fields describing the location, elevation, date and field-crew recorded
weather conditions for each point-count survey.
str(bird.counts$brd_perpoint)
## 'data.frame': 27076 obs. of 32 variables:
## $ uid : chr "30d14ecc-65a8-4e73-bcb3-d5132c153da2" "f97b6fa3-5719-4c0d-9922-1b23ec949911" "6fdac273-77b0-4e30-86d1-f6ff94f92e66" "ff9f059e-46fa-471a-a41d-2ac759881ff0" ...
## $ namedLocation : chr "BART_025.birdGrid.brd" "BART_025.birdGrid.brd" "BART_025.birdGrid.brd" "BART_025.birdGrid.brd" ...
## $ domainID : chr "D01" "D01" "D01" "D01" ...
## $ siteID : chr "BART" "BART" "BART" "BART" ...
## $ plotID : chr "BART_025" "BART_025" "BART_025" "BART_025" ...
## $ plotType : chr "distributed" "distributed" "distributed" "distributed" ...
## $ pointID : chr "3" "2" "1" "4" ...
## $ nlcdClass : chr "evergreenForest" "evergreenForest" "evergreenForest" "evergreenForest" ...
## $ decimalLatitude : num 44.1 44.1 44.1 44.1 44.1 ...
## $ decimalLongitude : num -71.3 -71.3 -71.3 -71.3 -71.3 ...
## $ geodeticDatum : chr "WGS84" "WGS84" "WGS84" "WGS84" ...
## $ coordinateUncertainty : num 250 250 250 250 250 ...
## $ elevation : num 576 576 576 576 576 ...
## $ elevationUncertainty : num 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.8 0.8 0.8 ...
## $ startDate : POSIXct, format: "2015-06-14 09:23:00" "2015-06-14 09:43:00" ...
## $ boutNumber : num 1 1 1 1 1 1 1 1 1 1 ...
## $ samplingImpracticalRemarks: chr NA NA NA NA ...
## $ samplingImpractical : chr "OK" "OK" "OK" "OK" ...
## $ eventID : chr "BRD.BART.2015.1" "BRD.BART.2015.1" "BRD.BART.2015.1" "BRD.BART.2015.1" ...
## $ startCloudCoverPercentage : num 20 20 20 20 20 20 20 100 100 100 ...
## $ endCloudCoverPercentage : num 40 40 40 40 40 40 40 100 100 100 ...
## $ startRH : num 72 72 72 72 72 72 72 93 93 93 ...
## $ endRH : num 56 56 56 56 56 56 56 85 85 85 ...
## $ observedHabitat : chr "evergreen forest" "deciduous forest" "mixed deciduous/evergreen forest" "deciduous forest" ...
## $ observedAirTemp : num 18 19 17 19 16 18 18 13 13 14 ...
## $ kmPerHourObservedWindSpeed: num 1 3 0 0 0 0 1 0 0 0 ...
## $ laboratoryName : chr "Bird Conservancy of the Rockies" "Bird Conservancy of the Rockies" "Bird Conservancy of the Rockies" "Bird Conservancy of the Rockies" ...
## $ samplingProtocolVersion : chr "NEON.DOC.014041vG" "NEON.DOC.014041vG" "NEON.DOC.014041vG" "NEON.DOC.014041vG" ...
## $ remarks : chr NA NA NA NA ...
## $ measuredBy : chr "JRUEB" "JRUEB" "JRUEB" "JRUEB" ...
## $ publicationDate : chr "20260106T223419Z" "20260106T223419Z" "20260106T223419Z" "20260106T223419Z" ...
## $ release : chr "RELEASE-2026" "RELEASE-2026" "RELEASE-2026" "RELEASE-2026" ...
head(bird.counts$brd_perpoint, n=5)
## uid namedLocation domainID siteID
## 1 30d14ecc-65a8-4e73-bcb3-d5132c153da2 BART_025.birdGrid.brd D01 BART
## 2 f97b6fa3-5719-4c0d-9922-1b23ec949911 BART_025.birdGrid.brd D01 BART
## 3 6fdac273-77b0-4e30-86d1-f6ff94f92e66 BART_025.birdGrid.brd D01 BART
## 4 ff9f059e-46fa-471a-a41d-2ac759881ff0 BART_025.birdGrid.brd D01 BART
## 5 e3f73906-fe6f-4f88-a05e-067a567d6516 BART_025.birdGrid.brd D01 BART
## plotID plotType pointID nlcdClass decimalLatitude decimalLongitude
## 1 BART_025 distributed 3 evergreenForest 44.06015 -71.31548
## 2 BART_025 distributed 2 evergreenForest 44.06015 -71.31548
## 3 BART_025 distributed 1 evergreenForest 44.06015 -71.31548
## 4 BART_025 distributed 4 evergreenForest 44.06015 -71.31548
## 5 BART_025 distributed 5 evergreenForest 44.06015 -71.31548
## geodeticDatum coordinateUncertainty elevation elevationUncertainty
## 1 WGS84 250.3 575.8 0.3
## 2 WGS84 250.3 575.8 0.3
## 3 WGS84 250.3 575.8 0.3
## 4 WGS84 250.3 575.8 0.3
## 5 WGS84 250.3 575.8 0.3
## startDate boutNumber samplingImpracticalRemarks samplingImpractical
## 1 2015-06-14 09:23:00 1 <NA> OK
## 2 2015-06-14 09:43:00 1 <NA> OK
## 3 2015-06-14 10:31:00 1 <NA> OK
## 4 2015-06-14 11:24:00 1 <NA> OK
## 5 2015-06-14 12:20:00 1 <NA> OK
## eventID startCloudCoverPercentage endCloudCoverPercentage startRH
## 1 BRD.BART.2015.1 20 40 72
## 2 BRD.BART.2015.1 20 40 72
## 3 BRD.BART.2015.1 20 40 72
## 4 BRD.BART.2015.1 20 40 72
## 5 BRD.BART.2015.1 20 40 72
## endRH observedHabitat observedAirTemp
## 1 56 evergreen forest 18
## 2 56 deciduous forest 19
## 3 56 mixed deciduous/evergreen forest 17
## 4 56 deciduous forest 19
## 5 56 deciduous forest 16
## kmPerHourObservedWindSpeed laboratoryName
## 1 1 Bird Conservancy of the Rockies
## 2 3 Bird Conservancy of the Rockies
## 3 0 Bird Conservancy of the Rockies
## 4 0 Bird Conservancy of the Rockies
## 5 0 Bird Conservancy of the Rockies
## samplingProtocolVersion remarks measuredBy publicationDate release
## 1 NEON.DOC.014041vG <NA> JRUEB 20260106T223419Z RELEASE-2026
## 2 NEON.DOC.014041vG <NA> JRUEB 20260106T223419Z RELEASE-2026
## 3 NEON.DOC.014041vG <NA> JRUEB 20260106T223419Z RELEASE-2026
## 4 NEON.DOC.014041vG <NA> JRUEB 20260106T223419Z RELEASE-2026
## 5 NEON.DOC.014041vG <NA> JRUEB 20260106T223419Z RELEASE-2026
brd_countdata contains data for the birds observed
during each point-count survey, including identification, observation
method, distance, and observed minute.
str(bird.counts$brd_countdata)
## 'data.frame': 373518 obs. of 26 variables:
## $ uid : chr "08b95360-854d-4d74-a276-93fc7d2423ca" "f2d2d4ce-c978-46a5-befb-4d3b749b1cdb" "754c7b57-1892-4f1f-9f49-594c40d47bfa" "6212a830-6d2d-4999-b169-19273526790e" ...
## $ namedLocation : chr "BART_025.birdGrid.brd" "BART_025.birdGrid.brd" "BART_025.birdGrid.brd" "BART_025.birdGrid.brd" ...
## $ domainID : chr "D01" "D01" "D01" "D01" ...
## $ siteID : chr "BART" "BART" "BART" "BART" ...
## $ plotID : chr "BART_025" "BART_025" "BART_025" "BART_025" ...
## $ plotType : chr "distributed" "distributed" "distributed" "distributed" ...
## $ pointID : chr "3" "3" "3" "3" ...
## $ startDate : POSIXct, format: "2015-06-14 09:23:00" "2015-06-14 09:23:00" ...
## $ boutNumber : num 1 1 1 1 1 1 1 1 1 1 ...
## $ eventID : chr "BRD.BART.2015.1" "BRD.BART.2015.1" "BRD.BART.2015.1" "BRD.BART.2015.1" ...
## $ pointCountMinute : num 1 1 2 2 1 3 4 1 1 6 ...
## $ targetTaxaPresent : chr "Y" "Y" "Y" "Y" ...
## $ taxonID : chr "REVI" "BTNW" "BTNW" "BAWW" ...
## $ scientificName : chr "Vireo olivaceus" "Setophaga virens" "Setophaga virens" "Mniotilta varia" ...
## $ taxonRank : chr "species" "species" "species" "species" ...
## $ vernacularName : chr "Red-eyed Vireo" "Black-throated Green Warbler" "Black-throated Green Warbler" "Black-and-white Warbler" ...
## $ observerDistance : num 9 12 50 17 42 27 62 26 21 46 ...
## $ detectionMethod : chr "singing" "singing" "singing" "singing" ...
## $ visualConfirmation : chr "No" "No" "No" "No" ...
## $ sexOrAge : chr "Male" "Male" "Male" "Male" ...
## $ clusterSize : num 1 1 1 1 1 1 1 1 1 1 ...
## $ clusterCode : chr NA NA NA NA ...
## $ identifiedBy : chr "JRUEB" "JRUEB" "JRUEB" "JRUEB" ...
## $ identificationHistoryID: chr NA NA NA NA ...
## $ publicationDate : chr "20260106T223419Z" "20260106T223419Z" "20260106T223419Z" "20260106T223419Z" ...
## $ release : chr "RELEASE-2026" "RELEASE-2026" "RELEASE-2026" "RELEASE-2026" ...
head(bird.counts$brd_countdata, n=5)
## uid namedLocation domainID siteID
## 1 08b95360-854d-4d74-a276-93fc7d2423ca BART_025.birdGrid.brd D01 BART
## 2 f2d2d4ce-c978-46a5-befb-4d3b749b1cdb BART_025.birdGrid.brd D01 BART
## 3 754c7b57-1892-4f1f-9f49-594c40d47bfa BART_025.birdGrid.brd D01 BART
## 4 6212a830-6d2d-4999-b169-19273526790e BART_025.birdGrid.brd D01 BART
## 5 192585ee-0a3d-4994-856f-3da8090a37c3 BART_025.birdGrid.brd D01 BART
## plotID plotType pointID startDate boutNumber eventID
## 1 BART_025 distributed 3 2015-06-14 09:23:00 1 BRD.BART.2015.1
## 2 BART_025 distributed 3 2015-06-14 09:23:00 1 BRD.BART.2015.1
## 3 BART_025 distributed 3 2015-06-14 09:23:00 1 BRD.BART.2015.1
## 4 BART_025 distributed 3 2015-06-14 09:23:00 1 BRD.BART.2015.1
## 5 BART_025 distributed 3 2015-06-14 09:23:00 1 BRD.BART.2015.1
## pointCountMinute targetTaxaPresent taxonID scientificName taxonRank
## 1 1 Y REVI Vireo olivaceus species
## 2 1 Y BTNW Setophaga virens species
## 3 2 Y BTNW Setophaga virens species
## 4 2 Y BAWW Mniotilta varia species
## 5 1 Y BCCH Poecile atricapillus species
## vernacularName observerDistance detectionMethod
## 1 Red-eyed Vireo 9 singing
## 2 Black-throated Green Warbler 12 singing
## 3 Black-throated Green Warbler 50 singing
## 4 Black-and-white Warbler 17 singing
## 5 Black-capped Chickadee 42 singing
## visualConfirmation sexOrAge clusterSize clusterCode identifiedBy
## 1 No Male 1 <NA> JRUEB
## 2 No Male 1 <NA> JRUEB
## 3 No Male 1 <NA> JRUEB
## 4 No Male 1 <NA> JRUEB
## 5 No Male 1 <NA> JRUEB
## identificationHistoryID publicationDate release
## 1 <NA> 20260106T223419Z RELEASE-2026
## 2 <NA> 20260106T223419Z RELEASE-2026
## 3 <NA> 20260106T223419Z RELEASE-2026
## 4 <NA> 20260106T223419Z RELEASE-2026
## 5 <NA> 20260106T223419Z RELEASE-2026
Before we can build detection histories and fit an occupancy model, we need to clean the data a bit. This includes:
Each point-count survey needs a unique identifier so that we can
match each point-count survey record (brd_perpoint) to its corresponding
bird data (brd_countdata). We extract the survey year from eventID and
then combine the site, plot, point, year, and bout into a single key
called pointSurveyID for each of the tables. Note, because the siteID is
already included in the plotID (e.g., BART_012), siteID is not actually
included in the construction of pointSurveyID.
brd_perpoint_clean <- bird.counts$brd_perpoint %>%
mutate(year = str_extract(eventID, "\\d{4}")) %>%
mutate(pointSurveyID = paste(plotID, "point", pointID, year, "bout", boutNumber, sep = "_"))
brd_countdata_clean <- bird.counts$brd_countdata %>%
mutate(year = str_extract(eventID, "\\d{4}")) %>%
mutate(pointSurveyID = paste(plotID, "point", pointID, year, "bout", boutNumber, sep = "_"))
The samplingImpractical field flags point-count surveys
that could not be completed (e.g., weather, access issues). For our
purposes, we keep only surveys marked as “OK”.
brd_perpoint_clean <- brd_perpoint_clean %>%
filter(samplingImpractical == "OK")
Point-count surveys were not conducted consistently at all sites in the early years of the NEON program. To ensure a consistent survey effort across sites and years, we remove surveys conducted before 2017.
brd_perpoint_clean <- brd_perpoint_clean %>%
filter(year >= 2017)
For this workflow, we only focus on detections identified to the
species level. Records that are not identified to species (or have
NA in taxonRank) are dropped.
brd_countdata_clean <- brd_countdata_clean %>%
filter(taxonRank == "species")
Finally, we join the cleaned detection data
(brd_countdata_clean) with the cleaned per-point metadata
(brd_perpoint_clean) using the pointSurveyID
identifer we created earlier as the primary key. After the join, we drop
duplicate columns and keep a single, combined dataset.
#Join the tables
brd_joineddata_clean <- inner_join(
brd_countdata_clean,
brd_perpoint_clean,
by = "pointSurveyID",
suffix = c(".count", ".perpoint")
)
#Remove duplicate columns between tables
brd_joineddata_clean <- brd_joineddata_clean %>%
select(-matches("\\.perpoint$"))
names(brd_joineddata_clean) <- gsub("\\.count$", "", names(brd_joineddata_clean))
head(brd_joineddata_clean, n=5)
## uid namedLocation domainID siteID
## 1 28587fd5-e9c6-4dd9-ac2f-2a80d9d69ec2 BART_018.birdGrid.brd D01 BART
## 2 09f4496e-72c5-42ff-a4bf-aadcbf739397 BART_018.birdGrid.brd D01 BART
## 3 8721c42a-1e64-4c60-a40f-26b41ec3f51a BART_018.birdGrid.brd D01 BART
## 4 8ea10abd-5835-4fda-b592-ba3533c3311a BART_018.birdGrid.brd D01 BART
## 5 39915ebd-b013-4681-9903-9debb0ab7166 BART_018.birdGrid.brd D01 BART
## plotID plotType pointID startDate boutNumber eventID
## 1 BART_018 distributed 1 2017-06-21 09:03:00 1 BRD.BART.2017.1
## 2 BART_018 distributed 1 2017-06-21 09:03:00 1 BRD.BART.2017.1
## 3 BART_018 distributed 1 2017-06-21 09:03:00 1 BRD.BART.2017.1
## 4 BART_018 distributed 1 2017-06-21 09:03:00 1 BRD.BART.2017.1
## 5 BART_018 distributed 1 2017-06-21 09:03:00 1 BRD.BART.2017.1
## pointCountMinute targetTaxaPresent taxonID scientificName taxonRank
## 1 1 Y BTNW Setophaga virens species
## 2 2 Y RBWO Melanerpes carolinus species
## 3 1 Y REVI Vireo olivaceus species
## 4 1 Y OVEN Seiurus aurocapilla species
## 5 1 Y HETH Catharus guttatus species
## vernacularName observerDistance detectionMethod
## 1 Black-throated Green Warbler 40 singing
## 2 Red-bellied Woodpecker 20 calling
## 3 Red-eyed Vireo 45 singing
## 4 Ovenbird 40 singing
## 5 Hermit Thrush 50 singing
## visualConfirmation sexOrAge clusterSize clusterCode identifiedBy
## 1 No Male 1 <NA> JRUEB
## 2 No Unknown 1 <NA> JRUEB
## 3 No Male 1 <NA> JRUEB
## 4 No Male 1 <NA> JRUEB
## 5 No Male 1 <NA> JRUEB
## identificationHistoryID publicationDate release year
## 1 <NA> 20260106T223356Z RELEASE-2026 2017
## 2 <NA> 20260106T223356Z RELEASE-2026 2017
## 3 <NA> 20260106T223356Z RELEASE-2026 2017
## 4 <NA> 20260106T223356Z RELEASE-2026 2017
## 5 <NA> 20260106T223356Z RELEASE-2026 2017
## pointSurveyID nlcdClass decimalLatitude decimalLongitude
## 1 BART_018_point_1_2017_bout_1 mixedForest 44.05997 -71.2779
## 2 BART_018_point_1_2017_bout_1 mixedForest 44.05997 -71.2779
## 3 BART_018_point_1_2017_bout_1 mixedForest 44.05997 -71.2779
## 4 BART_018_point_1_2017_bout_1 mixedForest 44.05997 -71.2779
## 5 BART_018_point_1_2017_bout_1 mixedForest 44.05997 -71.2779
## geodeticDatum coordinateUncertainty elevation elevationUncertainty
## 1 WGS84 250.1 297.2 0.2
## 2 WGS84 250.1 297.2 0.2
## 3 WGS84 250.1 297.2 0.2
## 4 WGS84 250.1 297.2 0.2
## 5 WGS84 250.1 297.2 0.2
## samplingImpracticalRemarks samplingImpractical startCloudCoverPercentage
## 1 <NA> OK 100
## 2 <NA> OK 100
## 3 <NA> OK 100
## 4 <NA> OK 100
## 5 <NA> OK 100
## endCloudCoverPercentage startRH endRH observedHabitat observedAirTemp
## 1 100 73 78 deciduous forest 15
## 2 100 73 78 deciduous forest 15
## 3 100 73 78 deciduous forest 15
## 4 100 73 78 deciduous forest 15
## 5 100 73 78 deciduous forest 15
## kmPerHourObservedWindSpeed laboratoryName
## 1 1 Bird Conservancy of the Rockies
## 2 1 Bird Conservancy of the Rockies
## 3 1 Bird Conservancy of the Rockies
## 4 1 Bird Conservancy of the Rockies
## 5 1 Bird Conservancy of the Rockies
## samplingProtocolVersion remarks measuredBy
## 1 NEON.DOC.014041vH <NA> JRUEB
## 2 NEON.DOC.014041vH <NA> JRUEB
## 3 NEON.DOC.014041vH <NA> JRUEB
## 4 NEON.DOC.014041vH <NA> JRUEB
## 5 NEON.DOC.014041vH <NA> JRUEB
Occupancy models require detection histories, which record whether a species was detected or not. For single-species occupancy models, detection histories are typically represented as a table of 0s and 1s, where:
A value of 1 indicates the species was detected during that survey, while 0 indicates it was not detected.
The way detection histories are constructed depends on the research question. For NEON observational data, locations could be defined at the plot or site level, depending on the scale of the analysis. Survey could also be defined in different ways. For example, individual surveys could be aggregated within a year, month, or other time period.
In this tutorial, we construct detection histories at the site level and define survey occasions by year (i.e., each cell in our table will correspond to a site-year combination)
First, select a species to model. Here we use Melanerpes carolinus (Red-bellied Woodpecker), which is widespread and frequently detected.
We filter the combined dataset we just prepared to include only observations of this species.
# Filter table by species
bird_species <- "Melanerpes carolinus"
brd_single_species <- brd_joineddata_clean %>%
filter(scientificName == bird_species)
Next, we create a table of all site–year combinations where surveys actually occurred - Remember that surveys marked as incomplete were removed above. This ensures that when we build the detection history, site–year surveys that did not have valid surveys well be recorded as NA, not 0.
# Get all valid site-year combinations across sites (so we can fill in NAs)
valid_site_years <- brd_perpoint_clean %>%
distinct(siteID, year)
brd_single_species contains only presence records for
our species. Here, we summarize those observations so that each
site–year combination where the species was detected at least once is
recorded with a value of 1. We will use this table to fill in the
presence values when constructing the detection history.
# Get detections (so we can fill in 0/1s)
detections <- brd_single_species %>%
group_by(siteID, year) %>%
summarize(present = 1, .groups = "drop")
Finally, we will combine the tables and reshape the data. Any site–year combinations without a detection are filled with 0, indicating the species was not detected. The data are then reshaped so that each row represents a site and each column represents a year, as explained above. We also sort the table alphabetically by siteID and chronologically by year.
# Join and fill
detection_history <- valid_site_years %>%
left_join(detections, by = c("siteID", "year")) %>%
mutate(present = replace_na(present, 0)) %>%
pivot_wider(
names_from = year,
values_from = present
) %>%
arrange(siteID) %>% # sort siteIDs
select(siteID, sort(names(.)[-1])) # sort surveys
head(detection_history, n=10)
## # A tibble: 10 × 9
## siteID `2017` `2018` `2019` `2020` `2021` `2022` `2023` `2024`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ABBY 0 0 0 NA 0 0 0 0
## 2 BARR 0 0 0 NA 0 0 0 0
## 3 BART 1 0 0 0 0 0 0 0
## 4 BLAN 1 1 1 1 1 1 1 1
## 5 BONA 0 0 0 0 0 0 0 0
## 6 CLBJ 1 1 1 1 1 1 1 1
## 7 CPER 0 0 0 0 0 0 0 0
## 8 DCFS 0 0 0 0 0 0 NA 0
## 9 DEJU 0 0 0 0 0 0 0 0
## 10 DELA 1 1 1 1 1 1 1 1
Now that we have constructed a detection history, we can fit a simple occupancy model! In this example, we fit a single-species, single-season occupancy model for our woodpecker.
A single-season occupancy model assumes that the occupancy state of each site does not change during the period represented by the surveys. Remember this model will estimate two key parameters:
The basic input for nearly every occupancy analysis in RPresence is
an object of class pao, which stands for
presence–absence object. This object contains the detection
history and any associated data needed to fit an occupancy model.
A pao object is created using the
createPao() function. At a minimum, this function requires
a detection history containing the detection (=1) and non-detection (=0)
data with observations from a sampling unit along a row, and surveys in
the columns. Missing values are allowed (=NA).
Additional arguments allow users to include site-level or survey-level covariates but we will not include those in this initial model.
In this example, we provide two arguments:
birds_pao_simple <- createPao(
data = detection_history[, -1], # detection/nondetection data (survey columns only) from detection_history
unitnames = detection_history$siteID # siteIDs from detection_history
)
Once the detection history has been converted into a pao
object, we can fit the occupancy model using the occMod()
function from the RPresence package.
There are several arguments to the occMod() function, but here are the ones needed for this model:
pao data object.The model formulas follow the standard R formula syntax:
parameter ~ predictors
In this first example, we fit a null model, meaning that both occupancy and detection probability for our bird are assumed to be constant across all sites and surveys. This means we will have one resulting value for ψ (psi) and one resulting value for p. This is written as:
psi ~ 1, p ~ 1
The formula can also include predictors to allow parameters to vary
across surveys or sites. In RPresence, a factor called
SURVEY is automatically included in the pao object to allow
for survey-specific detection probabilities.
For example, we could use the formula p ~ SURVEY to
specify a model in which detection probability varies by survey,
resulting in a separate value of p for each survey.
In practice, detection and occupancy are often modeled as functions of site or survey covariates, such as habitat characteristics or weather conditions. We will explore how to include these types of predictors later in the tutorial.
birds_occupancy_simple <- occMod(
data = birds_pao_simple,
model = list(
psi ~ 1,
p ~ 1), # p ~ SURVEY
type = 'so'
)
summary(birds_occupancy_simple)
## Model name=psi(1)p(1)
## AIC=207.9761
## -2*log-likelihood=203.9761
## num. par=2
## Warning: Numerical convergence may not have been reached. Parameter estimates converged to approximately 7 signifcant digits.
The model summary includes a few statistics describing model fit.
The values reported are:
In later sections, we will fit additional models and use AIC comparisons to determine whether including environmental covariates improves model performance.
Now that we have a model, we can examine our estimated values. We can
use the print_one_site_estimates() function to retrieve our
estimates, and then do some light data wrangling to shape the results
into a presentable table.
ests <- as.data.frame(print_one_site_estimates(mod = birds_occupancy_simple))
#Rename some values for readability
ests <- ests[1:2,] %>%
mutate(parm = c("psi", "p")) %>%
select(parm, est, se, lower, upper) %>%
rename(.,
c(
Parameter = parm,
Estimate = est,
SE = se,
L95 = lower,
U95 = upper
)
) %>%
`rownames<-`(seq_len(nrow(ests[1:2,])))
ests
## Parameter Estimate SE L95 U95
## 1 psi 0.4681362 0.07279173 0.3316722 0.6095392
## 2 p 0.8604286 0.02643372 0.8001831 0.9046737
From these results, we see that Melanerpes carolinus has an estimated occupancy probability (ψ) of about 0.47. This means that if you randomly select a NEON site, there is roughly a 47% chance that the species can be found at that site—equivalently, the species is present at about 47% of NEON sites in this dataset. We can see that if we overlay a range map for Melanerpes carolinus, it covers roughly half of NEON sites.
U.S. Geological Survey (USGS) - Gap Analysis Project (GAP), 2018, Red-bellied Woodpecker (Melanerpes carolinus) bRBWOx_CONUS_2001v1 Range Map: U.S. Geological Survey data release, https://doi.org/10.5066/F7JW8CX7.
library(sf)
library(rnaturalearth)
melanerpes_carolinus_range <- st_read("data/bRBWOx_CONUS_Range_2001v1/bRBWOx_CONUS_Range_2001v1.shp", quiet = TRUE) %>%
st_transform(crs = 4326)
site_map <- brd_perpoint_clean %>%
select(siteID, decimalLatitude, decimalLongitude) %>%
distinct(siteID, .keep_all = TRUE) %>%
left_join(
brd_single_species %>%
distinct(siteID) %>%
mutate(present = 1),
by = "siteID"
) %>%
mutate(present = ifelse(is.na(present), 0, present))
usa <- bind_rows(
ne_states(country = "United States of America", returnclass = "sf"),
ne_states(country = "Puerto Rico", returnclass = "sf")
) %>%
st_transform(4326)
ggplot() +
geom_sf(data = usa, fill = NA, color = "black") +
geom_sf(data = melanerpes_carolinus_range,
fill = "lightgreen",
color = NA,
alpha = 0.4) +
geom_point(data = site_map,
aes(x = decimalLongitude,
y = decimalLatitude,
color = factor(present)),
size = 2) +
scale_color_manual(
values = c("0" = "gray70", "1" = "darkgreen"),
labels = c("Not detected", "Detected"),
name = "Species observed"
) +
coord_sf(
xlim = c(-170, -60),
ylim = c(15, 72),
expand = FALSE
) +
labs(
x = "Longitude",
y = "Latitude",
title = "Red-bellied Woodpecker Range and NEON Site Detections"
) +
theme_bw()
At this broad scale, this estimate is mostly descriptive and could be approximated directly from the data. Occupancy estimates become more informative when applied at finer spatial scales or when combined with covariates, which help explain why species occurs in some locations but not others.
The estimated detection probability (p) is about 0.86, meaning that if the species is present at a site, there is about a 86% chance it will be detected.
Now that we have demonstrated a simple occupancy model, we can repeat the same workflow for additional species and compare their occupancy and detection probabilities.
Below we fit the same model for:
For each species, we construct a detection history, create a pao
object, and fit the same simple occupancy model
(psi ~ 1, p ~ 1), following the same steps as above.
# Function to run the simple occupancy model workflow for any species
run_simple_species_model <- function(species_name, common_name){
brd_species <- brd_joineddata_clean %>%
filter(siteID %in% sites,
scientificName == species_name)
detections <- brd_species %>%
group_by(siteID, year) %>%
summarize(present = 1, .groups = "drop")
detection_history <- valid_site_years %>%
left_join(detections, by = c("siteID", "year")) %>%
mutate(present = replace_na(present, 0)) %>%
pivot_wider(names_from = year, values_from = present) %>%
arrange(siteID) %>%
select(siteID, sort(names(.)[-1]))
pao <- createPao(
data = detection_history[, -1],
unitnames = detection_history$siteID,
title = common_name
)
model <- occMod(
data = pao,
model = list(
psi ~ 1,
p ~ 1
),
type = "so"
)
return(model)
}
Now we run the model for both species.
owl_model <- run_simple_species_model(
species_name = "Bubo virginianus",
common_name = "Great Horned Owl"
)
jay_model <- run_simple_species_model(
species_name = "Cyanocitta cristata",
common_name = "Blue Jay"
)
Next, we extract the estimated occupancy (ψ) and detection probability (p) for each species.
# Function to extract and rename estimates for simple model
extract_estimates <- function(model){
ests <- as.data.frame(print_one_site_estimates(mod = model))
ests <- ests[1:2,] %>%
mutate(Parameter = c("psi", "p")) %>%
select(Parameter, est, se, lower, upper) %>%
rename(
Estimate = est,
SE = se,
L95 = lower,
U95 = upper
)
return(ests)
}
owl_estimates <- extract_estimates(owl_model) %>%
mutate(Species = "Great Horned Owl")
jay_estimates <- extract_estimates(jay_model) %>%
mutate(Species = "Blue Jay")
# Combine estimates for plotting
species_estimates <- bind_rows(owl_estimates, jay_estimates)
We’ll visualize the estimates in a plot.
ggplot(species_estimates,
aes(x = Parameter, y = Estimate, fill = Species)) +
geom_col(position = position_dodge(width = 0.7), width = 0.6) +
geom_errorbar(aes(ymin = L95, ymax = U95),
position = position_dodge(width = 0.7),
width = 0.15) +
scale_x_discrete(labels = c(
psi = "Occupancy (ψ)",
p = "Detection (p)"
)) +
scale_y_continuous(limits = c(0, 1)) +
labs(
x = NULL,
y = "Estimated Probability",
title = "Occupancy and Detection Estimates by Species"
) +
theme_bw()
Here we can see that although Blue Jays and Great Horned Owls have similar occupancy probabilities (.51 and .56), their detection probabilities differ dramatically (.90 and .22). Blue Jays are highly detectable during point-count surveys, whereas Great Horned Owls are much more difficult to detect under the same protocol.
In reality, occupancy and detection probabilities are rarely constant across all sites and surveys. Many factors can influence whether a species occurs at a site and whether it is detected during a survey. These factors can be included in occupancy models as covariates.
Site characteristics can influence occupancy or detection. For example:
Survey characteristics can only affect detection. For example:
By including these variables in our model, we can either:
So far, our model assumed that occupancy and detection probabilities were constant across all sites and surveys. In reality, site characteristics often influence whether a species occupies a location or is more easily detected. We can test these effects by including site covariates in the occupancy model. Remember that site characteristics can influence both occupancy and/or detection.
In this example, we examine whether elevation influences the occupancy probability of Melanerpes carolinus (Red-bellied Woodpecker).
Site-level covariates must be provided to createPao() as
a separate table with one row per site. The rows must correspond to the
same sites used in the detection history.
Here we extract the elevation for each site from the combined dataset
brd_joineddata_clean we prepared at the beginning of the
tutorial.
Data Tip: For the bird dataset, NEON includes several common environmental variables, making it easy to incorporate these characteristics directly into analyses. More broadly, NEON collects a wide range of co-located environmental measurements. These additional data products can be used to create covariates describing habitat, climate, or other conditions for occupancy models.
site_cov <- brd_joineddata_clean %>%
select(siteID, elevation) %>%
distinct(siteID, .keep_all = TRUE) %>%
arrange(siteID) %>%
tibble::column_to_rownames("siteID")
head(site_cov, n=10)
## elevation
## ABBY 386.6
## BARR 5.6
## BART 297.2
## BLAN 137.4
## BONA 403.6
## CLBJ 312.5
## CPER 1637.1
## DCFS 590.4
## DEJU 453.3
## DELA 23.7
Note that other types of site covariates could also be used. For example, categorical variables such as land cover class could be included instead. When including categorical variables as covariates, they must be converted to factors.
# Example categorical covariate
site_cov_categorical <- brd_perpoint_clean %>%
select(siteID, nlcdClass) %>%
distinct(siteID, .keep_all = TRUE) %>%
mutate(nlcdClass = as.factor(nlcdClass)) %>%
arrange(siteID) %>%
tibble::column_to_rownames("siteID")
site_cov_categorical$nlcdClass <- factor(site_cov_categorical$nlcdClass)
head(site_cov_categorical, n=10)
## nlcdClass
## ABBY grasslandHerbaceous
## BARR emergentHerbaceousWetlands
## BART mixedForest
## BLAN deciduousForest
## BONA shrubScrub
## CLBJ deciduousForest
## CPER grasslandHerbaceous
## DCFS grasslandHerbaceous
## DEJU shrubScrub
## DELA woodyWetlands
Next, we create a new pao object that includes the site covariate
table. We will use the same detection_history table as
before.
birds_pao_sitecov <- createPao(
data = detection_history[, -1], # detection/nondetection data (survey columns only) from detection_history
unitnames = detection_history$siteID, # site IDs from detection_history
unitcov = site_cov # add the site covariates
)
We now fit a model where occupancy varies with elevation, while detection probability remains constant.
In this model, the formula psi ~ elevation specifies
that occupancy probability is modeled as a linear function of elevation,
meaning the model estimates how occupancy probability changes as
elevation increases or decreases. As a result, occupancy probability
will not be a single value but will vary depending on the elevation of
each site.
Other functional forms can also be used, but aren’t included in this tutorial. For example:
psi ~ elevation + I(elevation^2) could model a
polynomial relationship. For example, a species might prefer
mid-elevation forests but be less common at both low and high
elevations.psi ~ elevation + forest_cover could include multiple
covariates influencing occupancy. For example, a bird may be more likely
to occur at higher elevations plus areas with greater forest cover.psi ~ elevation * forest_cover could model an
interaction between variables. For example, forest cover might increase
occupancy at low elevations but have little effect at high
elevations.Detection probability could also be modeled as a function of site
covariates in the same way. In this example, however, we keep detection
probability constant using p ~ 1 so that we can focus on
how elevation influences occupancy.
birds_occupancy_sitecov <- occMod(
data = birds_pao_sitecov,
model = list(psi ~ elevation,
p ~ 1),
type = 'so'
)
Now that the model has been fit, we can examine the results to see whether elevation improves model performance and how it influences occupancy probability. We will do this by:
We can compare the new model with elevation to the original model without to see whether including elevation improves the model fit. One common way to compare models is using AIC (Akaike Information Criterion), which balances model fit with model complexity. In general, models with lower AIC values are preferred.
The output table also includes additional statistics, such as the −2 log-likelihood (neg2ll) and other model parameters. These metrics can also be used to evaluate model performance, but we will focus on AIC for this tutorial.
mod.list <- list(
birds_occupancy_simple, # the model with formula psi ~ 1, p ~ 1
birds_occupancy_sitecov # the model with formula psi ~ elevation, p ~ 1
)
aictable <- createAicTable(mod.list = mod.list)
aictable$table
## Model AIC neg2ll npar warn.conv warn.VC DAIC modlike
## 2 psi(elevation)p(1) 191.7794 185.7794 3 7 0 0.0000 1e+00
## 1 psi(1)p(1) 207.9761 203.9761 2 7 0 16.1967 3e-04
## wgt
## 2 0.9997
## 1 0.0003
Looking at the two models, the model including elevation has a lower AIC value (191.7794 vs 207.9761). Because AIC balances model fit with the number of parameters, the lower value suggests that the model including elevation provides a better and more parsimonious explanation of the data.
We can also calculate a p-value using a likelihood ratio test (LRT)
to quantify whether the model including elevation provides a
significantly better fit than the simpler model without it. This method
is appropriate for “nested” models only, meaning the simpler model is a
special case of the more complex one. In this case, the simple model
(psi ~ 1) is a reduced version of the model that includes
elevation (psi ~ elevation).
The likelihood ratio test compares how well the two models explain the data by examining the difference in their −2 log-likelihood values. The difference in −2 log-likelihood values is compared to a chi-square distribution, where the degrees of freedom equals the difference in the number of parameters between the two models.
#get the difference in the neg2loglike between the two models
diff <- abs(birds_occupancy_simple$neg2loglike - birds_occupancy_sitecov$neg2loglike)
#look up our observed difference on a chi-square distribution with 1 degree of freedom (the difference between the number of parameters)
pchisq(diff,
df = abs(birds_occupancy_simple$npar - birds_occupancy_sitecov$npar),
lower.tail = FALSE)
## [1] 1.992237e-05
The resulting p-value tests the null hypothesis that adding elevation does not improve the model fit. Because the p-value is very small (1.992237e-05), we reject the null hypothesis and conclude that including elevation significantly improves the model. Therefore, elevation appears to be an important predictor of occupancy for this species in our dataset!
We can also examine the estimated coefficients from the model to understand exactly how the covariate influences occupancy probability. In this case, the coefficient for elevation tells us the direction and strength of the relationship between elevation and the probability that a site is occupied.
coef(birds_occupancy_sitecov, 'psi', prob = 0.95)
## Beta_est se lowerCI upperCI
## psi_A1.Int 1.477853 0.378737 0.735542120 2.220163880
## psi_A1.Int.psi.elevation -0.002936 0.000668 -0.004245256 -0.001626744
The beta estimate for elevation is slightly negative (-0.002936), meaning that occupancy probability decreases as elevation increases.
The output also includes 95% confidence intervals for each estimate. In this case, the confidence interval for elevation (-0.004245256 to -0.001626744) does not include 0 and lies entirely below it. This indicates that the estimated relationship between elevation and occupancy is consistently negative across the range of plausible values for the coefficient. In other words, it is quite likely that occupancy probability decreases as elevation increases.
It can also be helpful to visually examine how occupancy probability changes across the elevation gradient. To do this, we extract the predicted occupancy probabilities for each site from the model output and plot them against site elevation.
data <- cbind(site_cov, birds_occupancy_sitecov$real$psi) %>%
distinct(.keep_all = TRUE) %>%
arrange(elevation)
ggplot(data, aes(x = elevation, y = est)) +
geom_point(size = 2, alpha = 0.8) +
labs(
x = "Elevation",
y = "Predicted Occupancy Probability"
) +
scale_y_continuous(limits = c(0, 1)) +
theme_bw()
Data Tip: An occupancy model with a linear effect will produce a curved (sigmoidal) prediction line when plotting probability. This is because technically the model is linear on the logit (log-odds) scale, and transforming log-odds back to probabilities creates a non-linear curve.
In addition to site characteristics, the conditions under which a survey is conducted can also influence whether a species is detected. Factors such as weather, time of day, or observer differences can affect how likely it is that an observer detects a species during a survey.
Unlike site covariates, survey covariate can only influence detection probability, not whether a species occupies a site.
In this example, we examine whether survey effort influences the detection probability of Melanerpes carolinus. Survey effort here is defined as the total number of minutes spent conducting bird point-count surveys at a site within a year. Although each survey lasts six minutes, the number of surveys conducted can vary among site–year combinations due to differences in sampling design (e.g., large vs. small sites) and the number of plots sampled at each site. Greater survey effort could provide more opportunities to detect a species if it is present.
NEON bird surveys follow a standardized protocol in which each
point-count survey lasts 6 minutes. To estimate survey effort for each
site–year combination, we count the number of point-count surveys
conducted and multiply by six minutes. Survey covariates supplied to
createPao() must be provided in the same order as the rows
and columns of detection_history, so we also sort by siteID
and year to match.
survey_effort <- brd_perpoint_clean %>% # we start with brd_perpoint_clean since we want all point-count surveys conducted, not just those that detected birds
distinct(pointSurveyID, siteID, year) %>%
group_by(siteID, year) %>%
summarize(
survey_effort = n() * 6, # the number of surveys multiplied by 6
.groups = "drop"
) %>%
pivot_wider(
names_from = year,
values_from = survey_effort,
values_fill = 0 # We can fill in NAs as 0s since no time was spend conducting surveys for incomplete surveys
) %>%
tibble::column_to_rownames("siteID") %>% # sort siteIDs alphabetically
select(sort(names(.))) # sort years
head(survey_effort, n=10)
## 2017 2018 2019 2020 2021 2022 2023 2024
## ABBY 258 156 240 0 228 228 228 204
## BARR 432 426 426 0 378 318 378 378
## BART 474 474 468 474 480 474 480 480
## BLAN 144 132 120 132 132 132 132 132
## BONA 810 522 540 534 540 540 534 540
## CLBJ 486 486 486 480 486 486 486 486
## CPER 810 540 540 540 540 540 540 540
## DCFS 240 240 240 240 240 240 0 240
## DEJU 540 540 540 540 540 540 540 540
## DELA 216 180 180 168 168 168 168 168
createPao() expects survey covariates in a single column
vector rather than a matrix, so we flatten the survey_effort matrix into
a vector.
# flatten the data frame
surveycov <- data.frame(
survey_effort = unlist(survey_effort)
#it is possible to add other survey covariates here
)
# remove the rownames
row.names(surveycov) <- NULL
head(surveycov)
## survey_effort
## 1 258
## 2 432
## 3 474
## 4 144
## 5 810
## 6 486
Next we create a new pao object that includes both the
site covariate (elevation) from before (since adding elevation provides
a better model fit) and the survey covariate (survey effort).
birds_pao_surveycov <- createPao(
data = detection_history[, -1], # detection/nondetection data (survey columns only) from detection_history
unitnames = detection_history$siteID, # site IDs from detection_history
unitcov = site_cov, # add the site covariates
survcov = surveycov # add the survey covariates
)
We now fit a model where:
birds_occupancy_surveycov <- occMod(
data = birds_pao_surveycov,
model = list(psi ~ elevation, p ~ survey_effort),
type = 'so'
)
summary(birds_occupancy_surveycov)
## Model name=psi(elevation)p(survey_effort)
## AIC=193.1907
## -2*log-likelihood=185.1907
## num. par=4
## Warning: Numerical convergence may not have been reached. Parameter estimates converged to approximately 7 signifcant digits.
We can evaluate the survey covariate model using the same approaches used above.
We can again compare models using AIC. Here we compare the simple original model, the site covariate elevation model, and the survey covariate survey_effort model.
mod.list <- list(
birds_occupancy_simple, # the model with formula psi ~ 1, p ~ 1
birds_occupancy_sitecov, # the model with formula psi ~ elevation, p ~ 1
birds_occupancy_surveycov # the model with formula psi ~ elevation, p ~ survey_effort
)
aictable <- createAicTable(mod.list = mod.list)
aictable$table
## Model AIC neg2ll npar warn.conv warn.VC
## 2 psi(elevation)p(1) 191.7794 185.7794 3 7 0
## 3 psi(elevation)p(survey_effort) 193.1907 185.1907 4 7 0
## 1 psi(1)p(1) 207.9761 203.9761 2 7 0
## DAIC modlike wgt
## 2 0.0000 1.0000 0.6693
## 3 1.4113 0.4938 0.3305
## 1 16.1967 0.0003 0.0002
Looking at the three models, the model including elevation only has the lowest AIC value (191.7794), followed by the model including elevation and survey effort (193.1907), and finally the simple model with no covariates (207.9761). This suggests that adding survey effort as a detection covariate does not improve the model relative to the elevation-only model, and the additional parameter does not provide enough improvement in model fit to justify the more complex model.
We can also calculate a p-value using a likelihood ratio test to
quantify whether adding survey effort improves the model relative to the
elevation-only model. Remember, we can do this because the
psi ~ elevation, p ~ 1 model is a simpler version of the
psi ~ elevation, p ~ survey_effort model.
#get the difference in the neg2loglike between the two models
diff <- abs(birds_occupancy_sitecov$neg2loglike - birds_occupancy_surveycov$neg2loglike)
#look up our observed difference on a chi-square distribution with 1 degree of freedom (the difference between the number of parameters)
pchisq(diff,
df = abs(birds_occupancy_sitecov$npar - birds_occupancy_surveycov$npar),
lower.tail = FALSE)
## [1] 0.4429223
The resulting p-value tests the null hypothesis that including survey effort does not improve the model fit. In this case, the p-value (0.4429223) is relatively large, meaning we do not have evidence that adding survey effort improves the model.
Lets take a look at the coeffients associated with our model.
coef(birds_occupancy_surveycov, 'p', prob = 0.95)
## Beta_est se lowerCI upperCI
## p1_B1.Int 2.279714 0.332403 1.628216092 2.9312119084
## p1_B1.Int.p.survey_effort -0.001102 0.000728 -0.002528854 0.0003248538
The coefficient for survey effort is slightly negative (-0.001102), meaning the model estimates that detection probability decreases slightly as survey effort increases. However, the 95% confidence interval for this estimate (-0.00253 to 0.00032) includes 0. This means the data are consistent with either a small negative effect, no effect, or even a very small positive effect. In other words, the coefficient estimates do not provide strong evidence that survey effort meaningfully influences detection probability either.
Finally, we can visualize how predicted detection probability changes across the range of survey effort values, just to see what it looks like, even though we are fairly confident that survey_effort does not have any real effect on detection for Melanerpes carolinus.
data <- cbind(surveycov, birds_occupancy_surveycov$real$p) %>%
distinct(.keep_all = TRUE) %>%
arrange(survey_effort)
ggplot(data, aes(x = survey_effort, y = est)) +
geom_point(size = 2, alpha = 0.8) +
labs(
x = "Survey Effort",
y = "Predicted Detection Probability"
) +
scale_y_continuous(limits = c(0, 1)) +
theme_bw()
Biologically, the fact that survey_effort does not have
any real effect on detection likely has to do with Melanerpes
carolinus’s high detection rate (80%-90%). Since the species is
already highly detectable, observers are likely to detect it quickly
when it is present, so increasing survey effort does not meaningfully
increase detection probability.
In this tutorial, we demonstrated how NEON breeding landbird data can be used to build detection histories and fit a single-species, single-season occupancy model. We estimated occupancy and detection probabilities, incorporated both site-level and survey-level covariates and examined ways to test model performance.
While this tutorial focused on a relatively simple, but powerful, model, occupancy modeling can be extended in several ways. For example, multi-season occupancy models allow occupancy probability to change over time (we assumed it was constant here), enabling researchers to estimate local colonization and extinction rates and their predictors for a species. Multi-species occupancy models can also be used to analyze patterns across entire communities at a location, allowing researchers to examine how species traits influence occupancy and detection probabilities.