R: Monte Carlo Goodness-of-fit for SECR Models

MCgof {secr}

R Documentation

Monte Carlo Goodness-of-fit for SECR Models

Description

MCgof implements and extends the Monte Carlo resampling method of Choo et al. (2024) to emulate Bayesian posterior predictive checks (Gelman et al. 1996, Royle et al. 2014). Initial results suggest the approach is more informative than the deviance-based test proposed by Borchers and Efford (2008) and implemented in secr.test. However, the tests have limited power.

MCgof is under development. The structure of the output may change and bugs may be found. See Warning below for exclusions.

Usage


## S3 method for class 'secr'
MCgof(object, nsim = 100, statfn = NULL, testfn = NULL, seed = NULL, 
    ncores = 1, clustertype = c("PSOCK", "FORK"), usefxi = TRUE, 
    useMVN = TRUE, Ndist = NULL, quiet = FALSE, debug = FALSE, ...)

## S3 method for class 'secrlist'
MCgof(object, nsim = 100, statfn = NULL, testfn = NULL, seed = NULL, 
    ncores = 1, clustertype = c("PSOCK", "FORK"), usefxi = TRUE, 
    useMVN = TRUE, Ndist = NULL, quiet = FALSE, debug = FALSE, ...)

Arguments

`object`	secr fitted model or `secrlist` object
`nsim`	integer number of replicates
`statfn`	function to extract summary statistics from capture histories
`testfn`	function to compare observed and expected counts
`seed`	integer seed
`ncores`	integer for number of parallel cores
`clustertype`	character cluster type for parallel::makeCluster
`usefxi`	logical; if FALSE then AC are simulated de novo from the density process rather than using information on the detected individuals
`useMVN`	logical; if FALSE parameter values are fixed at the MLE rather than drawn from multivariate normal distribution
`Ndist`	character; distribution of number of unobserved AC (optional)
`quiet`	logical; if FALSE then a progress bar (ncores=1) and final timing are shown
`debug`	integer; if >0 then the browser is started at one of 4 points in code
`...`	other arguments passed to testfn

Details

At each replicate parameter values are sampled from the multivariate-normal sampling distribution of the fitted model. The putative location of each detected individual is drawn from the spatial distribution implied by its observations and the resampled parameters (see fxi); locations of undetected individuals are simulated from the complement of pdot(x) times D(x).

New detections are simulated under the model for individuals at the simulated locations, along with the expected numbers. Detections form a capthist object, a 3-D array with dimensions for individual $i$ , occasion $j$ and detector $k$ *. Thus for each replicate and detected individual there are the original observations $y_{ijk}$ , simulated observations $Y_{ijk}$ , and expected counts $\mathrm E (y_{ijk})$ . Two discrepancy statistics are calculated for each replicate – observed vs expected counts, and simulated vs expected counts – and a record is kept of which of these discrepancy statistics is the larger (indicating poorer fit).

* Notation differs slightly from Choo et al. (2024), using $j$ for occasion and $k$ for detector to be consistent with usage in secr and elsewhere (e.g., Borchers and Fewster 2016).

The default discrepancy (testfn) is the Freeman-Tukey statistic as in Choo et al. (2024) and Royle et al. (2014) (see also Brooks, Catchpole and Morgan 2000). The statistic has this general form for $M$ counts $y_m$ with expected value $\mathrm E(y_m)$ :

$T = \sum_{m=1}^{m=M} \left(\sqrt {y_m} - \sqrt{E(y_m)}\right)^2.$

The key output of MCgof is the proportion of replicates in which the simulated discrepancy exceeds the observed discrepancy. For perfect fit this will be about 0.5, and for poor fit it will approach zero.

By default, tests are performed separately for three types of count: the numbers of detections of each individual (yi), at each detector (yk), and for each individual at each detector (yik) extracted by the default statfn from the margins of the observed and simulated capture histories.

$y_{ik} = \sum_j y_{ijk}$		individual x detector
$y_{i} = \sum_j \sum_k y_{ijk}$		individual
$y_{k} = \sum_i \sum_j y_{ijk}$		detector

Parallel processing is offered using multiple cores (CPUs) through the package parallel when ncores > 1. This differs from the usual multithreading paradigm in secr and does not rely on the environment variable set by setNumThreads except that, if ncores = NULL, ncores will be set to the value from setNumThreads. The cluster type "FORK" is available only on Unix-like systems; it can require large amounts of memory, but is generally fast. A small value of ncores>1 may be optimal, especially with cluster type "PSOCK".

‘usefxi’ and ‘useMVN’ may be used to drop key elements of the Choo et al. (2024) approach - they are provided for demonstration only.

‘Ndist’ refers to the distribution of the number of unobserved AC, conditional on the expected number $q = D^*A - n$ where $D^*$ is the resampled density, $A$ the mask area, and $n$ the number of detected individuals. By default ‘Ndist’ depends on the distribution component of the ‘details’ argument of the fitted model (“poisson" for Poisson $n$ , “fixed"" for binomial $n$ ).

‘debug’ may be used to view intermediate data at certain points in MCgof() numbered 1 to 5. Examine the code of secr:::MCgof.secr or secr:::simfxiAC for these points. Debugging requires ‘ncores = 1’.

The RNGkind of the random number generator is set internally for consistency across platforms.

The ... argument may be used to pass 'np' and 'verbose' to ‘Fletcher.chat' used as ’testfn'.

Value

Invisibly returns an object of class ‘MCgof’ with components -

`nsim`	as input
`statfn`	as input or default
`testfn`	as input or default
`all`	list of outputs: for each statistic, a 3 x nsim matrix. Rows correspond to Tobs, Tsim, and a binary indicator for Tsim > Tobs
`proctime`	execution time in seconds

For secrlist input the value returned is a list of ‘MCgof’ objects.

Warning

Not all models are covered and some are untested. These models are specifically excluded -

multi-session models
models with groups
conditional likelihood
polygon, transect, telemetry or signal detectors
non-binary behavioural responses

Notes

This implementation extends the work of Choo et al. (2024) in these respects -

detector types ‘multi’ and ‘count’ are allowed
the model may include variation among detectors
the model may include behavioural responses
2-class finite mixture and hybrid mixture models are both allowed.

Author(s)

Murray Efford and Yan Ru Choo

References

Borchers, D. L. and Efford, M. G. (2008) Spatially explicit maximum likelihood methods for capture–recapture studies. Biometrics 64, 377–385.

Borchers, D. L. and Fewster, R. M. (2016) Spatial capture–recapture models. Statistical Science 31, 219–232.

Brooks, S. P., Catchpole, E. A. and Morgan, B. J. T. (2000) Bayesian animal survival estimation. Statistical Science 15, 357–376.

Choo, Y. R., Sutherland, C. and Johnston, A. (2024) A Monte Carlo resampling framework for implementing goodness-of-fit tests in spatial capture-recapture model Methods in Ecology and Evolution DOI: 10.1111/2041-210X.14386.

Gelman, A., Meng, X.-L., and Stern, H. (1996) Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 6, 733–807.

Royle, J. A., Chandler, R. B., Sollmann, R. and Gardner, B. (2014) Spatial capture–recapture. Academic Press.

Examples



tmp <- MCgof(secrdemo.0)
summary(tmp)
par(mfrow = c(1,3), pty = 's')
plot(tmp)

[Package secr version 5.2.1 Index]