Title: | Evaluating Geographic Sampling Bias in Biological Collections |
---|---|
Description: | Evaluating the biasing impact of geographic features such as airports, cities, roads, rivers in datasets of coordinates based biological collection datasets, by Bayesian estimation of the parameters of a Poisson process. Enables also spatial visualization of sampling bias and includes a set of convenience functions for publication level plotting. Also available as shiny app. |
Authors: | Alexander Zizka [aut, cre], Daniele Silvestro [aut], Bruno Vilela [ctb] (Bruno updated the code to use new spatial packages) |
Maintainer: | Alexander Zizka <[email protected]> |
License: | GPL-3 |
Version: | 2.0.0 |
Built: | 2025-03-06 03:18:55 UTC |
Source: | https://github.com/azizka/sampbias |
An example of the format needed to provide custom areas for calculate_bias
area_example
area_example
An object of class sf
(inherits from data.frame
) with 2 rows and 1 columns.
data(area_example)
data(area_example)
The outline of Borneo, as example data for the user-defined study area option
of calculate_bias
. From https://www.naturalearthdata.com.
borneo
borneo
An object of class sf
(inherits from data.frame
) with 1 rows and 3 columns.
data(borneo)
data(borneo)
The major function of the package, calculating the bias effect of sampling bias due to geographic structures, such as the vicinity to cities, airports, rivers and roads. Results are projected to space, and can be compared numerically.
calculate_bias( x, gaz = NULL, res = 1, buffer = NULL, restrict_sample = NULL, terrestrial = TRUE, inp_raster = NULL, mcmc_rescale_distances = 1000, mcmc_iterations = 1e+05, mcmc_burnin = 20000, mcmc_outfile = NULL, prior_q = c(1, 0.01), prior_w = c(1, 1), plot_raster = FALSE, verbose = FALSE, run_null_model = FALSE, use_hyperprior = TRUE )
calculate_bias( x, gaz = NULL, res = 1, buffer = NULL, restrict_sample = NULL, terrestrial = TRUE, inp_raster = NULL, mcmc_rescale_distances = 1000, mcmc_iterations = 1e+05, mcmc_burnin = 20000, mcmc_outfile = NULL, prior_q = c(1, 0.01), prior_w = c(1, 1), plot_raster = FALSE, verbose = FALSE, run_null_model = FALSE, use_hyperprior = TRUE )
x |
an object of the class |
gaz |
a list of geographic gazetteers as |
res |
numerical. The raster resolution for the distance calculation to
the geographic features and the data visualization, in decimal degrees. The
default is to one degree, but higher resolution will be desirable for most
analyses. |
buffer |
numerical. The size of the geographic buffer around the extent
of |
restrict_sample |
a |
terrestrial |
logical. If TRUE, the empirical distribution (and the
output maps) are restricted to terrestrial areas. Uses the
|
inp_raster |
an object of class |
mcmc_rescale_distances |
numerical. rescaling factor for the distance calculation |
mcmc_iterations |
numerical. the number of iterations for the MCMC, by default 100,000 |
mcmc_burnin |
numerical. the burn-in for the MCMC, default is to 20,000 |
mcmc_outfile |
character string. the path on where to write the results of the MCMC, optional. |
prior_q |
the gamma prior for the sampling rate $q$, which represents the expected number of occurrences per cell in the absence of biases. In the format c(shape,rate). |
prior_w |
the gamma prior for the steepness of the Poisson rate decline, such that w approximating 0 results in a null model of uniform sampling rate q across cells. In the format c(shape,rate). |
plot_raster |
logical. If TRUE, a plot of the occurrence raster is shown for diagnostic purposes. Default = FALSE |
verbose |
logical. If TRUE, progress is reported. Default = FALSE. |
run_null_model |
logical. Run a null model with bias weights set to zero. |
use_hyperprior |
logical. If TRUE a hyperprior on the bias weights is used for regularization to avoid over-parametrization. |
The default gazetteers delivered with the package are simplified from http://www.naturalearthdata.com/downloads/. They include major features, and for small scale analyses custom gazetteers should be used.
For computational convenience, the gazetteers are cropped to the extent of the point occurrence data sets. To account for the fact, that, relevant structures might lay directly outside this extent, but still influencing the distribution of samples in the study area, the buffer option, gives the area, around the extent that should be included in the distance calculation.
Visit https://github.com/azizka/sampbias/wiki for more information on distance calculation and the algorithm behind sampbias.
An object of the S3-class ‘sampbias’, which is a list including the following objects:
summa |
A list of summary statistics
for the sampbias analyses, including the total number of occurrence points
in |
occurrences |
a |
species |
a |
biasmaps |
a list of |
biastable |
a |
Check https://github.com/azizka/sampbias/wiki for a tutorial on sampbias.
summary.sampbias
is.sampbias
plot.sampbias
## Not run: #simulate data x <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 20), decimalLongitude = runif(n = 100, min = 0, max = 20), decimalLatitude = runif(n = 100, min = -4, max = 4)) out <- calculate_bias(x, terrestrial = TRUE, buffer = 0) summary(out) plot(out) ## End(Not run)
## Not run: #simulate data x <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 20), decimalLongitude = runif(n = 100, min = 0, max = 20), decimalLatitude = runif(n = 100, min = -4, max = 4)) out <- calculate_bias(x, terrestrial = TRUE, buffer = 0) summary(out) plot(out) ## End(Not run)
Creates a list of distances rasters based on a list of geographic gazetteers, as SpatVector objects, and a template SpatRaster, indicating the desired extent and resolution.
dis_rast(gaz, ras, buffer = NULL)
dis_rast(gaz, ras, buffer = NULL)
gaz |
an object of the class |
ras |
an object of the class |
buffer |
numerical. The size of the geographic buffer around the
extent of |
a list
of SpatRaster
objects of the same length as
gaz
. The values in each raster correspond to the planar geographic
distance to the next feature in gaz
, given the resolution of
ras
Check https://github.com/azizka/sampbias/wiki for a tutorial on sampbias.
#create raster for resolution and extent ras <- terra::rast(terra::ext(-5,5,-4,4), res = 1) #create point gazeteer pts <- data.frame(long = runif(n = 5, min = -5, max = 5), lat = runif(n = 5, min = -4, max = 4), dat = rep("A", 5)) pts <- terra::vect(pts, geom = c("long", "lat")) lin <- as.matrix(data.frame(long = seq(-5, 5, by = 1), lat = rep(2, times = 11))) lin <- terra::vect(lin, type = "line") gaz <- list(point.structure = pts, lines.strucutre = lin) out <- dis_rast(gaz, ras) ## Not run: plot(out[[1]])
#create raster for resolution and extent ras <- terra::rast(terra::ext(-5,5,-4,4), res = 1) #create point gazeteer pts <- data.frame(long = runif(n = 5, min = -5, max = 5), lat = runif(n = 5, min = -4, max = 4), dat = rep("A", 5)) pts <- terra::vect(pts, geom = c("long", "lat")) lin <- as.matrix(data.frame(long = seq(-5, 5, by = 1), lat = rep(2, times = 11))) lin <- terra::vect(lin, type = "line") gaz <- list(point.structure = pts, lines.strucutre = lin) out <- dis_rast(gaz, ras) ## Not run: plot(out[[1]])
An example for an global equal area raster (in Behrmann projection) for the
format needed for a custom grid provided to calculate_bias
.
ea_raster
ea_raster
An object of class PackedSpatRaster
of length 1.
data(ea_raster) ea_raster <- terra::unwrap(ea_raster)
data(ea_raster) ea_raster <- terra::unwrap(ea_raster)
An example of the format needed to provide custom areas for calculate_bias
based on a publicly available set of global ecoregions.
ecoregion_example
ecoregion_example
An object of class sf
(inherits from data.frame
) with 7 rows and 22 columns.
https://www.worldwildlife.org/publications/terrestrial-ecoregions-of-the-world
data(ecoregion_example)
data(ecoregion_example)
Check class of sampbias
objects.
## S3 method for class 'sampbias' is(object, class2 = "sampbias")
## S3 method for class 'sampbias' is(object, class2 = "sampbias")
object |
an object of the class |
class2 |
the names of the class to which is relations are to be examined defined, or (more efficiently) the class definition objects for the classes. |
With two arguments, tests whether object can be treated as from class2. With one argument, returns all the super-classes of this object's class.
## Not run: #simulate data occ <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 10), decimalLongitude = runif(n = 50, min = 12, max = 20), decimalLatitude = runif(n = 50, min = -4, max = 4)) out <- calculate_bias(x = occ, terrestrial = TRUE) is(out) ## End(Not run)
## Not run: #simulate data occ <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 10), decimalLongitude = runif(n = 50, min = 12, max = 20), decimalLatitude = runif(n = 50, min = -4, max = 4)) out <- calculate_bias(x = occ, terrestrial = TRUE) is(out) ## End(Not run)
A plotting function to visualize the effect of accessibility bias caused by different biasing factors in space.
map_bias(x, gaz = NULL, sealine = TRUE, type = "sampling_rate")
map_bias(x, gaz = NULL, sealine = TRUE, type = "sampling_rate")
x |
a raster stack as generate by |
gaz |
a list of SpatialObjects, to be printed on the maps. Should be
the same objects provided to |
sealine |
logical. Should the coastline be added to the plots? Default is to TRUE. |
type |
character vector. One of c("sampling_rate", "log_sampling_rate", "diff_to_max"). If "sampling_rate".
the plot shows the raw projected sampling rate depending on the
biasing factors, if "log_sampling_rate", the plot shows the log10 transformed sampling rate, and if
"diff_to_max", the relative deviation of sampling rate from the maximum
rate as calculated using |
A series of R plots based on ggplot2.
## Not run: #simulate data occ <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 10), decimalLongitude = runif(n = 50, min = 12, max = 20), decimalLatitude = runif(n = 50, min = -4, max = 4)) out <- calculate_bias(x = occ, terrestrial = TRUE) proj <- project_bias(out) map_bias(proj) ## End(Not run)
## Not run: #simulate data occ <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 10), decimalLongitude = runif(n = 50, min = 12, max = 20), decimalLatitude = runif(n = 50, min = -4, max = 4)) out <- calculate_bias(x = occ, terrestrial = TRUE) proj <- project_bias(out) map_bias(proj) ## End(Not run)
Plotting method for class sampbias
, generating a box-whiskers-plot
showing the bias weights for all biasing factors
indicating the effect strength for each gazetteer provided to
calculate_bias
.
## S3 method for class 'sampbias' plot(x, ...)
## S3 method for class 'sampbias' plot(x, ...)
x |
an object of the class |
... |
Additional arguments passed to summary. |
A plot
calculate_bias
, summary.sampbias
## Not run: #simulate data occ <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 10), decimalLongitude = runif(n = 50, min = 12, max = 20), decimalLatitude = runif(n = 50, min = -4, max = 4)) out <- calculate_bias(x = occ, terrestrial = TRUE) summary(out) plot(out) ## End(Not run)
## Not run: #simulate data occ <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 10), decimalLongitude = runif(n = 50, min = 12, max = 20), decimalLatitude = runif(n = 50, min = -4, max = 4)) out <- calculate_bias(x = occ, terrestrial = TRUE) summary(out) plot(out) ## End(Not run)
Uses the the estimated bias weights from a sampbias
object to project
the bias through space, using the same raster as used for the distance
calculation.#'
project_bias(x, factors = NULL)
project_bias(x, factors = NULL)
x |
an object of the class |
factors |
a character vector indicating which biasing factors to project |
A raster stack, with the same length as the number of biasing factors
used in calculate_bias
. The names indicate the factors
included for each layer.
calculate_bias
, summary.sampbias
## Not run: #simulate data occ <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 10), decimalLongitude = runif(n = 50, min = 12, max = 20), decimalLatitude = runif(n = 50, min = -4, max = 4)) out <- calculate_bias(x = occ, terrestrial = TRUE) proj <- project_bias(out) ## End(Not run)
## Not run: #simulate data occ <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 10), decimalLongitude = runif(n = 50, min = 12, max = 20), decimalLatitude = runif(n = 50, min = -4, max = 4)) out <- calculate_bias(x = occ, terrestrial = TRUE) proj <- project_bias(out) ## End(Not run)
Summary method for objects of the class sampbias
.
## S3 method for class 'sampbias' summary(object, ...)
## S3 method for class 'sampbias' summary(object, ...)
object |
An object of the class |
... |
Additional arguments passed to summary. |
Summary printed to screen.
calculate_bias
is.sampbias
plot.sampbias
## Not run: #simulate data occ <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 10), decimalLongitude = runif(n = 50, min = 12, max = 20), decimalLatitude = runif(n = 50, min = -4, max = 4)) out <- calculate_bias(x = occ, terrestrial = TRUE) summary(out) ## End(Not run)
## Not run: #simulate data occ <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 10), decimalLongitude = runif(n = 50, min = 12, max = 20), decimalLatitude = runif(n = 50, min = -4, max = 4)) out <- calculate_bias(x = occ, terrestrial = TRUE) summary(out) ## End(Not run)