Title: | Prepare and Explore Data for Palaeobiological Analyses |
---|---|
Description: | Provides functionality to support data preparation and exploration for palaeobiological analyses, improving code reproducibility and accessibility. The wider aim of 'palaeoverse' is to bring the palaeobiological community together to establish agreed standards. The package currently includes functionality for data cleaning, binning (time and space), exploration, summarisation and visualisation. Reference datasets (i.e. Geological Time Scales <https://stratigraphy.org/chart>) and auxiliary functions are also provided. Details can be found in: Jones et al., (2023) <doi: 10.1111/2041-210X.14099>. |
Authors: | Lewis A. Jones [aut, cre] , William Gearty [aut] , Bethany J. Allen [aut] , Kilian Eichenseer [aut] , Christopher D. Dean [aut] , Sofia Galvan [ctb] , Miranta Kouvari [ctb] , Pedro L. Godoy [ctb] , Cecily Nicholl [ctb] , Lucas Buffan [ctb] , Erin M. Dillon [ctb] , Joseph T. Flannery-Sutherland [aut] , A. Alessandro Chiarenza [ctb] |
Maintainer: | Lewis A. Jones <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.4.0 |
Built: | 2024-11-22 05:54:39 UTC |
Source: | https://github.com/palaeoverse/palaeoverse |
axis_geo
behaves similarly to axis
in that it
adds an axis to the specified side of a base R plot. The main difference is
that it also adds a geological timescale between the plot and the axis. The
default scale includes international epochs from the the Geological Timescale
2020 (GTS2020
). However, international stages, periods, eras,
and eons are also available. Interval data hosted by
Macrostrat are also available (see
time_bins
). A custom interval dataset can also be used (see
Details below). The appearance of the axis is highly customisable (see Usage
below), with the intent that plots will be publication-ready.
axis_geo( side = 1, intervals = "epoch", height = 0.05, fill = NULL, lab = TRUE, lab_col = NULL, lab_size = 1, rot = 0, abbr = TRUE, center_end_labels = TRUE, skip = c("Quaternary", "Holocene", "Late Pleistocene"), bord_col = "black", lty = par("lty"), lwd = par("lwd"), bkgd = "grey90", neg = FALSE, exact = FALSE, round = FALSE, tick_at = NULL, tick_labels = TRUE, phylo = FALSE, root.time = NULL, ... ) axis_geo_phylo(...)
axis_geo( side = 1, intervals = "epoch", height = 0.05, fill = NULL, lab = TRUE, lab_col = NULL, lab_size = 1, rot = 0, abbr = TRUE, center_end_labels = TRUE, skip = c("Quaternary", "Holocene", "Late Pleistocene"), bord_col = "black", lty = par("lty"), lwd = par("lwd"), bkgd = "grey90", neg = FALSE, exact = FALSE, round = FALSE, tick_at = NULL, tick_labels = TRUE, phylo = FALSE, root.time = NULL, ... ) axis_geo_phylo(...)
side |
|
intervals |
The interval information to use to plot the axis: either A)
a |
height |
|
fill |
|
lab |
|
lab_col |
|
lab_size |
|
rot |
|
abbr |
|
center_end_labels |
|
skip |
A |
bord_col |
|
lty |
|
lwd |
|
bkgd |
|
neg |
|
exact |
|
round |
|
tick_at |
A |
tick_labels |
Either a) a |
phylo |
|
root.time |
|
... |
Further arguments that are passed directly to
|
If a custom data.frame
is provided (with intervals
), it should
consist of at least 3 columns of data. See GTS2020
for an
example.
The interval_name
column (name
is also allowed) lists
the names of each time interval. These will be used as labels if no
abbreviations are provided.
The max_ma
column (max_age
is also allowed) lists the
oldest boundary of each time interval. Values should always be
positive.
The min_ma
column (min_age
is also allowed) lists the
youngest boundary of each time interval. Values should always be
positive.
The abbr
column is optional and lists abbreviations that may
be used as labels.
The colour
column (color
is also allowed) is also
optional and lists a colour for the background for each time interval
(see the Color Specification section
here
).
The font
(lab_color
is also allowed) column is
also optional and lists a colour for the label for each time interval
(see the Color Specification section
here
).
intervals
may also be a list if multiple time scales should be added
to a single side of the plot. In this case, height
, fill
,
lab
, lab_col
, lab_size
, rot
, abbr
,
center_end_labels
, skip
, bord_col
, lty
, and
lwd
can also be lists. If these lists are not as long as
intervals
, the elements will be recycled. If individual values
(or vectors, e.g. for skip
) are used for these parameters, they will
be applied to all time scales (and recycled as necessary). If multiple scales
are requested they will be added sequentially outwards starting from the plot
border. The axis will always be placed on the outside of the last scale.
If you would like to use intervals from the Geological Time Scale 2012
(GTS2012
), you can use time_bins
and supply the
returned data.frame
to the intervals
argument.
axis_geo_phylo(...)
is shorthand for
axis_geo(..., phylo = TRUE)
.
No return value. Function is used for its side effect, which is to add an axis of the geological timescale to an already existing plot.
William Gearty & Kilian Eichenseer
Lewis A. Jones
# track user par oldpar <- par(no.readonly = TRUE) # single scale on bottom par(mar = c(6.1, 4.1, 4.1, 2.1)) # modify margin plot(0:100, axes = FALSE, xlim = c(100, 0), ylim = c(100, 0), xlab = NA, ylab = "Depth (m)") box() axis(2) axis_geo(side = 1, intervals = "period") # the line argument here depends on the absolute size of the plot title(xlab = "Time (Ma)", line = 4) # stack multiple scales, abbreviate only one set of labels par(mar = c(7.1, 4.1, 4.1, 2.1)) # further expand bottom margin plot(0:100, axes = FALSE, xlim = c(100, 0), ylim = c(100, 0), xlab = NA, ylab = "Depth (m)") box() axis(2) axis_geo(side = 1, intervals = list("epoch", "period"), abbr = list(TRUE, FALSE)) # the line argument here depends on the absolute size of the plot title(xlab = "Time (Ma)", line = 6) # scale with MacroStrat intervals par(mar = c(6.1, 4.1, 4.1, 2.1)) # modify margin plot(0:30, axes = FALSE, xlim = c(30, 0), ylim = c(30, 0), xlab = NA, ylab = "Depth (m)") box() axis(2) axis_geo(side = 1, intervals = "North American land mammal ages") # the line argument here depends on the absolute size of the plot title(xlab = "Time (Ma)", line = 4) # scale with custom intervals intervals <- data.frame(min_ma = c(0, 10, 25, 32), max_ma = c(10, 25, 32, 40), interval_name = c("A", "B", "C", "D")) par(mar = c(6.1, 4.1, 4.1, 2.1)) # modify margin plot(0:40, axes = FALSE, xlim = c(40, 0), ylim = c(40, 0), xlab = NA, ylab = "Depth (m)") box() axis(2) axis_geo(side = 1, intervals = intervals) # the line argument here depends on the absolute size of the plot title(xlab = "Time (Ma)", line = 4) # scale with phylogeny library(phytools) data(mammal.tree) plot(mammal.tree) axis_geo_phylo() title(xlab = "Time (Ma)", line = 4) # scale with fossil phylogeny library(paleotree) data(RaiaCopesRule) plot(ceratopsianTreeRaia) axis_geo_phylo() title(xlab = "Time (Ma)", line = 4) # reset user par par(oldpar)
# track user par oldpar <- par(no.readonly = TRUE) # single scale on bottom par(mar = c(6.1, 4.1, 4.1, 2.1)) # modify margin plot(0:100, axes = FALSE, xlim = c(100, 0), ylim = c(100, 0), xlab = NA, ylab = "Depth (m)") box() axis(2) axis_geo(side = 1, intervals = "period") # the line argument here depends on the absolute size of the plot title(xlab = "Time (Ma)", line = 4) # stack multiple scales, abbreviate only one set of labels par(mar = c(7.1, 4.1, 4.1, 2.1)) # further expand bottom margin plot(0:100, axes = FALSE, xlim = c(100, 0), ylim = c(100, 0), xlab = NA, ylab = "Depth (m)") box() axis(2) axis_geo(side = 1, intervals = list("epoch", "period"), abbr = list(TRUE, FALSE)) # the line argument here depends on the absolute size of the plot title(xlab = "Time (Ma)", line = 6) # scale with MacroStrat intervals par(mar = c(6.1, 4.1, 4.1, 2.1)) # modify margin plot(0:30, axes = FALSE, xlim = c(30, 0), ylim = c(30, 0), xlab = NA, ylab = "Depth (m)") box() axis(2) axis_geo(side = 1, intervals = "North American land mammal ages") # the line argument here depends on the absolute size of the plot title(xlab = "Time (Ma)", line = 4) # scale with custom intervals intervals <- data.frame(min_ma = c(0, 10, 25, 32), max_ma = c(10, 25, 32, 40), interval_name = c("A", "B", "C", "D")) par(mar = c(6.1, 4.1, 4.1, 2.1)) # modify margin plot(0:40, axes = FALSE, xlim = c(40, 0), ylim = c(40, 0), xlab = NA, ylab = "Depth (m)") box() axis(2) axis_geo(side = 1, intervals = intervals) # the line argument here depends on the absolute size of the plot title(xlab = "Time (Ma)", line = 4) # scale with phylogeny library(phytools) data(mammal.tree) plot(mammal.tree) axis_geo_phylo() title(xlab = "Time (Ma)", line = 4) # scale with fossil phylogeny library(paleotree) data(RaiaCopesRule) plot(ceratopsianTreeRaia) axis_geo_phylo() title(xlab = "Time (Ma)", line = 4) # reset user par par(oldpar)
A function to assign fossil occurrences to user-specified latitudinal bins.
bin_lat(occdf, bins, lat = "lat", boundary = FALSE)
bin_lat(occdf, bins, lat = "lat", boundary = FALSE)
occdf |
|
bins |
|
lat |
|
boundary |
|
A dataframe of the original input occdf
with appended
columns containing respective latitudinal bin information.
Lewis A. Jones
Sofia Galvan
# Load occurrence data occdf <- tetrapods # Generate latitudinal bins bins <- lat_bins_degrees(size = 10) # Bin data occdf <- bin_lat(occdf = occdf, bins = bins, lat = "lat")
# Load occurrence data occdf <- tetrapods # Generate latitudinal bins bins <- lat_bins_degrees(size = 10) # Bin data occdf <- bin_lat(occdf = occdf, bins = bins, lat = "lat")
A function to assign fossil occurrences (or localities) to spatial bins/samples using a hexagonal equal-area grid.
bin_space( occdf, lng = "lng", lat = "lat", spacing = 100, sub_grid = NULL, return = FALSE, plot = FALSE )
bin_space( occdf, lng = "lng", lat = "lat", spacing = 100, sub_grid = NULL, return = FALSE, plot = FALSE )
occdf |
|
lng |
|
lat |
|
spacing |
|
sub_grid |
|
return |
|
plot |
|
This function assigns fossil occurrence data into
equal-area grid cells using discrete hexagonal grids via the
h3jsr
package. This package relies on
Uber's H3 library, a geospatial indexing system
that partitions the world into hexagonal cells. In H3, 16 different
resolutions are available
(see here). In the
implementation of the bin_space()
function, the resolution is defined by
the user-input spacing
which represents the distance between the centroid
of adjacent cells. Using this distance, the function identifies which
resolution is most similar to the input spacing
, and uses this resolution.
Additional functionality allows the user to simultaneously assign occurrence
data to equal-area grid cells of a finer-scale grid (i.e. a ‘sub-grid’)
within the primary grid via the sub_grid
argument. This might be desirable
for users to evaluate the differences in the amount of area occupied by
occurrences within their primary grid cells. This functionality also allows
the user to easily rarefy across sub-grid cells within primary cells to
further standardise spatial sampling (see example for basic implementation).
Note: prior to implementation, coordinate reference system (CRS) for input data is defined as EPSG:4326 (World Geodetic System 1984). The user should transform their data accordingly if this is not appropriate. If you are unfamiliar with working with geographic data, we highly recommend checking out Geocomputation with R.
If the return
argument is set to FALSE
, a dataframe is
returned of the original input occdf
with cell information. If return
is
set to TRUE
, a list is returned with both the input occdf
and grid
information and polygons.
Lewis A. Jones
Bethany Allen & Kilian Eichenseer
# Get internal data data("reefs") # Reduce data for plotting occdf <- reefs[1:250, ] # Bin data using a hexagonal equal-area grid ex1 <- bin_space(occdf = occdf, spacing = 500, plot = TRUE) # Bin data using a hexagonal equal-area grid and sub-grid ex2 <- bin_space(occdf = occdf, spacing = 1000, sub_grid = 250, plot = TRUE) # EXAMPLE: rarefy # Load data occdf <- tetrapods[1:250, ] # Assign to spatial bin occdf <- bin_space(occdf = occdf, spacing = 1000, sub_grid = 250) # Get unique bins bins <- unique(occdf$cell_ID) # n reps n <- 10 # Rarefy data across sub-grid grid cells # Returns a list with each element a bin with respective mean genus richness df <- lapply(bins, function(x) { # subset occdf for respective grid cell tmp <- occdf[which(occdf$cell_ID == x), ] # Which sub-grid cells are there within this bin? sub_bin <- unique(tmp$cell_ID_sub) # Sample 1 sub-grid cell n times s <- sample(sub_bin, size = n, replace = TRUE) # Count the number of unique genera within each sub_grid cell for each rep counts <- sapply(s, function(i) { # Number of unique genera within each sample length(unique(tmp[which(tmp$cell_ID_sub == i), ]$genus)) }) # Mean richness across subsamples mean(counts) })
# Get internal data data("reefs") # Reduce data for plotting occdf <- reefs[1:250, ] # Bin data using a hexagonal equal-area grid ex1 <- bin_space(occdf = occdf, spacing = 500, plot = TRUE) # Bin data using a hexagonal equal-area grid and sub-grid ex2 <- bin_space(occdf = occdf, spacing = 1000, sub_grid = 250, plot = TRUE) # EXAMPLE: rarefy # Load data occdf <- tetrapods[1:250, ] # Assign to spatial bin occdf <- bin_space(occdf = occdf, spacing = 1000, sub_grid = 250) # Get unique bins bins <- unique(occdf$cell_ID) # n reps n <- 10 # Rarefy data across sub-grid grid cells # Returns a list with each element a bin with respective mean genus richness df <- lapply(bins, function(x) { # subset occdf for respective grid cell tmp <- occdf[which(occdf$cell_ID == x), ] # Which sub-grid cells are there within this bin? sub_bin <- unique(tmp$cell_ID_sub) # Sample 1 sub-grid cell n times s <- sample(sub_bin, size = n, replace = TRUE) # Count the number of unique genera within each sub_grid cell for each rep counts <- sapply(s, function(i) { # Number of unique genera within each sample length(unique(tmp[which(tmp$cell_ID_sub == i), ]$genus)) }) # Mean richness across subsamples mean(counts) })
A function to assign fossil occurrences to specified time bins based on different approaches commonly applied in palaeobiology.
bin_time( occdf, min_ma = "min_ma", max_ma = "max_ma", bins, method = "mid", reps = 100, fun = dunif, ... )
bin_time( occdf, min_ma = "min_ma", max_ma = "max_ma", bins, method = "mid", reps = 100, fun = dunif, ... )
occdf |
|
min_ma |
|
max_ma |
|
bins |
|
method |
|
reps |
|
fun |
|
... |
Additional arguments available in the called function ( |
Five approaches (methods) exist in the bin_time()
function for
assigning occurrences to time bins:
Midpoint: The "mid" method is the simplest approach and uses the midpoint of the fossil occurrence age range to bin the occurrence.
Majority: The "majority" method bins an occurrence into the bin which it
most overlaps with. As part of this implementation, the majority
percentage overlap of the occurrence is also calculated and returned as
an additional column in occdf
. If desired, these percentages can be
used to further filter an occurrence dataset.
All: The "all" method bins an occurrence into every bin its age range
covers. For occurrences with age ranges of more than one bin, the
occurrence row is duplicated. Each occurrence is assigned an ID in the
column occdf$id
so that duplicates can be tracked. Additionally,
occdf$n_bins
records the number of bins each occurrence appears within.
Random: The "random" method randomly samples X amount of bins (with
replacement) from the bins that the fossil occurrence age range covers
with equal probability regardless of bin length. The reps
argument
determines the number of times the sample process is repeated. All
replications are stored as individual elements within the returned list
with an appended bin_assignment
and bin_midpoint
column to the
original input occdf
. If desired, users can easily bind this list using
do.call(rbind, x)
.
Point: The "point" method randomly samples X (reps
) amount of point age
estimates from the age range of the fossil occurrence. Sampling follows a
user-input probability density function such
as dnorm (see example 5). Users should also provide any
additional arguments for the probability density function (see ...
).
However, x
(vector of quantiles) values should not be provided as these
values are input from the age range of each occurrence. These
values range between 0 and 1, and therefore function arguments should be
scaled to be within these bounds. The reps
argument determines the
number of times the sample process is repeated. All replications are
stored as individual elements within the returned list with an appended
bin_assignment
and point_estimates
column to the original input
occdf
. If desired, users can easily bind this list using
do.call(rbind, x)
.
For methods "mid", "majority" and "all", a dataframe
of the
original input occdf
with the following appended columns is returned:
occurrence id (id
), number of bins that the occurrence age range covers
(n_bins
), bin assignment (bin_assignment
), and bin midpoint
(bin_midpoint
). In the case of the "majority" method, an additional
column of the majority percentage overlap (overlap_percentage
) is also
appended. For the "random" and "point" method, a list
is returned
(of length reps) with each element a copy of the occdf
and appended
columns (random: bin_assignment
and bin_midpoint
; point:
bin_assignment
and point_estimates
).
Christopher D. Dean & Lewis A. Jones
William Gearty
#Grab internal tetrapod data occdf <- tetrapods[1:100, ] bins <- time_bins() #Assign via midpoint age of fossil occurrence data ex1 <- bin_time(occdf = occdf, bins = bins, method = "mid") #Assign to all bins that age range covers ex2 <- bin_time(occdf = occdf, bins = bins, method = "all") #Assign via majority overlap based on fossil occurrence age range ex3 <- bin_time(occdf = occdf, bins = bins, method = "majority") #Assign randomly to overlapping bins based on fossil occurrence age range ex4 <- bin_time(occdf = occdf, bins = bins, method = "random", reps = 5) #Assign point estimates following a normal distribution ex5 <- bin_time(occdf = occdf, bins = bins, method = "point", reps = 5, fun = dnorm, mean = 0.5, sd = 0.25)
#Grab internal tetrapod data occdf <- tetrapods[1:100, ] bins <- time_bins() #Assign via midpoint age of fossil occurrence data ex1 <- bin_time(occdf = occdf, bins = bins, method = "mid") #Assign to all bins that age range covers ex2 <- bin_time(occdf = occdf, bins = bins, method = "all") #Assign via majority overlap based on fossil occurrence age range ex3 <- bin_time(occdf = occdf, bins = bins, method = "majority") #Assign randomly to overlapping bins based on fossil occurrence age range ex4 <- bin_time(occdf = occdf, bins = bins, method = "random", reps = 5) #Assign point estimates following a normal distribution ex5 <- bin_time(occdf = occdf, bins = bins, method = "point", reps = 5, fun = dnorm, mean = 0.5, sd = 0.25)
A function to apply palaeoverse
functionality across subsets (groups) of
data, delineated using one or more variables. Functions which receive a
data.frame
as input (e.g. nrow
, ncol
, lengths
, unique
) may also be
used.
group_apply(occdf, group, fun, ...)
group_apply(occdf, group, fun, ...)
occdf |
|
group |
|
fun |
|
... |
Additional arguments available in the called function. These arguments may be required for function arguments without default values, or if you wish to overwrite the default argument value (see examples). |
group_apply
applies functions to subgroups of data within a
supplied dataset, enabling the separate analysis of occurrences or taxa from
different time intervals, spatial regions, or trait values. The function
serves as a wrapper around palaeoverse
functions. Other functions which
can be applied to a data.frame
(e.g. nrow
, ncol
, lengths
,
unique
) may also be used.
All palaeoverse
functions which require a dataframe input can be used in
conjunction with the group_apply
function. However, this is unnecessary
for many functions (e.g. bin_time
) as groups do not need to
be partitioned before binning. This list provides
users with palaeoverse
functions that might be interesting to apply across
group(s):
tax_unique
: return the number of unique taxa per grouping
variable.
tax_range_time
: return the temporal range of taxa per
grouping variable.
tax_range_space
: return the geographic range of taxa per
grouping variable.
tax_check
: return potential spelling variations of the
same taxon per grouping variable. Note: verbose
needs to be set to FALSE.
A data.frame
of the outputs from the selected function, with
appended column(s) indicating the user-defined groups. If a single vector
is returned via the called function, it will be transformed to a
data.frame
with the column name equal to the input function.
Lewis A. Jones & William Gearty
Kilian Eichenseer & Bethany Allen
# Examples # Get tetrapods data occdf <- tetrapods[1:100, ] # Remove NA data occdf <- subset(occdf, !is.na(genus)) # Count number of occurrences from each country ex1 <- group_apply(occdf = occdf, group = "cc", fun = nrow) # Unique genera per collection with group_apply and input arguments ex2 <- group_apply(occdf = occdf, group = c("collection_no"), fun = tax_unique, genus = "genus", family = "family", order = "order", class = "class", resolution = "genus") # Use multiple variables (number of occurrences per collection and formation) ex3 <- group_apply(occdf = occdf, group = c("collection_no", "formation"), fun = nrow) # Compute counts of occurrences per latitudinal bin # Set up lat bins bins <- lat_bins_degrees() # bin occurrences occdf <- bin_lat(occdf = occdf, bins = bins) # Calculate number of occurrences per bin ex4 <- group_apply(occdf = occdf, group = "lat_bin", fun = nrow)
# Examples # Get tetrapods data occdf <- tetrapods[1:100, ] # Remove NA data occdf <- subset(occdf, !is.na(genus)) # Count number of occurrences from each country ex1 <- group_apply(occdf = occdf, group = "cc", fun = nrow) # Unique genera per collection with group_apply and input arguments ex2 <- group_apply(occdf = occdf, group = c("collection_no"), fun = tax_unique, genus = "genus", family = "family", order = "order", class = "class", resolution = "genus") # Use multiple variables (number of occurrences per collection and formation) ex3 <- group_apply(occdf = occdf, group = c("collection_no", "formation"), fun = nrow) # Compute counts of occurrences per latitudinal bin # Set up lat bins bins <- lat_bins_degrees() # bin occurrences occdf <- bin_lat(occdf = occdf, bins = bins) # Calculate number of occurrences per bin ex4 <- group_apply(occdf = occdf, group = "lat_bin", fun = nrow)
A dataframe of the Geological Timescale 2012. Age data from the International Commission on Stratigraphy. Supplementary information is also included in the dataset for plotting functionality (e.g. GTS2012 colour scheme).
GTS2012
GTS2012
A data frame with 186 rows and 9 variables:
Index number for the temporal order of all intervals present in the dataset.
Names of intervals in the dataset.
The temporal rank of intervals in the dataset.
The maximum age of the interval in millions of years before present.
The midpoint age of the interval in millions of years before present.
The minimum age of the interval in millions of years before present.
The duration of the interval in millions of years.
Colour of font to use for plotting in conjunction with the colour column.
Colours of stages based on the ICS timescale.
Standard abbreviations of interval names where appropiate.
Gradstein, F.M., Ogg, J.G., Schmitz, M.D. and Ogg, G.M. eds. (2012).
Geologic Timescale 2012. Elsevier.
Compiled by Lewis A. Jones (2022-07-02) from the ICS.
A dataframe of the Geological Timescale 2020. Age data from the International Commission on Stratigraphy. Supplementary information is included in the dataset for plotting functionality (e.g. GTS2020 colour scheme).
GTS2020
GTS2020
A data frame with 189 rows and 9 variables:
Index number for the temporal order of all intervals present in the dataset.
Names of intervals in the dataset.
The temporal rank of intervals in the dataset.
The maximum age of the interval in millions of years before present.
The midpoint age of the interval in millions of years before present.
The minimum age of the interval in millions of years before present.
The duration of the interval in millions of years.
Colour of font to use for plotting in conjunction with the colour column.
Colours of stages based on the ICS timescale.
Standard abbreviations of interval names where appropiate.
Gradstein, F.M., Ogg, J.G., Schmitz, M.D. and Ogg, G.M. eds. (2020).
Geologic Timescale 2020. Elsevier.
Compiled by Lewis A. Jones (2022-07-02) from the ICS.
A table of geological intervals and the earliest and latest corresponding international geological stages from the International Commission on Stratigraphy (ICS). The table was compiled using regional stratigraphies, the GeoWhen Database, temporal information from the Paleobiology Database and the Geological Timescale 2022. Some assignments were made with incomplete information on the stratigraphic provenance of intervals. The assignments in this table should be verified before research use. They are provided here as an example of functionality only.
interval_key
interval_key
A data frame with 1323 rows and 3 variables:
Stratigraphic interval
Earliest (oldest) geological stage which overlaps with the interval
Latest (youngest) geological stage which overlaps with the interval
Compiled by Kilian Eichenseer and Lewis Jones for assigning geological stages to ccurrences from the Paleobiology Database and the PaleoReefs Database.
lat_bins()
was renamed to lat_bins_degrees()
to be consistent
with lat_bins_area().
lat_bins(size = 10, min = -90, max = 90, fit = FALSE, plot = FALSE)
lat_bins(size = 10, min = -90, max = 90, fit = FALSE, plot = FALSE)
size |
|
min |
|
max |
|
fit |
|
plot |
|
A function to generate approximately equal-area latitudinal bins for a user-specified number of bins and latitudinal range. This approach is based on calculating the curved surface area of spherical segments bounded by two parallel discs.
lat_bins_area(n = 12, min = -90, max = 90, r = 6371, plot = FALSE)
lat_bins_area(n = 12, min = -90, max = 90, r = 6371, plot = FALSE)
n |
|
min |
|
max |
|
r |
|
plot |
|
A data.frame
of user-defined number of latitudinal bins. The
data.frame
contains the following columns: bin (bin number), min
(minimum latitude of the bin), mid (midpoint latitude of the bin),
max (maximum latitude of the bin), area (the area of the bin in
km2), area_prop (the
proportional area of the bin across all bins).
Lewis A. Jones & Kilian Eichenseer
Kilian Eichenseer & Bethany Allen
For bins with unequal area, but equal latitudinal range, see lat_bins_degrees.
# Generate 12 latitudinal bins bins <- lat_bins_area(n = 12) # Generate latitudinal bins for just the (sub-)tropics bins <- lat_bins_area(n = 6, min = -30, max = 30) # Generate latitudinal bins and a plot bins <- lat_bins_area(n = 24, plot = TRUE)
# Generate 12 latitudinal bins bins <- lat_bins_area(n = 12) # Generate latitudinal bins for just the (sub-)tropics bins <- lat_bins_area(n = 6, min = -30, max = 30) # Generate latitudinal bins and a plot bins <- lat_bins_area(n = 24, plot = TRUE)
A function to generate latitudinal bins of a given size for a user-defined latitudinal range. If the desired size of the bins is not compatible with the defined latitudinal range, bin size can be updated to the nearest integer which is divisible into this range.
lat_bins_degrees(size = 10, min = -90, max = 90, fit = FALSE, plot = FALSE)
lat_bins_degrees(size = 10, min = -90, max = 90, fit = FALSE, plot = FALSE)
size |
|
min |
|
max |
|
fit |
|
plot |
|
A dataframe
of latitudinal bins of user-defined size. The
data.frame
contains the following columns: bin (bin number), min
(minimum latitude of the bin), mid (midpoint latitude of
the bin), max (maximum latitude of the bin).
Lewis A. Jones
Bethany Allen
For equal-area latitudinal bins, see lat_bins_area.
# Generate 20 degrees latitudinal bins bins <- lat_bins_degrees(size = 20) # Generate latitudinal bins with closest fit to 13 degrees bins <- lat_bins_degrees(size = 13, fit = TRUE) # Generate latitudinal bins for defined latitudinal range bins <- lat_bins_degrees(size = 10, min = -50, max = 50)
# Generate 20 degrees latitudinal bins bins <- lat_bins_degrees(size = 20) # Generate latitudinal bins with closest fit to 13 degrees bins <- lat_bins_degrees(size = 13, fit = TRUE) # Generate latitudinal bins for defined latitudinal range bins <- lat_bins_degrees(size = 10, min = -50, max = 50)
A function that uses interval names to assign either international geological stages and numeric ages from the International Commission on Stratigraphy (ICS), or user-defined intervals, to fossil occurrences.
look_up( occdf, early_interval = "early_interval", late_interval = "late_interval", int_key = FALSE, assign_with_GTS = "GTS2020", return_unassigned = FALSE )
look_up( occdf, early_interval = "early_interval", late_interval = "late_interval", int_key = FALSE, assign_with_GTS = "GTS2020", return_unassigned = FALSE )
occdf |
|
early_interval |
|
late_interval |
|
int_key |
Optionally, named
If set to |
assign_with_GTS |
|
return_unassigned |
|
If int_key
is set to FALSE
(default), this function can be used to
assign numerical ages solely based on stages from a GTS table, and to assign
stages based on GTS interval names.
Instead of geological stages, the user can supply any names in the
early_stage
and late_stage
column of int_key
.
assign_with_GTS
should then be set to FALSE
.
An exemplary int_key
has been included within this package
(interval_key
). This key works well for assigning
geological stages to many of the intervals from the
Paleobiology Database
and the PaleoReefs Database.
palaeoverse
cannot guarantee that all of
the stage assignments with the exemplary key are accurate.
The table corresponding to this key can be loaded with
palaeoverse::interval_key
.
A dataframe
of the original input data
with the following
appended columns is returned: early_stage
and late_stage
, corresponding
to the earliest and latest international geological stage which
could be assigned to the occurrences based on the given interval names.
interval_max_ma
and interval_min_ma
return maximum and minimum interval
ages if provided in the interval key, or if they can be fetched from GTS2012
or GTS2020. A column interval_mid_ma
is appended to provide the midpoint
ages of the intervals.
Kilian Eichenseer & William Gearty
Lewis A. Jones & Christopher D. Dean
## Just use GTS2020 (default): # create exemplary dataframe taxdf <- data.frame(name = c("A", "B", "C"), early_interval = c("Maastrichtian", "Campanian", "Sinemurian"), late_interval = c("Maastrichtian", "Campanian", "Bartonian")) # assign stages and numerical ages taxdf <- look_up(taxdf) ## Use exemplary int_key # Get internal reef data occdf <- reefs # assign stages and numerical ages occdf <- look_up(occdf, early_interval = "interval", late_interval = "interval", int_key = interval_key) ## Use exemplary int_key and return unassigned # Get internal tetrapod data occdf <- tetrapods # assign stages and numerical ages occdf <- look_up(occdf, int_key = palaeoverse::interval_key) # return unassigned intervals unassigned <- look_up(occdf, int_key = palaeoverse::interval_key, return_unassigned = TRUE) ## Use own key and GTS2012: # create example data occdf <- data.frame( stage = c("any Permian", "first Permian stage", "any Permian", "Roadian")) # create example key interval_key <- data.frame( interval_name = c("any Permian", "first Permian stage"), early_stage = c("Asselian", "Asselian"), late_stage = c("Changhsingian", "Asselian")) # assign stages and numerical ages: occdf <- look_up(occdf, early_interval = "stage", late_interval = "stage", int_key = interval_key, assign_with_GTS = "GTS2012")
## Just use GTS2020 (default): # create exemplary dataframe taxdf <- data.frame(name = c("A", "B", "C"), early_interval = c("Maastrichtian", "Campanian", "Sinemurian"), late_interval = c("Maastrichtian", "Campanian", "Bartonian")) # assign stages and numerical ages taxdf <- look_up(taxdf) ## Use exemplary int_key # Get internal reef data occdf <- reefs # assign stages and numerical ages occdf <- look_up(occdf, early_interval = "interval", late_interval = "interval", int_key = interval_key) ## Use exemplary int_key and return unassigned # Get internal tetrapod data occdf <- tetrapods # assign stages and numerical ages occdf <- look_up(occdf, int_key = palaeoverse::interval_key) # return unassigned intervals unassigned <- look_up(occdf, int_key = palaeoverse::interval_key, return_unassigned = TRUE) ## Use own key and GTS2012: # create example data occdf <- data.frame( stage = c("any Permian", "first Permian stage", "any Permian", "Roadian")) # create example key interval_key <- data.frame( interval_name = c("any Permian", "first Permian stage"), early_stage = c("Asselian", "Asselian"), late_stage = c("Changhsingian", "Asselian")) # assign stages and numerical ages: occdf <- look_up(occdf, early_interval = "stage", late_interval = "stage", int_key = interval_key, assign_with_GTS = "GTS2012")
A function to estimate palaeocoordinates for fossil occurrence data (i.e. reconstruct the geographic distribution of organisms' remains at time of deposition). Each occurrence is assigned palaeocoordinates based on its current geographic position and age estimate.
palaeorotate( occdf, lng = "lng", lat = "lat", age = "age", model = "MERDITH2021", method = "point", uncertainty = TRUE, round = 3 )
palaeorotate( occdf, lng = "lng", lat = "lat", age = "age", model = "MERDITH2021", method = "point", uncertainty = TRUE, round = 3 )
occdf |
|
lng |
|
lat |
|
age |
|
model |
|
method |
|
uncertainty |
|
round |
|
This function can estimate palaeocoordinates using two different
approaches (method
):
Reconstruction files: The "grid" method
uses reconstruction files from
Jones & Domeier (2024) to spatiotemporally link present-day geographic
coordinates and age estimates with a discrete global grid rotated at one
million-year time steps throughout the Phanerozoic (540–0 Ma). Here,
resolution 3 (~119 km spacing) of the reconstruction files is used. All
files, and the process used to generate them, are available and documented
in Jones & Domeier (2024). If fine-scale spatial analyses are being
conducted, use of the "point" method
(see GPlates API below) may be
preferred (particularly if occurrences are close to plate boundaries). When
using the "grid" method
, coordinates within the same grid cell will be
assigned equivalent palaeocoordinates due to spatial aggregation. However,
this approach enables efficient estimation of the past distribution of
fossil occurrences. Note: each reconstruction file is ~45 MB in size.
GPlates API: The "point" method
uses the GPlates Web Service to reconstruct palaeocoordinates for point
data. The use of this method
is slower than the "grid" method
if many
unique time intervals exist in your dataset. However, it provides
palaeocoordinates with higher precision.
Available models and timespan for each method
:
"MERDITH2021" (Merdith et al., 2021)
0–1000 Ma (point)
0–540 Ma (grid)
"TorsvikCocks2017" (Torsvik and Cocks, 2016)
0–540 Ma (point/grid)
"PALEOMAP" (Scotese, 2016)
0–1100 Ma (point)
0–540 Ma (grid)
"MATTHEWS2016_pmag_ref" (Matthews et al., 2016)
0–410 Ma (grid/point)
"GOLONKA" (Wright et al., 2013)
0–540 Ma (grid/point)
A data.frame
containing the original input occurrence
data.frame
and the reconstructed coordinates (i.e. "p_lng", "p_lat"). The
"grid" method
also returns the age of rotation ("rot_age") and the
reference coordinates rotated ("rot_lng" and "rot_lat"). If only one
model is requested, a column containing the rotation model used
("rot_model") is also appended. Otherwise, the name of each model is
appended to the name of each column containing palaeocoordinates (e.g.
"p_lng_GOLONKA"). If uncertainty
is set to TRUE
, the
palaeolatitudinal range ("range_p_lat") and the maximum geographic
distance ("max_dist") in km between palaeocoordinates will also be
returned (the latter calculated via distGeo
).
Jones, L.A., Domeier, M. A Phanerozoic gridded dataset for palaeogeographic reconstructions. Sci Data 11, 710 (2024). doi:10.1038/s41597-024-03468-w.
Matthews, K.J., Maloney, K.T., Zahirovic, S., Williams, S.E., Seton, M., and Müller, R.D. (2016). Global plate boundary evolution and kinematics since the late Paleozoic. Global and Planetary Change, 146, 226-250. doi:10.1016/j.gloplacha.2016.10.002.
Merdith, A., Williams, S.E., Collins, A.S., Tetley, M.G., Mulder, J.A., Blades, M.L., Young, A., Armistead, S.E., Cannon, J., Zahirovic, S., Müller. R.D. (2021). Extending full-plate tectonic models into deep time: Linking the Neoproterozoic and the Phanerozoic. Earth-Science Reviews, 214(103477). doi:10.1016/j.earscirev.2020.103477.
Scotese, C., & Wright, N. M. (2018). PALEOMAP Paleodigital Elevation Models (PaleoDEMs) for the Phanerozoic. PALEOMAP Project.
Torsvik, T. H. & Cocks, L. R. M. Earth History and Palaeogeography. Cambridge University Press, 2016.
Wright, N., Zahirovic, S., Müller, R. D., & Seton, M. (2013). Towards community-driven paleogeographic reconstructions: integrating open-access paleogeographic and paleobiology data with plate tectonics. Biogeosciences, 10(3), 1529-1541. doi:10.5194/bg-10-1529-2013.
See GPlates documentation for additional information and details.
Lewis A. Jones
Kilian Eichenseer, Lucas Buffan & Will Gearty
## Not run: #Generic example with a few occurrences occdf <- data.frame(lng = c(2, -103, -66), lat = c(46, 35, -7), age = c(88, 125, 200)) #Calculate palaeocoordinates using reconstruction files ex1 <- palaeorotate(occdf = occdf, method = "grid") #Calculate palaeocoordinates using the GPlates API ex2 <- palaeorotate(occdf = occdf, method = "point") #Calculate uncertainity in palaeocoordinates from models ex3 <- palaeorotate(occdf = occdf, method = "grid", model = c("MERDITH2021", "GOLONKA", "PALEOMAP"), uncertainty = TRUE) #Now with some real fossil occurrence data! #Grab some data from the Paleobiology Database data(tetrapods) #Assign midpoint age of fossil occurrence data for reconstruction tetrapods$age <- (tetrapods$max_ma + tetrapods$min_ma)/2 #Rotate the data ex3 <- palaeorotate(occdf = tetrapods) #Calculate uncertainity in palaeocoordinates from models ex4 <- palaeorotate(occdf = tetrapods, model = c("MERDITH2021", "GOLONKA", "PALEOMAP"), uncertainty = TRUE) ## End(Not run)
## Not run: #Generic example with a few occurrences occdf <- data.frame(lng = c(2, -103, -66), lat = c(46, 35, -7), age = c(88, 125, 200)) #Calculate palaeocoordinates using reconstruction files ex1 <- palaeorotate(occdf = occdf, method = "grid") #Calculate palaeocoordinates using the GPlates API ex2 <- palaeorotate(occdf = occdf, method = "point") #Calculate uncertainity in palaeocoordinates from models ex3 <- palaeorotate(occdf = occdf, method = "grid", model = c("MERDITH2021", "GOLONKA", "PALEOMAP"), uncertainty = TRUE) #Now with some real fossil occurrence data! #Grab some data from the Paleobiology Database data(tetrapods) #Assign midpoint age of fossil occurrence data for reconstruction tetrapods$age <- (tetrapods$max_ma + tetrapods$min_ma)/2 #Rotate the data ex3 <- palaeorotate(occdf = tetrapods) #Calculate uncertainity in palaeocoordinates from models ex4 <- palaeorotate(occdf = tetrapods, model = c("MERDITH2021", "GOLONKA", "PALEOMAP"), uncertainty = TRUE) ## End(Not run)
A function to check the list of tip names in a phylogeny against a vector of taxon names, and if desired, to trim the phylogeny to only include taxon names within the vector.
phylo_check(tree = NULL, list = NULL, out = "full_table", sort = "presence")
phylo_check(tree = NULL, list = NULL, out = "full_table", sort = "presence")
tree |
|
list |
|
out |
|
sort |
|
Phylogenies can be read into R from .txt or .tree files containing
the Newick formatted tree using ape::read.tree()
, and can be saved as
files using ape::write.tree()
. When out = "tree", tips are trimmed using
ape::drop.tip()
; if your tree is not ultrametric (i.e. the tip dates are
not all the same), we recommend using paleotree::fixRootTime()
to readjust
your branch lengths following pruning.
If out = "full_table", a dataframe
describing whether taxon
names are present in the list and/or the tree. If out = "diff_table", a
dataframe
describing which taxon names are present in the list or the
tree, but not both. If out = "counts", a summary table containing the number
of taxa in the list but not the tree, in the tree but not the list, and in
both. If out = "tree", a phylo object consisting of the input phylogeny
trimmed to only include the tips present in the list.
Bethany Allen
William Gearty & Pedro Godoy
# track user par oldpar <- par(no.readonly = TRUE) #Read in example tree of ceratopsians from paleotree library(paleotree) data(RaiaCopesRule) #Set smaller margins for plotting par(mar = rep(0.5, 4)) plot(ceratopsianTreeRaia) #Specify list of names dinosaurs <- c("Nasutoceratops_titusi", "Diabloceratops_eatoni", "Zuniceratops_christopheri", "Psittacosaurus_major", "Psittacosaurus_sinensis", "Avaceratops_lammersi", "Xenoceratops_foremostensis", "Leptoceratops_gracilis", "Triceratops_horridus", "Triceratops_prorsus") #Table of taxon names in list, tree or both ex1 <- phylo_check(tree = ceratopsianTreeRaia, list = dinosaurs) #Counts of taxa in list, tree or both ex2 <- phylo_check(tree = ceratopsianTreeRaia, list = dinosaurs, out = "counts") #Trim tree to tips in the list my_ceratopsians <- phylo_check(tree = ceratopsianTreeRaia, list = dinosaurs, out = "tree") plot(my_ceratopsians) # reset user par par(oldpar)
# track user par oldpar <- par(no.readonly = TRUE) #Read in example tree of ceratopsians from paleotree library(paleotree) data(RaiaCopesRule) #Set smaller margins for plotting par(mar = rep(0.5, 4)) plot(ceratopsianTreeRaia) #Specify list of names dinosaurs <- c("Nasutoceratops_titusi", "Diabloceratops_eatoni", "Zuniceratops_christopheri", "Psittacosaurus_major", "Psittacosaurus_sinensis", "Avaceratops_lammersi", "Xenoceratops_foremostensis", "Leptoceratops_gracilis", "Triceratops_horridus", "Triceratops_prorsus") #Table of taxon names in list, tree or both ex1 <- phylo_check(tree = ceratopsianTreeRaia, list = dinosaurs) #Counts of taxa in list, tree or both ex2 <- phylo_check(tree = ceratopsianTreeRaia, list = dinosaurs, out = "counts") #Trim tree to tips in the list my_ceratopsians <- phylo_check(tree = ceratopsianTreeRaia, list = dinosaurs, out = "tree") plot(my_ceratopsians) # reset user par par(oldpar)
A dataset of Phanerozoic reef occurrences from the
PaleoReefs Database (PARED).
This example dataset includes a subset of the available data from PARED,
but can be used to demonstrate how the functions in the palaeoverse
package might be applied.
reefs
reefs
A data frame with 4363 rows and 14 variables:
Reference number given to the particular fossil reef in PARED
Reference name given to the particular fossil reef in PARED
The geological formation to which the fossil reef belongs
The stratigraphic system to which the fossil reef belongs
The stratigraphic series to which the fossil reef belongs
The stratigraphic interval to which the fossil reef belongs
The main biota present within the fossil reef
The secondary biota present within the fossil reef
The modern-day longitude of the fossil reef
The modern-day latitude of the fossil reef
The country or ocean the fossil reef is located in
The authors of the publication documenting the fossil reef
The title of the publication documenting the fossil reef
The year of the publication documenting the fossil reef
Kiessling, W. & Krause, M. C. (2022). PaleoReefs Database (PARED)
(1.0) Data set. doi:10.5281/zenodo.6037852
Compiled by Lewis A. Jones. Downloaded on the 25th July 2022. doi:10.5281/zenodo.6037852
A function to check for and count potential spelling variations of the same taxon. Spelling variations are checked within alphabetical groups (default), or within higher taxonomic groups if provided.
tax_check( taxdf, name = "genus", group = NULL, dis = 0.05, start = 1, verbose = TRUE )
tax_check( taxdf, name = "genus", group = NULL, dis = 0.05, start = 1, verbose = TRUE )
taxdf |
|
name |
|
group |
|
dis |
|
start |
|
verbose |
|
When higher taxonomy is provided, but some entries are missing,
comparisons will still be made within alphabetical groups of taxa which lack
higher taxonomic affiliations. The function also performs a check for
non-letter characters which are not expected to be present in
correctly-formatted taxon names. This detection may be made available to the
user via the verbose
argument. Comparisons are performed using the
Jaro dissimilarity metric via
stringdist::stringdistmatrix()
.
As all string distance metrics rely on approximate string matching,
different metrics can produce different results. This function uses Jaro
distance as it was designed with short, typed strings in mind, but good
practice should include comparisons using multiple metrics, and ultimately
specific taxonomic vetting where possible. A more complete implementation
and workflow for cleaning taxonomic occurrence data is available in the
fossilbrush
R package on CRAN.
If verbose = TRUE
(default), a list
with three elements. The
first element in the list (synonyms) is a data.frame
with each row
reporting a pair of potential synonyms. The first column "group" contains the
higher group in which they occur (alphabetical groupings if group
is
not provided). The second column "greater" contains the most common synonym
in each pair. The third column "lesser" contains the least common synonym in
each pair. The third and fourth column (count_greater
, count_lesser
)
contain the respective counts of each synonym in a pair. If no matches were
found for the filtering arguments, this element is NULL
instead. The second
element (non_letter_name
) is a vector of taxon names which contain
non-letter characters, or NULL
if none were detected. The third element
(non_letter_group) is a vector of taxon groups which contain non-letter
characters, or NULL
if none were detected. If verbose = FALSE
, a
data.frame
as described above is returned, or NULL
if no matches
were found.
van der Loo, M. P. J. (2014). The stringdist package for approximate string matching. The R Journal 6, 111-122.
Joseph T. Flannery-Sutherland & Lewis A. Jones
Lewis A. Jones, Kilian Eichenseer & Christopher D. Dean
## Not run: # load occurrence data data("tetrapods") # Check taxon names alphabetically ex1 <- tax_check(taxdf = tetrapods, name = "genus", dis = 0.1) # Check taxon names by group ex2 <- tax_check(taxdf = tetrapods, name = "genus", group = "family", dis = 0.1) ## End(Not run)
## Not run: # load occurrence data data("tetrapods") # Check taxon names alphabetically ex1 <- tax_check(taxdf = tetrapods, name = "genus", dis = 0.1) # Check taxon names by group ex2 <- tax_check(taxdf = tetrapods, name = "genus", group = "family", dis = 0.1) ## End(Not run)
A function to generate pseudo-occurrences for taxa based on latitudinal
ranges (e.g. the output of the 'lat' method in
tax_range_space
).
While the resulting pseudo-occurrences should not be treated as equivalent
to actual occurrence data (e.g. like that from the Paleobiology Database),
such pseudo-occurrences may be useful for performing statistical analyses
where the row representing a taxon must be replicated for each latitudinal
bin through which the taxon ranges.
tax_expand_lat(taxdf, bins, max_lat = "max_lat", min_lat = "min_lat")
tax_expand_lat(taxdf, bins, max_lat = "max_lat", min_lat = "min_lat")
taxdf |
|
bins |
|
max_lat |
|
min_lat |
|
A dataframe
where each row represents a latitudinal bin which
a taxon ranges through. The columns are identical to those in the
user-supplied data with additional columns included to identify bins. Output
will be returned in the order of supplied bins.
Lewis A. Jones & William Gearty
Christopher D. Dean
bins <- lat_bins_degrees() taxdf <- data.frame(name = c("A", "B", "C"), max_lat = c(60, 20, -10), min_lat = c(20, -40, -60)) ex <- tax_expand_lat(taxdf = taxdf, bins = bins, max_lat = "max_lat", min_lat = "min_lat")
bins <- lat_bins_degrees() taxdf <- data.frame(name = c("A", "B", "C"), max_lat = c(60, 20, -10), min_lat = c(20, -40, -60)) ex <- tax_expand_lat(taxdf = taxdf, bins = bins, max_lat = "max_lat", min_lat = "min_lat")
A function to generate interval-level pseudo-occurrences for taxa based on
temporal ranges (e.g. the output of tax_range_time
). While the
resulting pseudo-occurrences should not be treated as equivalent to actual
occurrence data (e.g. like that from the Paleobiology Database), such
pseudo-occurrences may be useful for performing statistical analyses where
the row representing a taxon must be replicated for each interval through
which the taxon persisted.
tax_expand_time( taxdf, max_ma = "max_ma", min_ma = "min_ma", bins = NULL, scale = "GTS2020", rank = "stage", ext_orig = TRUE )
tax_expand_time( taxdf, max_ma = "max_ma", min_ma = "min_ma", bins = NULL, scale = "GTS2020", rank = "stage", ext_orig = TRUE )
taxdf |
|
max_ma |
|
min_ma |
|
bins |
|
scale |
|
rank |
|
ext_orig |
|
A dataframe
where each row represents an interval during which
a taxon in the original user-supplied data persisted. The columns are
identical to those in the user-supplied data with additional columns
included to identify the intervals. If ext_orig
is TRUE
,
two additional columns are added to identify in which intervals taxa
originated and went extinct.
William Gearty & Lewis A. Jones
Lewis A. Jones
taxdf <- data.frame(name = c("A", "B", "C"), max_ma = c(150, 60, 30), min_ma = c(110, 20, 0)) ex <- tax_expand_time(taxdf) bins <- time_bins(scale = "GTS2012", rank = "stage") ex2 <- tax_expand_time(taxdf, bins = bins)
taxdf <- data.frame(name = c("A", "B", "C"), max_ma = c(150, 60, 30), min_ma = c(110, 20, 0)) ex <- tax_expand_time(taxdf) bins <- time_bins(scale = "GTS2012", rank = "stage") ex2 <- tax_expand_time(taxdf, bins = bins)
A function to calculate the geographic range of fossil taxa from occurrence data. The function can calculate geographic range in four ways: convex hull, latitudinal range, maximum Great Circle Distance, and the number of occupied equal-area hexagonal grid cells.
tax_range_space( occdf, name = "genus", lng = "lng", lat = "lat", method = "lat", spacing = 100, coords = FALSE )
tax_range_space( occdf, name = "genus", lng = "lng", lat = "lat", method = "lat", spacing = 100, coords = FALSE )
occdf |
|
name |
|
lng |
|
lat |
|
method |
|
spacing |
|
coords |
|
Four commonly applied approaches (Darroch et al. 2020)
are available using the tax_range_space
function for calculating ranges:
Convex hull: the "con" method calculates the geographic range of taxa
using a convex hull for each taxon in occdf
, and calculates the area of
the convex hull (in km2) using
geosphere::areaPolygon()
. The
convex hull method works by creating a polygon that encompasses all
occurrence points of the taxon.
Latitudinal: the "lat" method calculates the palaeolatitudinal
range of a taxon. It does so for each taxon in occdf
by finding their
maximum and minimum latitudinal occurrence (from input lat
).
The palaeolatitudinal range of each taxon is also calculated (i.e. the
difference between the minimum and maximum latitude).
Maximum Great Circle Distance: the "gcd" method calculates the maximum
Great Circle Distance between occurrences for each taxon in occdf
. It does
so using geosphere::distHaversine()
.
This function calculates Great Circle Distance using the Haversine method
with the radius of the Earth set to the 6378.137 km.
Great Circle Distance represents the shortest distance between two
points on the surface of a sphere. This is different from Euclidean Distance,
which represents the distance between two points on a plane.
Occupied cells: the "occ" method calculates the number and proportion of
occupied equal-area grid cells. It does so using discrete hexagonal grids
via the h3jsr
package. This package relies on
Uber's H3 library, a geospatial indexing system
that partitions the world into hexagonal cells. In H3, 16 different
resolutions are available
(see here).
In the implementation of the tax_range_space()
function, the resolution is
defined by the user-input spacing
which represents the distance between
the centroid of adjacent cells. Using this distance, the function identifies
which resolution is most similar to the input spacing
, and uses this
resolution.
A dataframe
with method-specific columns:
For the "con" method, a dataframe
with each unique taxa (taxon
)
and taxon ID (taxon_id
) by convex hull coordinate (lng
& lat
)
combination, and area (area
) in
km2 is returned.
For the "lat" method, a dataframe
with unique taxa (taxon
),
taxon ID (taxon_id
), maximum latitude of occurrence (max_lat
),
minimum latitude of occurrence (min_lat
), and latitudinal
range (range_lat
) is returned.
For the "gcd" method, a dataframe
with each unique taxa (taxon
)
and taxon ID (taxon_id
) by coordinate combination (lng
& lat
) of the
two most distant points, and the Great Circle Distance (gcd
) between
these points in km is returned.
For the "occ" method, a dataframe
with unique taxa (taxon
), taxon
ID (taxon_id
), the number of occupied cells (n_cells
), proportion of
occupied cells from all occupied by occurrences (proportional_occ
),
and the spacing between cells (spacing
) in km is returned. Note: the number
of occupied cells and proportion of occupied cells is highly dependent on
the user-defined spacing.
For the "con", "lat" and "gcd" method, values of zero indicate that the
respective taxon is a singleton (i.e. represented by only one occurrence).
Darroch, S. A., Casey, M. M., Antell, G. S., Sweeney, A., & Saupe, E. E. (2020). High preservation potential of paleogeographic range size distributions in deep time. The American Naturalist, 196(4), 454-471.
Lewis A. Jones
Bethany Allen & Christopher D. Dean
# Grab internal data occdf <- tetrapods[1:100, ] # Remove NAs occdf <- subset(occdf, !is.na(genus)) # Convex hull ex1 <- tax_range_space(occdf = occdf, name = "genus", method = "con") # Latitudinal range ex2 <- tax_range_space(occdf = occdf, name = "genus", method = "lat") # Great Circle Distance ex3 <- tax_range_space(occdf = occdf, name = "genus", method = "gcd") # Occupied grid cells ex4 <- tax_range_space(occdf = occdf, name = "genus", method = "occ", spacing = 500) # Convex hull with coordinates ex5 <- tax_range_space(occdf = occdf, name = "genus", method = "con", coords = TRUE)
# Grab internal data occdf <- tetrapods[1:100, ] # Remove NAs occdf <- subset(occdf, !is.na(genus)) # Convex hull ex1 <- tax_range_space(occdf = occdf, name = "genus", method = "con") # Latitudinal range ex2 <- tax_range_space(occdf = occdf, name = "genus", method = "lat") # Great Circle Distance ex3 <- tax_range_space(occdf = occdf, name = "genus", method = "gcd") # Occupied grid cells ex4 <- tax_range_space(occdf = occdf, name = "genus", method = "occ", spacing = 500) # Convex hull with coordinates ex5 <- tax_range_space(occdf = occdf, name = "genus", method = "con", coords = TRUE)
A function to plot the stratigraphic ranges of fossil taxa from occurrence data.
tax_range_strat( occdf, name = "genus", level = "bed", certainty = NULL, by = "FAD", plot_args = NULL, x_args = NULL, y_args = NULL )
tax_range_strat( occdf, name = "genus", level = "bed", certainty = NULL, by = "FAD", plot_args = NULL, x_args = NULL, y_args = NULL )
occdf |
|
name |
|
level |
|
certainty |
|
by |
|
plot_args |
A list of optional arguments that are passed directly to
|
x_args |
A list of optional arguments that are passed directly to
|
y_args |
A list of optional arguments that are passed directly to
|
Note that the default spacing for the x-axis title may cause it to
overlap with the x-axis tick labels. To avoid this, you can call
graphics::title()
after running tax_range_strat()
and specify both
xlab
and line
to add the x-axis title farther from the axis (see
examples).
The styling of the points and line segments can be adjusted by supplying
named arguments to plot_args
. col
(segment and point color), lwd
(segment width), pch
(point symbol), bg
(background point color for
some values of pch
), lty
(segment line type), and cex
(point size)
are supported. In the case of a column being supplied to the certainty
argument, these arguments may be vectors of length two, in which case the
first value of the vector will be used for the "certain" points and
segments, and the second value of the vector will be used for the
"uncertain" points and segments. If only a single value is supplied, it
will be used for both. The default values for these arguments are as
follows:
col
= c("black", "black")
lwd
= c(1.5, 1.5)
pch
= c(19, 21)
bg
= c("black", "white")
lty
= c(1, 2)
cex
= c(1, 1)
Invisibly returns a data.frame of the calculated taxonomic stratigraphic ranges.
The function is usually used for its side effect, which is to create a plot showing the stratigraphic ranges of taxa in a section, with levels at which the taxon was sampled indicated with a point.
Bethany Allen, William Gearty & Alexander Dunhill
William Gearty & Lewis A. Jones
# Load tetrapod dataset data(tetrapods) # Sample tetrapod occurrences tetrapod_names <- tetrapods$accepted_name[1:50] # Simulate bed numbers beds_sampled <- sample.int(n = 10, size = 50, replace = TRUE) # Simulate certainty values certainty_sampled <- sample(x = 0:1, size = 50, replace = TRUE) # Combine into data frame occdf <- data.frame(taxon = tetrapod_names, bed = beds_sampled, certainty = certainty_sampled) # Plot stratigraphic ranges par(mar = c(12, 5, 2, 2)) tax_range_strat(occdf, name = "taxon") tax_range_strat(occdf, name = "taxon", certainty = "certainty", plot_args = list(ylab = "Stratigraphic height (m)")) # Plot stratigraphic ranges with more labelling tax_range_strat(occdf, name = "taxon", certainty = "certainty", by = "name", plot_args = list(main = "Section A", ylab = "Stratigraphic height (m)")) eras_custom <- data.frame(name = c("Mesozoic", "Cenozoic"), max_age = c(0.5, 3.5), min_age = c(3.5, 10.5), color = c("#67C5CA", "#F2F91D")) axis_geo(side = 4, intervals = eras_custom, tick_labels = FALSE) title(xlab = "Taxon", line = 10.5)
# Load tetrapod dataset data(tetrapods) # Sample tetrapod occurrences tetrapod_names <- tetrapods$accepted_name[1:50] # Simulate bed numbers beds_sampled <- sample.int(n = 10, size = 50, replace = TRUE) # Simulate certainty values certainty_sampled <- sample(x = 0:1, size = 50, replace = TRUE) # Combine into data frame occdf <- data.frame(taxon = tetrapod_names, bed = beds_sampled, certainty = certainty_sampled) # Plot stratigraphic ranges par(mar = c(12, 5, 2, 2)) tax_range_strat(occdf, name = "taxon") tax_range_strat(occdf, name = "taxon", certainty = "certainty", plot_args = list(ylab = "Stratigraphic height (m)")) # Plot stratigraphic ranges with more labelling tax_range_strat(occdf, name = "taxon", certainty = "certainty", by = "name", plot_args = list(main = "Section A", ylab = "Stratigraphic height (m)")) eras_custom <- data.frame(name = c("Mesozoic", "Cenozoic"), max_age = c(0.5, 3.5), min_age = c(3.5, 10.5), color = c("#67C5CA", "#F2F91D")) axis_geo(side = 4, intervals = eras_custom, tick_labels = FALSE) title(xlab = "Taxon", line = 10.5)
A function to calculate the temporal range of fossil taxa from occurrence data.
tax_range_time( occdf, name = "genus", min_ma = "min_ma", max_ma = "max_ma", by = "FAD", plot = FALSE, plot_args = NULL, intervals = "periods" )
tax_range_time( occdf, name = "genus", min_ma = "min_ma", max_ma = "max_ma", by = "FAD", plot = FALSE, plot_args = NULL, intervals = "periods" )
occdf |
|
name |
|
min_ma |
|
max_ma |
|
by |
|
plot |
|
plot_args |
|
intervals |
|
The temporal range(s) of taxa are calculated by extracting all
unique taxa (name
column) from the input occdf
, and checking their
first and last appearance. The temporal duration of each taxon is also
calculated. If the input data columns contain NAs, these must be
removed prior to function call. A plot of the temporal range of each
taxon is also returned if plot = TRUE
. Customisable argument options
(i.e. graphics::par()
) to pass to plot_args
as a list (and their
defaults) for plotting include:
xlab = "Time (Ma)"
ylab = "Taxon ID"
col = "black"
bg = "black"
pch = 20
cex = 1
lty = 1
lwd = 1
Note: this function provides output based solely on the user input data. The true duration of a taxon is likely confounded by uncertainty in dating occurrences, and incomplete sampling and preservation.
A dataframe
containing the following columns:
unique taxa (taxon
), taxon ID (taxon_id
), first appearance of taxon
(max_ma
), last appearance of taxon (min_ma
), duration of temporal
range (range_myr
), and number of occurrences per taxon (n_occ
) is
returned.
Lewis A. Jones
Bethany Allen, Christopher D. Dean & Kilian Eichenseer
# Grab internal data occdf <- tetrapods # Remove NAs occdf <- subset(occdf, !is.na(order) & order != "NO_ORDER_SPECIFIED") # Temporal range ex <- tax_range_time(occdf = occdf, name = "order", plot = TRUE) # Customise appearance ex <- tax_range_time(occdf = occdf, name = "order", plot = TRUE, plot_args = list(ylab = "Orders", pch = 21, col = "black", bg = "blue", lty = 2), intervals = list("periods", "eras"))
# Grab internal data occdf <- tetrapods # Remove NAs occdf <- subset(occdf, !is.na(order) & order != "NO_ORDER_SPECIFIED") # Temporal range ex <- tax_range_time(occdf = occdf, name = "order", plot = TRUE) # Customise appearance ex <- tax_range_time(occdf = occdf, name = "order", plot = TRUE, plot_args = list(ylab = "Orders", pch = 21, col = "black", bg = "blue", lty = 2), intervals = list("periods", "eras"))
A function to filter a list of taxonomic occurrences to unique taxa of a predefined resolution. Occurrences identified to a coarser taxonomic resolution than the desired level are retained if they belong to a clade which is not otherwise represented in the dataset (see details section for further information). This has previously been described as "cryptic diversity" (e.g. Mannion et al. 2011).
tax_unique( occdf = NULL, binomial = NULL, species = NULL, genus = NULL, ..., name = NULL, resolution = "species", append = FALSE )
tax_unique( occdf = NULL, binomial = NULL, species = NULL, genus = NULL, ..., name = NULL, resolution = "species", append = FALSE )
occdf |
|
binomial |
|
species |
|
genus |
|
... |
|
name |
|
resolution |
|
append |
|
Palaeobiologists usually count unique taxa by retaining only unique occurrences identified to a given taxonomic resolution, however this function retains occurrences identified to a coarser taxonomic resolution which are not already represented within the dataset. For example, consider the following set of occurrences:
Albertosaurus sarcophagus
Ankylosaurus sp.
Aves indet.
Ceratopsidae indet.
Hadrosauridae indet.
Ornithomimus sp.
Tyrannosaurus rex
A filter for species-level identifications would reduce the species richness to two. However, none of these clades are nested within one another, so each of the indeterminately identified occurrences represents at least one species not already represented in the dataset. This function is designed to deal with such taxonomic data, and would retain all seven 'species' in this example.
Taxonomic information is supplied within a dataframe, in which columns
provide identifications at different taxonomic levels. Occurrence
data can be filtered to retain either unique species, or unique genera. If a
species-level filter is desired, the minimum input requires either (1)
binomial
, (2) species
and genus
, or (3) name
and genus
columns to
be entered, as well as at least one column of a higher taxonomic level.
In a standard Paleobiology Database
occurrence dataframe, species names are only
captured in the 'accepted_name' column, so a species-level filter should use
'genus
= "genus"' and 'name
= "accepted_name"' arguments. If a
genus-level filter is desired, the minimum input requires either (1)
binomial
or (2) genus
columns to be entered, as well as at least one
column of a higher taxonomic level.
Missing data should be indicated with NAs, although the function can handle common labels such as "NO_FAMILY_SPECIFIED" within Paleobiology Database datasets.
The function matches taxonomic names at face value, so homonyms may be falsely filtered out.
A dataframe
of taxa, with each row corresponding to a unique
"species" or "genus" in the dataset (depending on the chosen resolution).
The dataframe will include the taxonomic information provided into the
function, as well as a column providing the 'unique' names of each taxon. If
append
is TRUE
, the original dataframe (occdf
) will be
returned with these 'unique' names appended as a new column. Occurrences that
are identified to a coarse taxonomic resolution and belong to a clade which
is already represented within the dataset will have their 'unique' names
listed as NA
.
Mannion, P. D., Upchurch, P., Carrano, M. T., and Barrett, P. M. (2011). Testing the effect of the rock record on diversity: a multidisciplinary approach to elucidating the generic richness of sauropodomorph dinosaurs through time. Biological Reviews, 86, 157-181. doi:10.1111/j.1469-185X.2010.00139.x.
Bethany Allen & William Gearty
Lewis A. Jones & William Gearty
#Retain unique species occdf <- tetrapods[1:100, ] species <- tax_unique(occdf = occdf, genus = "genus", family = "family", order = "order", class = "class", name = "accepted_name") #Retain unique genera genera <- tax_unique(occdf = occdf, genus = "genus", family = "family", order = "order", class = "class", resolution = "genus") #Append unique names to the original occurrences genera_append <- tax_unique(occdf = occdf, genus = "genus", family = "family", order = "order", class = "class", resolution = "genus", append = TRUE) #Create dataframe from lists occdf2 <- data.frame(species = c("rex", "aegyptiacus", NA), genus = c("Tyrannosaurus", "Spinosaurus", NA), family = c("Tyrannosauridae", "Spinosauridae", "Diplodocidae")) dinosaur_species <- tax_unique(occdf = occdf2, species = "species", genus = "genus", family = "family") #Retain unique genera per collection with group_apply genera <- group_apply(occdf = occdf, group = c("collection_no"), fun = tax_unique, genus = "genus", family = "family", order = "order", class = "class", resolution = "genus")
#Retain unique species occdf <- tetrapods[1:100, ] species <- tax_unique(occdf = occdf, genus = "genus", family = "family", order = "order", class = "class", name = "accepted_name") #Retain unique genera genera <- tax_unique(occdf = occdf, genus = "genus", family = "family", order = "order", class = "class", resolution = "genus") #Append unique names to the original occurrences genera_append <- tax_unique(occdf = occdf, genus = "genus", family = "family", order = "order", class = "class", resolution = "genus", append = TRUE) #Create dataframe from lists occdf2 <- data.frame(species = c("rex", "aegyptiacus", NA), genus = c("Tyrannosaurus", "Spinosaurus", NA), family = c("Tyrannosauridae", "Spinosauridae", "Diplodocidae")) dinosaur_species <- tax_unique(occdf = occdf2, species = "species", genus = "genus", family = "family") #Retain unique genera per collection with group_apply genera <- group_apply(occdf = occdf, group = c("collection_no"), fun = tax_unique, genus = "genus", family = "family", order = "order", class = "class", resolution = "genus")
A dataset of tetrapod occurrences ranging from the Carboniferous through to the Early Triassic, from the Palaeobiology Database. Dataset includes a range of variables relevant to common palaeobiological analyses, relating to identification, geography, environmental context, traits and more. Additional information can be found here. The downloaded data is unaltered, with the exception of removing some superfluous variables, and can be used to demonstrate how the functions in the palaeoverse package might be applied.
tetrapods
tetrapods
A data frame with 5270 rows and 32 variables:
Reference number given to the particular occurrence in the Paleobiology Database
Reference number given to the Paleobiology Database collection (locality) that the occurrence belongs to
Taxon name as it appears in the original publication, which may include expressions of uncertainty (e.g. "cf.", "aff.", "?") or novelty (e.g. "n. gen.", "n. sp.")
The taxonomic rank, or resolution, of the identified name
Taxon name once the identified name has passed through the Paleobiology Database's internal taxonomy, which collapses synonyms, amends binomials which have been altered (e.g. species moving to another genus) and updates taxa which are no longer valid (e.g. nomina dubia)
The taxonomic rank, or resolution, of the accepted name
The oldest (or only) time interval within which the occurrence is thought to have been deposited
The youngest time interval within which the occurrence is thought to have been deposited
The age range given to the occurrence
The taxa (of decreasing taxonomic level) which the occurrence is identified as belonging to
The number (and units) of fossils attributed to the occurrence
The modern-day longitude and latitude of the fossil locality
The name of the Paleobiology Database collection which the occurrence belongs to, typically a spatio-temporally restricted locality
The country (code) where the fossils were discovered
The geological units from which the fossils were collected
The biozone which the occurrence is attributed to
The main lithology of the beds in the section where the fossils were collected
The inferred environmental conditions in the place of deposition
The mode of preservation of the fossils found in the collection (not necessarily of that specific occurrence), which will include information on whether they are body or trace fossils
The environment within which the taxon is thought to have lived, collated within the Paleobiology Database
Various types of trait data for the taxon, collated within the Paleobiology Database
Uhen MD et al. (2023). Paleobiology Database User Guide Version 1.0.
PaleoBios, 40 (11). doi:10.5070/P9401160531.
Compiled by Bethany Allen, current version downloaded on 14th July 2022. See item descriptions for details.
A function to generate time bins for a given study interval and geological
timescale. This function is flexible in that either stage-level or higher
stratigraphic-level (e.g. period) time bins can be called, valid timescales
from Macrostrat can be
used, or a data.frame
of a geological timescale can be provided. In
addition, near equal-length time bins can be generated by grouping
intervals together. For example, for a target bin size of 10 Myr, the
function will generate bins that have a mean bin length close to
10 Myr. Similarly, for a specified number of bins (n_bins
), the function
will generate this number of bins with with a bin duration as uniform as
possible. However, users may also want to consider grouping stages based on
other reasoning e.g. availability of outcrop (see Dean et al. 2020).
time_bins( interval = "Phanerozoic", rank = "stage", size = NULL, n_bins = NULL, assign = NULL, scale = "GTS2020", plot = FALSE )
time_bins( interval = "Phanerozoic", rank = "stage", size = NULL, n_bins = NULL, assign = NULL, scale = "GTS2020", plot = FALSE )
interval |
|
rank |
|
size |
|
n_bins |
|
assign |
|
scale |
|
plot |
|
This function uses either the Geological Time Scale 2020,
Geological Time Scale 2012, a valid timescale from
Macrostrat, or a
user-input data.frame
(see scale
argument) to generate time bins.
Note, timescales from
Macrostrat tend to
contain the most up-to-date information (e.g. the Geological Time Scale).
Additional information on included Geological Time Scales and source can
be accessed via:
Available interval names are accessible via the interval_name
column
in GTS2012
and GTS2020
. Data of the Geological Timescale 2020 and
2012 were compiled by Lewis A. Jones (2022-07-02).
A data.frame
of time bins for the specified intervals or a
list with a data.frame
of time bins and a named numeric
vector (bin number) of binned age estimates (midpoint of specified bins)
if assign
is specified. By default, the time bins data.frame
contains the following columns: bin, interval_name, rank, max_ma, mid_ma,
min_ma, duration_myr, abbr (interval abbreviation), colour and font
(colour). If size
or n_bins
is specified, the time bins
data.frame
contains the following columns: bin, max_ma, mid_ma,
min_ma, duration_myr, grouping_rank, intervals, colour and font.
Dean, C.D., Chiarenza, A.A. and Maidment, S.C., 2020. Formation binning: a new method for increased temporal resolution in regional studies, applied to the Late Cretaceous dinosaur fossil record of North America. Palaeontology, 63(6), 881-901. doi:10.1111/pala.12492.
Lewis A. Jones & Kilian Eichenseer
Kilian Eichenseer & William Gearty
#Using numeric age ex1 <- time_bins(interval = 10, plot = TRUE) #Using numeric age range ex2 <- time_bins(interval = c(50, 100), plot = TRUE) #Using a single interval name ex3 <- time_bins(interval = c("Maastrichtian"), plot = TRUE) #Using a range of intervals and near-equal duration bins ex4 <- time_bins(interval = c("Fortunian", "Meghalayan"), size = 10, plot = TRUE) #Assign bins based on given age estimates ex5 <- time_bins(interval = c("Fortunian", "Meghalayan"), assign = c(232, 167, 33)) #Use user-input data.frame to generate near-equal length bins scale <- data.frame(interval_name = 1:5, min_ma = c(0, 18, 32, 38, 45), max_ma = c(18, 32, 38, 45, 53)) ex6 <- time_bins(scale = scale, size = 20, plot = TRUE) #Use North American land mammal ages from Macrostrat and specify a desired #number of bins ex7 <- time_bins(scale = "North American land mammal ages", n_bins = 7)
#Using numeric age ex1 <- time_bins(interval = 10, plot = TRUE) #Using numeric age range ex2 <- time_bins(interval = c(50, 100), plot = TRUE) #Using a single interval name ex3 <- time_bins(interval = c("Maastrichtian"), plot = TRUE) #Using a range of intervals and near-equal duration bins ex4 <- time_bins(interval = c("Fortunian", "Meghalayan"), size = 10, plot = TRUE) #Assign bins based on given age estimates ex5 <- time_bins(interval = c("Fortunian", "Meghalayan"), assign = c(232, 167, 33)) #Use user-input data.frame to generate near-equal length bins scale <- data.frame(interval_name = 1:5, min_ma = c(0, 18, 32, 38, 45), max_ma = c(18, 32, 38, 45, 53)) ex6 <- time_bins(scale = scale, size = 20, plot = TRUE) #Use North American land mammal ages from Macrostrat and specify a desired #number of bins ex7 <- time_bins(scale = "North American land mammal ages", n_bins = 7)