Abstract

This study is a reproduction of: Tuholske, C., Lynch, V.D., Spriggs, R. et al. Hazardous heat exposure among incarcerated people in the United States. Nat Sustain 7, 394–398 (2024). https://doi.org/10.1038/s41893-024-01293-y Git Repository

By reproducing the exploratory data analysis done in Tuholske et al. (2024), we seek to…

1. Identify the temporal resolution at which authors use population data to calculate population weighted hazardous heat days

2. Understand the population weighting mechanism in state-level hazardous heat calculations.

3. Evaluate the effectiveness of the author’s methods compared to similar research.

Study metadata

Key words: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods.
Subject: Social and Behavioral Sciences: Geography: Geographic Information Sciences, Human Geography, Nature and Society Relations
Date created: 04/14/25
Date modified: 04/21/25
Spatial Coverage: United States Lower 48
Spatial Resolution: Carceral facility points and States
Spatial Reference System: Specify the geographic or projected coordinate system for the study, e.g. EPSG:4326
Temporal Coverage: 1982-2020
Temporal Resolution: 1 year

Original study spatio-temporal metadata

Spatial Coverage: United States Lower 48
Spatial Resolution: Carceral facility points and States
Spatial Reference System: spatial reference system of original study
Temporal Coverage: 1982-2020
Temporal Resolution: 1 year

Study design

This study is a reproduction study, and the original study is more exploratory in design. The primary objective for exploration through research and analysis investigated in the original study was exposure to hazardous heat in carceral facilities in the continental U.S. The authors wanted to examine how exposure to hazardous heat changed over time from 1982-2020, as well as how exposure within carceral facilities compared to exposure in the rest of the state. In general, determining the spatial distribution of carceral facilities with higher levels of hazardous heat exposure was also an objective of the original paper

Materials and procedure

Computational environment

Original study computational environment

The original study data transformations and analysis were completed primarily in R using Rmd documents, as well as in Python. The versions of R and Python used are not disclosed, but would have been R 4.3.3 or earlier, and Python 3.12 or earlier.

In the original study, R packages are called in across different scripts. However, it seems that the important ones for this study are:

original_study_packages <- c(
  "dplyr", 
  "data.table", 
  "maptools",
  "mapproj",
  "rgeos",
  "rgdal",
  "RColorBrewer",
  "ggplot2",
  "raster", # planned deviation: we will be using `stars` in our reproduction
  "sp", # planned deviation: we will be using `sf` in our reproduction
  "plyr",
  "graticule",
  "zoo",
  "purrr",
  "cowplot",
  "janitor"
  )

Prepare reproduction computational environment

For the reproduction study, we will be using R version 4.4.2, and the groundhog package to maintain package consistency. All packages used will be up to date as of 2025-02-01.

We plan on using the packages tidyverse, here, markdown, htmltools, dplyr, sf, and stars. As we encounter the need for other packages in our implementation of the code, we will make note of them as unplanned deviations.

Data and variables

We are going to use data from the original study’s git repository (linked on top level readme). This includes:

- Population data for the study period

- Prison boundary polygons with facility information

- State polygons

- WBGT data, at prison point and state levels

Population Data

Title: population (pre_1990 & vintage_2020)
Abstract: Population data representing different age groups (10 year increments) from 0-5 years old up to 85 years old by sex
Spatial Coverage: Continental U.S.
Spatial Resolution: County by FIPS Code
Spatial Representation Type: N/A
Spatial Reference System: N/A
Temporal Coverage: Each year, 1982-2020
Temporal Resolution: Month
Lineage: Acquired from census, pre-1990 and post-1990 data standardized
Distribution: Data available in original study’s git repository
Constraints: Public domain
Data Quality: Unclear lineage documentation

Label	Alias	Definition	Type	Accuracy	Domain	Missing Data Value(s)	Missing Data Frequency
year	…	observance year	integer	…	…	…	…
fips	…	county FIPS code	integer	…	…	…	…
sex	…	1 is male, 2 is female	integer	…	…	…	…
age	…	10-year age group	integer	…	…	…	…
month	…	month of year	integer	…	…	…	…
pop	…	group’s population in county	integer	…	…	…	…

Prison Boundaries

Title: Prison_Boundaries.shp
Abstract: Shapefile containing prison boundary polygons including geographic, type, operation, population, capacity, and other data
Spatial Coverage: United States of America (including Alaska, Hawaii, DC, and territories)
Spatial Resolution: parcel/building sized polygon (effectively points)
Spatial Representation Type: vector MULTIPOLYGON
Spatial Reference System: CRS 3857 Spherical/Web Mercator
Temporal Coverage: unclear - appears to represent data as of 6/6/2020
Temporal Resolution: n/a
Lineage: Refer to metadata_Prison_Boundaries_WebDownload.pdf
Distribution: Data available in original study’s git repository
Constraints: Public Domain
Data Quality: Unclear lineage documentation, many missing facility information
Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)

Label	Alias	Definition	Type	Accuracy	Domain	Missing Data Value(s)	Missing Data Frequency
status	…	describes facility status as open, closed or not available	…	…	…	…	…
population	…	population of facility, -999 represents missing data	…	…	…	…	…
capacity	…	total capacity of facility, -999 represents missing data	…	…	…	…	…

State Boundaries

Title: states.shp
Abstract: state boundary polygons with region
Spatial Coverage: The 50 US states and Washington DC
Spatial Resolution: US state
Spatial Representation Type: vector MULTIPOLYGON
Spatial Reference System: EPSG 4269
Temporal Coverage: n/a
Temporal Resolution: n/a
Lineage: unknown
Distribution: Data available in original study’s git repository
Constraints: Public domain
Data Quality: n/a
Variables:

Label	Alias	Definition	Type	Accuracy	Domain	Missing Data Value(s)	Missing Data Frequency
STATE_NAME	Name	Name of state	character string	n/a	US state names	n/a	n/a
DRAWSEQ	Draw Sequence	unknown	integer	n/a	1-51	n/a	n/a
STATE_FIPS	FIPS Code	State FIPS code (two digit)	integer	n/a	01-56	n/a	n/a
SUB_REGION	Sub-Region	Sub-Region of the US	character string	n/a	n/a	n/a	n/a
STATE_ABBR	Abbreviation	Two letter abbreviation	character string	n/a	two-letter postal abbreviations	…	…

WBGT Data

WBGT Data - Prison Level

Title: wbgt_raw/prison/weighted_area_raster_prison_wbgtmax_daily_(year).rds
Abstract: Daily WBGTmax, weighted by area, from 1982-2020
Spatial Coverage: United States Lower 48
Spatial Resolution: Prison by prison ID
Spatial Representation Type: N/A
Spatial Reference System: N/A
Temporal Coverage: Each year, 1982-2020
Temporal Resolution: Day of year
Lineage:
- Heat data acquired from Parameter-elevation Regressions on Independent Slopes Model (PRISM) dataset
- Prison data acquired from Homeland Infrastructure Foundation-Level Data (HIFLD), produced by the Department of Homeland Security
- WBGTmax estimated by authors using high-resolution (4 km) daily maximum 2 m air temperature data (Tmax), and maximum vapour pressure deficit data (VPDmax)
Distribution: Data available in original study’s git repository
Constraints: Open access Creative Commons Attribution 4.0 International License,
Data Quality: Data lacks sufficient documentation in the original study repository/resources. The original link to the HIFLD data (from the citation) no longer works, however HIFLD data can now be found here.

Label	Alias	Definition	Type	Accuracy	Domain	Missing Data Value(s)	Missing Data Frequency
prison_id	…	unqiue prison id	integer	…	6640 prisons	…	…
wbgtmax	…	wbgtmax estimated for specified day	integer	…	…	missing data not included	…
date	…	day of year (dd/mm/yyyy)	character string	…	…	…	…
day	…	day	integer	…	…	…	…
month	…	month	integer	…	…	…	…
year	…	year	integer	…	…	…	…

WBGT Data - State Level

Title: wbgt_raw/state/weighted_area_raster_fips_wbgtmax_daily_(year).rds
Abstract: Daily WBGTmax, weighted by area, from 1982-2020
Spatial Coverage: United States Lower 48
Spatial Resolution: County by FIPS Code
Spatial Representation Type: N/A
Spatial Reference System: N/A
Temporal Coverage: Each year, 1982-2020
Temporal Resolution: Day of year
Lineage:
- Acquired from Parameter-elevation Regressions on Independent Slopes Model (PRISM) dataset
- Unknown exactly how WBGTmax 4 km raster was summarized by county polygons using the daily maximum 2 m air temperature data (Tmax) and maximum vapour pressure deficit data (VPDmax)
Distribution: Data available in original study’s git repository
Constraints: Open access Creative Commons Attribution 4.0 International License,
Data Quality: Data lacks sufficient documentation in the original study repository/resources.

Label	Alias	Definition	Type	Accuracy	Domain	Missing Data Value(s)	Missing Data Frequency
fips	…	county fips code	integer	…	…	…	…
wbgtmax	…	wbgtmax estimated for specified day	integer	…	…	missing data not included	…
date	…	day of year (dd/mm/yyyy)	character string	…	…	…	…
day	…	day	integer	…	…	…	…
month	…	month	integer	…	…	…	…
year	…	year	integer	…	…	…	…

Prior observations

At the time of this pre-analysis plan, we have the derived data from the original study to work off of, and examined some of the csv tables. We have neither visualized nor analyzed prison data or WBGTmax temperature data before.

Bias and threats to validity

There are no statistical tests in this study, so issues such as spatial heterogeneity/anisotropy/autocorrelation do not matter. Scale could be a threat to validity, because county populations are aggregated to calculate the number of population-weighted heat days in each state. There is also a scale issue measuring micro-climate conditions at prison boundaries compared to 4 km temperature data. Further, there is no specification of how heat days are calculated within each county given that counties do no map neatly to 4 km by 4km grids used to calculate hazardous heat days. The ways in which the county boundaries are drawn also supports the argument that there is a Modifiable Area Unit Problem.

Both the scale and boundary issues also have a temporal component that may create threats to validity.

Data transformations

Planned deviation:

We will not attempt to produce the original study’s WBGTmax grid because the methods are unclear, and therefore we will skip to joining the author-provided WBGTmax by day grid data to the prison points.

(When implementing plan) Explain what we believe the authors did to produce the WBGTmax grid and preliminary steps

Transform data to create Figures 1a, 1b

(More descriptive segment of original study’s workflow)

Step 1: Join author-provided WBGTmax by day grid data to author-provided carceral facility point data

Define years and data paths

Unplanned Deviation: Read in WBGTmax by prison data as single RDS frame.

(This step takes a while to run!)

Code is adapted from original repository

Unplanned Deviation: Data still needs to be joined to prison points

Result: Rds table of WBGTmax by day by prison

Step 2: Filter result by days when WBGTmax exceeded 28 degrees C

begin here to avoid running first two chunks! Step 3: Group by carceral facility type and year

The group by is already included in the study’s filter/summary code from 02 –MM Count to produce summary of days exceeded per year by facility

Result: Rds table with variables
- prison facility

- facility type

- prison population

- n days exceeding 28 degrees

- year

Unplanned Deviation: **Taken from original code Take number of days in first year, last year and last 5 years of data prepares data for figure 1 and figure 2b (and a little bit for 2c) –MM

Trends in growth of number of day per year over time

Totals are calculated by multiplying the regression slope by the total number of years -MM

Merge regression analyses into one data frame

## Joining with `by = join_by(prison_id)`
## Joining with `by = join_by(prison_id)`
## Joining with `by = join_by(prison_id)`
## Joining with `by = join_by(prison_id)`
## Joining with `by = join_by(prison_id)`
## Joining with `by = join_by(prison_id)`

Save regression file

Step 4: combine state level WBGT data into a single file

Iterate across years of WBGT data and combine into one large file

Step 4b: summarize hazardous heat day count by county

## `summarise()` has grouped output by 'fips'. You can override using the
## `.groups` argument.

Unplanned Deviation: Per original study, fix counties to be consistent all the way through

Next, summarize by the fixed counties

## `summarise()` has grouped output by 'fips'. You can override using the
## `.groups` argument.

Step 4c: load county populations

Step 4d: join population data with summarized, county-level hazardous heat data

## Joining with `by = join_by(fips, year)`

Unplanned Deviation: Report list of counties with NA population values.

## # A tibble: 34 × 7
## # Groups:   fips [34]
##    fips   year wbgt_26 wbgt_28 wbgt_30 wbgt_35   pop
##    <chr> <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
##  1 55121  2012      10       3       0       0    NA
##  2 55123  2012      11       4       0       0    NA
##  3 55125  2012       2       0       0       0    NA
##  4 55127  2012      13       5       0       0    NA
##  5 55129  2012       4       1       0       0    NA
##  6 55131  2012      10       3       0       0    NA
##  7 55133  2012      10       4       0       0    NA
##  8 55135  2012      10       3       0       0    NA
##  9 55137  2012      13       6       0       0    NA
## 10 55139  2012      13       5       0       0    NA
## # ℹ 24 more rows

Join to county shapefile and visualize

## Retrieving data for the year 2022

## ℹ tmap mode set to "plot".

Filter entire dataset by NA fips and see if they have population data dplyr filter where fips is %IN% group of fips code

Take average of year before and year after dplyr filter where fips is in group of fips code, and in year 2011 and 2013 group by average of pop

Step 4e: Take weighted average of each state in each year by population

## `summarise()` has grouped output by 'state'. You can override using the
## `.groups` argument.

Take weighted average of entire country in each year by population

Combine state-specific with national

Save state file over time

Analysis

Step 4: Begin with result from Step 3

Step 5: Spatial join county population by year data

Result: Rds table with variables
- prison facility

- facility type

- prison population

- county

- population

- n days exceeding 28 degrees

- year

Step 6: Population-weighted aggregation

Aggregate data into states

Weighted sum of days exceeded across all counties of the state

Sum of days exceeded multiplied by (Ratio of county population / state population)

Planned deviations for reproduction:

Results: Present figures (reproduced from original study and planned deviations).

Create Fig. 1a and b using results from step 3

Make Figure 1a

Work through author’s code for Figure 1

Load WBGT regression file

Prepare map structure (using ggplot)

Load US prison shapefile to load details and mapping

Merge prison over time file with shapefile (for geometry or prison data?)

## `summarise()` has grouped output by 'STATE', 'STATEFP', 'TYPE'. You can
## override using the `.groups` argument.

Calculate 5-year averages

## `summarise()` has grouped output by 'STATE', 'STATEFP'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'STATE', 'STATEFP'. You can override using
## the `.groups` argument.

Create Figure 1a

## Joining with `by = join_by(STATE)`
## Joining with `by = join_by(STATE)`
## Joining with `by = join_by(STATE)`

Create 1b

## [conflicted] Will prefer dplyr::filter over any other package.

Extract coordinates

Plot figure 1a

Plot figure 1b

Unplanned Deviation: using geom_sf() to plot states as basemap and geom_point() functions to plot carceral facility points

Deviation 2: Investigating sources of uncertainty/error/bias

Step 7: Select facilities with population of -999 from author-provided carceral facilities data.

Report number of facilities compared to authors’ number.

### Results Text

Ultimately, we were able to produce nearly identical versions of figures 1a and 1b but not figures 2a, 2b, or 2c. This indicates that the study provided sufficient materials to be partially reproduced. However, difficulties in the reproduction process made reproducing subsequent figures unfeasible given time and resource constraints. The reproduction process fortified trust in the authors’ findings, though the process revealed furhter avenues for improvements to reproducibility and open science practices.

We were also able to map NA and excluded values for both county and prison population. County population values are only missing in 2012, presenting minimal impact on study results. There does not appear to be a significantly different geographic distribution of excluded carceral facilities from included ones, though further investigation is necessary.

Discussion

The study showed important first steps towards participating in open source conventions. Including a linked github repository containing the labeled, ordered study code and a supplementary materials document improved the study’s legibility and provided essential information that greatly improved reproducibility. However, these documents could be improved to ensure better reproducibility and open access.

Key barriers to reproducibility identified were:

Brief Methods Section - The methods section of the published paper does not explain key information about the data transformation and analysis process, selection of prison facilities to include in the study, or the extent of the study and choice to remove Alaska, Hawaii, and non-state US territories.
Missing Climate Data - The github repository does not include raw PRISM data used to calculate WBGT max, any form of the 4x4km climate raster, or a download script to retrieve data. It is therefore very difficult to reproduce their climate analysis or determine any methods used to resolve MAUP/boundary issues.
Raster to Polygon Join - The methods section and repository do not include a method for joining the raster data to the prison and county geometries. Therefore, it is unclear what method was used to assign WBGTmax values to counties spanning multiple grid cells.
Prison Population Metadata - The study does not specify the date the prison data were downloaded or how the prison boundaries file was edited.
Repository Organization and Labeling - While there was a clear order to the R code files, spreading the analysis and visualization across a large number of files made it more difficult to trace what was being used where. Mislabeled and extraneous files in the repository added challenges to navigation. Some files throughout the repository are named “raster” but do not include raster data.
Code Documentation - Code files could be more explicitly and thoroughly documented for readability. For example, the README.txt file included in the python folder outlines a different series of code files than is present in the repository. Recording of intermediate results including attribute counts would improve reproducibility. Without a clear workflow in the methods section, the data transformation process was difficult to understand.
Packages - The use of a variety of packages, including rgdal which is no longer operational, impedes reproducibility. Confusing use of packages like using the geom_polygon() function to plot points made code less readable. Furthermore, there use dates and package versions are not specified which impedes reproducibility. Standardization of open source package use and the groundhog package that stores and loads dates and versions for all packages would improve reproducibility.

Key threats to validity identified were:

Barriers to Reproducibility - The study only partially reproducible which presents a threat to validity because its results are difficult to verified.
WBGTmax Raster - Hazardous heat exposure is calculated using a 4x4km grid of daytime highs to county populations, presenting scale and boundary issues.

Scale - Both of these grid and county geometries are large enough to house significant variation in heat and population density which could skew results, especially in densely populated areas.
Boundary - The lack of a documented method to resolve boundary issues joining the raster to prison and county geometries (e.g. centroid, area of overlap, area-weighted reaggregation etc.) presents a threat to the validity of the hazardous heat exposure and relative risk calculations.
Heat Risk Metric - Recent studies of heat risk have indicated that nighttime lows predict heat related mortality. When nighttime lows are high, the body is unable to recover from heat stress, exacerbating health risks.*

Prison Data - The study and its repository consistently refer to carceral facilities as “prisons” despite also analyzing data from jails, detention centers, and other facilities. Missing data and documentation and the temporal resolution threaten validity.

Missing Population Data - While the study focuses on the 4078 open and populated facilities, the HIFLD prison data file contains data on 6738 facilities. 1817 of these facilities reported being open but did not have available population data. 12 reported a population greater than 0 but did not have available data on their status (open or closed). 80 facilities did not have available status or population data.
Temporal Resolution - The study uses carceral facility population data from a single, unspecified date to calculate changes in heat risk over 38 years. This threatens validity because these populations are not stable and the lack of a specific date makes it impossible to check biases.

Selective Reporting Bias - While the supplementary materials and github repository included additional figures and analyses testing different temperature thresholds and data visualizations, there is minimal documentation of the reasoning used to select the figures in the published study.

*Murage P, Hajat S, Kovats RS. Effect of night-time temperatures on cause and age-specific mortality in London. Environ Epidemiol. 2017 Dec;1(2):e005. doi: 10.1097/EE9.0000000000000005. Epub 2017 Dec 13. PMID: 33195962; PMCID: PMC7608908.

Through this reproduction in progress, we were able to identify key barriers to reproducibility and threats to validity in the study. The reproduction was limited by the time and energy of the reproducers (Sam and Matthew) and our limited expertise handling large datasets containing raster data and time series data. However, our inability to reproduce or verify all of the figures indicates that reproducibility attempts in the study were not sufficient to reduce barriers. Producing a more thorough reproduction could be accomplished by: Recreating Figure 2 Mapping out the authors’ workflow and study design based on the code Reanalyzing and visualizing the data using open source packages Visualizing and investigating prisons with missing data Replicating the study using nighttime low temperatures

Ultimately, the study provides a template for exploring general trends in extreme heat risk for incarcerated populations. With further attention to clarity, accessibility, and reproducibility, this study and its reproductions could contribute to a reporting or monitoring system for heat risk in carceral facilities. Despite threats to validity and reproducibility, the study provides a basis for critical future research in environmental hazards and abolition.

Integrity Statement

This is the first version of our report Any deviations from our pre-analysis plan in our workflow will be documented as unplanned deviations.

Acknowledgements

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}

References

Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ

Barrett, Tyson, Matt Dowle, Arun Srinivasan, Jan Gorecki, Michael Chirico, Toby Hocking, and Benjamin Schwendinger. 2024. Data.table: Extension of ‘Data.frame‘. https://r-datatable.com.

Bivand, Roger, and Nicholas Lewin-Koh. 2023. Maptools: Tools for Handling Spatial Objects. http://maptools.r-forge.r-project.org/.

Cheng, Joe, Carson Sievert, Barret Schloerke, Winston Chang, Yihui Xie, and Jeff Allen. 2024. Htmltools: Tools for HTML. https://github.com/rstudio/htmltools.

Firke, Sam. 2024. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://github.com/sfirke/janitor.

McIlroy, Doug, Ray Brownrigg, Thomas P Minka, and Roger Bivand. 2023. Mapproj: Map Projections. https://CRAN.R-project.org/package=mapproj.

Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://here.r-lib.org/.

Neuwirth, Erich. 2022. RColorBrewer: ColorBrewer Palettes. https://CRAN.R-project.org/package=RColorBrewer.

Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.

———. 2024a. Sf: Simple Features for r. https://r-spatial.github.io/sf/.

———. 2024b. Stars: Spatiotemporal Arrays, Raster and Vector Data Cubes. https://r-spatial.github.io/stars/.

Pebesma, Edzer, and Roger Bivand. 2023a. Spatial Data Science: With applications in R. Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.

———. 2023b. Spatial Data Science: With applications in R. London: Chapman; Hall/CRC. https://doi.org/10.1201/9780429459016.

R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Sumner, Michael D. 2023. Graticule: Meridional and Parallel Lines for Maps. https://github.com/hypertidy/graticule.

Tennekes, Martijn. 2018. “tmap: Thematic Maps in R.” Journal of Statistical Software 84 (6): 1–39. https://doi.org/10.18637/jss.v084.i06.

———. 2025. Tmap: Thematic Maps. https://github.com/r-tmap/tmap.

Walker, Kyle. 2024. Tigris: Load Census TIGER/Line Shapefiles. https://github.com/walkerke/tigris.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

———. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://tidyverse.tidyverse.org.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, Dewey Dunnington, and Teun van den Brand. 2024. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://ggplot2.tidyverse.org.

Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org.

Wickham, Hadley, and Lionel Henry. 2023. Purrr: Functional Programming Tools. https://purrr.tidyverse.org/.

Wilke, Claus O. 2024. Cowplot: Streamlined Plot Theme and Plot Annotations for Ggplot2. https://wilkelab.org/cowplot/.

Xie, Yihui, JJ Allaire, and Jeffrey Horner. 2024. Markdown: Render Markdown with Commonmark. https://github.com/rstudio/markdown.

Zeileis, Achim, and Gabor Grothendieck. 2005. “Zoo: S3 Infrastructure for Regular and Irregular Time Series.” Journal of Statistical Software 14 (6): 1–27. https://doi.org/10.18637/jss.v014.i06.

Zeileis, Achim, Gabor Grothendieck, and Jeffrey A. Ryan. 2023. Zoo: S3 Infrastructure for Regular and Irregular Time Series (z’s Ordered Observations). https://zoo.R-Forge.R-project.org/.

Analysis Plan Version 1

Samuel Barnard and Matthew Mills

2025-05-21