This study is a reproduction of: Tuholske, C., Lynch, V.D., Spriggs, R. et al. Hazardous heat exposure among incarcerated people in the United States. Nat Sustain 7, 394–398 (2024). https://doi.org/10.1038/s41893-024-01293-y Git Repository
By reproducing the exploratory data analysis done in Tuholske et al. (2024), we seek to…
1. Identify the temporal resolution at which authors use population data to calculate population weighted hazardous heat days
2. Understand the population weighting mechanism in state-level hazardous heat calculations.
3. Evaluate the effectiveness of the author’s methods compared to similar research.
Key words
: Comma-separated list of keywords (tags) for
searchability. Geographers often use one or two keywords each for:
theory, geographic context, and methods.Subject
: Social and Behavioral Sciences: Geography:
Geographic Information Sciences, Human Geography, Nature and Society
RelationsDate created
: 04/14/25Date modified
: 04/21/25Spatial Coverage
: United States Lower 48Spatial Resolution
: Carceral facility points and
StatesSpatial Reference System
: Specify the geographic or
projected coordinate system for the study, e.g. EPSG:4326Temporal Coverage
: 1982-2020Temporal Resolution
: 1 yearSpatial Coverage
: United States Lower 48Spatial Resolution
: Carceral facility points and
StatesSpatial Reference System
: spatial reference system of
original studyTemporal Coverage
: 1982-2020Temporal Resolution
: 1 yearThis study is a reproduction study, and the original study is more exploratory in design. The primary objective for exploration through research and analysis investigated in the original study was exposure to hazardous heat in carceral facilities in the continental U.S. The authors wanted to examine how exposure to hazardous heat changed over time from 1982-2020, as well as how exposure within carceral facilities compared to exposure in the rest of the state. In general, determining the spatial distribution of carceral facilities with higher levels of hazardous heat exposure was also an objective of the original paper
The original study data transformations and analysis were completed primarily in R using Rmd documents, as well as in Python. The versions of R and Python used are not disclosed, but would have been R 4.3.3 or earlier, and Python 3.12 or earlier.
In the original study, R packages are called in across different scripts. However, it seems that the important ones for this study are:
original_study_packages <- c(
"dplyr",
"data.table",
"maptools",
"mapproj",
"rgeos",
"rgdal",
"RColorBrewer",
"ggplot2",
"raster", # planned deviation: we will be using `stars` in our reproduction
"sp", # planned deviation: we will be using `sf` in our reproduction
"plyr",
"graticule",
"zoo",
"purrr",
"cowplot",
"janitor"
)
For the reproduction study, we will be using R version 4.4.2, and the
groundhog
package to maintain package consistency. All
packages used will be up to date as of 2025-02-01.
We plan on using the packages tidyverse
,
here
, markdown
, htmltools
,
dplyr
, sf
, and stars
. As we
encounter the need for other packages in our implimentation of the code,
we will make note of them as unplanned deviations.
We are going to use data from the original study’s git repository (linked on top level readme). This includes:
- Population data for the study period
- Prison boundary polygons with facility information
- State polygons
- WBGT data, at prison point and state levels
Title
: population (pre_1990 & vintage_2020)Abstract
: Population data representing different age groups (10 year increments) from 0-5 years old up to 85 years old by sexSpatial Coverage
: Continental U.S.Spatial Resolution
: County by FIPS CodeSpatial Representation Type
: N/ASpatial Reference System
: N/ATemporal Coverage
: Each year, 1982-2020Temporal Resolution
: MonthLineage
: Acquired from census, pre-1990 and post-1990 data standardizedDistribution
: Data available in original study’s git repositoryConstraints
: Public domainData Quality
: Unclear lineage documentationLabel | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
year | … | observance year | integer | … | … | … | … |
fips | … | county FIPS code | integer | … | … | … | … |
sex | … | 1 is male, 2 is female | integer | … | … | … | … |
age | … | 10-year age group | integer | … | … | … | … |
month | … | month of year | integer | … | … | … | … |
pop | … | group’s population in county | integer | … | … | … | … |
Title
: Prison_Boundaries.shpAbstract
: Shapefile containing prison boundary polygons including geographic, type, operation, population, capacity, and other dataSpatial Coverage
: United States of America (including Alaska, Hawaii, DC, and territories)Spatial Resolution
: parcel/building sized polygon (effectively points)Spatial Representation Type
: vector
MULTIPOLYGON
Spatial Reference System
: CRS 3857 Spherical/Web MercatorTemporal Coverage
: unclear - appears to represent data as of 6/6/2020Temporal Resolution
: n/aLineage
: Refer to metadata_Prison_Boundaries_WebDownload.pdfDistribution
: Data available in original study’s git repositoryConstraints
: Public DomainData Quality
: Unclear lineage documentation, many missing facility informationVariables
: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
status | … | describes facility status as open, closed or not available | … | … | … | … | … |
population | … | population of facility, -999 represents missing data | … | … | … | … | … |
capacity | … | total capacity of facility, -999 represents missing data | … | … | … | … | … |
Title
: states.shpAbstract
: state boundary polygons with regionSpatial Coverage
: The 50 US states and Washington DCSpatial Resolution
: US stateSpatial Representation Type
: vector
MULTIPOLYGON
Spatial Reference System
: EPSG 4269Temporal Coverage
: n/aTemporal Resolution
: n/aLineage
: unknownDistribution
: Data available in original study’s git repositoryConstraints
: Public domainData Quality
: n/aVariables
:Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
STATE_NAME | Name | Name of state | character string | n/a | US state names | n/a | n/a |
DRAWSEQ | Draw Sequence | unknown | integer | n/a | 1-51 | n/a | n/a |
STATE_FIPS | FIPS Code | State FIPS code (two digit) | integer | n/a | 01-56 | n/a | n/a |
SUB_REGION | Sub-Region | Sub-Region of the US | character string | n/a | n/a | n/a | n/a |
STATE_ABBR | Abbreviation | Two letter abbreviation | character string | n/a | two-letter postal abbreviations | … | … |
Title
: wbgt_raw/prison/weighted_area_raster_prison_wbgtmax_daily_(year).rdsAbstract
: Daily WBGTmax, weighted by area, from 1982-2020Spatial Coverage
: United States Lower 48Spatial Resolution
: Prison by prison IDSpatial Representation Type
: N/ASpatial Reference System
: N/ATemporal Coverage
: Each year, 1982-2020Temporal Resolution
: Day of yearLineage
:
Distribution
: Data available in original study’s git repositoryConstraints
: Open access Creative Commons Attribution 4.0 International License,Data Quality
: Data lacks sufficient documentation in the original study repository/resources. The original link to the HIFLD data (from the citation) no longer works, however HIFLD data can now be found here.Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
prison_id | … | unqiue prison id | integer | … | 6640 prisons | … | … |
wbgtmax | … | wbgtmax estimated for specified day | integer | … | … | missing data not included | … |
date | … | day of year (dd/mm/yyyy) | character string | … | … | … | … |
day | … | day | integer | … | … | … | … |
month | … | month | integer | … | … | … | … |
year | … | year | integer | … | … | … | … |
Title
: wbgt_raw/state/weighted_area_raster_fips_wbgtmax_daily_(year).rdsAbstract
: Daily WBGTmax, weighted by area, from 1982-2020Spatial Coverage
: United States Lower 48Spatial Resolution
: County by FIPS CodeSpatial Representation Type
: N/ASpatial Reference System
: N/ATemporal Coverage
: Each year, 1982-2020Temporal Resolution
: Day of yearLineage
:
Distribution
: Data available in original study’s git repositoryConstraints
: Open access Creative Commons Attribution 4.0 International License,Data Quality
: Data lacks sufficient documentation in the original study repository/resources.Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
fips | … | county fips code | integer | … | … | … | … |
wbgtmax | … | wbgtmax estimated for specified day | integer | … | … | missing data not included | … |
date | … | day of year (dd/mm/yyyy) | character string | … | … | … | … |
day | … | day | integer | … | … | … | … |
month | … | month | integer | … | … | … | … |
year | … | year | integer | … | … | … | … |
At the time of this pre-analysis plan, we have the derived data to work off of, and we have examined some of the csv tables. We have neither visualized nor analyzed prison data or WBGTmax temperature data before.
There are no statistical tests in this study, so issues such as spatial heterogeneity/anistropy/autocorrelation do not matter. Scale could be a threat to validity, because county populations are aggregated to calculate the number of population-weighted heat days in each state. There is also a scale issue measuring micro-climate conditions at prison boundaries compared to 4 km temperature data. Further, there is no specification of how heat days are calculated within each county given that counties do no map neatly to 4 km by 4km grids used to calculate hazardous heat days. The ways in which the county boundaries are drawn also supports the argument that there is a Modifiable Area Unit Problem.
Both the scale and boundary issues also have a temporal component that may create threats to validity.
We will not attempt to produce the original study’s WBGTmax grid because the methods are unclear, and therefore we will skip to joining the author-provided WBGTmax by day grid data to the prison points.
(When implementing plan) Explain what we believe the authors did to produce the WBGTmax grid and preliminary steps
(More descriptive segment of original study’s workflow)
Count to produce summary of days exceeded per year by facility
Result: Rds table with variables
- prison facility
- facility type
- prison population
- n days exceeding 28 degrees
- year
Figure 2b, 2c - results are based on linear regression models
Result: Rds table with variables
- prison facility
- facility type
- prison population
- county
- population
- n days exceeding 28 degrees
- year
Aggregate data into states
Weighted sum of days exceeded across all counties of the state
Sum of days exceeded multiplied by (Ratio of county population / state population)
Repeat workflow for reproducing figures 1 and 2, instead filtering for days when WBGTmax exceeded 29.4 degrees C (85 degrees F standard informed by other literature)
What are the implications of us being able to recreate or not recreate the figures? Why does it matter for the original study to be reproducible? Mention significance of groundhog usage for sustainable reproduction. Discuss research suggesting maximum daily temperature doesn’t matter as much for heat stress, and how long stretches of night time lows may be more serious.
This is the first version of our pre-analysis plan. Any deviations in our workflow will be documented as unplanned deviations.
This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}
Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ