This study is a reproduction of: Tuholske, C., Lynch, V.D., Spriggs, R. et al. Hazardous heat exposure among incarcerated people in the United States. Nat Sustain 7, 394–398 (2024). https://doi.org/10.1038/s41893-024-01293-y Git Repository
By reproducing the exploratory data analysis done in Tuholske et al. (2024), we seek to…
1. Identify the temporal resolution at which authors use population data to calculate population weighted hazardous heat days
2. Understand the population weighting mechanism in state-level hazardous heat calculations.
3. Evaluate the effectiveness of the author’s methods compared to similar research.
Key words
: Comma-separated list of keywords (tags) for
searchability. Geographers often use one or two keywords each for:
theory, geographic context, and methods.Subject
: Social and Behavioral Sciences: Geography:
Geographic Information Sciences, Human Geography, Nature and Society
RelationsDate created
: 04/14/25Date modified
: 04/21/25Spatial Coverage
: United States Lower 48Spatial Resolution
: Carceral facility points and
StatesSpatial Reference System
: Specify the geographic or
projected coordinate system for the study, e.g. EPSG:4326Temporal Coverage
: 1982-2020Temporal Resolution
: 1 yearSpatial Coverage
: United States Lower 48Spatial Resolution
: Carceral facility points and
StatesSpatial Reference System
: spatial reference system of
original studyTemporal Coverage
: 1982-2020Temporal Resolution
: 1 yearThis study is a reproduction study, and the original study is more exploratory in design. The primary objective for exploration through research and analysis investigated in the original study was exposure to hazardous heat in carceral facilities in the continental U.S. The authors wanted to examine how exposure to hazardous heat changed over time from 1982-2020, as well as how exposure within carceral facilities compared to exposure in the rest of the state. In general, determining the spatial distribution of carceral facilities with higher levels of hazardous heat exposure was also an objective of the original paper
The original study data transformations and analysis were completed primarily in R using Rmd documents, as well as in Python. The versions of R and Python used are not disclosed, but would have been R 4.3.3 or earlier, and Python 3.12 or earlier.
In the original study, R packages are called in across different scripts. However, it seems that the important ones for this study are:
original_study_packages <- c(
"dplyr",
"data.table",
"maptools",
"mapproj",
"rgeos",
"rgdal",
"RColorBrewer",
"ggplot2",
"raster", # planned deviation: we will be using `stars` in our reproduction
"sp", # planned deviation: we will be using `sf` in our reproduction
"plyr",
"graticule",
"zoo",
"purrr",
"cowplot",
"janitor"
)
For the reproduction study, we will be using R version 4.4.2, and the
groundhog
package to maintain package consistency. All
packages used will be up to date as of 2025-02-01.
We plan on using the packages tidyverse
,
here
, markdown
, htmltools
,
dplyr
, sf
, and stars
. As we
encounter the need for other packages in our implementation of the code,
we will make note of them as unplanned deviations.
We are going to use data from the original study’s git repository (linked on top level readme). This includes:
- Population data for the study period
- Prison boundary polygons with facility information
- State polygons
- WBGT data, at prison point and state levels
Title
: population (pre_1990 & vintage_2020)Abstract
: Population data representing different age groups (10 year increments) from 0-5 years old up to 85 years old by sexSpatial Coverage
: Continental U.S.Spatial Resolution
: County by FIPS CodeSpatial Representation Type
: N/ASpatial Reference System
: N/ATemporal Coverage
: Each year, 1982-2020Temporal Resolution
: MonthLineage
: Acquired from census, pre-1990 and post-1990 data standardizedDistribution
: Data available in original study’s git repositoryConstraints
: Public domainData Quality
: Unclear lineage documentationLabel | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
year | … | observance year | integer | … | … | … | … |
fips | … | county FIPS code | integer | … | … | … | … |
sex | … | 1 is male, 2 is female | integer | … | … | … | … |
age | … | 10-year age group | integer | … | … | … | … |
month | … | month of year | integer | … | … | … | … |
pop | … | group’s population in county | integer | … | … | … | … |
Title
: Prison_Boundaries.shpAbstract
: Shapefile containing prison boundary polygons including geographic, type, operation, population, capacity, and other dataSpatial Coverage
: United States of America (including Alaska, Hawaii, DC, and territories)Spatial Resolution
: parcel/building sized polygon (effectively points)Spatial Representation Type
: vector
MULTIPOLYGON
Spatial Reference System
: CRS 3857 Spherical/Web MercatorTemporal Coverage
: unclear - appears to represent data as of 6/6/2020Temporal Resolution
: n/aLineage
: Refer to metadata_Prison_Boundaries_WebDownload.pdfDistribution
: Data available in original study’s git repositoryConstraints
: Public DomainData Quality
: Unclear lineage documentation, many missing facility informationVariables
: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
status | … | describes facility status as open, closed or not available | … | … | … | … | … |
population | … | population of facility, -999 represents missing data | … | … | … | … | … |
capacity | … | total capacity of facility, -999 represents missing data | … | … | … | … | … |
Title
: states.shpAbstract
: state boundary polygons with regionSpatial Coverage
: The 50 US states and Washington DCSpatial Resolution
: US stateSpatial Representation Type
: vector
MULTIPOLYGON
Spatial Reference System
: EPSG 4269Temporal Coverage
: n/aTemporal Resolution
: n/aLineage
: unknownDistribution
: Data available in original study’s git repositoryConstraints
: Public domainData Quality
: n/aVariables
:Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
STATE_NAME | Name | Name of state | character string | n/a | US state names | n/a | n/a |
DRAWSEQ | Draw Sequence | unknown | integer | n/a | 1-51 | n/a | n/a |
STATE_FIPS | FIPS Code | State FIPS code (two digit) | integer | n/a | 01-56 | n/a | n/a |
SUB_REGION | Sub-Region | Sub-Region of the US | character string | n/a | n/a | n/a | n/a |
STATE_ABBR | Abbreviation | Two letter abbreviation | character string | n/a | two-letter postal abbreviations | … | … |
Title
: wbgt_raw/prison/weighted_area_raster_prison_wbgtmax_daily_(year).rdsAbstract
: Daily WBGTmax, weighted by area, from 1982-2020Spatial Coverage
: United States Lower 48Spatial Resolution
: Prison by prison IDSpatial Representation Type
: N/ASpatial Reference System
: N/ATemporal Coverage
: Each year, 1982-2020Temporal Resolution
: Day of yearLineage
:
Distribution
: Data available in original study’s git repositoryConstraints
: Open access Creative Commons Attribution 4.0 International License,Data Quality
: Data lacks sufficient documentation in the original study repository/resources. The original link to the HIFLD data (from the citation) no longer works, however HIFLD data can now be found here.Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
prison_id | … | unqiue prison id | integer | … | 6640 prisons | … | … |
wbgtmax | … | wbgtmax estimated for specified day | integer | … | … | missing data not included | … |
date | … | day of year (dd/mm/yyyy) | character string | … | … | … | … |
day | … | day | integer | … | … | … | … |
month | … | month | integer | … | … | … | … |
year | … | year | integer | … | … | … | … |
Title
: wbgt_raw/state/weighted_area_raster_fips_wbgtmax_daily_(year).rdsAbstract
: Daily WBGTmax, weighted by area, from 1982-2020Spatial Coverage
: United States Lower 48Spatial Resolution
: County by FIPS CodeSpatial Representation Type
: N/ASpatial Reference System
: N/ATemporal Coverage
: Each year, 1982-2020Temporal Resolution
: Day of yearLineage
:
Distribution
: Data available in original study’s git repositoryConstraints
: Open access Creative Commons Attribution 4.0 International License,Data Quality
: Data lacks sufficient documentation in the original study repository/resources.Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
fips | … | county fips code | integer | … | … | … | … |
wbgtmax | … | wbgtmax estimated for specified day | integer | … | … | missing data not included | … |
date | … | day of year (dd/mm/yyyy) | character string | … | … | … | … |
day | … | day | integer | … | … | … | … |
month | … | month | integer | … | … | … | … |
year | … | year | integer | … | … | … | … |
At the time of this pre-analysis plan, we have the derived data from the original study to work off of, and examined some of the csv tables. We have neither visualized nor analyzed prison data or WBGTmax temperature data before.
There are no statistical tests in this study, so issues such as spatial heterogeneity/anisotropy/autocorrelation do not matter. Scale could be a threat to validity, because county populations are aggregated to calculate the number of population-weighted heat days in each state. There is also a scale issue measuring micro-climate conditions at prison boundaries compared to 4 km temperature data. Further, there is no specification of how heat days are calculated within each county given that counties do no map neatly to 4 km by 4km grids used to calculate hazardous heat days. The ways in which the county boundaries are drawn also supports the argument that there is a Modifiable Area Unit Problem.
Both the scale and boundary issues also have a temporal component that may create threats to validity.
We will not attempt to produce the original study’s WBGTmax grid because the methods are unclear, and therefore we will skip to joining the author-provided WBGTmax by day grid data to the prison points.
(When implementing plan) Explain what we believe the authors did to produce the WBGTmax grid and preliminary steps
(More descriptive segment of original study’s workflow)
Unplanned Deviation: Read in WBGTmax by prison data as single RDS frame.
(This step takes a while to run!)
Code is adapted from original repository
Unplanned Deviation: Data still needs to be joined to prison points
Result: Rds table of WBGTmax by day by prison
The group by is already included in the study’s filter/summary code from 02 –MM Count to produce summary of days exceeded per year by facility
Result: Rds table with variables
- prison facility
- facility type
- prison population
- n days exceeding 28 degrees
- year
Unplanned Deviation: **Taken from original code Take number of days in first year, last year and last 5 years of data prepares data for figure 1 and figure 2b (and a little bit for 2c) –MM
Trends in growth of number of day per year over time
Totals are calculated by multiplying the regression slope by the total number of years -MM
## Joining with `by = join_by(prison_id)`
## Joining with `by = join_by(prison_id)`
## Joining with `by = join_by(prison_id)`
## Joining with `by = join_by(prison_id)`
## Joining with `by = join_by(prison_id)`
## Joining with `by = join_by(prison_id)`
## `summarise()` has grouped output by 'fips'. You can override using the
## `.groups` argument.
Unplanned Deviation: Per original study, fix counties to be consistent all the way through
Next, summarize by the fixed counties
## `summarise()` has grouped output by 'fips'. You can override using the
## `.groups` argument.
## Joining with `by = join_by(fips, year)`
Unplanned Deviation: Report list of counties with NA population values.
## # A tibble: 34 × 7
## # Groups: fips [34]
## fips year wbgt_26 wbgt_28 wbgt_30 wbgt_35 pop
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 55121 2012 10 3 0 0 NA
## 2 55123 2012 11 4 0 0 NA
## 3 55125 2012 2 0 0 0 NA
## 4 55127 2012 13 5 0 0 NA
## 5 55129 2012 4 1 0 0 NA
## 6 55131 2012 10 3 0 0 NA
## 7 55133 2012 10 4 0 0 NA
## 8 55135 2012 10 3 0 0 NA
## 9 55137 2012 13 6 0 0 NA
## 10 55139 2012 13 5 0 0 NA
## # ℹ 24 more rows
Join to county shapefile and visualize
## Retrieving data for the year 2022
## ℹ tmap mode set to "plot".
Filter entire dataset by NA fips and see if they have population data dplyr filter where fips is %IN% group of fips code
Take average of year before and year after dplyr filter where fips is in group of fips code, and in year 2011 and 2013 group by average of pop
## `summarise()` has grouped output by 'state'. You can override using the
## `.groups` argument.
Take weighted average of entire country in each year by population
Combine state-specific with national
Result: Rds table with variables
- prison facility
- facility type
- prison population
- county
- population
- n days exceeding 28 degrees
- year
Aggregate data into states
Weighted sum of days exceeded across all counties of the state
Sum of days exceeded multiplied by (Ratio of county population / state population)
Work through author’s code for Figure 1
Merge prison over time file with shapefile (for geometry or prison data?)
## `summarise()` has grouped output by 'STATE', 'STATEFP', 'TYPE'. You can
## override using the `.groups` argument.
Calculate 5-year averages
## `summarise()` has grouped output by 'STATE', 'STATEFP'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'STATE', 'STATEFP'. You can override using
## the `.groups` argument.
## Joining with `by = join_by(STATE)`
## Joining with `by = join_by(STATE)`
## Joining with `by = join_by(STATE)`
## [conflicted] Will prefer dplyr::filter over any other package.
Extract coordinates
The study showed important first steps towards participating in open source conventions. Including a linked github repository containing the labeled, ordered study code and a supplementary materials document improved the study’s legibility and provided essential information that greatly improved reproducibility. However, these documents could be improved to ensure better reproducibility and open access.
Key barriers to reproducibility identified were:
rgdal
which is no longer operational, impedes
reproducibility. Confusing use of packages like using the geom_polygon()
function to plot points made code less readable. Furthermore, there use
dates and package versions are not specified which impedes
reproducibility. Standardization of open source package use and the
groundhog package that stores and loads dates and versions for all
packages would improve reproducibility.Key threats to validity identified were:
*Murage P, Hajat S, Kovats RS. Effect of night-time temperatures on cause and age-specific mortality in London. Environ Epidemiol. 2017 Dec;1(2):e005. doi: 10.1097/EE9.0000000000000005. Epub 2017 Dec 13. PMID: 33195962; PMCID: PMC7608908.
Through this reproduction in progress, we were able to identify key barriers to reproducibility and threats to validity in the study. The reproduction was limited by the time and energy of the reproducers (Sam and Matthew) and our limited expertise handling large datasets containing raster data and time series data. However, our inability to reproduce or verify all of the figures indicates that reproducibility attempts in the study were not sufficient to reduce barriers. Producing a more thorough reproduction could be accomplished by: Recreating Figure 2 Mapping out the authors’ workflow and study design based on the code Reanalyzing and visualizing the data using open source packages Visualizing and investigating prisons with missing data Replicating the study using nighttime low temperatures
Ultimately, the study provides a template for exploring general trends in extreme heat risk for incarcerated populations. With further attention to clarity, accessibility, and reproducibility, this study and its reproductions could contribute to a reporting or monitoring system for heat risk in carceral facilities. Despite threats to validity and reproducibility, the study provides a basis for critical future research in environmental hazards and abolition.
This is the first version of our report Any deviations from our pre-analysis plan in our workflow will be documented as unplanned deviations.
This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}
Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ