Abstract

This is a study of gerrymandering in Alabama. We well test three methods of shape-based compactness scores, assess the representativeness of districts based on prior presidential elections and race. We will then extend prior studies by calculating representativeness of the convex hull of district polygons.

Study Metadata

Study design

This is an original study based on literature on gerrymandering metrics.

This study is exploratory in design, with the goal of evaluating usefulness of a new gerrymandering metric based on the convex hull of a congressional district and representative capability inside the convex hull compared to the congressional district.

Materials and procedure

Computational environment

I plan on using…
groundhog() for reproducible computational environments (consistent versions of R and its packages)
here() for reproducible path names
tidyverse() includes dplyr() for database-style data frames
sf() provides support for spatial vector data implementing the OSGeo simple features standards we are accustomed to
stars() spatial-temporal raster data in R
tmap() thematic maps, including static maps or interactive leaflet maps

## Loading required package: conflicted
## Loading required package: groundhog
## groundhog says: No default repository found, setting to 'http://cran.r-project.org/'
## Attached: 'Groundhog' (Version: 3.2.2)
## Tips and troubleshooting: https://groundhogR.com
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## here() starts at /Users/samuelbarnard/Desktop/springTerm25/openGISci/OR-Gerrymander-Alabama
## 
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
## 
## Linking to liblwgeom 3.0.0beta1 r16016, GEOS 3.11.0, PROJ 9.1.0
## Warning in fun(libname, pkgname): GEOS versions differ: lwgeom has 3.11.0 sf
## has 3.13.0
## Warning in fun(libname, pkgname): PROJ versions differ: lwgeom has 9.1.0 sf has
## 9.5.1
## 
## Attaching package: 'lwgeom'
## 
## The following object is masked from 'package:sf':
## 
##     st_perimeter
## 
## Successfully attached 'tidyverse_2.0.0'
## Successfully attached 'here_1.0.1'
## Successfully attached 'sf_1.0-19'
## Successfully attached 'tmap_4.0'
## Successfully attached 'tidycensus_1.7.1'
## Successfully attached 'knitr_1.49'
## Successfully attached 'lwgeom_0.2-14'
## Successfully attached 'markdown_1.13'
## Successfully attached 'htmltools_0.5.8.1'

Data and variables

We plan on using data sources…

Set up districts file from districts.gpkg:

## Driver: GPKG 
## Available layers:
##    layer_name geometry_type features fields crs_name
## 1 districts21 Multi Polygon        7      4   WGS 84
## 2 districts23 Multi Polygon        7      4    NAD83
## 3 precincts20 Multi Polygon     1972      8    NAD83

Layers from districts.gpkg

precincts20

  • Title: Voting Precincts 2020
  • Abstract: Alabama voting data for 2020 elections by precinct.
  • Spatial Coverage: Alabama OSM:161950
  • Spatial Resolution: voting precincts
  • Spatial Reference System: EPSG: 4269, NAD 1983 geographic coordinate system
  • Temporal Coverage: voting precincts used for tabulating the 2020 election
  • Temporal Resolution: annual election (2020)
  • Lineage: Saved as geopackage format. Processing prior to download is explained in validation report and readme
  • Distribution: Data available at Redistricting Data Hub with free login.
  • Constraints: Permitted for noncommercial and nonpartisan use only. Copyright and use constraints explained here
  • Data Quality: State any planned quality assessment
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
VTDST20 Voting district ID
GEOID20 Unique geographic ID
G20PRETRU total votes for Trump in 2020
G20PREBID total votes for Biden in 2020

Load variables:

## Reading layer `precincts20' from data source 
##   `/Users/samuelbarnard/Desktop/springTerm25/openGISci/OR-Gerrymander-Alabama/data/raw/public/alabama_dataset/districts.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 1972 features and 8 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.47323 ymin: 30.14442 xmax: -84.88825 ymax: 35.00803
## Geodetic CRS:  NAD83

districts23

  • Title: US Congressional Districts 2023
  • Abstract: Alabama congressional districts for the 2024 election.
  • Spatial Coverage: Alabama OSM:161950
  • Spatial Resolution: congressional districts
  • Spatial Reference System: EPSG: 3857, NAD 1984 Web Mercator projection
  • Temporal Coverage: districts approved in 2023 for use in 2024.
  • Temporal Resolution:
  • Lineage: Loaded into QGIS as ArcGIS feaure service layer and saved in geopackage format. Extraneous data fields were removed and the FIX GEOMETRIES tool was used to correct geometry errors.
  • Distribution: Alabama State GIS via ESRI feature service
  • Constraints: Public Domain data free for use and redistribution.
  • Data Quality: State any planned quality assessment
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
DISTRICT US Congressional District Number
POPULATION total population (2020 census)
WHITE total white population (2020 census)
BLACK total Black or African American population (2020 census)

Load variables:

## Reading layer `districts23' from data source 
##   `/Users/samuelbarnard/Desktop/springTerm25/openGISci/OR-Gerrymander-Alabama/data/raw/public/alabama_dataset/districts.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 7 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.47323 ymin: 30.14443 xmax: -84.88825 ymax: 35.00803
## Geodetic CRS:  NAD83

Map 2023 districts

## ℹ tmap mode set to "plot".
## 
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## 
## [v3->v4] `tm_text()`: migrate the layer options 'just' to 'options =
## opt_tm_text(<HERE>)'
## [tm_text()] Argument `on_surface` unknown.

blockgroups2020

  • Title: Block Groups 2020
  • Abstract: Vector polygon geopackage layer of Census tracts and demographic data.
  • Spatial Coverage: Alabama OSM:161950
  • Spatial Resolution: census block groups
  • Spatial Reference System: EPSG: 4269, NAD 1983 geographic coordinate system
  • Temporal Coverage: 2020 census
  • Temporal Resolution: 10 year census (2020)
  • Lineage: Data downloaded from US Census API “pl” public law summary file using tidycensus in R
  • Distribution: US Census API
  • Constraints: Public Domain data free for use and redistribution.
  • Data Quality: State any planned quality assessment
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
GEOID code to uniquely identify tracts
P4_001N total population, 18 years or older
P4_006N total: not Hispanic or Latino, Population of one race, Black or African American alone, 18 years or older
P5_003N Total institutionalized population in correctional facilities for adults, 18 years or older

Load data:

Acquire decennial census data in block groups using the tidycensus package. First, query metadata for the pl public law data series.

The issue in the 2023 court cases on Alabama’s gerrymandering was a racial gerrymander discriminating against people identifying as Black or African American. Therefore, we will analyze people of voting age (18 or older) identifying as Black and or African as one race in any combination with other races.

This data is found in public law data series table P3.

Query table P3 on "race for the population 18 years and over".

## Reading layer `block_groups' from data source 
##   `/Users/samuelbarnard/Desktop/springTerm25/openGISci/OR-Gerrymander-Alabama/data/raw/public/block_groups.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 3925 features and 83 fields (with 1 geometry empty)
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.47323 ymin: 30.22333 xmax: -84.88908 ymax: 35.00803
## Geodetic CRS:  NAD83

Prior observations

I have previously conducted an analysis with this data using QGIS to determine compactness along with race and party affiliation data.

However, I only conducted my analysis with an area-weighted re-aggregation approach, and did not incorporate convex hulls.

Bias and threats to validity

“This study is explicitly an investigation to the modifiable areal unit problem. Aspects of the study are extremely sensitive to the combination of edge effects and scale, whereby complex borders formed by natural features, e.g. coastlines or rivers, vary greatly in perimeter depending on the scale of analysis. We hope that in part, this study establishes a method that is more robust (less sensitive) to the threats to validity caused by scale and edge effects in studies of gerrymandering and district shapes.”


Data transformations

Step 1:

districts23 needs to be re-projected to EPSG:4269 NAD 1983 coordinate system using st_transform() for the purpose of geodesic analysis.

From here, we can calculate the percentage of population identifying as Black using mutate().

Step 2:

Census data (blockgroups2020) also needs to be re-projected from the WGS 1984 geographic coordinate system to the NAD 1983 geographic coordinate system.

Step 3:

Find the total of people identifying as Black or African American as one race or any combination of multiple races.

a. First, make a list of all the variables inclusive of people identifying as Black or African American.

X name label
151 P3_004N !!Total:!!Population of one race:!!Black or African American alone
158 P3_011N !!Total:!!Population of two or more races:!!Population of two races:!!White; Black or African American
163 P3_016N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; American Indian and Alaska Native
164 P3_017N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; Asian
165 P3_018N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; Native Hawaiian and Other Pacific Islander
166 P3_019N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; Some Other Race
174 P3_027N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; American Indian and Alaska Native
175 P3_028N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; Asian
176 P3_029N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; Native Hawaiian and Other Pacific Islander
177 P3_030N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; Some Other Race
184 P3_037N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; American Indian and Alaska Native; Asian
185 P3_038N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander
186 P3_039N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; American Indian and Alaska Native; Some Other Race
187 P3_040N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; Asian; Native Hawaiian and Other Pacific Islander
188 P3_041N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; Asian; Some Other Race
189 P3_042N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; Native Hawaiian and Other Pacific Islander; Some Other Race
195 P3_048N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; American Indian and Alaska Native; Asian
196 P3_049N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander
197 P3_050N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; American Indian and Alaska Native; Some Other Race
198 P3_051N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; Asian; Native Hawaiian and Other Pacific Islander
199 P3_052N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; Asian; Some Other Race
200 P3_053N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; Native Hawaiian and Other Pacific Islander; Some Other Race
205 P3_058N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander
206 P3_059N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; American Indian and Alaska Native; Asian; Some Other Race
207 P3_060N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander; Some Other Race
208 P3_061N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race
211 P3_064N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander
212 P3_065N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; American Indian and Alaska Native; Asian; Some Other Race
213 P3_066N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander; Some Other Race
214 P3_067N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race
216 P3_069N !!Total:!!Population of two or more races:!!Population of five races:!!Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race
218 P3_071N !!Total:!!Population of two or more races:!!Population of six races:!!White; Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race

b. Next, calculate new columns.

Black is a sum of all 32 columns shown above, in which any of the racial categories by which someone identifies is Black or African American.
Total is a copy of the population 18 years or over, variable P3_001N.
PctBlack is calculated as Black / Total * 100
CheckPct is calculated as the percentage of the population 18 years or older that is either white of one race only (P3_003N) or Black or African American as calculated above. In Alabama, we can expect that this will be close to 100% for most block groups, and should never exceed 100%.

c. Save the results in blockgroups_calc.gpkg

## Deleting layer `blockgroups_calc' using driver `GPKG'
## Writing layer `blockgroups_calc' to data source 
##   `/Users/samuelbarnard/Desktop/springTerm25/openGISci/OR-Gerrymander-Alabama/data/derived/public/blockgroups_calc.gpkg' using driver `GPKG'
## Writing 3925 features with 6 fields and geometry type Multi Polygon.

Step 4:

Map the percentage of the population 18 or over that is Black or African American.

## ℹ tmap mode set to "plot".

Map approved 2023 districts over the black population

## ℹ tmap mode set to "view".
## Registered S3 method overwritten by 'jsonify':
##   method     from    
##   print.json jsonlite
## 
## Variable bgcol and bgcol_alpha not supported by view mode

Analysis

Approach 1: AWR

Use area weighted re-aggregation to estimate white and black voting age populations in block groups.

Why do this when POPULATION, BLACK, and WHITE variables are already in the table? First, this is the total population, but we should care more about the voting age population. Second, we may want to categorize and calculate BLACK differently from the state of Alabama.

It turns out that R optimizes the first dataset in a spatial query or overlay, with a spatial index, and not the second. Therefore, add the more complex data to st_intersection first, and you’ll see remarkably different run times.

Spatial indices in R (sf)

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries

Report results. Percentages of Black or African American people vary significantly from district to district.

DISTRICT POPULATION WHITE BLACK pctBlack bgTotal bgBlack pctBlackbg
1 717754 527330 116462 16.2 556559.0 90288.76 16.2
2 717754 303461 353228 49.2 559640.2 272078.98 48.6
3 717754 508080 146376 20.4 564183.0 116778.64 20.7
4 717754 585183 49721 6.9 558441.9 42522.09 7.6
5 717754 495427 126226 17.6 560573.5 102750.67 18.3
6 717755 517634 125785 17.5 552556.9 96865.75 17.5
7 717754 283337 378364 52.7 564941.7 293014.56 51.9

Approach 2: Convex hull

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries

Join convex hull estimates to districts with blockgroup estimates.

Calculate compactness scores based on:

  1. the area and perimeter
  2. the area and the area of the convex hull
  3. the area and the area of the minimum bounding circle

Note: This code block takes some time to run due to the st_minimum_bounding_circle function.

Also, to knit, will we need to replacest_perimeter() with st_length(st_cast(geom, "MULTILINESTRING"))?


Results

Visualization 1: Correlation matrix and small plots for gerrymandering indicators

pctBlackbg diffPct absdiffPct compact_shp compact_hull compact_circ
pctBlackbg 1.0000000 0.8656916 0.2755931 -0.2814174 -0.1247704 0.0389629
diffPct 0.8656916 1.0000000 -0.0363323 0.1128061 0.1850875 0.1185339
absdiffPct 0.2755931 -0.0363323 1.0000000 -0.6801495 -0.5295192 0.2456776
compact_shp -0.2814174 0.1128061 -0.6801495 1.0000000 0.9557498 0.4197650
compact_hull -0.1247704 0.1850875 -0.5295192 0.9557498 1.0000000 0.5039445
compact_circ 0.0389629 0.1185339 0.2456776 0.4197650 0.5039445 1.0000000

Visualization 2: Plot with representational difference and compactness

This is a scatterplot with (absolute) difference in representation on x axis and compactness on y axis. This plot presents the three different compactness scores simultaneously with different colors.

## `geom_smooth()` using formula = 'y ~ x'

There is a positive relationship between minimum bounding circle compactness and absolute convex hull representational difference. There is a negative relationship between convex hull compactness and absolute convex hull representational difference. There is a negative relationship between shape compactness and absolute convex hull representational difference. Shape and convex hull exhibit a positive correlation.

Districts 1 and 2 are the least compact across all scores.

Discussion

Describe how the results are to be interpreted vis a vis each hypothesis or research question.

Integrity Statement

Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research. If a prior registration does exist, explain the rationale for revising the registration here.

Acknowledgements

This analysis is based on work by Professor Joseph Holler of Middlebury College, and concepts and workflows from the course Open GIScience GEOG 0361.

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}

References

Cheng, Joe, Carson Sievert, Barret Schloerke, Winston Chang, Yihui Xie, and Jeff Allen. 2024. Htmltools: Tools for HTML. https://github.com/rstudio/htmltools.
Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://here.r-lib.org/.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
———. 2024a. Lwgeom: Bindings to Selected Liblwgeom Functions for Simple Features. https://r-spatial.github.io/lwgeom/.
———. 2024b. Sf: Simple Features for r. https://r-spatial.github.io/sf/.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With applications in R. Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Tennekes, Martijn. 2018. tmap: Thematic Maps in R.” Journal of Statistical Software 84 (6): 1–39. https://doi.org/10.18637/jss.v084.i06.
———. 2025. Tmap: Thematic Maps. https://github.com/r-tmap/tmap.
Walker, Kyle, and Matt Herman. 2025. Tidycensus: Load US Census Boundary and Attribute Data as Tidyverse and Sf-Ready Data Frames. https://walker-data.com/tidycensus/.
Wickham, Hadley. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://tidyverse.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2024. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, JJ Allaire, and Jeffrey Horner. 2024. Markdown: Render Markdown with Commonmark. https://github.com/rstudio/markdown.