This is a study of gerrymandering in Alabama. We well test three methods of shape-based compactness scores, assess the representativeness of districts based on prior presidential elections and race. We will then extend prior studies by calculating representativeness of the convex hull of district polygons.
Key words
: gerrymandering, compactness, convex hull,
alabama, political representation Comma-separated list of keywords
(tags) for searchability. Geographers often use one or two keywords each
for: theory, geographic context, and methods.Subject
: Social and Behavioral Sciences: Geography:
Geographic Information SciencesDate created
: 2025-02-17Date modified
: 2025-02-17Spatial Coverage
: Alabama OSM:161950Spatial Resolution
: Census Block GroupsSpatial Reference System
: EPSG: 4269, NAD 1983
geographic coordinate systemTemporal Coverage
: 2020-2024 population and voting
dataTemporal Resolution
: Decennial censusThis is an original study based on literature on gerrymandering metrics.
This study is exploratory in design, with the goal of evaluating usefulness of a new gerrymandering metric based on the convex hull of a congressional district and representative capability inside the convex hull compared to the congressional district.
I plan on using package … for …
We plan on using data sources…
districts.gpkg
Title
: Voting Precincts 2020Abstract
: Alabama voting data for 2020 elections by
precinct.Spatial Coverage
: Alabama OSM:161950Spatial Resolution
: voting precinctsSpatial Reference System
: EPSG: 4269, NAD 1983
geographic coordinate systemTemporal Coverage
: voting precincts used for tabulating
the 2020 electionTemporal Resolution
: annual election (2020)Lineage
: Saved as geopackage format. Processing prior
to download is explained in validation report and readmeDistribution
: Data available at Redistricting
Data Hub with free login.Constraints
: Permitted for noncommercial and
nonpartisan use only. Copyright and use constraints explained hereData Quality
: State any planned quality assessmentVariables
: For each variable, enter the following
information. If you have two or more variables per data source, you may
want to present this information in table form (shown below)
Label
: variable name as used in the data or codeAlias
: intuitive natural language nameDefinition
: Short description or definition of the
variable. Include measurement units in description.Type
: data type, e.g. character string, integer,
realAccuracy
: e.g. uncertainty of measurementsDomain
: Expected range of Maximum and Minimum of
numerical data, or codes or categories of nominal data, or reference to
a standard codebookMissing Data Value(s)
: Values used to represent missing
data and frequency of missing data observationsMissing Data Frequency
: Frequency of missing data
observations: not yet known for data to be collectedLabel | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
VTDST20 | … | Voting district ID | … | … | … | … | … |
GEOID20 | … | Unique geographic ID | … | … | … | … | … |
G20PRETRU | … | total votes for Trump in 2020 | … | … | … | … | … |
G20PREBID | … | total votes for Biden in 2020 | … | … | … | … | … |
Title
: US Congressional Districts 2023Abstract
: Alabama congressional districts for the 2024
election.Spatial Coverage
: Alabama OSM:161950Spatial Resolution
: congressional districtsSpatial Reference System
: EPSG: 3857, NAD 1984 Web
Mercator projectionTemporal Coverage
: districts approved in 2023 for use
in 2024.Temporal Resolution
:Lineage
: Loaded into QGIS as ArcGIS feaure service
layer and saved in geopackage format. Extraneous data fields were
removed and the FIX GEOMETRIES
tool was used to correct
geometry errors.Distribution
: Alabama State GIS via
ESRI feature serviceConstraints
: Public Domain data free for use and
redistribution.Data Quality
: State any planned quality assessmentVariables
: For each variable, enter the following
information. If you have two or more variables per data source, you may
want to present this information in table form (shown below)
Label
: variable name as used in the data or codeAlias
: intuitive natural language nameDefinition
: Short description or definition of the
variable. Include measurement units in description.Type
: data type, e.g. character string, integer,
realAccuracy
: e.g. uncertainty of measurementsDomain
: Expected range of Maximum and Minimum of
numerical data, or codes or categories of nominal data, or reference to
a standard codebookMissing Data Value(s)
: Values used to represent missing
data and frequency of missing data observationsMissing Data Frequency
: Frequency of missing data
observations: not yet known for data to be collectedLabel | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
DISTRICT | … | US Congressional District Number | … | … | … | … | … |
POPULATION | … | total population (2020 census) | … | … | … | … | … |
WHITE | … | total white population (2020 census) | … | … | … | … | … |
BLACK | … | total Black or African American population (2020 census) | … | … | … | … | … |
Title
: Block Groups 2020Abstract
: Vector polygon geopackage layer of Census
tracts and demographic data.Spatial Coverage
: Alabama OSM:161950Spatial Resolution
: census block groupsSpatial Reference System
: EPSG: 4269, NAD 1983
geographic coordinate systemTemporal Coverage
: 2020 censusTemporal Resolution
: 10 year census (2020)Lineage
: Data downloaded from US Census API “pl” public
law summary file using tidycensus in RDistribution
: US Census APIConstraints
: Public Domain data free for use and
redistribution.Data Quality
: State any planned quality assessmentVariables
: For each variable, enter the following
information. If you have two or more variables per data source, you may
want to present this information in table form (shown below)
Label
: variable name as used in the data or codeAlias
: intuitive natural language nameDefinition
: Short description or definition of the
variable. Include measurement units in description.Type
: data type, e.g. character string, integer,
realAccuracy
: e.g. uncertainty of measurementsDomain
: Expected range of Maximum and Minimum of
numerical data, or codes or categories of nominal data, or reference to
a standard codebookMissing Data Value(s)
: Values used to represent missing
data and frequency of missing data observationsMissing Data Frequency
: Frequency of missing data
observations: not yet known for data to be collectedLabel | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
GEOID | … | code to uniquely identify tracts | … | … | … | … | … |
P4_001N | … | total population, 18 years or older | … | … | … | … | … |
P4_006N | … | total: not Hispanic or Latino, Population of one race, Black or African American alone, 18 years or older | … | … | … | … | … |
P5_003N | … | Total institutionalized population in correctional facilities for adults, 18 years or older | … | … | … | … | … |
I have conducted an analogous analysis with this data before using QGIS to determine compactness along with race and party affiliation data. However, I only conducted my analysis with an area-weighted re-aggregation approach, and did not incorporate convex hull.
At the time of this study pre-registration, the authors had _____ prior knowledge of the geography of the study region with regards to the ____ phenomena to be studied. This study is related to ____ prior studies by the authors
For each primary data source, declare the extent to which authors had already engaged with the data:
For each secondary source, declare the extent to which authors had already engaged with the data:
If pilot test data has been collected or acquired, describe how the researchers observed and analyzed the pilot test, and the extent to which the pilot test influenced the research design.
Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity.
These include: - uneven primary data collection due to geographic inaccessibility or other constraints - multiple hypothesis testing - edge or boundary effects - the modifiable areal unit problem - nonstationarity - spatial dependence or autocorrelation - temporal dependence or autocorrelation - spatial scale dependency - spatial anisotropies - confusion of spatial and a-spatial causation - ecological fallacy - uncertainty e.g. from spatial disaggregation, anonymization, differential privacy
blockgroups2020
needs to be acquired using
tidycensus()
in R
districts23
needs to be reprojected to EPSG:4269 for
geodesic analysis
Area needs to be calculated for districts23
and
blockgroups2020
The process of area-weighted re-aggregation needs to be conducted for
blockgroups20
and districts23
Compactness needs to be calculated for districts23
Convex hull needs to be calculated for districts23
Race, compactness and voting data need to be joined together to produce a final table.
Describe all data transformations planned to prepare data sources for analysis. This section should explain with the fullest detail possible how to transform data from the raw state at the time of acquisition or observation, to the pre-processed derived state ready for the main analysis. Including steps to check and mitigate sources of bias and threats to validity. The method may anticipate contingencies, e.g. tests for normality and alternative decisions to make based on the results of the test. More specifically, all the geographic and variable transformations required to prepare input data as described in the data and variables section above to match the study’s spatio-temporal characteristics as described in the study metadata and study design sections. Visual workflow diagrams may help communicate the methodology in this section.
Examples of geographic transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.
Examples of variable transformations include standardization, normalization, constructed variables, imputation, classification, etc.
Be sure to include any steps planned to exclude observations with missing or outlier data, to group observations by attribute or geographic criteria, or to impute missing data or apply spatial or temporal interpolation.
Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions. This section should explicitly define any spatial / statistical models and their parameters, including grouping criteria, weighting criteria, and significance thresholds. Also explain any follow-up analyses or validations.
Describe how results are to be presented.
Describe how the results are to be interpreted vis a vis each hypothesis or research question.
Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research. If a prior registration does exist, explain the rationale for revising the registration here.
Funding Name
: name of funding for the projectFunding Title
: title of project grantAward info URI
: web address for award informationAward number
: award numberThis report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}