Version 0.1 | First Created March 10, 2025 | Updated March 10, 2025
Chakraborty (2021) investigates the relationships between COVID-19 incidence rates and several demographic characteristics of people with disabilities by county in the continental United States. The aim of the study is to investigate whether people with disabilities (PwDs) face disproportionate challenges due to COVID-19.
Continued, from “Reproduction of Chakraborty 2021: An intracategorical analysis of COVID-19 and people with disabilities” (Holler et al.): To do so, Chakraborty examines the statistical relationship between county incidence rates of COVID-19 cases and county-level percentages of people with disabilities and different socio-demographic characteristics. Specifically, Chakraborty tests county-level bivariate correlations between COVID-19 incidence against the percentage of disability as one hypothesis, and tests correlation between COVID-19 incidence and percentage of people with disabilities in 18 different socio-demographic categories of race, ethnicity, poverty status, age, and biological sex. Chakraborty then re-tests for the same county-level associations while controlling for spatial dependence. Spatial dependence is controlled by constructing generalized estimating equation (GEE) models using a combination of state and spatial clusters of COVID-19 incidence as to define the GEE clusters. One GEE model is constructed for each of the four types of socio-demographic category: race, ethnicity, age, and biological sex. Chakraborty (2021) finds significant positive relationships between COVID-19 rates and socially vulnerable demographic categories of race, ethnicity, poverty status, age, and biological sex.
This study is a reproduction, with the goal of examining Chakraborty’s study design and its impact particularly in public policy as well as in fields such as research and education. This reproduction will attempt to reproduce the original study’s results.
This will include the map of county level distribution of COVID-19 incidence rates (Fig. 1), the summary statistics for disability and socio-demographic variables and bivariate correlations with county-level COVID-19 incidence rate (Table 1), and the GEE models for predicting COVID-19 county-level incidence rate (Table 2). A successful reproduction should be able to generate identical results as published by Chakraborty (2021).
Chakraborty, J. 2021. Social inequities in the distribution of COVID-19: An intra-categorical analysis of people with disabilities in the U.S. Disability and Health Journal 14:1-5. https://doi.org/10.1016/j.dhjo.2020.101007
The American Community Survey (ACS) five-year estimate (2014-2018) variables used in the study are outlined in the table below. Details on ACS data collection can be found at https://www.census.gov/topics/health/disability/guidance/data-collection-acs.html and details on sampling methods and accuracy can be found at https://www.census.gov/programs-surveys/acs/technical-documentation/code-lists.html.
Variable Name in Study | ACS Variable name |
---|---|
percent of total civilian non-institutionalized population with a disability | S1810_C03_001E |
Race | |
percent w disability: White alone | S1810_C03_004E |
percent w disability: Black alone | S1810_C03_005E |
percent w disability: Native American | S1810_C03_006E |
percent w disability: Asian alone | S1810_C03_007E |
percent w disability: Other race | S1810_C03_009E |
Ethnicity | |
percent w disability: Non-Hispanic White | S1810_C03_0011E |
percent w disability: Hispanic | S1810_C03_012E |
percent w disability: Non-Hispanic non-White | (S1810_C02_001E - S1810_C02_011E - S1810_C02_012E) / (S1810_C01_001E - S1810_C01_011E - S1810_C01_012E) * 100 |
percent w disability: Other race | S1810_C03_009E |
Poverty | |
percent w disability: Below poverty level | (C18130_004E + C18130_011E + C18130_018E) / C18130_001E * 100 |
percent w disability: Above poverty level | (C18130_005E + C18130_012E + C18130_019E) / C18130_001E * 100 |
Age | |
percent w disability: 5-17 | S1810_C03_014E |
percent w disability: 18-34 | S1810_C03_015E |
percent w disability: 35-64 | S1810_C03_016E |
percent w disability: 65-74 | S1810_C03_017E |
percent w disability: 75+ | S1810_C03_018E |
Biological sex | |
percent w disability: male | S1810_C03_001E |
percent w disability: female | S1810_C03_003E |
American Community Survey (ACS) data for sociodemographic
subcategories of people with disabilities can be accessed by using the
tidycensus
package to query the Census API. This requires
an API key which can be acquired at api.census.gov/data/key_signup.html.
Data on COVID-19 cases from the Johns Hopkins University dashboard
have been provided directly with the research compendium because the
data is no longer available online in the state in which it was
downloaded on August 1, 2020. The dashboard and cumulative counts of
COVID-19 cases and deaths were continually updated, so an exact
reproduction required communication with the original author, Jayajit
Chakraborty, for assistance with provision of data from August 1, 2020.
The data includes an estimate of the total population
(POP_ESTIMA
) and confirmed COVID-19 cases
(Confirmed
). The COVID-19 case data expresses cumulative
count of reported COVID-19 from 1/22/2020 to 8/1/2020. Although metadata
for this particular resource is no longer available from the original
source, one can reasonably assume that the total population estimate was
based on the 2014-2018 5-year ACS estimate, as the 2019 estimates data
had not been released yet.
Versions of the data can be found at the John Hopkins CCSE COVID-19 Data Repository (https://github.com/CSSEGISandData/COVID-19). However, archived data only provides summaries at the national scale. We received the COVID-19 case data through 8/1/2020 at the county level from the author, as there is no readily apparent way to access archived data from the Johns Hopkins University Center for Systems Science Engineering database.
As of the release of this pre-analysis plan, I have not observed the data past recording the metadata above. My experience with the methods of this study stem from an in-class analysis and breakdown of the original study to better understand data sources and transformations. In-class discussion included exposure to some workflow from a previous reproduction study (“Reproduction of Chakraborty 2021: An intracategorical analysis of COVID-19 and people with disabilities”), which among other things helped to solidify important steps in data acquisition and transformation, for instance the correct data tables from the ACS to query. This prior reproduction was also completed by contacting the original study’s researchers for clarification of their process.
Data at the county level, for the continental US only.
cases per county / 100,000
Use ggplot()
or tmap()
Join variables from acs_vars_S1810
and
acs5_c18130
Number of PwDs / total from each PwD variable * 100
Use ggplot()
or tmap()
summarise()
Calculate summary statistics for PwD variables (Min, Max, Mean, SD)
Correlate cor()
COVID cases, pct disability for each
disability variable
Mutate()
to produce Pearson’s r column in Table 1mutate( t = abs(r) / sqrt((1 - r\^2) / (df)), p = pt(t, df, lower.tail = FALSE) )
Subtract results from Chakraborty’s Table 1
SpatialEpi
packageUse ggplot()
or tmap()
(local cases/local population) / (global cases - local cases/global population - local population)
Classify counties using mutate(case_when())
c(state + relative risk)
to produce unique cluster
identification codes
Aug1GEE
)Present maps, takeaways from comparing data with original study results.
Explain weaknesses in original study design/data, points for improvement.
This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}