Version 0.1 | First Created March 10, 2025 | Updated March 10, 2025

Abstract

Chakraborty (2021) investigates the relationships between COVID-19 incidence rates and several demographic characteristics of people with disabilities by county in the continental United States. The aim of the study is to investigate whether people with disabilities (PwDs) face disproportionate challenges due to COVID-19.

Continued, from “Reproduction of Chakraborty 2021: An intracategorical analysis of COVID-19 and people with disabilities” (Holler et al.): To do so, Chakraborty examines the statistical relationship between county incidence rates of COVID-19 cases and county-level percentages of people with disabilities and different socio-demographic characteristics. Specifically, Chakraborty tests county-level bivariate correlations between COVID-19 incidence against the percentage of disability as one hypothesis, and tests correlation between COVID-19 incidence and percentage of people with disabilities in 18 different socio-demographic categories of race, ethnicity, poverty status, age, and biological sex. Chakraborty then re-tests for the same county-level associations while controlling for spatial dependence. Spatial dependence is controlled by constructing generalized estimating equation (GEE) models using a combination of state and spatial clusters of COVID-19 incidence as to define the GEE clusters. One GEE model is constructed for each of the four types of socio-demographic category: race, ethnicity, age, and biological sex. Chakraborty (2021) finds significant positive relationships between COVID-19 rates and socially vulnerable demographic categories of race, ethnicity, poverty status, age, and biological sex.

Study Design

This study is a reproduction, with the goal of examining Chakraborty’s study design and its impact particularly in public policy as well as in fields such as research and education. This reproduction will attempt to reproduce the original study’s results.

This will include the map of county level distribution of COVID-19 incidence rates (Fig. 1), the summary statistics for disability and socio-demographic variables and bivariate correlations with county-level COVID-19 incidence rate (Table 1), and the GEE models for predicting COVID-19 county-level incidence rate (Table 2). A successful reproduction should be able to generate identical results as published by Chakraborty (2021).

Chakraborty, J. 2021. Social inequities in the distribution of COVID-19: An intra-categorical analysis of people with disabilities in the U.S. Disability and Health Journal 14:1-5. https://doi.org/10.1016/j.dhjo.2020.101007

Study data

Study metadata

ACS Socio-demographic data

The American Community Survey (ACS) five-year estimate (2014-2018) variables used in the study are outlined in the table below. Details on ACS data collection can be found at https://www.census.gov/topics/health/disability/guidance/data-collection-acs.html and details on sampling methods and accuracy can be found at https://www.census.gov/programs-surveys/acs/technical-documentation/code-lists.html.

Disability Subgroup Variables
Variable Name in Study	ACS Variable name
percent of total civilian non-institutionalized population with a disability	S1810_C03_001E
Race
percent w disability: White alone	S1810_C03_004E
percent w disability: Black alone	S1810_C03_005E
percent w disability: Native American	S1810_C03_006E
percent w disability: Asian alone	S1810_C03_007E
percent w disability: Other race	S1810_C03_009E
Ethnicity
percent w disability: Non-Hispanic White	S1810_C03_0011E
percent w disability: Hispanic	S1810_C03_012E
percent w disability: Non-Hispanic non-White	(S1810_C02_001E - S1810_C02_011E - S1810_C02_012E) / (S1810_C01_001E - S1810_C01_011E - S1810_C01_012E) * 100
percent w disability: Other race	S1810_C03_009E
Poverty
percent w disability: Below poverty level	(C18130_004E + C18130_011E + C18130_018E) / C18130_001E * 100
percent w disability: Above poverty level	(C18130_005E + C18130_012E + C18130_019E) / C18130_001E * 100
Age
percent w disability: 5-17	S1810_C03_014E
percent w disability: 18-34	S1810_C03_015E
percent w disability: 35-64	S1810_C03_016E
percent w disability: 65-74	S1810_C03_017E
percent w disability: 75+	S1810_C03_018E
Biological sex
percent w disability: male	S1810_C03_001E
percent w disability: female	S1810_C03_003E

American Community Survey (ACS) data for sociodemographic subcategories of people with disabilities can be accessed by using the tidycensus package to query the Census API. This requires an API key which can be acquired at api.census.gov/data/key_signup.html.

COVID-19 data

Data on COVID-19 cases from the Johns Hopkins University dashboard have been provided directly with the research compendium because the data is no longer available online in the state in which it was downloaded on August 1, 2020. The dashboard and cumulative counts of COVID-19 cases and deaths were continually updated, so an exact reproduction required communication with the original author, Jayajit Chakraborty, for assistance with provision of data from August 1, 2020. The data includes an estimate of the total population (POP_ESTIMA) and confirmed COVID-19 cases (Confirmed). The COVID-19 case data expresses cumulative count of reported COVID-19 from 1/22/2020 to 8/1/2020. Although metadata for this particular resource is no longer available from the original source, one can reasonably assume that the total population estimate was based on the 2014-2018 5-year ACS estimate, as the 2019 estimates data had not been released yet.

Versions of the data can be found at the John Hopkins CCSE COVID-19 Data Repository (https://github.com/CSSEGISandData/COVID-19). However, archived data only provides summaries at the national scale. We received the COVID-19 case data through 8/1/2020 at the county level from the author, as there is no readily apparent way to access archived data from the Johns Hopkins University Center for Systems Science Engineering database.

Materials and procedure

Computational environment

Prior observations

As of the release of this pre-analysis plan, I have not observed the data past recording the metadata above. My experience with the methods of this study stem from an in-class analysis and breakdown of the original study to better understand data sources and transformations. In-class discussion included exposure to some workflow from a previous reproduction study (“Reproduction of Chakraborty 2021: An intracategorical analysis of COVID-19 and people with disabilities”), which among other things helped to solidify important steps in data acquisition and transformation, for instance the correct data tables from the ACS to query. This prior reproduction was also completed by contacting the original study’s researchers for clarification of their process.

Data transformations and Analysis

Step 1: Acquire John’s Hopkins COVID data

Data at the county level, for the continental US only.

Step 2: Calculate incident rate by county

cases per county / 100,000

Map: COVID incident rate data

Use ggplot() or tmap()

Step 3: Acquire ACS data

Join variables from acs_vars_S1810 and acs5_c18130

Step 4: Join ACS to COVID data

Step 5: Mutate

Number of PwDs / total from each PwD variable * 100

Map: PwD rates by county

Use ggplot() or tmap()

Step 6: `summarise()`

Calculate summary statistics for PwD variables (Min, Max, Mean, SD)

Step 7: Bivariate Pearson product-moment correlations

Correlate cor() COVID cases, pct disability for each disability variable

`Mutate()` to produce Pearson’s r column in Table 1

mutate( t = abs(r) / sqrt((1 - r\^2) / (df)), p = pt(t, df, lower.tail = FALSE) )

Check results with Table 1

Digitize Table 1 from original study

Subtract results from Chakraborty’s Table 1

Planned Deviation

Step 8: Kulldorff method with `SpatialEpi` package

Map: Spatial Clusters

Use ggplot() or tmap()

Step 9: Calculate relative risk

(local cases/local population) / (global cases - local cases/global population - local population)

Classify counties using mutate(case_when())

Step 10: Concatenate

c(state + relative risk) to produce unique cluster identification codes

Check number of clusters - compare to total of 102 in original study

Map: Qualitative map of clusters

Step 11: Compare data with GEE results (`Aug1GEE`)

Results

Present maps, takeaways from comparing data with original study results.

Discussion

Explain weaknesses in original study design/data, points for improvement.

Acknowledgements

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}

References

github.com/HEGSRR/RPr-Chakraborty2021

Pre Analysis Plan: Chakraborty (2021) Reproduction Study

Samuel Barnard

2025-03-10

Abstract

Study Design

Study data

Study metadata

ACS Socio-demographic data

COVID-19 data

Materials and procedure

Computational environment

Prior observations

Data transformations and Analysis

Step 1: Acquire John’s Hopkins COVID data

Step 2: Calculate incident rate by county

Map: COVID incident rate data

Step 3: Acquire ACS data

Step 4: Join ACS to COVID data

Step 5: Mutate

Map: PwD rates by county

Step 6: `summarise()`

Step 7: Bivariate Pearson product-moment correlations

`Mutate()` to produce Pearson’s r column in Table 1

Check results with Table 1

Digitize Table 1 from original study

Planned Deviation

Step 8: Kulldorff method with `SpatialEpi` package

Map: Spatial Clusters

Step 9: Calculate relative risk

Step 10: Concatenate

Check number of clusters - compare to total of 102 in original study

Map: Qualitative map of clusters

Step 11: Compare data with GEE results (`Aug1GEE`)

Results

Discussion

Acknowledgements

References

Pre Analysis Plan: Chakraborty (2021) Reproduction Study

Samuel Barnard

2025-03-10

Abstract

Study Design

Study data

Study metadata

ACS Socio-demographic data

COVID-19 data

Materials and procedure

Computational environment

Prior observations

Data transformations and Analysis

Step 1: Acquire John’s Hopkins COVID data

Step 2: Calculate incident rate by county

Map: COVID incident rate data

Step 3: Acquire ACS data

Step 4: Join ACS to COVID data

Step 5: Mutate

Map: PwD rates by county

Step 6: summarise()

Step 7: Bivariate Pearson product-moment correlations

Mutate() to produce Pearson’s r column in Table 1

Check results with Table 1

Digitize Table 1 from original study

Planned Deviation

Step 8: Kulldorff method with SpatialEpi package

Map: Spatial Clusters

Step 9: Calculate relative risk

Step 10: Concatenate

Check number of clusters - compare to total of 102 in original study

Map: Qualitative map of clusters

Step 11: Compare data with GEE results (Aug1GEE)

Results

Discussion

Acknowledgements

References

Step 6: `summarise()`

`Mutate()` to produce Pearson’s r column in Table 1

Step 8: Kulldorff method with `SpatialEpi` package

Step 11: Compare data with GEE results (`Aug1GEE`)