Volume 34 Issue 9
Sep.  2021
Turn off MathJax
Article Contents

James B. Hittner, Folorunso O. Fasina, Almira L. Hoogesteijn, Renata Piccinini, Dawid Maciorowski, Prakasha Kempaiah, Stephen D. Smith, Ariel L. Rivas. Testing-Related and Geo-Demographic Indicators Strongly Predict COVID-19 Deaths in the United States during March of 2020[J]. Biomedical and Environmental Sciences, 2021, 34(9): 734-738. doi: 10.3967/bes2021.102
Citation: James B. Hittner, Folorunso O. Fasina, Almira L. Hoogesteijn, Renata Piccinini, Dawid Maciorowski, Prakasha Kempaiah, Stephen D. Smith, Ariel L. Rivas. Testing-Related and Geo-Demographic Indicators Strongly Predict COVID-19 Deaths in the United States during March of 2020[J]. Biomedical and Environmental Sciences, 2021, 34(9): 734-738. doi: 10.3967/bes2021.102

Testing-Related and Geo-Demographic Indicators Strongly Predict COVID-19 Deaths in the United States during March of 2020

doi: 10.3967/bes2021.102
More Information
  • Author Bio:

    James B. Hittner, male, born in 1965, PhD Degree, majoring in clinical and applied psychology, risky behavior, statistical methodology, and infectious disease dynamics

  • Corresponding author: Folorunso O. Fasina, E-mail: Folorunso.fasina@fao.org, Tel: 255-686-132-852
  • Received Date: 2020-08-10
  • Accepted Date: 2021-01-18
  • 加载中
  • [1] Haleem A, Javaid M, Vaishya R. Effects of COVID-19 pandemic in daily life. Curr Med Res Prac, 2020; 10, 78−9. doi:  10.1016/j.cmrp.2020.03.011
    [2] Holmes EA, O’Connor RC, Perry VH, et al. Multidisciplinary research priorities for the COVID-19 pandemic: a call for action for mental health science. Lancet Psychiatry, 2020; 7, 547−60. doi:  10.1016/S2215-0366(20)30168-1
    [3] Bishara AJ, Hittner JB. Testing the significance of a correlation with nonnormal data: comparison of Pearson, spearman, transformation, and resampling approaches. Psychol Methods, 2012; 17, 399−417. doi:  10.1037/a0028087
    [4] Hainmueller J, Hazlett C. Kernel Regularized Least Squares: reducing misspecification bias with a flexible and interpretable machine learning approach. Pol Anal, 2014; 22, 143−68. doi:  10.1093/pan/mpt019
    [5] Vaishya R, Javaid M, Haleem Khan I, et al. Artificial intelligence (AI) applications for COVID-19 pandemic. Diabetes Metab Syndr: Clin Res Rev, 2020; 14, 337−9. doi:  10.1016/j.dsx.2020.04.012
    [6] Rivas AL, Fasina FO, Hoogesteyn AL, et al. Connecting network properties of rapidly disseminating epizoonotics. PLoS One, 2012; 7, e39778. doi:  10.1371/journal.pone.0039778
    [7] Hittner JB, May K, Silver NC. A Monte Carlo evaluation of tests for comparing dependent correlations. J Gen Psychol, 2003; 130, 149−68. doi:  10.1080/00221300309601282
    [8] Padula WV. Why only test symptomatic patients? Consider random screening for COVID-19. Appl Health Econ Health Policy, 2020; 18, 333−4. doi:  10.1007/s40258-020-00579-4
    [9] World Health Organization. COVID 19: Public Health Emergency of International Concern (PHEIC). Global research and innovation forum: towards a research roadmap. https://www.who.int/blueprint/priority-diseases/key-action/Global_Research_Forum_FINAL_VERSION_for_web_14_feb_2020.pdf?ua=1. [2020-04-04].
    [10] Rocklöv J, Sjödin H. High population densities catalyse the spread of COVID-19. J Travel Med, 2020; 27, taaa038. doi:  10.1093/jtm/taaa038
  • 20319.pdf
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(2)  / Tables(2)

Article Metrics

Article views(549) PDF downloads(31) Cited by()

Proportional views
Related

Testing-Related and Geo-Demographic Indicators Strongly Predict COVID-19 Deaths in the United States during March of 2020

doi: 10.3967/bes2021.102
James B. Hittner, Folorunso O. Fasina, Almira L. Hoogesteijn, Renata Piccinini, Dawid Maciorowski, Prakasha Kempaiah, Stephen D. Smith, Ariel L. Rivas. Testing-Related and Geo-Demographic Indicators Strongly Predict COVID-19 Deaths in the United States during March of 2020[J]. Biomedical and Environmental Sciences, 2021, 34(9): 734-738. doi: 10.3967/bes2021.102
Citation: James B. Hittner, Folorunso O. Fasina, Almira L. Hoogesteijn, Renata Piccinini, Dawid Maciorowski, Prakasha Kempaiah, Stephen D. Smith, Ariel L. Rivas. Testing-Related and Geo-Demographic Indicators Strongly Predict COVID-19 Deaths in the United States during March of 2020[J]. Biomedical and Environmental Sciences, 2021, 34(9): 734-738. doi: 10.3967/bes2021.102
  • The COVID-19 pandemic has wreaked havoc around the globe and caused significant disruptions across multiple domains[1]. Moreover, different countries have been differentially impacted by COVID-19 — a phenomenon that is due to a multitude of complex and often interacting determinants[2]. Understanding such complexity and interacting factors requires both compelling theory and appropriate data analytic techniques. Regarding data analysis, one question that arises is how to analyze extremely non-normal data, such as those variables evidencing L-shaped distributions. A second question concerns the appropriate selection of a predictive modelling technique when the predictors derive from multiple domains (e.g., testing-related variables, population density), and both main effects and interactions are examined.

    To address these questions, we propose a novel statistical approach for analyzing and understanding complex data interactions. Using data collected in the USA during the first month in which COVID-19 testing was performed (March of 2020 Supplementary Table S1 available in www.besjournal.com), we examined the following six predictors of COVID-19 related deaths: (i) the proportion of all tests conducted during the first week of testing; (ii) the cumulative number of (test-positive) cases through 3-31-2020; (iii) the number of tests performed/million inhabitants; (iv) the cumulative number of inhabitants tested; (v) the number of cases/million inhabitants (cases/mill inh); and (vi) the number of diagnostic tests performed in week one of testing/million inhabitants/state-specific population density (w1DT/MI/PD), where “population density” is defined as the number of inhabitants per square kilometer.

    StateTests
    wk I
    Total
    tests
    Wk I/all
    tests (%)
    State
    pop/mill
    Total tests/
    mill inh
    Total
    cases
    Cases/
    mill inh
    State
    pop dens
    Wk I tests/
    mill inh/state
    pop dens
    Total deaths
    (count)
    Deaths/
    mill inh
    AK2273,6546.210.7344978.20114155.310.425726.11534.08
    AL3525,0147.024.9081021.60830169.1136.1491.98440.81
    AR1153,4533.333.0381136.60426140.2222.0571.71661.97
    AZ37313,8722.687.3781880.18919124.5624.9902.023172.30
    CA4,26090,6574.6939.9372270.005,708142.9394.1981.1321233.07
    CO93814,4706.485.8452475.622,307394.7021.6807.402478.04
    CT55111,9004.633.5633339.881,993559.36248.1880.623349.54
    DE1293,7013.480.9823768.84236240.33152.3420.86266.11
    FL1,38048,9982.8121.9922227.994,950225.08129.1260.485592.68
    GA4812,5960.383.9913156.102,683672.2625.9300.4638020.04
    HI128,0130.141.4125674.93175123.9349.8690.17000
    ID4274,7069.071.8262577.22310169.778.43627.71863.28
    IL1,62227,7625.8412.6592193.064,596363.0684.3951.518665.21
    IN1499,8301.516.7451457.381,514224.4671.5050.308324.74
    IO3305,3496.163.1791682.60336105.6921.8114.75941.25
    KS1674,5133.702.9101550.86319109.6213.6554.20262.06
    KY2076,0183.434.4991337.6343997.5742.9881.07092.00
    LA36827,8711.324.6456000.223,540762.1134.2402.31315132.50
    MA17139,0660.436.9765600.064,955710.29255.2030.096486.88
    MD44713,5933.286.0832234.591,239203.68189.3120.388101.64
    ME2723,6477.451.3452711.52253188.1014.67713.77732.23
    MI27417,3791.5710.0451730.115,486546.1440.1010.68013213.14
    MN88917,6575.035.7003097.7250388.2425.3146.16191.57
    MO23614,1071.676.1692286.76903146.3734.1691.119121.94
    MS9693,31829.202.9891110.07758253.5923.82813.605144.68
    MT1604,0693.931.0863746.78161148.252.85151.66410.92
    NC1719,0720.0810.6111797.381,167109.9876.1230.02150.47
    ND393372410.550.7614893.5698128.774.156124.26011.31
    NE272234511.591.9521201.3312061.478.85915.72821.02
    NH2325,3964.291.3713935.81258188.1856.6222.98832.18
    NJ28435,6020.798.9363984.1113,3861497.99395.5550.08016118.01
    NM48811,1794.362.0965333.49237113.075.94439.16820.95
    NY1,661172,3600.9619.4408866.2659,5133061.37137.5820.62196549.63
    NV26210,5342.483.1393355.85920293.0810.9607.614154.77
    OH14820,6650.7111.7471759.171,653140.72101.1800.124292.46
    OK2061,63412.603.954413.25429108.5021.8402.385164.04
    OR1,02311,4268.954.3012656.59548127.4116.87914.090133.02
    PA40333,4551.2012.8202609.593,394264.74107.4780.292382.96
    RI5503,13417.541.0562967.80294278.40263.9341.97332.84
    SC973,7892.565.210727.26774148.5662.8220.296163.07
    SD1853,2185.740.9033563.689099.664.52145.31511.10
    TN7320,5740.356.8972983.041,537222.8563.1860.16771.01
    TX4825,7600.1829.472874.052,55286.5942.3650.038341.15
    UT27913,9931.993.2824263.56719219.0714.9265.69520.60
    VA31410,6092.958.6261229.89890103.1777.8610.467222.55
    VT29137017.860.6285893.31235374.2025.21518.3761219.10
    WA4,41559,2067.457.7977593.434,310552.7842.22313.41018924.24
    WI25917,6621.465.8513018.631,112190.0534.4911.283132.22
    WV383,1081.221.7781748.0312469.7428.3310.75400
    WY3081,64118.760.5672894.1887153.432.238242.70700
      Note. Abbreviations of USA states: AK (Alaska), AL (Alabama), AR (Arkansas), AZ (Arizona), CA (California), CO (Colorado), CT (Connecticut), DE (Delaware), FL (Florida), GA (Georgia), HI (Hawaii), ID (Idaho), IL (Illinois), IN (Indiana), IO (Iowa), KS (Kansas), KY (Kentucky), LA (Louisiana), MA (Massachusetts), MD (Maryland), ME (Maine), MI (Michigan), MN (Minnesota), MO (Missouri), MS (Mississippi), MT (Montana), NC (North Carolina), ND (North Dakota), NE (Nebraska), NH (New Hampshire), NJ (New Jersey), NM (New Mexico), NY (New York), NV (Nevada), OH (Ohio), OK (Oklahoma), OR (Oregon), PA (Pennsylvania), RI (Rhode Island), SC (South Carolina), SD (South Dakota), TN (Tennessee), TX (Texas), UT (Utah), VA (Virginia), VT (Vermont), WA (Washington), WI (Wisconsin), WV (West Virginia), WY (Wyoming). Variables: Tests wk I = number of tests performed in the first 7 days of testing; Total tests = total number of people tested; Wk I/all tests (%) = tests wk I/total tests (i.e., the proportion of all tests conducted during the first week of testing, expressed as a percentage of all tests performed in March, 2020); State pop/mill = the population of each state, expressed in million inhabitants; Total tests/mill inh = number of tests performed per 1 million inhabitants by March 31, 2020; Total cases = cumulative number of confirmed (test-positive) infections by March 31, 2020; Cases/mill inh = the number of cases in the population (expressed in million inhabitants); State pop dens = the state-specific number of inhabitants per square kilometer; Wk I tests/mill inh/state pop dens = the number of tests performed during week one/million inhabitants/state population density; Total deaths (count) = cumulative number of deaths through March 31, 2020; Deaths/mill inh = cumulative number of deaths per 1 million inhabitants through March 31, 2020.

    Table S1.  Epidemic data collected in all states of the USA in March, 2020

    The purpose of this study was to examine the ability of the six variables to predict COVID-19 related deaths in the United States during March of 2020. We ran the predictive model twice, once for each dependent variable: mortality count (overall number of deaths), and deaths per million inhabitants. Because our model (a) uses predictors that leverage information from multiple domains, (b) captures both nationwide and state-specific dimensions, and (c) examines two different mortality-related outcomes, the results are expected to have relevance for policy-makers.

    All data used in this study were obtained from three sources in the public domain: Worldometer (https://www.worldometers.info/coronavirus/), World Population Review (https://worldpopulationreview.com/states), and Covidtracking (https://covidtracking.com/). The data were processed and analyzed using IBM SPSS, Minitab, and R. Univariate skewness and kurtosis values indicated that all predictors and outcomes were non-normally distributed, with a few variables evidencing L-shaped distributions. The L-shaped variables were normalized using the rank-based inverse normal (RIN) transformation[3]. For extremely non-normal data, the RIN method is a highly effective normalizing transformation[3].

    The prediction models were first examined using linear multiple regression, with the RIN-transformed versions of all variables used in the regressions. Because the homoscedasticity assumption (i.e., constant variance of the predicted Y-values) was not met, we re-ran the prediction models using a non-parametric approach known as Kernel Regularized Least Squares (KRLS) Regression[4]. KRLS is an appropriate method to use when the assumptions of linear regression are not met and the precise functional forms between the predictors and outcomes are unknown. All KRLS regressions used the RIN-transformed variables and all analyses were performed using the KRLS package for R. The use of non-parametric, machine learning-based methods such as KRLS is consistent with recent calls to place greater reliance on artificial intelligence systems for understanding the causes and consequences of the COVID-19 pandemic[5].

    The KRLS regression results are presented in Table 1. For number of deaths, the six predictors accounted for 98.8% of the variance. Five of the predictors were statistically significant (P-values ≤ 0.002). Two of the significant predictors (i.e., number of test-positive cases, Cohen’s d = 2.3; and cases per million inhabitants, Cohen’s d = 1.3) represent different ways of quantifying the illness burden due to SARS-CoV-2 infection. The ratio of the two d values indicated that the predictive strength of number of test-positive cases was 77% greater than was cases per million inhabitants. Regarding the second dependent variable, the six predictors accounted for 92.6% of the variance in deaths per million inhabitants. Five of the predictors were significant (P-values ≤ 0.03). For this regression analysis, the number of test-positive cases (d = 1.1) and cases per million inhabitants (d = 1.4) were similar in predictive strength.

    Items Estimate Std. Error t value P-value
    Predictors of number of deaths
     Totaltests RIN 0.111 0.033 3.326 0.002
     Testedpermil RIN −0.153 0.026 −5.782 < 0.001
     Wkonepropalltests RIN 0.044 0.030 1.452 0.153
     Wkonepermilcitperpopden RIN 0.169 0.032 5.262 < 0.001
     Confircases RIN 0.568 0.035 16.340 < 0.001
     Casespermil RIN 0.215 0.023 9.185 < 0.001
    Predictors of deaths per million inhabitants
     Totaltests RIN −0.138 0.058 −2.352 0.023
     Testedpermil RIN 0.004 0.048 0.091 0.928
     Wkonepropalltests RIN 0.136 0.061 2.234 0.031
     Wkonepermilcitperpopden RIN 0.161 0.063 2.570 0.014
     Confircases RIN 0.408 0.055 7.353 < 0.001
     Casespermil RIN 0.441 0.045 9.748 < 0.001
      Note. All predictors were normalized using the rank-based inverse normal (RIN) transformation. Estimates are sample-average partial derivatives. The set of predictors accounted for 98.8% of the variance in number of deaths (R2 = 0.9875). For deaths per million citizens, the predictors accounted for 92.6% of the variance (R2 = 0.9264). Description of predictors: totaltests = number of tests performed in March of 2020; testedpermil = number of all tests conducted per million inhabitants, in March of 2020; wkonepropalltests = all tests conducted during the first week of testing, expressed as the percentage of all tests performed in March 2020; wkonepermilcitperpopden = the number of tests performed during week one per million inhabitants, divided by state-specific population density; confircases = total number of test-positive individuals, in March of 2020; casespermil = number of test-positive individuals per million inhabitants, in March of 2020.

    Table 1.  KRLS regression of potential predictors of COVID-19 related mortality

    In addition to number of test-positive cases and cases per million inhabitants, another interesting predictor was our geo-demographic variable (i.e., the number of diagnostic tests/million inhabitants/population density performed in week one of testing, or w1DT/MI/PD). This predictor was significantly associated with both dependent variables. Because w1DT/MI/PD is a complex, ratio-based predictor, discerning the precise nature of its predictive association from a single regression estimate alone is challenging. To further enhance the interpretation of this variable, we created two scatterplots showing the association between w1DT/MI/PD and each dependent variable. Both scatterplots include a best fitting linear regression line and a lowess line (with accompanying 95% confidence interval). Lowess stands for locally weighted scatterplot smoothing. The lowess line is the best fitting non-linear curve that tracks the data points in the scatterplot. The lowess curves allow us to make inferences about COVID-19 related deaths at low and high levels of w1DT/MI/PD. Such inferences are tantamount to examining COVID-19 related deaths for U.S. states scoring low versus high on the geo-demographic predictor variable. The scatterplots were created using the car package for R.

    As the lowess curve in the top panel of Figure 1 indicates, at higher and medium levels of w1DT/MI/PD, the association between the geo-demographic predictor and death count was strongly negative and moderately negative, respectively. In contrast, at lower levels of w1DT/MI/PD, there was little if any association between the geo-demographic variable and number of fatalities. The bottom panel of Figure 1 indicates that at lower levels of w1DT/MI/PD, the association between the geo-demographic variable and deaths per million inhabitants was moderately positive. At medium levels of w1DT/MI/PD, there was little if any association between the two variables. Finally, at higher levels of w1DT/MI/PD, there was a moderately strong negative association between the geo-demographic variable and deaths per million inhabitants.

    Figure 1.  Scatterplots depicting lowess curves (the middle dashed lines) and accompanying 95% confidence intervals (top and bottom dashed lines) for the association between number of tests during week 1/million inhabitants/population density and (A) number of COVID-19 related deaths (top panel) and (B) number of COVID-19 related deaths per million inhabitants (bottom panel). All variables were normalized using the rank-based inverse normal (RIN) transformation.

    In constructing our geo-demographic predictor variable, we controlled for population density because it is an important factor associated with disease transmission[6]. Moreover, because there typically is a lag time of several weeks or more between being infected with SARS-CoV-2 and showing disease-related symptoms, the association between population density and disease-related deaths should strengthen over time. To highlight this point, Figure 2 presents scatterplots showing the Pearson correlations between population density and cumulative COVID-19 related deaths per million inhabitants through March 31st and June 17th, 2020, respectively. The correlations were as follow: March 31st (r = 0.228, P > 0.05); June 17th (r = 0.800, P < 0.01). The difference between the two statistically dependent correlations was evaluated using Hittner, May and Silver’s modification of Dunn and Clark’s z test[7]. The two correlations were significantly different (z = 5.85, P < 0.0001), thereby supporting the prediction that the association between population density and COVID-19 related deaths will strengthen over time.

    Figure 2.  Scatterplots showing the Pearson correlations between population density and cumulative COVID-19 related deaths per million inhabitants through (A) March 31, 2020, top panel (r = 0.228, 95% CI: −0.054, 0.476) and (B) June 17, 2020, bottom panel (r = 0.80, 95% CI: 0.671, 0.882).

    To the best of our knowledge, this is the first study that examines testing-, case count- and geo-demographic variables as predictors of COVID-19 related deaths. Using a flexible, machine learning-based approach (KRLS regression), we found that our predictors accounted for very high percentages of outcome variance (98.8% and 92.6% for number of deaths and deaths per million inhabitants, respectively). Furthermore, with very few exceptions, our predictors were both statistically significant and practically important.

    One novel contribution of this study was our examination of a complex, ratio-based geo-demographic predictor variable. This variable—the number of diagnostic tests performed in week one of testing/million inhabitants/state-specific population density (w1DT/MI/PD)—significantly predicted COVID-19 related deaths, but did so differently depending on where, along the continuum of geo-demographic values, the predictive association was examined. At the lower end of the geo-demographic predictor, more tests during week one per million inhabitants, normalized by population density, were associated with more deaths per million citizens. In contrast, at the higher end of the geo-demographic predictor, more tests during week one per million inhabitants, normalized by population density, were associated with fewer deaths per million inhabitants. These different quantitative patterns could reflect different qualitative situations. In the first case (lower values on the geo-demographic variable, where more tests are associated with more deaths), testing seems to pursue a confirmatory purpose. In contrast, for the second case (higher values on the geo-demographic variable, where more tests are associated with fewer deaths), diagnostic testing appears to be emphasized[8]. One implication of these findings is that when examining our geo-demographic variable as a predictor of deaths, the inflection points along the lowess curves (the positions where the slope rises and falls) can serve as approximate cut-points demarcating three types of testing: confirmatory, diagnostic, and other.

    When testing prioritizes symptomatic cases, it is expected that most tested individuals will result in positive results (infection will be confirmed). Because deaths will occur within a subset of infected individuals, when testing is confirmatory (when only symptomatic patients are tested), more tests will be associated with more deaths. In contrast, when asymptomatic individuals are also tested, more tests, conducted earlier, will allow clinicians to detect, treat, and isolate infections earlier and prevent further viral dissemination which, in turn, will result in fewer deaths/million inhabitants. Our findings thus support an important recommendation from the World Health Organization, which is that early and frequent testing helps to prevent deaths[9].

    In addition to the contributions described above, we performed supplemental analyses examining the association between population density and COVID-19 related deaths. The role of population density in predicting epidemic dispersal and epidemic-related deaths is receiving increased research attention[10]. To the best of our knowledge, the present study is the first to demonstrate that the magnitude of association between population density and COVID-19 related deaths strengthens as the time since first infection increases. Understanding how factors such as testing frequency, the relative proportion of confirmatory versus diagnostic testing, and sociodemographic composition influence the temporal association between population density and COVID-19 related deaths is an important priority for future research.

    Overall, our findings highlight the importance of considering predictor variables from multiple domains. When ratio-based predictors such as our geo-demographic variable are analyzed, we recommend examining lowess curves as a visual interpretational aid for explicating the (often) complex non-linear associations between such ratio-based predictors and various outcomes of interest. An important direction for future research on epidemic dissemination and potential control is to examine both ratio-based composite variables—such as our geo-demographic measure—and traditional multiplicative interaction terms (created as linear products of two or more variables). The joint examination of both types of complex variables might result in greater predictive power and/or might foster additional insights into the dynamics of infectious diseases, such as COVID-19.

    This work was previously released as a preprint by J. B. Hittner, F. O. Fasina, A. L. Hoogesteijn, R. Piccinini, P. Kempaiah, S. D. Smith, and A. L. Rivas, with the title ‘Early and massive testing saves lives: COVID-19 related infections and deaths in the United States during March of 2020’ medRxiv 2020.05.14.20102483; https://doi.org/10.1101/2020.05.14.20102483.

    Acknowledgements The authors appreciate the data gathering efforts of those citizens who contributed to COVID-19 tracking (https://covidtracking.com). FOF is currently funded by the United States Agency for International Development (USAID) grant to the Food and Agriculture Organization of the United Nations, (Global Health Security Agenda- Zoonotic Diseases and Animal Health in Africa). The views and opinions expressed in this paper are those of the authors and not necessarily the views and opinions of the United States Agency for International Development and the Food and Agriculture Organization of the United Nations.

    Financial Support This research received no specific grant from any funding agency, commercial or not-for-profit entity.

    Conflict of Interest The authors declare that they have no conflict of interest.

    Author Contributions JBH and ALR designed the study. DM curated the original data. ALH, RP, PK, SDS, JBH, and FOF reviewed the literature and extracted and filtered the available data from online repositories. JBH conducted the statistical analyses. All authors contributed to writing of the report.

Reference (10)
Supplements:
20319.pdf

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return