Volume 37 Issue 3
Mar.  2024
Turn off MathJax
Article Contents

Brouwer S, Rivera-Hernandez T, Curren BF, et al. Pathogenesis, epidemiology and control of Group A Streptococcus infection. Nat Rev Microbiol, 2023; 21, 431−47. doi:  10.3967/bes2024.038
Citation: Brouwer S, Rivera-Hernandez T, Curren BF, et al. Pathogenesis, epidemiology and control of Group A Streptococcus infection. Nat Rev Microbiol, 2023; 21, 43147. doi:  10.3967/bes2024.038

Estimating the Scarlet Fever Epidemics Using a Seasonal Autoregressive Fractionally Integrated Moving Average Model

doi: 10.3967/bes2024.038
Funds:  This work was supported by the Key Scientific Research Project of Universities in Henan (21A330004) and Supported by the Open Project Program of the First Hospital of Xinxiang Medical University (XZZX2022002).
More Information
  • Author Bio:

    WANG Yong Bin, male, born in 1989, MD, majoring in infectious disease epidemiology

  • Corresponding author: WANG Yong Bin, MD, Tel: 86-373-3831646, E-mail: wybwho@163.com; XU Chun Jie, Tel: 86-373-3831646, E-mail: xuchunjie@imb.pumc.edu.cn
  • Received Date: 2023-10-23
  • Accepted Date: 2023-12-11
  • 加载中
  • [1] Brouwer S, Rivera-Hernandez T, Curren BF, et al. Pathogenesis, epidemiology and control of Group A Streptococcus infection. Nat Rev Microbiol, 2023; 21, 43147. doi:  10.1038/s41579-023-00865-7
    [2] Lamagni T, Guy R, Chand M, et al. Resurgence of scarlet fever in England, 2014-16: a population-based surveillance study. Lancet Infect Dis, 2018; 18, 1807. doi:  10.1016/S1473-3099(17)30693-X
    [3] Hurst JR, Brouwer S, Walker MJ, et al. Streptococcal superantigens and the return of scarlet fever. PLoS Pathogens, 2021; 17, e1010097. doi:  10.1371/journal.ppat.1010097
    [4] Ceylan Z. Estimation of COVID-19 prevalence in Italy, Spain, and France. Sci Total Environ, 2020; 729, 138817. doi:  10.1016/j.scitotenv.2020.138817
    [5] Wu WW, Li Q, Tian DC, et al. Forecasting the monthly incidence of scarlet fever in Chongqing, China using the SARIMA model. Epidemiol Infect, 2022; 150, e90. doi:  10.1017/S0950268822000693
    [6] Veenstra JQ. Persistence and anti-persistence: theory and software. Western University. 2013.
    [7] Liu YH, Ding H, Chang ST, et al. Exposure to air pollution and scarlet fever resurgence in China: a six-year surveillance study. Nat Commun, 2020; 11, 4229. doi:  10.1038/s41467-020-17987-8
    [8] Kim JH, Cheong HK. Increasing number of scarlet fever cases, South Korea, 2011-2016. Emerg Infect Dis, 2018; 24, 1723. doi:  10.3201/eid2401.171027
    [9] You YH, Davies MR, Protani M, et al. Scarlet fever epidemic in China caused by Streptococcus pyogenes serotype M12: epidemiologic and molecular analysis. EBioMedicine, 2018; 28, 12835. doi:  10.1016/j.ebiom.2018.01.010
    [10] Staszewska-Jakubik E, Czarkowski MP, Kondej B. Scarlet fever in Poland in 2014. Przegl Epidemiol, 2016; 70, 195202.
  • 23357+Supplementary Materials.pdf
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(7)  / Tables(5)

Article Metrics

Article views(202) PDF downloads(23) Cited by()

Proportional views
Related

Estimating the Scarlet Fever Epidemics Using a Seasonal Autoregressive Fractionally Integrated Moving Average Model

doi: 10.3967/bes2024.038
Funds:  This work was supported by the Key Scientific Research Project of Universities in Henan (21A330004) and Supported by the Open Project Program of the First Hospital of Xinxiang Medical University (XZZX2022002).
  • Author Bio:

  • Corresponding author: WANG Yong Bin, MD, Tel: 86-373-3831646, E-mail: wybwho@163.com; XU Chun Jie, Tel: 86-373-3831646, E-mail: xuchunjie@imb.pumc.edu.cn
Brouwer S, Rivera-Hernandez T, Curren BF, et al. Pathogenesis, epidemiology and control of Group A Streptococcus infection. Nat Rev Microbiol, 2023; 21, 431−47. doi:  10.3967/bes2024.038
Citation: Brouwer S, Rivera-Hernandez T, Curren BF, et al. Pathogenesis, epidemiology and control of Group A Streptococcus infection. Nat Rev Microbiol, 2023; 21, 43147. doi:  10.3967/bes2024.038
  • Scarlet fever (SF) is a common infectious disease caused by group A streptococcus (GAS)[1]. During the 18th and 19th centuries, SF was a significant cause of mortality in children aged 5–15 years worldwide[2]. The incidence and fatality rates of SF have decreased remarkably due to the widespread use of effective antibiotics and improvements in diet and sanitation[3]. However, the recent resurgence of SF has sparked significant interest in infectious diseases[1,3]. Given the insufficient understanding of the triggers that cause SF outbreaks and the absence of available vaccines to prevent GAS infection to date[1], effective prevention and control programs are needed to manage the ongoing spread of SF.

    Time-series analysis is an invaluable tool for decision-making and strategic planning because of its ability to uncover patterns, trends, and relationships by examining and interpreting data points collected over a period[4]. Seasonal autoregressive integrated moving average (SARIMA) is the most common model in the field of health because of its straightforward structure, rapid applicability, and ability to provide meaningful insights into datasets[4]. SARIMA has proven successful in estimating the prevalence, morbidity, and mortality of contagious diseases. This is achieved by capturing temporal dependency properties and accounting for changing trends, periodic fluctuations, and random variations in a time series[4,5]. However, SARIMA fails to capture long-term temporal dependencies because it is designed to model short-term fluctuations. Moreover, the use of integer differencing in SARIMA can lead to over-differencing, potentially removing valuable information that can affect parameter estimation and fitting. Conversely, the seasonal autoregressive fractionally integrated moving average (SARFIMA), which incorporates fractional differencing into SARIMA, overcomes these limitations[6]. Thus, it is well-suited for analyzing both short- and long-term memory in time series, such as finance, economics, hydrology, and meteorology[6]. Additionally, SARFIMA does not involve complex mathematical concepts. This transparency enables end users to understand how the model is built and to depend more on the results for decision-making purposes. Despite these promising attributes, no published work has been conducted using this method to analyze and evaluate SF epidemics. Therefore, the objectives of this study were to examine the flexibility and efficacy of SARFIMA in estimating SF epidemics in Liaoning, where the yearly average morbidity was 10.363 per 100,000 persons, which is significantly higher than the average level (average, 3.26 per 100,000 people per year) in China[7]. This study also aimed to compare the predictive accuracy and reliability of SARFIMA against SARIMA.

    We retrospectively gathered monthly SF incident cases and population data of Liaoning between January 2004 and December 2019 from the National Notifiable Infectious Disease Surveillance System and the Statistical Yearbook. Subsequently, the series was split into two segments: a training subset from January 2004 to December 2018 to establish the SARIMA and SARFIMA and a test subset from January to December 2019 to validate the generalization of the models. An additional testing dataset from January 2004 to December 2017 was used to determine the reliability of the models. The trend, seasonal, and irregular components were estimated using Seasonal and Trend decomposition with the Loess technique. The seasonal relative, indicating the amount by which the incidence for that specific time tended to be higher (or lower) than the average, was computed using multiplicative decomposition. Changing SF epidemic patterns were estimated using the average annual percentage change (AAPC) based on Joinpoint (version 4.8.0.1). SARIMA and SARFIMA were created using the “forecast” and “arfima” packages in R 4.2. Moreover, the incidence rate ratio (IRR) with a 95% confidence interval (CI) was computed in the pre- and post-outbreak SF by comparing the two proportions[2]. To determine the predictive quality, two types of measurement indicators were calculated: scale-dependent metrics, such as mean absolute deviation (MAD) and root mean square error (RMSE), and metrics based on percentage errors, such as mean absolute percentage error (MAPE), mean error rate (MER), and root mean square percentage error (RMSPE)[5]. A better model was developed by minimizing these metrics.

    The study included a total of 70,020 incident cases in Liaoning during 2004–2019, showing an overall increase, but no statistical significance in SF morbidity was observed, with AAPC = 4.493 (95% CI: −22.278 to 40.485; t = 0.291, P = 0.771). The highest number of 6,728 cases (15.818 per 100,000 people) was recorded in 2011, which was 2.085 times higher than the lowest number of 2,181 cases (5.142 per 100,000 people) in 2013 (Supplementary Figure S1, available in www.besjournal.com). The decomposition results indicated that the number of SF epidemics relatively increased during 2004–2010 (average 9.256 per 100,000 people annually), with an AAPC of 6.758 (95% CI: −13.399 to 31.607; t = 0.613, P = 0.54) (Supplementary Figure S2, available in www.besjournal.com). However, an unexpected escalation was noted in 2011, and since then, it has remained relatively steady (average 11.024 per 100,000 people annually), with AAPC = 1.899 (95% CI: −12.966 to 19.302; t = 0.234, P = 0.815). SF morbidity was higher by a factor of 1.191 across the period 2004–2010 (IRR = 1.191, 95% CI: 1.188 to 1.193) (Supplementary Figures S1S2). These results align with the SF resurgence in Hong Kong, China and South Korea[8]. However, this trend did not align with the resurgence of SF in England[2] in 2014. Moreover, the exact factors driving the increased pathogenicity of GAS are not fully understood. One probable explanation is the acquisition of novel prophages carrying new combinations of toxin and antimicrobial resistance genes. This is associated with the emergence and spread of predominant genotypes of emm12 and emm1 in China[3]. Another explanation could be the natural cyclic patterns of the SF. As mentioned earlier, SF epidemics in China have exhibited an approximately six-year cycle[9]. The unexpected surge since 2011 may indicate an emerging trend distinct from the previous phase of low morbidity. A third plausible reason is the relaxation of China’s two-child policy in 2011, which resulted in an increase in the number of susceptible populations. Fourth, improvements in the diagnostic capacity and increased awareness among medical workers may have contributed to the observed increase. Finally, worsening air quality in China may also be a contributing factor[7].

    Figure S1.  Yearly incident cases and incidence rate in Liaoning during 2004–2019. This plot pinpoints that the SF outbreak occurred in 2011 and there is a periodic cycle pattern of around 4–7 years.

    Figure S2.  A seasonal decomposition of the SF incidence series based on the STL technique. The (A) SF series is decomposed into (B) seasonal, (C) trend, and (D) irregular parts. It seems that there is a periodic outbreak pattern and a clear seasonality in SF incidence.

    Remarkable semi-annual seasonality was observed in this study, with one peak in May–June and another in November–December (Supplementary Figure S3, available in www.besjournal.com). Different climatic features and the beginning of spring and autumn semesters might drive these peak activities[2,7]. Our seasonal profile concurs with the prior literature in Hong Kong, China, South Korea[8], and China′s mainland[7]. However, it disagrees with earlier findings in England (peaking in February–March)[2] and Poland (peaking in January–March)[10]. This disagreement might be attributed to differences in school breaks, population density, socioeconomic status, lifestyle, climatic and ecological features, and predominant GAS emm gene types[1,2,7]. In East Asia (including China), emm1 and emm12 are the predominant gene types responsible for SF outbreaks[3], whereas in England, emm3, emm4, and emm12 are more prevalent[2]. These variations in gene types may contribute to differences in the timing of SF peaks between East Asia and Europe[3]. Additionally, SF epidemics remain at a low level in February each year, and some activities, such as the winter holidays and Spring Festival in China, could explain the reduced SF epidemics[7].

    Figure S3.  The decomposed seasonal relative (SR) for the SF morbidity series using the multiplicative decomposition method. A value of SR = 1 means that the incidence for that period is exactly the same as the average. A value of SR > 1 means the incidence is higher than the average (indicating a high-risk season), and a value of SR < 1 means this period’s incidence is lower than the average (indicating a low-risk season). As shown, SF epidemics present pronounced dual seasonal patterns per year.

    The SARFIMA parameters (including p, q, P, and Q) were estimated, and the number of differencing orders and preferred modes were selected by eliminating those with a lower log-likelihood (refer to Supplementary Materials [SARFIMA method], available in www.besjournal.com). Supplementary Table S1 (available in www.besjournal.com) summarizes the modes of the best SARFIMA (3, 0, 1)(3, −0.347, 0)12, suggesting that the preferred model was selected as the one with mode 1. This model reported the maximum log-likelihood (−1193.7), coupled with the minimum Akaike’s information criterion (2411.391) and Bayesian information criterion (2449.706). Also, the autocorrelation function (ACF) and partial ACF results for the forecast error are provided in Supplementary Figure S4 (available in www.besjournal.com), showing most correlations were within the 95% CI. The Ljung-Box Q test ($ {\chi ^2} $ = 0.158, P = 0.691) indicated no serial correlations in the residuals. These checks confirmed a white noise series of forecast errors. Similarly, based on the modeling processes, SARFIMA (2, −0.302, 1)(1, 0.471, 2)12 was selected as the optimal model fitted to the SF incidence series during 2004–2017; the model diagnoses for their key parameters and residuals are illustrated in Supplementary Table S2 and Supplementary Figure S5 (available in www.besjournal.com). Accordingly, a prediction of the future 12 and 24 data points was achieved based on these two best SARFIMA models (Table 1 and Supplementary Table S3, available in www.besjournal.com).

    Time Observations SARIMA (3, 0, 1)(3, 1, 0)12 SARFIMA (3, 0, 1)(3, −0.347, 0)12
    Forecasts 95% CI Forecasts 95% CI
    January 450 397 222 to 573 416 238 to 593
    February 182 78 −171 to 327 134 −123 to 392
    March 343 206 −64 to 475 228 −48 to 505
    April 418 280 5 to 555 298 19 to 577
    May 538 594 314 to 875 540 259 to 822
    June 587 688 398 to 977 603 315 to 891
    July 380 300 2 to 598 305 8 to 602
    August 119 42 −263 to 347 99 −204 to 403
    September 217 106 −204 to 417 155 −153 to 462
    October 300 195 −119 to 509 235 −75 to 545
    November 559 505 187 to 823 488 176 to 801
    December 617 672 351 to 993 599 284 to 914
      Note. SARIMA, seasonal autoregressive integrated moving average; SARFIMA, seasonal autoregressive fractionally integrated moving average; CI, confidence interval.

    Table 1.  Forecasts between January and December 2019 using SARIMA and SARFIMA

    Modes AIC BIC LL
    Mode 1 1654.925 1690.048 −816.463
    Mode 2 1656.486 1691.608 −817.243
    Mode 3 1656.844 1691.967 −817.422
    Mode 4 1658.588 1693.71 −818.294
    Mode 5 1658.738 1693.861 −818.369
    Mode 6 1659.058 1694.18 −818.529
    Mode 7 1659.949 1695.072 −818.975
    Mode 8 1660.268 1695.39 −819.134
    Mode 9 1662.806 1697.928 −820.403
    Mode 10 1663.734 1698.856 −820.867
    Mode 11 1665.047 1700.17 −821.524
    Mode 12 1666.785 1701.907 −822.392
    Mode 13 1666.866 1701.989 −822.433
    Mode 14 1667.475 1702.597 −822.737
    Mode 15 1672.506 1707.629 −825.253
    Mode 16 1675.773 1710.896 −826.887
    Mode 17 1683.103 1718.225 −830.551
    Mode 18 1692.033 1727.156 −835.017
    Mode 19 1692.24 1727.362 −835.12
    Mode 20 1692.442 1727.565 −835.221
    Mode 21 1700.832 1735.954 −839.416
    Mode 22 1702.176 1737.299 −840.088
    Mode 23 1703.389 1738.512 −840.695
    Mode 24 1714.219 1749.341 −846.109
    Mode 25 1714.768 1749.89 −846.384
    Mode 26 1732.879 1768.002 −855.44
    Mode 27 1738.179 1773.301 −858.089
      Note. SARFIMA, seasonal autoregressive fractionally integrated moving average; AIC, Akaike’s information criterion; BIC, Bayesian information criterion; LL, log−likelihood.

    Table S1.  Resultant candidate modes under the SARFIMA(3, 0, 1)(3, −0.347, 0)12

    Figure S4.  ACF and PACF plots for the residual series under the SARIMA and SARFIMA. (A) ACF and (B) PACF plots for the residual series under the SARIMA, (C) ACF and (D) PACF plots for the residual series under the SARFIMA. Here the correlogram demonstrates that most spikes fall within the 95% CI except for few outside this significance bounds (which is also reasonable because some high-order correlations easily exceed the significance bounds by chance alone), indicating that there is little evidence of non-white noise in the forecast errors.

    Modes AIC BIC LL
    Mode 1 1561.067 1595.431 −769.534
    Mode 2 1561.444 1595.808 −769.722
    Mode 3 1561.448 1595.811 −769.724
    Mode 4 1561.448 1595.811 −769.724
    Mode 5 1561.45 1595.814 −769.725
    Mode 6 1561.486 1595.85 −769.743
    Mode 7 1561.495 1595.859 −769.748
    Mode 8 1561.499 1595.863 −769.75
    Mode 9 1561.502 1595.865 −769.751
    Mode 10 1561.506 1595.87 −769.753
    Mode 11 1561.509 1595.873 −769.755
    Mode 12 1561.511 1595.874 −769.755
    Mode 13 1561.511 1595.875 −769.755
    Mode 14 1561.512 1595.875 −769.756
    Mode 15 1561.512 1595.875 −769.756
    Mode 16 1561.512 1595.876 −769.756
    Mode 17 1561.512 1595.876 −769.756
    Mode 18 1561.514 1595.878 −769.757
    Mode 19 1561.74 1596.103 −769.87
    Mode 20 1561.753 1596.117 −769.877
    Mode 21 1561.776 1596.14 −769.888
    Mode 22 1561.791 1596.154 −769.895
    Mode 23 1561.811 1596.174 −769.905
    Mode 24 1561.83 1596.194 −769.915
    Mode 25 1561.843 1596.207 −769.922
    Mode 26 1561.846 1596.21 −769.923
    Mode 27 1561.855 1596.218 −769.927
    Mode 28 1561.86 1596.224 −769.93
    Mode 29 1561.865 1596.229 −769.933
    Mode 30 1561.867 1596.231 −769.934
    Mode 31 1561.883 1596.247 −769.941
    Mode 32 1561.968 1596.331 −769.984
    Mode 33 1562.251 1596.615 −770.126
    Mode 34 1562.261 1596.625 −770.131
    Mode 35 1562.871 1597.235 −770.436
    Mode 36 1563.323 1597.687 −770.661
    Mode 37 1563.362 1597.726 −770.681
    Mode 38 1563.447 1597.81 −770.723
    Mode 39 1563.761 1598.124 −770.88
    Mode 40 1564.014 1598.378 −771.007
    Mode 41 1564.144 1598.507 −771.072
    Mode 42 1564.149 1598.512 −771.074
    Mode 43 1564.15 1598.513 −771.075
    Mode 44 1564.158 1598.521 −771.079
    Mode 45 1564.159 1598.523 −771.08
    Mode 46 1564.162 1598.526 −771.081
    Mode 47 1564.189 1598.553 −771.095
    Mode 48 1564.213 1598.576 −771.106
    Mode 49 1564.377 1598.741 −771.189
    Mode 50 1564.683 1599.047 −771.342
    Mode 51 1564.755 1599.119 −771.377
    Mode 52 1564.992 1599.356 −771.496
    Mode 53 1565.249 1599.612 −771.624
    Mode 54 1565.656 1600.02 −771.828
    Mode 55 1566.507 1600.87 −772.253
    Mode 56 1567.261 1601.625 −772.631
    Mode 57 1567.358 1601.722 −772.679
    Mode 58 1567.595 1601.959 −772.798
    Mode 59 1567.683 1602.047 −772.842
    Mode 60 1567.746 1602.109 −772.873
    Mode 61 1568.846 1603.21 −773.423
    Mode 62 1569.285 1603.649 −773.643
    Mode 63 1570.262 1604.625 −774.131
    Mode 64 1570.905 1605.268 −774.452
    Mode 65 1570.961 1605.325 −774.481
    Mode 66 1571.379 1605.742 −774.689
    Mode 67 1573.295 1607.658 −775.647
    Mode 68 1574.893 1609.257 −776.447
    Mode 69 1575.204 1609.567 −776.602
    Mode 70 1575.716 1610.079 −776.858
    Mode 71 1576.168 1610.531 −777.084
    Mode 72 1576.184 1610.547 −777.092
    Mode 73 1576.558 1610.921 −777.279
    Mode 74 1576.932 1611.296 −777.466
    Mode 75 1580.158 1614.521 −779.079
    Mode 76 1590.366 1624.73 −784.183
    Mode 77 1590.854 1625.218 −784.427
    Mode 78 1595.913 1630.276 −786.956
    Mode 79 1598.18 1632.544 −788.09
    Mode 80 1600.319 1634.682 −789.159
    Mode 81 1611.598 1645.962 −794.799
      Note. SARFIMA, seasonal autoregressive fractionally integrated moving average; AIC, Akaike's information criterion; BIC, Bayesian information criterion; LL, log-likelihood.

    Table S2.  Resultant candidate modes under the SARFIMA(2, −0.302, 1)(1, 0.471, 2)12

    Figure S5.  ACF and PACF plots for the residual series under the SARIMA and SARFIMA based on the data during 2004-2017 in Liaoning. (A) ACF and (B) PACF plots for the residual series under the SARIMA, (C) ACF and (D) PACF plots for the residual series under the SARFIMA. Here the correlogram demonstrates that most spikes fall within the 95% CI except for few outside this significance bounds (which is also reasonable because some high-order correlations easily exceed the significance bounds by chance alone), indicating that there is little evidence of non-white noise in the forecast errors.

    Additionally, we identified the SARIMA (3, 0, 1)(3, 1, 0)12 and SARIMA (2, 1, 2)(1, 1, 2)12 specifications based on the modeling steps as the best models for the incidence series during 2004–2018 and 2004–2017, respectively (refer to Supplementary Materials [SARIMA method], Supplementary Table S4, and Supplementary Figure S6, available in www.besjournal.com). Further diagnostic checks for residuals are provided in Supplementary Figures S4S5 and Supplementary Table S4. Figure 1 and Table 2 compare the forecasting accuracy and reliability of SARIMA and SARFIMA. The MAD, MAPE, RMSE, RMSPE, and MER values under SARFIMA were lower than those under SARIMA for these two datasets. This indicates that SARFIMA offered a clearer perspective than SARIMA on capturing the dynamic dependency structure in the spread of SF. Previous literature has also indicated that SARFIMA is sufficient for forecasting oil supply, road fatality rate, temperature, and hemorrhagic fever with renal syndrome, and some generate more accurate results than SARIMA[6]. These studies provide additional support for our findings and reinforce the usefulness of SARFIMA as a promising alternative for analyzing SF trends and seasonality.

    Figure 1.  Comparison of the observed curves with the forecast curves under the SARIMA and SARFIMA. (A) 12 hold-out data forecasts under the SARIMA in Liaoning, (B) 12 hold-out data forecasts under the SARIMA in mainland China, (C) 12 hold-out data forecasts under the SARFIMA in Liaoning, and (D) 12 hold-out data forecasts under the SARFIMA in mainland China. The grey shaded area signifies the forecasted curve with 95% CI in plots. It appears that the forecasts under the SARFIMA are closer to the observed curves.

    Models AIC CAIC BIC LL Ljung-Box Q test
    $ {\chi ^2} $ P
    SARIMA(3, 0, 1)(0, 1, 1)12 2012.96 2013.48 2031.70 −1000.48 0.108 0.743
    SARIMA(3, 0, 1)(3, 1, 0)12 2007.61 2008.51 2032.60 −995.80 0.047 0.829
    SARIMA(3, 0, 1)(2, 1, 0)12 2018.84 2019.54 2040.71 −1002.71 0.130 0.718
    SARIMA(3, 0, 1)(1, 1, 0)12 2050.74 2051.26 2069.49 −1019.37 0.053 0.818
    SARIMA(3, 0, 1)(1, 1, 1)12 2014.59 2015.29 2036.46 −1000.30 0.120 0.729
    SARIMA(2, 0, 1)(3, 1, 0)12 2014.53 2015.23 2036.40 −1000.27 0.004 0.949
    SARIMA(1, 0, 1)(3, 1, 0)12 2014.55 2015.07 2033.29 −1001.27 0.114 0.735
      Note. SARIMA, seasonal autoregressive integrated moving average; AIC, Akaike’s information criterion; CAIC, corrected Akaike’s information criterion; BIC, Bayesian information criterion; LL, log-likelihood.

    Table S4.  Identified possible SARIMA with the AIC, CAIC, BIC, and LL values

    Figure S6.  ACF and PACF plots for the seasonally differenced series in Liaoning. (A) ACF plot, and (B) PACF plot. The significant spike at lag 3 in the PACF indicates that the maximum orders may be 3 in the non-seasonal AR component, the significant spike at lag 10, 11, and 12, along with 23, 24, and 25 in the ACF suggests that the maximum orders may be 3 in the seasonal AR component. The significant spikes at lag 12, 24, and 36 in the ACF suggests that the maximum orders may be 1 in the seasonal MA component.

    SARFIMA extends SARIMA, replacing the differencing term with fractional integration. With the introduction of fractional integration[6], SARFIMA has the potential to capture long-range dependence and long memory effects in SF incidence series; it enables better capture of nonlinear patterns and complexities; it can offer more flexibility in modeling the dependence structure of SF epidemics; it can accommodate both seasonal and non-seasonal series, allowing for the modeling of multiple seasonal patterns; and it becomes robust to outliers because the inclusion of long memory helps to smooth out the effects of outliers. This explains why SARFIMA generates more accurate and flexible predictions than SARIMA. Considering the appeal of SARFIMA, the importance of this sophisticated model as a powerful forecasting tool should be underscored when analyzing the temporal levels of other communicable diseases. However, further validation is required to confirm this finding. Furthermore, several new advanced statistical techniques, such as Bayesian structural time series, flexible transmitter networks, error-trend-seasonal frameworks, and age-structure mathematical models, have recently emerged, showing potential for time series forecasting. Consequently, studies that specifically focus on comparing the predictive quality of SARFIMA with the aforementioned techniques are essential.

    This study has several limitations. First, because SF has become a milder disease with a low fatality rate since the 20th century[10], patients with mild symptoms may not seek medical attention, leading to underreporting and underdiagnosis of SF cases. Second, owing to the unavailability of data from to 2020–2023, we only used data from 2004–2019 to indicate model performance. The COVID-19 outbreak significantly changed the SF epidemic trend from 2020–2023. Thus, the model should be regularly updated by incorporating new data to ensure reliable forecasting. Third, the findings of this study pertain specifically to how well SARFIMA estimates SF epidemics. Additional studies are needed to validate the efficiency of this method in estimating epidemics of other communicable diseases. Fourth, 100 or more observations are anticipated to be used to construct SARFIMA in applications. Finally, although SARFIMA with exogenous variables could potentially offer a higher forecasting accuracy, we could not obtain these SF-related variables; hence, further analyses were excluded.

    In summary, SARFIMA is a versatile model that can capture both short- and long-term dependencies in SF incidence as well as seasonal patterns. Its ability to capture complex dynamics and accurately forecast SF epidemics enables it to have advantages over SARIMA. Consequently, SARFIMA should be considered a valuable alternative for estimating SF epidemics to make informed decisions, optimize resources, and plan for the future. Furthermore, the incidence of SF remains high in Liaoning under current interventions. This highlights the need for additional preventive and control measures to address this evolving situation.

    Models MAD MAPE RMSE RMSPE MER
    12 hold−out data forecasts for the SF incidence in Liaoning
    SARIMA (3, 0, 1) (3, 1, 0)12 89.124 0.300 94.142 0.227 0.355
    SARFIMA (3, 0, 1) (3, −0.347, 0)12 53.747 0.168 64.959 0.137 0.200
    24 hold−out data forecasts for the SF incidence in Liaoning
    SARIMA (2, 1, 2) (1, 1, 2)12 133.123 0.486 153.433 0.325 0.684
    SARFIMA (2, −0.302, 1) (1, 0.471, 2)12 63.199 0.239 81.777 0.154 0.384
      Note. SARIMA, seasonal autoregressive integrated moving average; SARFIMA, seasonal autoregressive fractionally integrated moving average; MAD, mean absolute deviation; MAPE, mean absolute percentage error; RMSE, root mean square error; RMSPE, root mean square percentage error; MER, mean error rate.

    Table 2.  Comparison of forecast accuracy and reliability under SARIMA and SARFIMA

Reference (10)
Supplements:
23357+Supplementary Materials.pdf

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return