
Scarlet fever (SF) is a common communicable disease that results from group A Streptococcus (GAS) infections^{[1]}. SF accounted for the global loss of life among children 5–15 years of age in the 18th and 19th centuries^{[2]}. A rapid reduction in SF morbidity and mortality occurred due to the scaleup of effective antibiotics and improvements in sanitation and nutrition^{[3]}. The unexpected increase in the incidence of SF has attracted a renewed interest in infectious diseases^{[3]}. Because the triggers that cause SF outbreaks are not fully understood and there is a scarcity of available vaccines protecting susceptible populations from GAS infections, effective prevention and control plans are required to stop the continued spread of SF.
Time series analysis assists in the development of hypotheses to explain the temporal patterns of different diseases and to analyze the spread, therefore, facilitating the creation of a quality forecasting system. The seasonal autoregressive integrated moving average (SARIMA) model has been widely applied to estimate the epidemiological patterns of contagious diseases because this model has a simple structure, fast applicability, and a relatively high forecasting reliability level ^{[4]}. It has been shown that the SARIMA model is able to satisfactorily estimate a simple time series ^{[4]}, but it is difficult to manage complex time series, such as the data with multiple seasonal periods, highfrequency seasonality, noninteger seasonality, and dualcalendar effects. By comparison, the innovation statespace modelling framework that combines BoxCox transformations, Fourier series with timevarying coefficients, and autoregressive moving average (ARMA) error correction (known as the TBATS method) is customized for use with the patterns included in a complex time series described above ^{[5]}. In addition, the TBATS model is used for linearity and some types of nonlinearity in a complex time series based on BoxCox transformations ^{[5]}, which makes it possible to perform a multistep ahead prediction. Moreover, the TBATS model is able to decompose a complex seasonal time series into its trend, seasonal, and irregular components ^{[5]}, which is not able to be undertaken by use of the SARIMA model. Importantly, SF morbidity has been shown to display dual seasonal patterns in some countries. Therefore, this study analyzed the longterm epidemic patterns using the TBATS model. The forecasting power under the TBATS model was compared with the SARIMA model.
We obtained the monthly SF incidence and population data between January 2004 and December 2019 from the Chinese CDC and the Statistical Yearbook of China, respectively. Then, we partitioned the SF morbidity series into two segments comprising a training dataset from January 2004 to December 2017 to construct the SARIMA and TBATS models and a testing dataset from January 2018 to December 2019 to test the generalization of both models. Two additional datasets were provided to test the robustness of both models: the first 180 data sets from January 2004 to December 2018 and 156 data sets from January 2004 to December 2016 were treated as training datasets, respectively; and the remainder were testing datasets. We estimated the changing epidemiologic trends of SF based on annual percentage change (APC) and average APC (AAPC) using the Joinpoint regression program (version 4.8.0.1). We constructed the TBATS model with the “forecast,” “tseries,”, and “FinTS” packages in R software (version 3.4.3). The incidence rate ratio (IRR) with a 95% uncertainty limit (UL) before and after the SF outbreak was calculated using the method proposed by Armitage and Berry ^{[6]}. The mean absolute deviation (MAD), root mean square error (RMSE), mean absolute percentage error (MAPE), mean error rate (MER), and root mean square percentage error (RMSPE) were computed to compare the forecasting ability of both models.
During the study period, a notable increase in SF morbidity was detected, with an AAPC = 8.942 (95% UL: 5.995–11.971; t = 6.697, P < 0.001; Supplementary Figure S1, available in www.besjournal.com), and the highest morbidity (5.930/100,000 persons) occurred in 2019, which increased by a factor of nearly four compared with the lowest level (1.489/100,000 population) in 2004 (Supplementary Figure S2, available in www.besjournal.com). The SF epidemics remained relatively steady from 2004–2010 (average, 1.937/100,000 persons annually), with an AAPC = −0.840 (95% UL: −7.678 to 6.505; t = −0.231, P = 0.817). An unexpected outbreak was witnessed in 2011, and since then a rapidly increasing trend occurred (average, 4.580/100,000 persons annually [excluding 2013]; the exact causes regarding this annual drop are unknown; Supplementary Figure S2), with an AAPC = 5.952 (95% UL: 0.239–11.992; t = 2.466, P = 0.043), which showed good agreement with a resurgence of SF in Hong Kong, China^{[7]}, but inconsistent with England where the resurgence occurred in 2014 ^{[8]}. There has been a doubling in the SF incidence during the postresurgence periods compared with the preresurgence periods (IRR = 2.364, 95% UL: 2.358–2.370). Nevertheless, the driving force associated with the increased pathogenicity of GAS fails to be elucidated. A possible explanation may be due to the acquisition of novel prophages harboring new hybridizations of toxin genes and antimicrobial resistance genes, which is related to the emergence and expansion of the predominant genotypes of emm12 and emm1 in China ^{[3]}. Another explanation may be associated with the natural periodicity of the SF incidence (SF epidemics are characterized by a cyclic change of approximately 6 years ^{[3]}). A third explanation may be linked to the relaxation of the 2child policy in 2011 ^{[3]}, which led to an increase in the number of susceptible individuals. A fourth reason may be attributed to improvements in the diagnostic capacity and the increased awareness of medical workers in reporting SF ^{[3]}. A fifth reason may be a result of the deterioration of air quality in China ^{[9]}, despite the gradual improvements in the last 2 years. Finally, there are no vaccines available to prevent infections with GAS until now.
Figure S1. Joinpoint regression showing the scarlet fever epidemic trends in the period 2004–2019. *Showed that the annual percent change (APC) is significantly different from zero at the significance level. This plot was plotted using Joinpoint regression program (Version 4.8.0.1). Joinpoint is a statistical tool for the description and analysis of trends with the Joinpoint methods. This Joinpoint Regression Program is designed as a Windowsbased statistical software package to analyze Joinpoint methods. Users can use this software to investigate whether there is a statistically notable change in trend (Available from https://surveillance.cancer.gov/joinpoint/. Accessed on 16 May 2022).
Figure S2. The yearly incidence of SF from 2004 to 2019. It was seen that SF had the lowest incidence in 2004 (1.489 per 100,000 population) and the highest morbidity in 2019 (5.930 per 100,000 persons).
A marked semiannual seasonal behavior occurred in the monthly SF incidence, with a strong peak between May and June, and a weak peak between November and December (Supplementary Figures S3–S4, available in www.besjournal.com). We surmised that different climatic features and beginning of spring and autumn semesters contributed to this difference in the margin of peak activities. Our seasonal profile correlates well with previous findings from Hong Kong, China ^{[7]}; however, discordant with that in England, which peaked between February and March ^{[8]}. This inconsistency may be due to the different school breaks, population density, and different GAS emm gene types in east Asia and Europe ^{[1, 8]}. In addition, the SF epidemics retain the lowest level in February every year (Supplementary Figure S3), attributable to the winter holidays and the Spring Festival.
Figure S3. Time series plot showing the monthly SF incidence rate and the decomposed seasonal, trend, and irregular components based on the STL technique. (A) Original monthly SF morbidity series. (B) Decomposed trend component. (C) Decomposed seasonal component. (D) Decomposed random component. As illustrated, overall the SF morbidity showed a rising tendency, with a sudden increase in 2011, and having notable seasonal patterns.
Figure S4. The decomposed seasonal factors for the scarlet fever morbidity series using the multiplicative decomposition method (The classical Seasonal Decomposition technique can be applied to decompose a time series into a seasonal pattern, a trend component, and a random fluctuation component. This technique is implemented based on the ratiotomovingaverage method. The Seasonal Decomposition technique can be divided into two types including a multiplicative and an additive method. For a multiplicative decomposition method, the seasonal component is a factor by which the seasonally adjusted series is multiplied to produce original observed values. For a time series without seasonal variation, the seasonal pattern is deemed as one. For an additive decomposition method, the seasonal adjustments are added to the seasonally adjusted series to yield the original series. For the morbidity series of infectious diseases, the multiplicative decomposition method typically produces a more reliable decomposed result in application. By employing a multiplicative decomposition method to an object series, the seasonal pattern or seasonal factors included in the object series can be obtained. Such a decomposition is very helpful in removing the seasonal effect from a series so that we can visibly look into the other characteristics of interest in the series that may be masked by the seasonal trait^{[7,8]}).
The forecasts under the TBATS approach rely largely on the number of harmonics k_{i} applied for each seasonal pattern. As a result, in selecting the number of harmonics k_{i}, considering one seasonal component each time, we then fitted the model on the target data repeatedly via gradually increasing the number of harmonics k_{i}, but holding the remaining harmonics constant for each i until the optimal AIC is obtained. In determining the most suitable orders (p and q) of the ARMA model, we used the automatic procedure proposed by Hyndman and colleagues^{[10]} to fit the forecasting residuals. If the selected model with the ARMA (p, q) residual component generates a smaller AIC than the one model without the ARMA (p, q) residual component, this selected specification would be considered as the best possible model; otherwise, the ARMA (p, q) residual component is deleted. After modelling by trial and error, the TBATS (0.04, {4,0}, 0.882, {<12,5>}) specification was selected as the optimal model in that the minimum AIC (−197.965) was detected in this model, and the identified key parameters of this best TBATS model are reported in Supplementary Table S1, available in www.besjournal.com. Additional statistical diagnoses for the forecast errors are provided in Supplementary Table S2 and Supplementary Figures S5–S6, available in www.besjournal.com. The LjungBox Q statistics of the forecast errors produced a Q_{(18)} = 11.442 with a Pvalue of 0.875, indicating no serial correlations in this residual series. Moreover, the ARCH effect was largely removed because the LM_{(18)} = 23.808 with a Pvalue of 0.302. These results confirmed the adequacy of the model specifications. Similarly, based on the modelling steps described above, the TBATS (0.01, {0,0}, 0.898, {<12,5>}) and TBATS (0.048, {0,0}, 0.902, {<12,5>}) specifications tended to be the preferred models for forecasting the 12 and 36holdout periods (Supplementary Tables S3–S4 and Supplementary Figures S7–S10, available in www.besjournal.com).
Parameters 24step ahead forecasting 12step ahead forecasting 36step ahead forecasting TBATS (0.04, {4,0}, 0.882,
{<12,5>}) modelTBATS (0.01, {0,0}, 0.898,
{<12,5>}) modelTBATS (0.048, {0,0}, 0.902,
{<12,5>}) modelLambda 0.0401 0.0105 0.0480 Alpha 0.1652 0.6489 0.6795 Beta −0.0252 −0.0362 −0.0432 Damping Parameter 0.8818 0.8982 0.9024 Gamma1 Values 0.0002 −0.0046 −0.0048 Gamma2 Values −1.2682 0.0050 0.0072 AR coefficients AR1 = 0.4772, AR2 = 0.2318,
AR3 = −0.1894, AR4 = 0.3222− − Seed states 1 −2.3530 −2.4373 −2.3215 2 0.1089 0.1130 0.1015 3 −0.0246 −0.0185 −0.0251 4 −0.0748 −0.0757 −0.0731 5 −0.0069 −0.0077 −0.0065 6 0.0213 0.0266 0.0207 7 0.0296 0.0312 0.0279 8 −0.1547 0.1489 0.1353 9 −0.5953 −0.6307 −0.5756 10 −0.0602 −0.0649 −0.0598 11 0.1463 −0.1521 −0.1404 12 −0.0231 −0.0248 −0.0218 13 0.0000 − − 14 0.0000 − − 15 0.0000 − − 16 0.0000 − − Sigma 0.1686 0.1909 0.1797 AIC −197.9650 −175.3322 −198.4532 Parameters 48step ahead forecasting 72step ahead forecasting Future 36step ahead forecasting TBATS (0.037, {0,0}, 0.901,{<12,5>}) TBATS (0.029, {0,0}, 0.87,{<12,4>}) TBATS (0.023, {0,0}, 0.895, {<12,5>}) Lambda 0.0368 0.0292 0.0225 Alpha 0.6900 0.7253 0.6444 Beta −0.0410 −0.1128 −0.0378 Damping Parameter 0.9013 0.8702 0.8955 Gamma1 Values −0.0069 −0.0115 −0.0050 Gamma2 Values 0.0052 0.0035 0.0037 AR coefficients − − − Table S1. The best TBATS models obtained on the basis of different datasets and their key parameters
Lags SARIMA model TBATS model BoxLjung Q P LMtest P BoxLjung Q P LMtest P 1 0.044 0.833 0.001 0.979 0.005 0.943 52.651 < 0.001 3 1.944 0.584 2.364 0.500 0.235 0.972 22.234 < 0.001 6 3.720 0.715 4.223 0.647 1.246 0.975 23.555 0.001 9 6.093 0.731 5.981 0.742 4.538 0.873 23.624 0.005 12 13.021 0.368 9.427 0.666 7.320 0.836 23.416 0.024 15 13.094 0.595 12.379 0.650 10.456 0.790 23.533 0.073 18 13.663 0.751 19.872 0.340 11.442 0.875 24.550 0.138 21 17.313 0.692 20.262 0.505 19.031 0.583 23.808 0.302 24 23.237 0.506 29.352 0.207 23.061 0.516 24.015 0.461 27 26.060 0.515 25.492 0.547 23.466 0.660 23.924 0.635 30 26.082 0.671 26.422 0.653 25.230 0.714 24.918 0.729 33 29.133 0.660 29.863 0.624 29.004 0.667 24.941 0.842 36 41.417 0.246 37.230 0.412 30.707 0.718 24.391 0.929 Note. SARIMA seasonal autoregressive integrated moving average method, TBATS an advanced innovation statespace modelling framework by combining BoxCox transformations, Fourier series with timevarying coefficients, and autoregressive moving average (ARMA) error correction. Table S2. BoxLjung Q and LM tests for the residual series of the SARIMA and TBATS models
Figure S5. Estimated autocorrelation function (ACF) and partial ACF (PACF) plots to display the statistical checking for the residual series of the SARIMA and TBATS models. (A) ACF plot for the SARIMA model. (B) PACF plot for the SARIMA model. (C) ACF plot for the TBATS model. (D) PACF plot for the TBATS model. We can see that the correlation coefficients at lags 11, 23, and 36 in Figure 2A and 2B, and at lag 19 in Figure 2C and 2D touched the confidence levels (this is reasonable as the correlation coefficients at higherorder may exceed the significance bounds by chance alone), but all other correlation coefficients at different lags failed to touch the confidence levels, seemingly suggesting effective and sufficient models.
Figure S6. The extracted level, slope, and seasonal components of the best TBATS method developed on the basis of the data from January 2004 to December 2017.
Lags SARIMA (0,1,0)(0,1,1)_{12} model TBATS(0.01, {0,0}, 0.898, {<12,5>}) model BoxLjung Q P LMtest P BoxLjung Q P LMtest P 1 0.024 0.878 0.000 0.993 0.018 0.892 44.923 < 0.001 3 2.353 0.503 2.196 0.533 9.428 0.024 17.323 < 0.001 6 4.529 0.605 4.287 0.638 27.713 < 0.001 18.742 0.005 9 7.720 0.563 6.063 0.734 35.665 < 0.001 22.380 0.008 12 15.292 0.226 9.747 0.638 39.404 < 0.001 22.715 0.030 15 15.349 0.427 13.651 0.552 48.268 < 0.001 22.695 0.091 18 15.980 0.594 21.290 0.265 53.640 < 0.001 25.463 0.113 21 18.952 0.588 21.569 0.425 70.249 < 0.001 24.902 0.251 24 24.734 0.420 30.292 0.175 82.013 < 0.001 24.863 0.413 27 27.525 0.436 26.218 0.507 89.634 < 0.001 26.398 0.497 30 27.536 0.595 27.588 0.592 97.136 < 0.001 27.268 0.609 33 29.788 0.628 31.192 0.557 107.120 < 0.001 29.080 0.663 36 42.128 0.223 37.860 0.384 113.480 < 0.001 29.872 0.754 Note. SARIMA seasonal autoregressive integrated moving average method, TBATS an advanced innovation statespace modelling framework by combining BoxCox transformations, Fourier series with timevarying coefficients and autoregressive moving average (ARMA) error correction. Table S3. The BoxLjung Q and LM tests for the best SARIMA (0,1,0)(0,1,1)_{12} model and TBATS (0.01, {0,0}, 0.898, {<12,5>}) model built based on the date from January 2004 to December 2018
Lags SARIMA (0,1,0)(0,1,1)_{12} model TBATS (0.048, {0,0}, 0.902, {<12,5>}) model BoxLjung Q P LMtest P BoxLjung Q P LMtest P 1 0.055 0.815 0.001 0.975 0.023 0.879 41.160 < 0.001 3 1.921 0.589 2.011 0.570 6.733 0.081 14.042 0.003 6 3.835 0.699 3.951 0.683 22.488 0.001 15.202 0.019 9 6.482 0.691 5.727 0.767 27.233 0.001 18.045 0.035 12 13.434 0.338 8.305 0.761 29.988 0.003 18.208 0.110 15 13.478 0.565 11.083 0.747 35.413 0.002 18.302 0.247 18 14.064 0.725 17.475 0.491 38.712 0.003 21.100 0.274 21 17.679 0.669 17.610 0.674 51.174 0.002 20.863 0.467 24 22.746 0.535 25.236 0.393 60.873 < 0.001 20.972 0.640 27 25.130 0.567 21.988 0.738 65.748 < 0.001 22.446 0.714 30 25.270 0.712 22.799 0.823 70.290 < 0.001 23.178 0.808 33 28.759 0.678 25.691 0.814 78.981 < 0.001 24.382 0.861 36 40.773 0.269 32.064 0.656 83.260 < 0.001 24.538 0.926 Note. SARIMA seasonal autoregressive integrated moving average method, TBATS, an advanced innovation statespace modelling framework by combining BoxCox transformations, Fourier series with timevarying coefficients and autoregressive moving average (ARMA) error correction. Table S4. The BoxLjung Q and LM tests for the best SARIMA (0,1,0)(0,1,1)_{12} model and TBATS (0.048, {0,0}, 0.902, {<12,5>}) model built based on the date from January 2004 to December 2016
Figure S7. Statistical checking for the residual series of the TBATS model developed on the basis of the data from January 2004 to December 2018. (A) Time series plot showing the residual series. (B) Autocorrelation function (ACF) plot. (C) Partial autocorrelation function (PACF) plot. The correlogram shows that although there are some residual ACF and PACF for the forecast errors that exceed the significance bounds, this is the best model used to estimate the upcoming 12month epidemics of scarlet fever.
Figure S10. The extracted level, slope, and seasonal components of the best TBATS method developed on the basis of the data from January 2004 to December 2016.
Similarly, following the SARIMA modelling steps, the optimal SARIMA models on different datasets were identified (Supplementary Tables S2–S5, and Supplementary Figures S5, S11–S12, available in www.besjournal.com). Subsequently, the best SARIMA and TBATS models could be used to perform multistep ahead predictions (Figure 1 and Supplementary Figures S13–S14, available in www.besjournal.com). Table 1 lists the measurement metrics, which indicate the forecasting reliability levels on different time windows under the preferred SARIMA and TBATS models. The optimal TBATS models provided a smaller MAD, MAPE, RMSE, RMSPE, and MER compared with the optimal SARIMA models, with a performance improvement of almost 50% in the forecasting abilities for estimating both short and longterm epidemiological trends, albeit the predictive potential showed a slight reduction with the increase in prediction time windows. We further compared the forecasting abilities of both methods for 48 and 72step ahead predictions, and the comparative results are listed in Supplementary Tables S1, S6–S7, and Supplementary Figures S15–S16 (available in www.besjournal.com), which show a similar finding, but the predictive results deviated from the epidemic trajectories. In addition, we used the SF incidence data in Liaoning, Heilongjiang, and Shandong provinces, and Inner Mongolia (which are the hardest hit areas by SF in China in the last decade ^{[2, 3]}) to assess the predictive quality of these two methods. Likewise, the TBATS method produced lower error rates in all the datasets (Supplementary Table S8, available in www.besjournal.com). Our recent study indicated that the ErrorTrendSeasonal (ETS) model also has a powerful potential in estimating the longterm epidemic behaviors of diseases ^{[4]}. As a result, we further developed the ETS model based on the SF morbidity to predict the epidemiological trends, and the results also showed similar findings (the computed MAPE values were 16.101% vs. 38.511%, 21.142% vs. 28.273%, and 23.984% vs. 26.735% in the 12, 24, and 36step ahead forecasts, respectively; Supplementary Table S9, available in www.besjournal.com). These findings further substantiated the utility of the TBATS model. The TBATS model was introduced by adding the trigonometric representation of seasonal components based on the Fourier series into the traditional BATS model, which enabled handling of all complex time series, as well as linear and nonlinear information ^{[5]}, thus indicating the suitability and adequacy of this model. Considering the attractive advantages of the TBATS model, this model can be recommended as a flexible and useful longterm predictive tool in assessing the epidemic patterns of SF in other countries or other contagious diseases; however, further work is required for validation. Moreover, with the rapid advances in the forecasting domain of time series, many hybrid prediction models (e.g., SARIMABPNN, SARIMAGRNN, and SARIMALSTM) have also been reported to show an attractive advantage in estimating the longterm epidemic trajectories of diseases. Therefore, what is now needed are studies involving comparisons of the predictive reliable level between the TBATS model and the abovementioned models.
Figure 1. Comparative results of the forecasts based on the SARIMA and TBATS models. (A) The comparison between the 24step ahead forecasts of the SARIMA model and the observed values. (B) The predicted upcoming 36month values from January 2020 to December 2022 using the SARIMA model. (C) The comparison between the 24step ahead forecasts of the TBATS model and the observed values. (D) The predicted upcoming 36month values from January 2020 to December 2022 using the TBATS model.
Models Testing horizons MAD MAPE RMSE MER RMSPE 24step ahead predictions SARIMA 0.295 64.346 0.382 0.607 0.551 TBATS 0.114 21.142 0.160 0.235 0.061 Reduced percentage (%) SARIMA vs. TBATS 61.356 67.143 58.115 61.285 88.929 12step ahead predictions SARIMA 0.212 42.951 0.257 0.429 0.467 TBATS 0.087 16.101 0.113 0.177 0.193 Reduced percentage (%) SARIMA vs. TBATS 58.962 62.513 56.031 58.741 58.672 36step ahead predictions SARIMA 0.258 55.527 0.396 0.546 0.758 TBATS 0.133 23.984 0.174 0.282 0.271 Reduced percentage (%) SARIMA vs. TBATS 48.450 56.807 56.061 48.352 64.248 Note. SARIMA, seasonal autoregressive integrated moving average method; TBATS, an advanced innovation statespace modelling framework by combining BoxCox transformations, Fourier series with timevarying coefficients and autoregressive moving average (ARMA) error correction; MAD, mean absolute deviation; MAPE, mean absolute percentage error; RMSE, root mean square error; MER, mean error rate; RMSPE, root mean square percentage error. Table 1. The comparisons of the predicted results between the SARIMA model and the TBATS model on different testing datasets
Variables Estimate S.E. t P Stationary R^{2} R^{2} AIC BIC LjungBox Q _{(18)} Statistics P SARIMA (0,1,0)(0,1,1)_{12}method developed on the basis of the data from January 2004 to December 2018 SMA1 0.782 0.072 10.873 < 0.001 0.300 0.944 −6.159 −6.146 15.980 0.525 SARIMA (0,1,0)(0,1,1)_{ 12} method developed on the basis of the data from January 2004 to December 2017 SMA1 0.795 0.075 10.600 < 0.001 0.292 0.935 −6.166 −6.152 13.663 0.691 SARIMA (0,1,0)(0,1,1)_{ 12} method developed on the basis of the data from January 2004 to December 2016 SMA1 0.771 0.084 9.230 < 0.001 0.288 0.922 −6.162 −6.127 14.064 0.663 Note. SARIMA seasonal autoregressive integrated moving average method, S.E. standard error, AIC Akaike's Information Criterion, BIC Schwarz's Bayesian Information Criterion, SMA1 seasonal moving average method at lag 1. Table S5. The obtained best SARIMA models with their key parameters on the basis of different datasets
Figure S13. Comparative results of the forecasts based on the SARIMA and TBATS models. (A) The comparison between the 12step ahead forecasts of the SARIMA model and the observed values. (B) The comparison between the 12step ahead forecasts of the TBATS model and the observed values.
Figure S14. Comparative results of the forecasts based on the SARIMA and TBATS models. (A) The comparison between the 36step ahead forecasts of the SARIMA model and the observed values. (B) The comparison between the 36step ahead forecasts of the TBATS model and the observed values.
Figure S15. Comparative results of the forecasts based on the SARIMA and TBATS models. (A) The comparison between the 48step ahead forecasts of the SARIMA model and the observed values. (B) The comparison between the 48step ahead forecasts of the TBATS model and the observed values.
Figure S16. Comparative results of the forecasts based on the SARIMA and TBATS models. (A) The comparison between the 72step ahead forecasts of the SARIMA model and the observed values. (B) The comparison between the 72step ahead forecasts of the TBATS model and the observed values.
Sites Models Testing horizons MAD MAPE RMSE MER Liaoning province 12stepahead forecast SARIMA (0,1,0)(0,1,1)_{12} 0.681 8.983 0.858 0.697 TBATS (0.073, {4,0}, 0.888, {<12,4>}) 0.358 5.830 0.432 0.367 Percentage reductions (%) TBATS vs. SARIMA 47.430 35.100 49.650 47.346 24stepahead forecast SARIMA (0,1,0)(0,1,1)_{12} 0.330 42.453 0.509 0.326 TBATS(0.121, {0,0}, 0.868, {<12,4>}) 0.262 25.144 0.372 0.259 Percentage reductions (%) TBATS vs. SARIMA 20.606 40.772 26.916 20.552 36stepahead forecast SARIMA (0,1,0)(0,1,1)_{12} 2.303 275.181 3.207 2.393 TBATS (0.052, {0,0}, 0.875, {<12,5>}) 0.224 26.448 0.305 0.233 Percentage reductions (%) TBATS vs. SARIMA 90.274 90.389 90.490 90.263 Heilongjiang province 12stepahead forecast SARIMA (0,1,0)(0,1,1)_{12} 0.227 51.808 0.262 0.383 TBATS (0.151, {0,0}, 0.889, {<12,5>}) 0.130 32.657 0.167 0.219 Percentage reductions (%) TBATS vs. SARIMA 42.731 36.965 36.260 42.820 24stepahead forecast SARIMA (0,1,0)(0,1,1)_{12} 0.402 74.176 0.562 0.655 TBATS (0.17, {0,0}, 0.889, {<12,5>}) 0.119 25.819 0.162 0.194 Percentage reductions (%) TBATS vs. SARIMA 70.398 65.192 71.174 70.382 36stepahead forecast SARIMA (0,1,0)(0,1,1)_{12} 1.448 253.072 2.114 2.357 TBATS(0.161, {0,0}, 0.894, {<12,4>}) 0.155 20.235 0.252 0.446 Percentage reductions (%) TBATS vs. SARIMA 89.296 92.004 88.079 81.078 Shandong province 12stepahead forecast SARIMA (0,1,1)(0,1,1)_{12} 0.236 39.823 0.282 0.319 TBATS(0, {0,0}, 0.932, {<12,5>}) 0.175 19.390 0.280 0.237 Percentage reductions (%) TBATS vs. SARIMA 25.847 51.310 0.709 25.705 24stepahead forecast SARIMA (0,1,1)(0,1,1)_{12} 0.509 76.930 0.670 0.717 TBATS (0, {0,0}, 0.932, {<12,5>}) 0.168 19.222 0.268 0.237 Percentage reductions (%) TBATS vs. SARIMA 66.994 75.014 60.000 66.946 36stepahead forecast SARIMA (0,1,1)(0,1,1)_{12} 1.683 246.865 2.376 2.491 TBATS (0, {0,0}, 0.926, {<12,5>}) 0.312 43.277 0.401 0.461 Percentage reductions (%) TBATS vs. SARIMA 81.462 82.469 83.123 81.493 Inner Mongolia 12stepahead forecast SARIMA (1,0,2)(0,1,1)_{12} 0.401 51.026 0.496 0.397 TBATS (0.232, {0,0}, , {<12,4>}) 0.130 32.657 0.167 0.219 Percentage reductions (%) TBATS vs. SARIMA 67.581 35.999 66.331 44.836 24stepahead forecast SARIMA (1,0,2)(0,1,1)_{12} 0.319 38.984 0.418 0.308 TBATS (0.232, {0,0}, , {<12,4>}) 0.166 16.508 0.236 0.161 Percentage reductions (%) TBATS vs. SARIMA 47.962 57.654 43.541 47.727 36stepahead forecast SARIMA (1,0,2)(0,1,1)_{12} 0.484 56.502 0.573 0.494 TBATS (0.21, {0,0}, , {<12,4>}) 0.158 17.002 0.224 0.161 Percentage reductions (%) TBATS vs. SARIMA 67.355 69.909 60.908 67.409 Note. SARIMA seasonal autoregressive integrated moving average method, TBATS an advanced innovation statespace modelling framework by combining BoxCox transformations, Fourier series with timevarying coefficients and autoregressive moving average (ARMA) error correction, MAD mean absolute deviation, MAPE mean absolute percentage error, RMSE root mean square error, MER mean error rate. Table S8. The comparisons of the predicted results between the SARIMA model and the TBATS model on different testing datasets
Models Testing power MAD MAPE RMSE MER RMSPE 12data ahead forecasting SARIMA 144.734 25.049 161.671 0.203 0.296 TBATS 91.799 14.772 123.653 0.129 0.193 ETS 148.635 25.835 177.092 0.209 0.325 Decreased percentage (%) TBATS vs. SARIMA 36.574 41.028 23.516 36.453 34.797 TBATS vs. ETS 38.239 42.822 30.176 38.278 40.615 24data ahead forecasting SARIMA 357.734 53.753 406.278 0.460 0.625 TBATS 260.619 38.407 301.441 0.335 0.469 ETS 279.014 44.604 307.731 0.359 0.536 Decreased percentage (%) TBATS vs. SARIMA 27.147 28.549 25.804 27.174 24.960 TBATS vs. ETS 6.593 13.893 2.044 6.685 12.500 36data ahead forecasting SARIMA 290.782 42.156 365.570 0.334 0.560 TBATS 204.151 28.685 254.418 0.235 0.387 ETS 297.359 44.118 347.689 0.342 0.564 Decreased percentage (%) TBATS vs. SARIMA 29.792 31.955 30.405 29.641 30.893 TBATS vs. ETS 31.345 34.981 26.826 31.287 31.383 60data ahead forecasting SARIMA 224.931 28.874 310.043 0.258 0.424 TBATS 183.780 21.558 235.665 0.211 0.250 ETS 146.437 16.449 212.028 0.168 0.223 Decreased percentage (%) TBATS vs. SARIMA 18.295 25.338 23.990 18.217 41.038 TBATS vs. ETS 20.319 23.699 10.030 20.379 10.800 84data ahead forecasting SARIMA 931.622 134.915 1016.813 1.040 1.679 TBATS 167.432 20.081 204.172 0.187 0.240 ETS 183.777 21.010 227.113 0.205 0.250 Decreased percentage (%) TBATS vs. SARIMA 82.028 85.116 79.920 82.019 85.706 TBATS vs. ETS 8.894 4.422 10.101 8.780 4.000 108data ahead forecasting SARIMA 1483.868 202.729 1588.876 1.560 2.500 TBATS 208.775 22.849 287.536 0.220 0.310 ETS 268.726 30.177 354.045 0.283 0.386 Decreased percentage (%) TBATS vs. SARIMA 85.930 88.729 81.903 85.897 87.600 TBATS vs. ETS 22.309 24.283 18.785 22.261 19.689 Note. SARIMA seasonal autoregressive integrated moving average method, TBATS an advanced innovation statespace modeling framework by combining BoxCox transformations, Fourier series with timevarying coefficients and autoregressive moving average error correction, ETS ErrorTrendSeasonal framework, MAD mean absolute deviation, MAPE mean absolute percentage error, RMSE root mean square error, MER mean error rate, RMSPE root mean square percentage error. Table S9. Comparisons of the outofsample forecasting powers between TBATS models and SARIMA models, along with ETS models
This study had some limitations. First, SF is a mild illness and has rarely led to death since the 20^{th} century ^{[3]}. Therefore, infected individuals with mild clinical manifestations sometimes fail to seek medical aid, resulting in underreporting and underdiagnosis. Second, to ensure that the model obtained a satisfactory forecasting result, it is important to note that this model should be updated with new incidence data. Third, to investigate whether our TBATS model was adequate for estimating the SF epidemics in other study regions or other infectious diseases, much work is still needed. Finally, integrating the factors influencing the SF epidemics may improve the predictive power. Nevertheless, we are not able to perform such an analysis due to the unavailability of a multivariate TBATS method and SFrelated factors.
In summary, SF had dual seasonal behaviors, peaking in May–June and November–December, with a recurrence in 2011 in China; since then it started to be increasing in the SF incidence. The TBATS method was advantageous in analyzing the longterm epidemiological seasonality and trends of SF, which can be considered a useful and flexible alternative to aid stakeholders to develop practical solutions to stop the ongoing spread of SF in China. In addition, we reestablished the preferred TBATS (0.023, {0,0}, 0.895, {<12,5>}) specification based on the 16 years of data to predict the SF incidence into 2022, although the SF incidence was predicted to reach a plateau in the next 3 years [Supplementary Tables S1 and S10 (available in www.besjournal.com), and Figure 1], the SF incidence remained at a high level, suggesting that additional or comprehensive interventions must be developed to manage this evolving scenario.
Time Forecasts Lower 95% CI Upper 95% CI Time Forecasts Lower 95% CI Upper 95% CI Jan20 0.479 0.330 0.695 Jul21 0.426 0.147 1.204 Feb20 0.224 0.143 0.349 Aug21 0.220 0.073 0.647 Mar20 0.380 0.229 0.627 Sep21 0.250 0.081 0.751 Apr20 0.627 0.360 1.085 Oct21 0.412 0.132 1.249 May20 0.843 0.463 1.521 Nov21 0.616 0.195 1.890 Jun20 0.930 0.489 1.750 Dec21 0.816 0.254 2.543 Jul20 0.443 0.221 0.879 Jan22 0.434 0.130 1.400 Aug20 0.228 0.108 0.474 Feb22 0.204 0.059 0.686 Sep20 0.259 0.119 0.557 Mar22 0.351 0.100 1.186 Oct20 0.424 0.190 0.935 Apr22 0.584 0.166 1.988 Nov20 0.632 0.275 1.429 May22 0.791 0.222 2.722 Dec20 0.835 0.354 1.938 Jun22 0.878 0.242 3.073 Jan21 0.443 0.180 1.069 Jul22 0.421 0.111 1.533 Feb21 0.208 0.081 0.524 Aug22 0.218 0.055 0.823 Mar21 0.357 0.137 0.914 Sep22 0.248 0.062 0.953 Apr21 0.593 0.223 1.543 Oct22 0.409 0.102 1.574 May21 0.801 0.296 2.126 Nov22 0.612 0.151 2.369 Jun21 0.889 0.320 2.412 Dec22 0.811 0.199 3.172 Note. TBATS an advanced innovation statespace modelling framework by combining BoxCox transformations, Fourier series with timevarying coefficients and autoregressive moving average (ARMA) error correction, CI, confidence interval. Table S10. Projection into the future 36 months from January 2020 to December 2022 using the best TBATS (0.023, {0,0}, 0.895, {<12,5>}) model
Data Availability All the data supporting the findings of the work are contained within the Supplementary Material (Supplementary Table S11, available in www.besjournal.com).
Figure S8. The extracted level, slope, and seasonal components of the best TBATS method developed on the basis of the data from January 2004 to December 2018.
Figure S9. Statistical checking for the residual series of the TBATS model developed on the basis of the data from January 2004 to December 2016. (A) Time series plot showing the residual series. (B) Autocorrelation function (ACF) plot. (C) Partial autocorrelation function (PACF) plot. The correlogram shows that although there are some residual ACF and PACF for the forecast errors that exceed the significance bounds, this is the best model used to estimate the upcoming 36month epidemics of scarlet fever.
Figure S11. Statistical checking for the residual series of the SSARIMA model developed on the basis of the data from January 2004 to December 2018. (A) Time series plot showing the residual series. (B) Autocorrelation function (ACF) plot. (C) Partial autocorrelation function (PACF) plot. The correlogram shows that the ACF and PACF for the forecast errors do not exceed the significance bounds except for lags 11, 23, and 36, indicating that there is little evidence of nonzero autocorrelations at lags 1–36.
Figure S12. Statistical checking for the residual series of the SARIMA model developed on the basis of the data from January 2004 to December 2016. (A) Time series plot showing the residual series. (B) Autocorrelation function (ACF) plot. (C) Partial autocorrelation function (PACF) plot. The correlogram shows that the ACF and PACF for the forecast errors do not exceed the significance bounds except for lags 11 and 36, indicating that there is little evidence of nonzero autocorrelations at lags 1–36.
Continued Parameters 48step ahead forecasting 72step ahead forecasting Future 36step ahead forecasting TBATS (0.037, {0,0}, 0.901, {<12,5>}) TBATS (0.029, {0,0}, 0.87, {<12,4>}) TBATS (0.023, {0,0}, 0.895, {<12,5>}) Seed states 1 −2.3577 −2.3794 −2.3993 2 0.1062 0.1064 0.1117 3 −0.0314 −0.0546 −0.0067 4 −0.0815 −0.1004 −0.0690 5 −0.0103 −0.0196 −0.0068 6 0.0220 0.0160 0.0300 7 0.0268 0.1411 0.0336 8 0.1425 −0.5940 0.1460 9 −0.5902 −0.0762 −0.6235 10 −0.0647 −0.1476 −0.0634 11 −0.1437 − −0.1522 12 −0.0212 − −0.0232 Sigma 0.1868 0.1947 0.1871 AIC −197.7402 −203.9513 −160.8382 Note. TBATS, an advanced innovation statespace modelling framework by combining BoxCox transformations, Fourier series with timevarying coefficients and autoregressive moving average (ARMA) error correction, AR autoregressive method, AIC Akaike's Information Criterion. Variables Estimate S.E. t P Stationary R^{2} R^{2} AIC BIC LjungBox Q (18) Statistics P SARIMA (0,1,0)(0,1,1)_{12} method developed on the basis of the data from January 2004 to December 2015 SMA1 0.790 0.094 8.394 < 0.001 0.277 0.915 −6.086 −6.072 12.564 0.765 SARIMA (0,1,0)(0,1,1)_{ 12} method developed on the basis of the data from January 2004 to December 2013 SMA1 0.813 0.135 6.025 < 0.001 0.276 0.883 −6.055 −6.032 12.510 0.768 Note. SARIMA seasonal autoregressive integrated moving average method, S.E. standard error, AIC Akaike's Information Criterion, BIC Schwarz's Bayesian Information Criterion, SMA1 seasonal moving average method at lag 1. Table S6. The obtained best SARIMA methods with their key parameters on the basis of different datasets
Models Testing horizons MAD MAPE RMSE MER RMSPE 48step ahead predictions SARIMA 1.186 252.348 1.850 2.670 3.440 TBATS 0.098 18.307 0.139 0.220 0.222 Reduced percentage (%) SARIMA vs. TBATS 91.737 92.745 92.486 91.760 93.547 72step ahead predictions SARIMA 0.800 171.381 1.511 1.901 2.810 TBATS 0.207 46.042 0.248 0.491 0.477 Reduced percentage (%) SARIMA vs. TBATS 74.125 73.135 83.587 74.171 83.025 Note. SARIMA seasonal autoregressive integrated moving average method, TBATS an advanced innovation statespace modelling framework by combining BoxCox transformations, Fourier series with timevarying coefficients and autoregressive moving average (ARMA) error correction, MAD mean absolute deviation, MAPE mean absolute percentage error, RMSE root mean square error, MER mean error rate, RMSPE root mean square percentage error. Table S7. The comparisons of the predicted results between the SARIMA model and the TBATS model on different testing datasets
Date Incidence Date Incidence Date Incidence Date Incidence Date Incidence Date Incidence 012004 0.0298 092006 0.101 052009 0.2623 012012 0.3294 092014 0.1772 052017 0.8192 022004 0.0452 102006 0.1647 062009 0.2782 022012 0.1675 102014 0.3011 062017 0.8006 032004 0.0945 112006 0.2809 072009 0.1508 032012 0.2466 112014 0.5149 072017 0.3811 042004 0.1639 122006 0.3291 082009 0.0821 042012 0.3281 122014 0.6298 082017 0.1711 052004 0.1809 012007 0.231 092009 0.0946 052012 0.5325 012015 0.499 092017 0.2162 062004 0.1896 022007 0.095 102009 0.0988 062012 0.5018 022015 0.2079 102017 0.3078 072004 0.113 032007 0.1356 112009 0.0798 072012 0.2523 032015 0.2756 112017 0.5717 082004 0.0619 042007 0.2474 122009 0.0961 082012 0.1047 042015 0.4342 122017 0.7641 092004 0.0742 052007 0.3435 012010 0.069 092012 0.1323 052015 0.6642 012018 0.5421 102004 0.103 062007 0.39 022010 0.0353 102012 0.2071 062015 0.7269 022018 0.1547 112004 0.2033 072007 0.1992 032010 0.0636 112012 0.3264 072015 0.3765 032018 0.2705 122004 0.2299 082007 0.0932 042010 0.1008 122012 0.3741 082015 0.1568 042018 0.4862 012005 0.1686 092007 0.1122 052010 0.1692 012013 0.2437 092015 0.2076 052018 0.7702 022005 0.0582 102007 0.1695 062010 0.1788 022013 0.0762 102015 0.3096 062018 0.768 032005 0.1182 112007 0.2543 072010 0.1138 032013 0.1536 112015 0.5007 072018 0.3859 042005 0.2182 122007 0.3153 082010 0.0714 042013 0.2258 122015 0.6459 082018 0.1602 052005 0.2504 012008 0.193 092010 0.0872 052013 0.3381 012016 0.4449 092018 0.1967 062005 0.2707 022008 0.0724 102010 0.1149 062013 0.3106 022016 0.15 102018 0.3738 072005 0.1344 032008 0.1655 112010 0.2399 072013 0.1756 032016 0.2843 112018 0.7114 082005 0.0801 042008 0.2286 122010 0.3053 082013 0.0858 042016 0.3584 122018 0.9025 092005 0.0891 052008 0.367 012011 0.2333 092013 0.1077 052016 0.5771 012019 0.6313 102005 0.1296 062008 0.2816 022011 0.0742 102013 0.158 062016 0.5923 022019 0.1851 112005 0.2129 072008 0.1343 032011 0.2074 112013 0.2689 072016 0.3078 032019 0.3657 122005 0.2361 082008 0.0688 042011 0.3728 122013 0.3929 082016 0.1396 042019 0.4974 012006 0.1239 092008 0.087 052011 0.6908 012014 0.2568 092016 0.1898 052019 0.649 022006 0.0645 102008 0.129 062011 0.7253 022014 0.0899 102016 0.2734 062019 0.718 032006 0.1345 112008 0.1831 072011 0.3839 032014 0.2113 112016 0.4409 072019 0.4188 042006 0.2117 122008 0.2363 082011 0.1614 042014 0.3171 122016 0.5504 082019 0.1711 052006 0.2406 012009 0.1079 092011 0.2088 052014 0.52 012017 0.3333 092019 0.2481 062006 0.2701 022009 0.0625 102011 0.369 062014 0.5525 022017 0.168 102019 0.3839 072006 0.1454 032009 0.1393 112011 0.63 072014 0.2832 032017 0.3394 112019 0.7296 082006 0.0768 042009 0.2292 122011 0.7196 082014 0.1336 042017 0.4895 122019 0.9323 Table 11. The SF incidence time series data from January 2004 to December 2019
HTML
21432Supplementary Materials.pdf 