Now- and Fore-casting the Secular Epidemiological Trends and Seasonality of the Comeback of Scarlet Fever in China: A 16-year Time Series Analysis

WANG Yong Bin; LI Yan Yan; LU Hao; TAO Ying Jun; LI Yu Hong; WANG Lei; LIANG Wen Juan

doi:10.3967/bes2022.076

HTML

Scarlet fever (SF) is a common communicable disease that results from group A Streptococcus (GAS) infections^[1]. SF accounted for the global loss of life among children 5–15 years of age in the 18th and 19th centuries^[2]. A rapid reduction in SF morbidity and mortality occurred due to the scale-up of effective antibiotics and improvements in sanitation and nutrition^[3]. The unexpected increase in the incidence of SF has attracted a renewed interest in infectious diseases^[3]. Because the triggers that cause SF outbreaks are not fully understood and there is a scarcity of available vaccines protecting susceptible populations from GAS infections, effective prevention and control plans are required to stop the continued spread of SF.

Time series analysis assists in the development of hypotheses to explain the temporal patterns of different diseases and to analyze the spread, therefore, facilitating the creation of a quality forecasting system. The seasonal autoregressive integrated moving average (SARIMA) model has been widely applied to estimate the epidemiological patterns of contagious diseases because this model has a simple structure, fast applicability, and a relatively high forecasting reliability level ^[4]. It has been shown that the SARIMA model is able to satisfactorily estimate a simple time series ^[4], but it is difficult to manage complex time series, such as the data with multiple seasonal periods, high-frequency seasonality, non-integer seasonality, and dual-calendar effects. By comparison, the innovation state-space modelling framework that combines Box-Cox transformations, Fourier series with time-varying coefficients, and autoregressive moving average (ARMA) error correction (known as the TBATS method) is customized for use with the patterns included in a complex time series described above ^[5]. In addition, the TBATS model is used for linearity and some types of non-linearity in a complex time series based on Box-Cox transformations ^[5], which makes it possible to perform a multistep ahead prediction. Moreover, the TBATS model is able to decompose a complex seasonal time series into its trend, seasonal, and irregular components ^[5], which is not able to be undertaken by use of the SARIMA model. Importantly, SF morbidity has been shown to display dual seasonal patterns in some countries. Therefore, this study analyzed the long-term epidemic patterns using the TBATS model. The forecasting power under the TBATS model was compared with the SARIMA model.

We obtained the monthly SF incidence and population data between January 2004 and December 2019 from the Chinese CDC and the Statistical Yearbook of China, respectively. Then, we partitioned the SF morbidity series into two segments comprising a training dataset from January 2004 to December 2017 to construct the SARIMA and TBATS models and a testing dataset from January 2018 to December 2019 to test the generalization of both models. Two additional datasets were provided to test the robustness of both models: the first 180 data sets from January 2004 to December 2018 and 156 data sets from January 2004 to December 2016 were treated as training datasets, respectively; and the remainder were testing datasets. We estimated the changing epidemiologic trends of SF based on annual percentage change (APC) and average APC (AAPC) using the Joinpoint regression program (version 4.8.0.1). We constructed the TBATS model with the “forecast,” “tseries,”, and “FinTS” packages in R software (version 3.4.3). The incidence rate ratio (IRR) with a 95% uncertainty limit (UL) before and after the SF outbreak was calculated using the method proposed by Armitage and Berry ^[6]. The mean absolute deviation (MAD), root mean square error (RMSE), mean absolute percentage error (MAPE), mean error rate (MER), and root mean square percentage error (RMSPE) were computed to compare the forecasting ability of both models.

During the study period, a notable increase in SF morbidity was detected, with an AAPC = 8.942 (95% UL: 5.995–11.971; t = 6.697, P < 0.001; Supplementary Figure S1, available in www.besjournal.com), and the highest morbidity (5.930/100,000 persons) occurred in 2019, which increased by a factor of nearly four compared with the lowest level (1.489/100,000 population) in 2004 (Supplementary Figure S2, available in www.besjournal.com). The SF epidemics remained relatively steady from 2004–2010 (average, 1.937/100,000 persons annually), with an AAPC = −0.840 (95% UL: −7.678 to 6.505; t = −0.231, P = 0.817). An unexpected outbreak was witnessed in 2011, and since then a rapidly increasing trend occurred (average, 4.580/100,000 persons annually [excluding 2013]; the exact causes regarding this annual drop are unknown; Supplementary Figure S2), with an AAPC = 5.952 (95% UL: 0.239–11.992; t = 2.466, P = 0.043), which showed good agreement with a resurgence of SF in Hong Kong, China^[7], but inconsistent with England where the resurgence occurred in 2014 ^[8]. There has been a doubling in the SF incidence during the post-resurgence periods compared with the pre-resurgence periods (IRR = 2.364, 95% UL: 2.358–2.370). Nevertheless, the driving force associated with the increased pathogenicity of GAS fails to be elucidated. A possible explanation may be due to the acquisition of novel prophages harboring new hybridizations of toxin genes and antimicrobial resistance genes, which is related to the emergence and expansion of the predominant genotypes of emm12 and emm1 in China ^[3]. Another explanation may be associated with the natural periodicity of the SF incidence (SF epidemics are characterized by a cyclic change of approximately 6 years ^[3]). A third explanation may be linked to the relaxation of the 2-child policy in 2011 ^[3], which led to an increase in the number of susceptible individuals. A fourth reason may be attributed to improvements in the diagnostic capacity and the increased awareness of medical workers in reporting SF ^[3]. A fifth reason may be a result of the deterioration of air quality in China ^[9], despite the gradual improvements in the last 2 years. Finally, there are no vaccines available to prevent infections with GAS until now.

Figure S1. Joinpoint regression showing the scarlet fever epidemic trends in the period 2004–2019. *Showed that the annual percent change (APC) is significantly different from zero at the significance level. This plot was plotted using Joinpoint regression program (Version 4.8.0.1). Joinpoint is a statistical tool for the description and analysis of trends with the Joinpoint methods. This Joinpoint Regression Program is designed as a Windows-based statistical software package to analyze Joinpoint methods. Users can use this software to investigate whether there is a statistically notable change in trend (Available from https://surveillance.cancer.gov/joinpoint/. Accessed on 16 May 2022).

Figure S2. The yearly incidence of SF from 2004 to 2019. It was seen that SF had the lowest incidence in 2004 (1.489 per 100,000 population) and the highest morbidity in 2019 (5.930 per 100,000 persons).

A marked semi-annual seasonal behavior occurred in the monthly SF incidence, with a strong peak between May and June, and a weak peak between November and December (Supplementary Figures S3–S4, available in www.besjournal.com). We surmised that different climatic features and beginning of spring and autumn semesters contributed to this difference in the margin of peak activities. Our seasonal profile correlates well with previous findings from Hong Kong, China ^[7]; however, discordant with that in England, which peaked between February and March ^[8]. This inconsistency may be due to the different school breaks, population density, and different GAS emm gene types in east Asia and Europe ^{[1, 8]}. In addition, the SF epidemics retain the lowest level in February every year (Supplementary Figure S3), attributable to the winter holidays and the Spring Festival.

Figure S3. Time series plot showing the monthly SF incidence rate and the decomposed seasonal, trend, and irregular components based on the STL technique. (A) Original monthly SF morbidity series. (B) Decomposed trend component. (C) Decomposed seasonal component. (D) Decomposed random component. As illustrated, overall the SF morbidity showed a rising tendency, with a sudden increase in 2011, and having notable seasonal patterns.

Figure S4. The decomposed seasonal factors for the scarlet fever morbidity series using the multiplicative decomposition method (The classical Seasonal Decomposition technique can be applied to decompose a time series into a seasonal pattern, a trend component, and a random fluctuation component. This technique is implemented based on the ratio-to-moving-average method. The Seasonal Decomposition technique can be divided into two types including a multiplicative and an additive method. For a multiplicative decomposition method, the seasonal component is a factor by which the seasonally adjusted series is multiplied to produce original observed values. For a time series without seasonal variation, the seasonal pattern is deemed as one. For an additive decomposition method, the seasonal adjustments are added to the seasonally adjusted series to yield the original series. For the morbidity series of infectious diseases, the multiplicative decomposition method typically produces a more reliable decomposed result in application. By employing a multiplicative decomposition method to an object series, the seasonal pattern or seasonal factors included in the object series can be obtained. Such a decomposition is very helpful in removing the seasonal effect from a series so that we can visibly look into the other characteristics of interest in the series that may be masked by the seasonal trait^[7,8]).

The forecasts under the TBATS approach rely largely on the number of harmonics k_i applied for each seasonal pattern. As a result, in selecting the number of harmonics k_i, considering one seasonal component each time, we then fitted the model on the target data repeatedly via gradually increasing the number of harmonics k_i, but holding the remaining harmonics constant for each i until the optimal AIC is obtained. In determining the most suitable orders (p and q) of the ARMA model, we used the automatic procedure proposed by Hyndman and colleagues^[10] to fit the forecasting residuals. If the selected model with the ARMA (p, q) residual component generates a smaller AIC than the one model without the ARMA (p, q) residual component, this selected specification would be considered as the best possible model; otherwise, the ARMA (p, q) residual component is deleted. After modelling by trial and error, the TBATS (0.04, {4,0}, 0.882, {<12,5>}) specification was selected as the optimal model in that the minimum AIC (−197.965) was detected in this model, and the identified key parameters of this best TBATS model are reported in Supplementary Table S1, available in www.besjournal.com. Additional statistical diagnoses for the forecast errors are provided in Supplementary Table S2 and Supplementary Figures S5–S6, available in www.besjournal.com. The Ljung-Box Q statistics of the forecast errors produced a Q₍₁₈₎ = 11.442 with a P-value of 0.875, indicating no serial correlations in this residual series. Moreover, the ARCH effect was largely removed because the LM₍₁₈₎ = 23.808 with a P-value of 0.302. These results confirmed the adequacy of the model specifications. Similarly, based on the modelling steps described above, the TBATS (0.01, {0,0}, 0.898, {<12,5>}) and TBATS (0.048, {0,0}, 0.902, {<12,5>}) specifications tended to be the preferred models for forecasting the 12- and 36-holdout periods (Supplementary Tables S3–S4 and Supplementary Figures S7–S10, available in www.besjournal.com).

Parameters	24-step ahead forecasting	12-step ahead forecasting	36-step ahead forecasting
Parameters	TBATS (0.04, {4,0}, 0.882, {<12,5>}) model	TBATS (0.01, {0,0}, 0.898, {<12,5>}) model	TBATS (0.048, {0,0}, 0.902, {<12,5>}) model
Lambda	0.0401	0.0105	0.0480
Alpha	0.1652	0.6489	0.6795
Beta	−0.0252	−0.0362	−0.0432
Damping Parameter	0.8818	0.8982	0.9024
Gamma-1 Values	0.0002	−0.0046	−0.0048
Gamma-2 Values	−1.2682	0.0050	0.0072
AR coefficients	AR1 = 0.4772, AR2 = 0.2318, AR3 = −0.1894, AR4 = 0.3222	−	−
Seed states
1	−2.3530	−2.4373	−2.3215
2	0.1089	0.1130	0.1015
3	−0.0246	−0.0185	−0.0251
4	−0.0748	−0.0757	−0.0731
5	−0.0069	−0.0077	−0.0065
6	0.0213	0.0266	0.0207
7	0.0296	0.0312	0.0279
8	−0.1547	0.1489	0.1353
9	−0.5953	−0.6307	−0.5756
10	−0.0602	−0.0649	−0.0598
11	0.1463	−0.1521	−0.1404
12	−0.0231	−0.0248	−0.0218
13	0.0000	−	−
14	0.0000	−	−
15	0.0000	−	−
16	0.0000	−	−
Sigma	0.1686	0.1909	0.1797
AIC	−197.9650	−175.3322	−198.4532
Parameters	48-step ahead forecasting	72-step ahead forecasting	Future 36-step ahead forecasting
Parameters	TBATS (0.037, {0,0}, 0.901,{<12,5>})	TBATS (0.029, {0,0}, 0.87,{<12,4>})	TBATS (0.023, {0,0}, 0.895, {<12,5>})
Lambda	0.0368	0.0292	0.0225
Alpha	0.6900	0.7253	0.6444
Beta	−0.0410	−0.1128	−0.0378
Damping Parameter	0.9013	0.8702	0.8955
Gamma-1 Values	−0.0069	−0.0115	−0.0050
Gamma-2 Values	0.0052	0.0035	0.0037
AR coefficients	−	−	−

Table S1. The best TBATS models obtained on the basis of different datasets and their key parameters

Lags	SARIMA model				TBATS model
Lags	Box-Ljung Q	P	LM-test	P	Box-Ljung Q	P	LM-test	P
1	0.044	0.833	0.001	0.979	0.005	0.943	52.651	< 0.001
3	1.944	0.584	2.364	0.500	0.235	0.972	22.234	< 0.001
6	3.720	0.715	4.223	0.647	1.246	0.975	23.555	0.001
9	6.093	0.731	5.981	0.742	4.538	0.873	23.624	0.005
12	13.021	0.368	9.427	0.666	7.320	0.836	23.416	0.024
15	13.094	0.595	12.379	0.650	10.456	0.790	23.533	0.073
18	13.663	0.751	19.872	0.340	11.442	0.875	24.550	0.138
21	17.313	0.692	20.262	0.505	19.031	0.583	23.808	0.302
24	23.237	0.506	29.352	0.207	23.061	0.516	24.015	0.461
27	26.060	0.515	25.492	0.547	23.466	0.660	23.924	0.635
30	26.082	0.671	26.422	0.653	25.230	0.714	24.918	0.729
33	29.133	0.660	29.863	0.624	29.004	0.667	24.941	0.842
36	41.417	0.246	37.230	0.412	30.707	0.718	24.391	0.929
Note. SARIMA seasonal autoregressive integrated moving average method, TBATS an advanced innovation state-space modelling framework by combining Box-Cox transformations, Fourier series with time-varying coefficients, and autoregressive moving average (ARMA) error correction.

Table S2. Box-Ljung Q and LM tests for the residual series of the SARIMA and TBATS models

Figure S5. Estimated autocorrelation function (ACF) and partial ACF (PACF) plots to display the statistical checking for the residual series of the SARIMA and TBATS models. (A) ACF plot for the SARIMA model. (B) PACF plot for the SARIMA model. (C) ACF plot for the TBATS model. (D) PACF plot for the TBATS model. We can see that the correlation coefficients at lags 11, 23, and 36 in Figure 2A and 2B, and at lag 19 in Figure 2C and 2D touched the confidence levels (this is reasonable as the correlation coefficients at higher-order may exceed the significance bounds by chance alone), but all other correlation coefficients at different lags failed to touch the confidence levels, seemingly suggesting effective and sufficient models.

Figure S6. The extracted level, slope, and seasonal components of the best TBATS method developed on the basis of the data from January 2004 to December 2017.

Lags	SARIMA (0,1,0)(0,1,1)₁₂ model				TBATS(0.01, {0,0}, 0.898, {<12,5>}) model
Lags	Box-Ljung Q	P	LM-test	P	Box-Ljung Q	P	LM-test	P
1	0.024	0.878	0.000	0.993	0.018	0.892	44.923	< 0.001
3	2.353	0.503	2.196	0.533	9.428	0.024	17.323	< 0.001
6	4.529	0.605	4.287	0.638	27.713	< 0.001	18.742	0.005
9	7.720	0.563	6.063	0.734	35.665	< 0.001	22.380	0.008
12	15.292	0.226	9.747	0.638	39.404	< 0.001	22.715	0.030
15	15.349	0.427	13.651	0.552	48.268	< 0.001	22.695	0.091
18	15.980	0.594	21.290	0.265	53.640	< 0.001	25.463	0.113
21	18.952	0.588	21.569	0.425	70.249	< 0.001	24.902	0.251
24	24.734	0.420	30.292	0.175	82.013	< 0.001	24.863	0.413
27	27.525	0.436	26.218	0.507	89.634	< 0.001	26.398	0.497
30	27.536	0.595	27.588	0.592	97.136	< 0.001	27.268	0.609
33	29.788	0.628	31.192	0.557	107.120	< 0.001	29.080	0.663
36	42.128	0.223	37.860	0.384	113.480	< 0.001	29.872	0.754
Note. SARIMA seasonal autoregressive integrated moving average method, TBATS an advanced innovation state-space modelling framework by combining Box-Cox transformations, Fourier series with time-varying coefficients and autoregressive moving average (ARMA) error correction.

Table S3. The Box-Ljung Q and LM tests for the best SARIMA (0,1,0)(0,1,1)₁₂ model and TBATS (0.01, {0,0}, 0.898, {<12,5>}) model built based on the date from January 2004 to December 2018

Lags	SARIMA (0,1,0)(0,1,1)₁₂ model				TBATS (0.048, {0,0}, 0.902, {<12,5>}) model
Lags	Box-Ljung Q	P	LM-test	P	Box-Ljung Q	P	LM-test	P
1	0.055	0.815	0.001	0.975	0.023	0.879	41.160	< 0.001
3	1.921	0.589	2.011	0.570	6.733	0.081	14.042	0.003
6	3.835	0.699	3.951	0.683	22.488	0.001	15.202	0.019
9	6.482	0.691	5.727	0.767	27.233	0.001	18.045	0.035
12	13.434	0.338	8.305	0.761	29.988	0.003	18.208	0.110
15	13.478	0.565	11.083	0.747	35.413	0.002	18.302	0.247
18	14.064	0.725	17.475	0.491	38.712	0.003	21.100	0.274
21	17.679	0.669	17.610	0.674	51.174	0.002	20.863	0.467
24	22.746	0.535	25.236	0.393	60.873	< 0.001	20.972	0.640
27	25.130	0.567	21.988	0.738	65.748	< 0.001	22.446	0.714
30	25.270	0.712	22.799	0.823	70.290	< 0.001	23.178	0.808
33	28.759	0.678	25.691	0.814	78.981	< 0.001	24.382	0.861
36	40.773	0.269	32.064	0.656	83.260	< 0.001	24.538	0.926
Note. SARIMA seasonal autoregressive integrated moving average method, TBATS, an advanced innovation state-space modelling framework by combining Box-Cox transformations, Fourier series with time-varying coefficients and autoregressive moving average (ARMA) error correction.

Table S4. The Box-Ljung Q and LM tests for the best SARIMA (0,1,0)(0,1,1)₁₂ model and TBATS (0.048, {0,0}, 0.902, {<12,5>}) model built based on the date from January 2004 to December 2016

Figure S7. Statistical checking for the residual series of the TBATS model developed on the basis of the data from January 2004 to December 2018. (A) Time series plot showing the residual series. (B) Autocorrelation function (ACF) plot. (C) Partial autocorrelation function (PACF) plot. The correlogram shows that although there are some residual ACF and PACF for the forecast errors that exceed the significance bounds, this is the best model used to estimate the upcoming 12-month epidemics of scarlet fever.

Figure S10. The extracted level, slope, and seasonal components of the best TBATS method developed on the basis of the data from January 2004 to December 2016.

Similarly, following the SARIMA modelling steps, the optimal SARIMA models on different datasets were identified (Supplementary Tables S2–S5, and Supplementary Figures S5, S11–S12, available in www.besjournal.com). Subsequently, the best SARIMA and TBATS models could be used to perform multistep ahead predictions (Figure 1 and Supplementary Figures S13–S14, available in www.besjournal.com). Table 1 lists the measurement metrics, which indicate the forecasting reliability levels on different time windows under the preferred SARIMA and TBATS models. The optimal TBATS models provided a smaller MAD, MAPE, RMSE, RMSPE, and MER compared with the optimal SARIMA models, with a performance improvement of almost 50% in the forecasting abilities for estimating both short- and long-term epidemiological trends, albeit the predictive potential showed a slight reduction with the increase in prediction time windows. We further compared the forecasting abilities of both methods for 48- and 72-step ahead predictions, and the comparative results are listed in Supplementary Tables S1, S6–S7, and Supplementary Figures S15–S16 (available in www.besjournal.com), which show a similar finding, but the predictive results deviated from the epidemic trajectories. In addition, we used the SF incidence data in Liaoning, Heilongjiang, and Shandong provinces, and Inner Mongolia (which are the hardest hit areas by SF in China in the last decade ^{[2, 3]}) to assess the predictive quality of these two methods. Likewise, the TBATS method produced lower error rates in all the datasets (Supplementary Table S8, available in www.besjournal.com). Our recent study indicated that the Error-Trend-Seasonal (ETS) model also has a powerful potential in estimating the long-term epidemic behaviors of diseases ^[4]. As a result, we further developed the ETS model based on the SF morbidity to predict the epidemiological trends, and the results also showed similar findings (the computed MAPE values were 16.101% vs. 38.511%, 21.142% vs. 28.273%, and 23.984% vs. 26.735% in the 12-, 24-, and 36-step ahead forecasts, respectively; Supplementary Table S9, available in www.besjournal.com). These findings further substantiated the utility of the TBATS model. The TBATS model was introduced by adding the trigonometric representation of seasonal components based on the Fourier series into the traditional BATS model, which enabled handling of all complex time series, as well as linear and non-linear information ^[5], thus indicating the suitability and adequacy of this model. Considering the attractive advantages of the TBATS model, this model can be recommended as a flexible and useful long-term predictive tool in assessing the epidemic patterns of SF in other countries or other contagious diseases; however, further work is required for validation. Moreover, with the rapid advances in the forecasting domain of time series, many hybrid prediction models (e.g., SARIMA-BPNN, SARIMA-GRNN, and SARIMA-LSTM) have also been reported to show an attractive advantage in estimating the long-term epidemic trajectories of diseases. Therefore, what is now needed are studies involving comparisons of the predictive reliable level between the TBATS model and the above-mentioned models.

Figure 1. Comparative results of the forecasts based on the SARIMA and TBATS models. (A) The comparison between the 24-step ahead forecasts of the SARIMA model and the observed values. (B) The predicted upcoming 36-month values from January 2020 to December 2022 using the SARIMA model. (C) The comparison between the 24-step ahead forecasts of the TBATS model and the observed values. (D) The predicted upcoming 36-month values from January 2020 to December 2022 using the TBATS model.

Models	Testing horizons
Models	MAD	MAPE	RMSE	MER	RMSPE
24-step ahead predictions
SARIMA	0.295	64.346	0.382	0.607	0.551
TBATS	0.114	21.142	0.160	0.235	0.061
Reduced percentage (%)
SARIMA vs. TBATS	61.356	67.143	58.115	61.285	88.929
12-step ahead predictions
SARIMA	0.212	42.951	0.257	0.429	0.467
TBATS	0.087	16.101	0.113	0.177	0.193
Reduced percentage (%)
SARIMA vs. TBATS	58.962	62.513	56.031	58.741	58.672
36-step ahead predictions
SARIMA	0.258	55.527	0.396	0.546	0.758
TBATS	0.133	23.984	0.174	0.282	0.271
Reduced percentage (%)
SARIMA vs. TBATS	48.450	56.807	56.061	48.352	64.248
Note. SARIMA, seasonal autoregressive integrated moving average method; TBATS, an advanced innovation state-space modelling framework by combining Box-Cox transformations, Fourier series with time-varying coefficients and autoregressive moving average (ARMA) error correction; MAD, mean absolute deviation; MAPE, mean absolute percentage error; RMSE, root mean square error; MER, mean error rate; RMSPE, root mean square percentage error.

Table 1. The comparisons of the predicted results between the SARIMA model and the TBATS model on different testing datasets

Variables	Estimate	S.E.	t	P	Stationary R²	R²	AIC	BIC	Ljung-Box Q ₍₁₈₎
Variables	Estimate	S.E.	t	P	Stationary R²	R²	AIC	BIC	Statistics	P
SARIMA (0,1,0)(0,1,1)₁₂method developed on the basis of the data from January 2004 to December 2018
SMA1	0.782	0.072	10.873	< 0.001	0.300	0.944	−6.159	−6.146	15.980	0.525
SARIMA (0,1,0)(0,1,1)₁₂ method developed on the basis of the data from January 2004 to December 2017
SMA1	0.795	0.075	10.600	< 0.001	0.292	0.935	−6.166	−6.152	13.663	0.691
SARIMA (0,1,0)(0,1,1)₁₂ method developed on the basis of the data from January 2004 to December 2016
SMA1	0.771	0.084	9.230	< 0.001	0.288	0.922	−6.162	−6.127	14.064	0.663
Note. SARIMA seasonal autoregressive integrated moving average method, S.E. standard error, AIC Akaike's Information Criterion, BIC Schwarz's Bayesian Information Criterion, SMA1 seasonal moving average method at lag 1.

Table S5. The obtained best SARIMA models with their key parameters on the basis of different datasets

Figure S13. Comparative results of the forecasts based on the SARIMA and TBATS models. (A) The comparison between the 12-step ahead forecasts of the SARIMA model and the observed values. (B) The comparison between the 12-step ahead forecasts of the TBATS model and the observed values.

Figure S14. Comparative results of the forecasts based on the SARIMA and TBATS models. (A) The comparison between the 36-step ahead forecasts of the SARIMA model and the observed values. (B) The comparison between the 36-step ahead forecasts of the TBATS model and the observed values.

Figure S15. Comparative results of the forecasts based on the SARIMA and TBATS models. (A) The comparison between the 48-step ahead forecasts of the SARIMA model and the observed values. (B) The comparison between the 48-step ahead forecasts of the TBATS model and the observed values.

Figure S16. Comparative results of the forecasts based on the SARIMA and TBATS models. (A) The comparison between the 72-step ahead forecasts of the SARIMA model and the observed values. (B) The comparison between the 72-step ahead forecasts of the TBATS model and the observed values.

Sites	Models	Testing horizons
Sites	Models	MAD	MAPE	RMSE	MER
Liaoning province	12-step-ahead forecast
	SARIMA (0,1,0)(0,1,1)₁₂	0.681	8.983	0.858	0.697
	TBATS (0.073, {4,0}, 0.888, {<12,4>})	0.358	5.830	0.432	0.367
	Percentage reductions (%)
	TBATS vs. SARIMA	47.430	35.100	49.650	47.346
	24-step-ahead forecast
	SARIMA (0,1,0)(0,1,1)₁₂	0.330	42.453	0.509	0.326
	TBATS(0.121, {0,0}, 0.868, {<12,4>})	0.262	25.144	0.372	0.259
	Percentage reductions (%)
	TBATS vs. SARIMA	20.606	40.772	26.916	20.552
	36-step-ahead forecast
	SARIMA (0,1,0)(0,1,1)₁₂	2.303	275.181	3.207	2.393
	TBATS (0.052, {0,0}, 0.875, {<12,5>})	0.224	26.448	0.305	0.233
	Percentage reductions (%)
	TBATS vs. SARIMA	90.274	90.389	90.490	90.263
Heilongjiang province	12-step-ahead forecast
	SARIMA (0,1,0)(0,1,1)₁₂	0.227	51.808	0.262	0.383
	TBATS (0.151, {0,0}, 0.889, {<12,5>})	0.130	32.657	0.167	0.219
	Percentage reductions (%)
	TBATS vs. SARIMA	42.731	36.965	36.260	42.820
	24-step-ahead forecast
	SARIMA (0,1,0)(0,1,1)₁₂	0.402	74.176	0.562	0.655
	TBATS (0.17, {0,0}, 0.889, {<12,5>})	0.119	25.819	0.162	0.194
	Percentage reductions (%)
	TBATS vs. SARIMA	70.398	65.192	71.174	70.382
	36-step-ahead forecast
	SARIMA (0,1,0)(0,1,1)₁₂	1.448	253.072	2.114	2.357
	TBATS(0.161, {0,0}, 0.894, {<12,4>})	0.155	20.235	0.252	0.446
	Percentage reductions (%)
	TBATS vs. SARIMA	89.296	92.004	88.079	81.078
Shandong province	12-step-ahead forecast
	SARIMA (0,1,1)(0,1,1)₁₂	0.236	39.823	0.282	0.319
	TBATS(0, {0,0}, 0.932, {<12,5>})	0.175	19.390	0.280	0.237
	Percentage reductions (%)
	TBATS vs. SARIMA	25.847	51.310	0.709	25.705
	24-step-ahead forecast
	SARIMA (0,1,1)(0,1,1)₁₂	0.509	76.930	0.670	0.717
	TBATS (0, {0,0}, 0.932, {<12,5>})	0.168	19.222	0.268	0.237
	Percentage reductions (%)
	TBATS vs. SARIMA	66.994	75.014	60.000	66.946
	36-step-ahead forecast
	SARIMA (0,1,1)(0,1,1)₁₂	1.683	246.865	2.376	2.491
	TBATS (0, {0,0}, 0.926, {<12,5>})	0.312	43.277	0.401	0.461
	Percentage reductions (%)
	TBATS vs. SARIMA	81.462	82.469	83.123	81.493
Inner Mongolia	12-step-ahead forecast
	SARIMA (1,0,2)(0,1,1)₁₂	0.401	51.026	0.496	0.397
	TBATS (0.232, {0,0}, -, {<12,4>})	0.130	32.657	0.167	0.219
	Percentage reductions (%)
	TBATS vs. SARIMA	67.581	35.999	66.331	44.836
	24-step-ahead forecast
	SARIMA (1,0,2)(0,1,1)₁₂	0.319	38.984	0.418	0.308
	TBATS (0.232, {0,0}, -, {<12,4>})	0.166	16.508	0.236	0.161
	Percentage reductions (%)
	TBATS vs. SARIMA	47.962	57.654	43.541	47.727
	36-step-ahead forecast
	SARIMA (1,0,2)(0,1,1)₁₂	0.484	56.502	0.573	0.494
	TBATS (0.21, {0,0}, -, {<12,4>})	0.158	17.002	0.224	0.161
	Percentage reductions (%)
	TBATS vs. SARIMA	67.355	69.909	60.908	67.409
Note. SARIMA seasonal autoregressive integrated moving average method, TBATS an advanced innovation state-space modelling framework by combining Box-Cox transformations, Fourier series with time-varying coefficients and autoregressive moving average (ARMA) error correction, MAD mean absolute deviation, MAPE mean absolute percentage error, RMSE root mean square error, MER mean error rate.

Table S8. The comparisons of the predicted results between the SARIMA model and the TBATS model on different testing datasets

Models	Testing power
Models	MAD	MAPE	RMSE	MER	RMSPE
12-data ahead forecasting
SARIMA	144.734	25.049	161.671	0.203	0.296
TBATS	91.799	14.772	123.653	0.129	0.193
ETS	148.635	25.835	177.092	0.209	0.325
Decreased percentage (%)
TBATS vs. SARIMA	36.574	41.028	23.516	36.453	34.797
TBATS vs. ETS	38.239	42.822	30.176	38.278	40.615
24-data ahead forecasting
SARIMA	357.734	53.753	406.278	0.460	0.625
TBATS	260.619	38.407	301.441	0.335	0.469
ETS	279.014	44.604	307.731	0.359	0.536
Decreased percentage (%)
TBATS vs. SARIMA	27.147	28.549	25.804	27.174	24.960
TBATS vs. ETS	6.593	13.893	2.044	6.685	12.500
36-data ahead forecasting
SARIMA	290.782	42.156	365.570	0.334	0.560
TBATS	204.151	28.685	254.418	0.235	0.387
ETS	297.359	44.118	347.689	0.342	0.564
Decreased percentage (%)
TBATS vs. SARIMA	29.792	31.955	30.405	29.641	30.893
TBATS vs. ETS	31.345	34.981	26.826	31.287	31.383
60-data ahead forecasting
SARIMA	224.931	28.874	310.043	0.258	0.424
TBATS	183.780	21.558	235.665	0.211	0.250
ETS	146.437	16.449	212.028	0.168	0.223
Decreased percentage (%)
TBATS vs. SARIMA	18.295	25.338	23.990	18.217	41.038
TBATS vs. ETS	-20.319	-23.699	-10.030	-20.379	-10.800
84-data ahead forecasting
SARIMA	931.622	134.915	1016.813	1.040	1.679
TBATS	167.432	20.081	204.172	0.187	0.240
ETS	183.777	21.010	227.113	0.205	0.250
Decreased percentage (%)
TBATS vs. SARIMA	82.028	85.116	79.920	82.019	85.706
TBATS vs. ETS	8.894	4.422	10.101	8.780	4.000
108-data ahead forecasting
SARIMA	1483.868	202.729	1588.876	1.560	2.500
TBATS	208.775	22.849	287.536	0.220	0.310
ETS	268.726	30.177	354.045	0.283	0.386
Decreased percentage (%)
TBATS vs. SARIMA	85.930	88.729	81.903	85.897	87.600
TBATS vs. ETS	22.309	24.283	18.785	22.261	19.689
Note. SARIMA seasonal autoregressive integrated moving average method, TBATS an advanced innovation state-space modeling framework by combining Box-Cox transformations, Fourier series with time-varying coefficients and autoregressive moving average error correction, ETS Error-Trend-Seasonal framework, MAD mean absolute deviation, MAPE mean absolute percentage error, RMSE root mean square error, MER mean error rate, RMSPE root mean square percentage error.

Table S9. Comparisons of the out-of-sample forecasting powers between TBATS models and SARIMA models, along with ETS models

This study had some limitations. First, SF is a mild illness and has rarely led to death since the 20^th century ^[3]. Therefore, infected individuals with mild clinical manifestations sometimes fail to seek medical aid, resulting in under-reporting and under-diagnosis. Second, to ensure that the model obtained a satisfactory forecasting result, it is important to note that this model should be updated with new incidence data. Third, to investigate whether our TBATS model was adequate for estimating the SF epidemics in other study regions or other infectious diseases, much work is still needed. Finally, integrating the factors influencing the SF epidemics may improve the predictive power. Nevertheless, we are not able to perform such an analysis due to the unavailability of a multivariate TBATS method and SF-related factors.

In summary, SF had dual seasonal behaviors, peaking in May–June and November–December, with a recurrence in 2011 in China; since then it started to be increasing in the SF incidence. The TBATS method was advantageous in analyzing the long-term epidemiological seasonality and trends of SF, which can be considered a useful and flexible alternative to aid stakeholders to develop practical solutions to stop the ongoing spread of SF in China. In addition, we re-established the preferred TBATS (0.023, {0,0}, 0.895, {<12,5>}) specification based on the 16 years of data to predict the SF incidence into 2022, although the SF incidence was predicted to reach a plateau in the next 3 years [Supplementary Tables S1 and S10 (available in www.besjournal.com), and Figure 1], the SF incidence remained at a high level, suggesting that additional or comprehensive interventions must be developed to manage this evolving scenario.

Time	Forecasts	Lower 95% CI	Upper 95% CI	Time	Forecasts	Lower 95% CI	Upper 95% CI
Jan-20	0.479	0.330	0.695	Jul-21	0.426	0.147	1.204
Feb-20	0.224	0.143	0.349	Aug-21	0.220	0.073	0.647
Mar-20	0.380	0.229	0.627	Sep-21	0.250	0.081	0.751
Apr-20	0.627	0.360	1.085	Oct-21	0.412	0.132	1.249
May-20	0.843	0.463	1.521	Nov-21	0.616	0.195	1.890
Jun-20	0.930	0.489	1.750	Dec-21	0.816	0.254	2.543
Jul-20	0.443	0.221	0.879	Jan-22	0.434	0.130	1.400
Aug-20	0.228	0.108	0.474	Feb-22	0.204	0.059	0.686
Sep-20	0.259	0.119	0.557	Mar-22	0.351	0.100	1.186
Oct-20	0.424	0.190	0.935	Apr-22	0.584	0.166	1.988
Nov-20	0.632	0.275	1.429	May-22	0.791	0.222	2.722
Dec-20	0.835	0.354	1.938	Jun-22	0.878	0.242	3.073
Jan-21	0.443	0.180	1.069	Jul-22	0.421	0.111	1.533
Feb-21	0.208	0.081	0.524	Aug-22	0.218	0.055	0.823
Mar-21	0.357	0.137	0.914	Sep-22	0.248	0.062	0.953
Apr-21	0.593	0.223	1.543	Oct-22	0.409	0.102	1.574
May-21	0.801	0.296	2.126	Nov-22	0.612	0.151	2.369
Jun-21	0.889	0.320	2.412	Dec-22	0.811	0.199	3.172
Note. TBATS an advanced innovation state-space modelling framework by combining Box-Cox transformations, Fourier series with time-varying coefficients and autoregressive moving average (ARMA) error correction, CI, confidence interval.

Table S10. Projection into the future 36 months from January 2020 to December 2022 using the best TBATS (0.023, {0,0}, 0.895, {<12,5>}) model

Data Availability　All the data supporting the findings of the work are contained within the Supplementary Material (Supplementary Table S11, available in www.besjournal.com).

Figure S8. The extracted level, slope, and seasonal components of the best TBATS method developed on the basis of the data from January 2004 to December 2018.

Figure S9. Statistical checking for the residual series of the TBATS model developed on the basis of the data from January 2004 to December 2016. (A) Time series plot showing the residual series. (B) Autocorrelation function (ACF) plot. (C) Partial autocorrelation function (PACF) plot. The correlogram shows that although there are some residual ACF and PACF for the forecast errors that exceed the significance bounds, this is the best model used to estimate the upcoming 36-month epidemics of scarlet fever.

Figure S11. Statistical checking for the residual series of the SSARIMA model developed on the basis of the data from January 2004 to December 2018. (A) Time series plot showing the residual series. (B) Autocorrelation function (ACF) plot. (C) Partial autocorrelation function (PACF) plot. The correlogram shows that the ACF and PACF for the forecast errors do not exceed the significance bounds except for lags 11, 23, and 36, indicating that there is little evidence of non-zero autocorrelations at lags 1–36.

Figure S12. Statistical checking for the residual series of the SARIMA model developed on the basis of the data from January 2004 to December 2016. (A) Time series plot showing the residual series. (B) Autocorrelation function (ACF) plot. (C) Partial autocorrelation function (PACF) plot. The correlogram shows that the ACF and PACF for the forecast errors do not exceed the significance bounds except for lags 11 and 36, indicating that there is little evidence of non-zero autocorrelations at lags 1–36.

Continued
Parameters	48-step ahead forecasting	72-step ahead forecasting	Future 36-step ahead forecasting
Parameters	TBATS (0.037, {0,0}, 0.901, {<12,5>})	TBATS (0.029, {0,0}, 0.87, {<12,4>})	TBATS (0.023, {0,0}, 0.895, {<12,5>})
Seed states
1	−2.3577	−2.3794	−2.3993
2	0.1062	0.1064	0.1117
3	−0.0314	−0.0546	−0.0067
4	−0.0815	−0.1004	−0.0690
5	−0.0103	−0.0196	−0.0068
6	0.0220	0.0160	0.0300
7	0.0268	0.1411	0.0336
8	0.1425	−0.5940	0.1460
9	−0.5902	−0.0762	−0.6235
10	−0.0647	−0.1476	−0.0634
11	−0.1437	−	−0.1522
12	−0.0212	−	−0.0232
Sigma	0.1868	0.1947	0.1871
AIC	−197.7402	−203.9513	−160.8382
Note. TBATS, an advanced innovation state-space modelling framework by combining Box-Cox transformations, Fourier series with time-varying coefficients and autoregressive moving average (ARMA) error correction, AR autoregressive method, AIC Akaike's Information Criterion.

Variables	Estimate	S.E.	t	P	Stationary R²	R²	AIC	BIC	Ljung-Box Q (18)
Variables	Estimate	S.E.	t	P	Stationary R²	R²	AIC	BIC	Statistics	P
SARIMA (0,1,0)(0,1,1)₁₂ method developed on the basis of the data from January 2004 to December 2015
SMA1	0.790	0.094	8.394	< 0.001	0.277	0.915	−6.086	−6.072	12.564	0.765
SARIMA (0,1,0)(0,1,1)₁₂ method developed on the basis of the data from January 2004 to December 2013
SMA1	0.813	0.135	6.025	< 0.001	0.276	0.883	−6.055	−6.032	12.510	0.768
Note. SARIMA seasonal autoregressive integrated moving average method, S.E. standard error, AIC Akaike's Information Criterion, BIC Schwarz's Bayesian Information Criterion, SMA1 seasonal moving average method at lag 1.

Table S6. The obtained best SARIMA methods with their key parameters on the basis of different datasets

Models	Testing horizons
Models	MAD	MAPE	RMSE	MER	RMSPE
48-step ahead predictions
SARIMA	1.186	252.348	1.850	2.670	3.440
TBATS	0.098	18.307	0.139	0.220	0.222
Reduced percentage (%)
SARIMA vs. TBATS	91.737	92.745	92.486	91.760	93.547
72-step ahead predictions
SARIMA	0.800	171.381	1.511	1.901	2.810
TBATS	0.207	46.042	0.248	0.491	0.477
Reduced percentage (%)
SARIMA vs. TBATS	74.125	73.135	83.587	74.171	83.025
Note. SARIMA seasonal autoregressive integrated moving average method, TBATS an advanced innovation state-space modelling framework by combining Box-Cox transformations, Fourier series with time-varying coefficients and autoregressive moving average (ARMA) error correction, MAD mean absolute deviation, MAPE mean absolute percentage error, RMSE root mean square error, MER mean error rate, RMSPE root mean square percentage error.

Table S7. The comparisons of the predicted results between the SARIMA model and the TBATS model on different testing datasets

Date	Incidence	Date	Incidence	Date	Incidence	Date	Incidence	Date	Incidence	Date	Incidence
01-2004	0.0298	09-2006	0.101	05-2009	0.2623	01-2012	0.3294	09-2014	0.1772	05-2017	0.8192
02-2004	0.0452	10-2006	0.1647	06-2009	0.2782	02-2012	0.1675	10-2014	0.3011	06-2017	0.8006
03-2004	0.0945	11-2006	0.2809	07-2009	0.1508	03-2012	0.2466	11-2014	0.5149	07-2017	0.3811
04-2004	0.1639	12-2006	0.3291	08-2009	0.0821	04-2012	0.3281	12-2014	0.6298	08-2017	0.1711
05-2004	0.1809	01-2007	0.231	09-2009	0.0946	05-2012	0.5325	01-2015	0.499	09-2017	0.2162
06-2004	0.1896	02-2007	0.095	10-2009	0.0988	06-2012	0.5018	02-2015	0.2079	10-2017	0.3078
07-2004	0.113	03-2007	0.1356	11-2009	0.0798	07-2012	0.2523	03-2015	0.2756	11-2017	0.5717
08-2004	0.0619	04-2007	0.2474	12-2009	0.0961	08-2012	0.1047	04-2015	0.4342	12-2017	0.7641
09-2004	0.0742	05-2007	0.3435	01-2010	0.069	09-2012	0.1323	05-2015	0.6642	01-2018	0.5421
10-2004	0.103	06-2007	0.39	02-2010	0.0353	10-2012	0.2071	06-2015	0.7269	02-2018	0.1547
11-2004	0.2033	07-2007	0.1992	03-2010	0.0636	11-2012	0.3264	07-2015	0.3765	03-2018	0.2705
12-2004	0.2299	08-2007	0.0932	04-2010	0.1008	12-2012	0.3741	08-2015	0.1568	04-2018	0.4862
01-2005	0.1686	09-2007	0.1122	05-2010	0.1692	01-2013	0.2437	09-2015	0.2076	05-2018	0.7702
02-2005	0.0582	10-2007	0.1695	06-2010	0.1788	02-2013	0.0762	10-2015	0.3096	06-2018	0.768
03-2005	0.1182	11-2007	0.2543	07-2010	0.1138	03-2013	0.1536	11-2015	0.5007	07-2018	0.3859
04-2005	0.2182	12-2007	0.3153	08-2010	0.0714	04-2013	0.2258	12-2015	0.6459	08-2018	0.1602
05-2005	0.2504	01-2008	0.193	09-2010	0.0872	05-2013	0.3381	01-2016	0.4449	09-2018	0.1967
06-2005	0.2707	02-2008	0.0724	10-2010	0.1149	06-2013	0.3106	02-2016	0.15	10-2018	0.3738
07-2005	0.1344	03-2008	0.1655	11-2010	0.2399	07-2013	0.1756	03-2016	0.2843	11-2018	0.7114
08-2005	0.0801	04-2008	0.2286	12-2010	0.3053	08-2013	0.0858	04-2016	0.3584	12-2018	0.9025
09-2005	0.0891	05-2008	0.367	01-2011	0.2333	09-2013	0.1077	05-2016	0.5771	01-2019	0.6313
10-2005	0.1296	06-2008	0.2816	02-2011	0.0742	10-2013	0.158	06-2016	0.5923	02-2019	0.1851
11-2005	0.2129	07-2008	0.1343	03-2011	0.2074	11-2013	0.2689	07-2016	0.3078	03-2019	0.3657
12-2005	0.2361	08-2008	0.0688	04-2011	0.3728	12-2013	0.3929	08-2016	0.1396	04-2019	0.4974
01-2006	0.1239	09-2008	0.087	05-2011	0.6908	01-2014	0.2568	09-2016	0.1898	05-2019	0.649
02-2006	0.0645	10-2008	0.129	06-2011	0.7253	02-2014	0.0899	10-2016	0.2734	06-2019	0.718
03-2006	0.1345	11-2008	0.1831	07-2011	0.3839	03-2014	0.2113	11-2016	0.4409	07-2019	0.4188
04-2006	0.2117	12-2008	0.2363	08-2011	0.1614	04-2014	0.3171	12-2016	0.5504	08-2019	0.1711
05-2006	0.2406	01-2009	0.1079	09-2011	0.2088	05-2014	0.52	01-2017	0.3333	09-2019	0.2481
06-2006	0.2701	02-2009	0.0625	10-2011	0.369	06-2014	0.5525	02-2017	0.168	10-2019	0.3839
07-2006	0.1454	03-2009	0.1393	11-2011	0.63	07-2014	0.2832	03-2017	0.3394	11-2019	0.7296
08-2006	0.0768	04-2009	0.2292	12-2011	0.7196	08-2014	0.1336	04-2017	0.4895	12-2019	0.9323

Table 11. The SF incidence time series data from January 2004 to December 2019

Reference (10)

Supplements:

21432Supplementary Materials.pdf

[1]	Chen M, Cai J, Davies MR, et al. Increase of emm1 isolates among group A Streptococcus strains causing scarlet fever in Shanghai, China. Int J Infect Dis, 2020; 98, 305−14.
[2]	Li WT, Feng RH, Li T, et al. Spatial-temporal analysis and visualization of scarlet fever in mainland China from 2004 to 2017. Geospat Health, 2020; 15, 831.
[3]	Liu YH, Chan TC, Yap LW, et al. Resurgence of scarlet fever in China: a 13-year population-based surveillance study. Lancet Infect Dis, 2018; 18, 903−12.
[4]	Wang YB, Xu CJ, Yao SQ, et al. Estimating the prevalence and mortality of coronavirus disease 2019 (COVID-19) in the USA, the UK, Russia, and India. Infect Drug Resist, 2020; 13, 3335−50.
[5]	De Livera AM, Hyndman RJ, Snyder RD. Forecasting time series with complex seasonal patterns using exponential smoothing. J Am Stat Assoc, 2011; 106, 1513−27.
[6]	Armitage P, Berry G, Matthews JNS. Statistical methods in medical research. 4th ed. Blackwell Publishing. 2002, 127.
[7]	Lee CF, Cowling BJ, Lau EHY. Epidemiology of reemerging scarlet fever, Hong Kong, 2005-2015. Emerg Infect Dis, 2017; 23, 1707−10.
[8]	Lamagni T, Guy R, Chand M, et al. Resurgence of scarlet fever in England, 2014-16: a population-based surveillance study. Lancet Infect Dis, 2018; 18, 180−7.
[9]	Liu YH, Ding H, Chang ST, et al. Exposure to air pollution and scarlet fever resurgence in China: a six-year surveillance study. Nat Commun, 2020; 11, 4229.
[10]	Hyndman R, Athanasopoulos G, Bergmeir C, et al. Forecast: forecasting functions for time series and linear models. https://cran.r-project.org/web/packages/forecast/. [2022-05-10].

Now- and Fore-casting the Secular Epidemiological Trends and Seasonality of the Comeback of Scarlet Fever in China: A 16-year Time Series Analysis

doi: 10.3967/bes2022.076

References

Supplements

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related