-
Chang’an District and Hu County are HFRS-prevalent areas in the Guanzhong Plain of Shaanxi Province [27]. Notably, Hu County was upgraded to Huyi District in 2017, but we reference it as Hu County because our data were collected from 2014 to 2016. Both areas are located in the Weihe Basin in the central part of Shaanxi Province, bordering the Weihe River to the north (Figure 1). The terrain is high in the south and low in the north. The total area of Chang’an District and Hu County is 1,580 km2 and 1,282 km2, respectively. Chang’an District and Hu County border east to west, and both have a continental monsoon climate with four distinct seasons. The annual average temperature and the annual precipitation of Chang’an District are 15.5 °C and 600 mm, respectively, and those of Hu County are 13.5 °C and 627 mm, respectively. The two places have natural environmental similarities but considerable socioeconomic differences, with the socioeconomic conditions of Chang’an District far superior to those of Hu County [28]. Compared with Hu County, Chang’an District has a higher rate of urbanization, a more developed economy, a higher proportion of non-agricultural population, and a higher population density. The permanent population and Gross Domestic Product (GDP) of Chang’an District were 1,118,300 people and 51.388 billion yuan in 2015, and those of Hu County were 570,300 people and 15.541 billion yuan for the same year [28].
-
HFRS Data HFRS reported cases and incidence data from 2014 to 2016 in Shaanxi Province were obtained from the Chinese Center for Disease Control and Prevention. After desensitization, the information involved in patient privacy of all reported HFRS cases was strictly encrypted, and each case only included information such as gender, age, and the detailed address of their current residence. Geocoding the family addresses of HFRS cases yielded HFRS case-point data. Overall, 440 and 165 HFRS cases were reported in Chang’an District and Hu County from 2014 to 2016, respectively, with average annual crude incidence rates of 15.055/10−5 and 9.648/10−5, respectively.
Natural Environmental and Socio-economic Data According to previous studies on the influencing factors of the HFRS epidemic and the availability of data, information on environmental variables, including temperature, precipitation, normalized difference vegetation index (NDVI), digital elevation model (DEM), topography, land use, GDP, and population density (Table 1), were collected to build the MaxEnt model and analyze its influence on the HFRS epidemic. These variables effectively represent the climate characteristics, natural surface environment, and socioeconomic characteristics in the interior of the study area. Figure 2 shows a spatial distribution map of environmental variables in Chang’an District and Hu County in 2015. All environmental data were obtained from the Resource and Environmental Data Cloud Platform of the Resource and Environmental Science Data Center of the Chinese Academy of Sciences (http://www.resdc.cn/). The spatial resolution of all the variables was 1 km.
Factors Data Variable Time scale Data type Meteorological factors Temperature average annual temperature 2014–2016 Continuous Precipitation average annual precipitation 2014–2016 Continuous Landscape factors NDVI annual NDVI 2014–2016 Continuous DEM DEM Permanent Continuous Topography plain, platform, hill, mountain Permanent Categorical Socio-economic factors Land use type cultivated land, woodland, grassland, water area,
construction land, unused land2015 Categorical GDP 1 km grid GDP data 2015 Continuous Population density 1 km grid population density data 2015 Continuous Note. NDVI, normalized difference vegetation index; DEM, digital elevation model; GDP, Gross Domestic Product. Table 1. Variables used to construct MaxEnt model in this study
-
Maximum Entropy Model (MaxEnt Model) The MaxEnt model is a general-purpose machine learning technology based on the principle of maximum entropy that uses known case points and a set of predictor variables to estimate disease distributions [29-30]. In this study, we assumed that the study area
$ X $ was composed of a finite number of grid cells,$ \pi $ was the distribution of HFRS cases in the study area,$ \pi \left(x\right) $ was the value assigned to each unit$ x $ by the distribution of$ \pi $ , and their sum was 1. Given the various predictor variables$ {f}_{j}(j=\mathrm{1,2},\cdots ,n) $ (Equations 1 and 2), according to the approximate expected distribution of$ \pi $ , the entropy of$\widetilde{\pi }$ (Equation 3) gives the distribution with the largest entropy as the HFRS optimal distribution:$$ \widetilde{\pi }\left[{f}_{j}\right]=\frac{1}{m}{\sum }_{i=1}^{m}{f}_{j}\left({x}_{i}\right) $$ (1) $$ \widehat{\pi }\left[{f}_{j}\right]=\widetilde{\pi }\left[{f}_{j}\right] $$ (2) $$ H\left(\widehat{\pi }\right)=-{\sum }_{x\in X}\widehat{\pi }\left(x\right)\widehat{\mathit{ln}\pi }\left(x\right) $$ (3) where
$ H\left(\widehat{\pi }\right) $ is the entropy of the expected distribution$ \widehat{\pi } $ ,$ {x}_{i} $ is the i-th unit of$ X $ in the study area,$ m $ is the number of units,$ {f}_{j} $ is the environmental variable, and$\widetilde{\pi }\left[{f}_{j}\right]$ is the sample point where the environment variable is at the prior mean value under the distribution,$ \stackrel{~}{\pi } $ .The probability distribution of the maximum entropy method is the same as that of the Gibbs probability distribution (Equation 4), and both maximize the similarity of all the sample points and reduce the loss function (Equation 5). With the variables used by Maxent being empirical rather than determined actual values, overfitting the training data is possible. To reduce overfitting, it is necessary to relax the limits of the environmental variables appropriately (Equation 6). Simultaneously, to enable Maxent to effectively select important environmental variables, the Gibbs distribution that minimizes the logarithmic loss and limits the excessive weight of the environmental variables
$ {\lambda }_{j} $ (Equation 7) is used to represent the probability distribution of the maximum entropy value:$$ {q}_{\lambda }\left(x\right)=\frac{{e}^{\lambda \cdot f\left(x\right)}}{{Z}_{\lambda }} $$ (4) $$ \widetilde{\pi }\left[-\mathit{ln}({q}_{\lambda })\right] $$ (5) where
$ \lambda $ is a vector representing the weight of all environmental variables,$ f $ is a vector of all environmental variables, and$ {Z}_{\lambda } $ is a standardized constant to ensure that the sum of$ {q}_{\lambda } $ is 1.$$ \left|\widehat{\pi }\left[{f}_{j}\right]-\widetilde{\pi }\left[{f}_{j}\right]\right|\le {\beta }_{j} $$ (6) $$ \widetilde{\pi }\left[-\mathit{ln}({q}_{\lambda })\right]+{\sum }_{j}{\beta }_{j}\left|{\lambda }_{j}\right| $$ (7) Here,
$ {\beta }_{j} $ is a constant,$\widetilde{\pi }\left[-\mathit{ln}({q}_{\lambda })\right]$ is a loss function, and$ {\sum }_{j}{\beta }_{j}\left|{\lambda }_{j}\right| $ represents the restriction on the weight of the environmental variables.MaxEnt models were constructed by MaxEnt 3.3.3k software by importing the HFRS case distribution and environmental data into the “Samples” and “Environmental layers” of the model, respectively, and selecting 75% of the case distribution data as the training sample, with the remaining 25% of the case distribution data being used as the test sample. The model was run 10 times independently, with the model repeatedly generated by bootstrapping the training and test samples for each run. The maximum number of background points was set to 10,000, and the option to “add samples to background” was checked. The resulting output format was selected as “logistic,” and each grid value of the output result ranged from 0 to 1, representing the spread risk of the HFRS epidemic. The output result of the model was divided into two empirical thresholds generated by the model, that is, the “maximum training sensitivity plus specificity logistic threshold” and “balance training omission, predicted area, and threshold value logistic threshold”.
Model Construction and Evaluation To compare the similarities and differences of factors influencing HFRS epidemics in regions with different levels of economic development, MaxEnt models were built based on HFRS case data from 2014 to 2016 and meteorological, landscape, and socioeconomic data after correlation analysis, and the fitting effect of the model was tested according to the area under the curve (AUC).
A high degree of correlation between variables may affect the output results of the MaxEnt model [31-32]. There is a collinearity problem among the environmental variables participating in MaxEnt model construction. This study first calculated the correlation coefficients between all environmental variables (Figure 3). With the very high correlation between GDP and population density, the optimal model was selected based on the model fitting results, and the environmental variables were finally selected, including temperature, precipitation, NDVI, DEM, topography, land use type, and population density, to analyze the influencing factors of the HFRS epidemic.
Figure 3. Correlation coefficient between environmental variables in Chang'an District and Hu County. NDVI, normalized difference vegetation index; DEM, digital elevation model; GDP, Gross Domestic Product.
Figure 4 shows a flowchart of MaxEnt model construction. To analyze the risk factors influencing HFRS epidemic in Chang’an District and Hu County, the MaxEnt model was first constructed with Chang’an District and Hu County as a whole (Model 1) to analyze the dominant influencing factors of the epidemic situation in the overarching area. To analyze influencing factors other than topography, both northern plains were taken as a whole to construct a MaxEnt model (Model 2) to analyze the main influencing factors of HFRS spatial differentiation in the northern plains since more than 95% of the cases in Chang’an District and Hu County were located in their northern plains. As such, to compare the similarities and differences of factors influencing the HFRS epidemic in regions with different levels of economic development, two MaxEnt models were constructed for the northern plains of Chang’an District and Hu County (Models 3 and 4, respectively).
To evaluate the results of the model, we divided the HFRS case-distribution data into two parts. We used 75% of the HFRS case data from 2014 to 2016 as the training sample and the remaining 25% of the HFRS case data as the test sample. Meanwhile, during the modeling process, the training and test samples were combined with 10,000 background points to draw the receiver operating characteristic curve (ROC), and the AUC was calculated. Each value of the predicted result was used as a possible judgment threshold using the ROC curve, and the corresponding sensitivity and specificity were calculated. The false-positive rate, that is, the specificity, was plotted as the abscissa, and the true-positive rate, that is, the sensitivity, was plotted as the ordinate. AUC can be used as a measure of the prediction accuracy of the model, with its value ranging from 0 to 1. The larger the value, the stronger the judgment of the model [33]. It is generally considered that the diagnostic value is low when AUC is between 0.5 and 0.7, medium when AUC is between 0.7 and 0.9, and high when AUC is greater than 0.9 [34-36].
-
The annual HFRS incidences of Chang’an District were 6.305/105, 22.048/105, and 16.813/105 from 2014 to 2016, while those of Hu County were 5.451/105, 11.571/105, and 11.923/105 for the same period. The annual HFRS incidences in these areas were far higher than that in Shaanxi Province, which were 2.761/105, 3.706/105, and 2.460/105, respectively. According to the spatial distribution of HFRS cases, more than 95% of HFRS cases were in northern areas with altitudes of less than 800 m (Figure 5). In areas with densely distributed HFRS cases, the topography was dominated by plains, while there were a few cases scattered along the valley in the southern region. Meanwhile, the difference in the spatial distribution of the two places was that the distribution of HFRS cases in Chang’an District was closer to the southern mountainous area than in Hu County. The spatial distribution results showed that the HFRS cases in this area were densely distributed in the northern low-altitude plain.
-
Table 2 shows the training and test AUCs of the four MaxEnt models. Taking Chang’an District and Hu County as a whole (Model 1), the AUC value of the MaxEnt model reached 0.78. In the northern plains, where more than 95% of the cases were distributed (Model 2), the AUC value of the MaxEnt model reached 0.61. To analyze the effects of economic variables on HFRS epidemic in areas with different economic development stages, the model AUCs of the plains in Chang’an District (Model 3) and Hu County (Model 4) were 0.56 and 0.61, respectively. Considering the entire area (Model 1), the model was satisfactory, which indicated that the model we built in this area was efficient in explaining the influencing factors. Excluding landscape, the AUC of the other models (Models 2, 3, and 4) decreased; however, these models were also efficient in explaining the other key influencing factors in the plain areas owing to the satisfying effects of the whole area model.
Model Area Training AUC Test AUC Model 1 Chang’an District and Hu County as a whole 0.80 0.78 Model 2 The northern plains of Chang’an District and Hu County as a whole 0.65 0.61 Model 3 The northern plains of Chang’an District 0.65 0.56 Model 4 The northern plains of Hu County 0.69 0.61 Table 2. MaxEnt model results in different areas
The MaxEnt model results (Table 3) showed that topography type was the environmental variable with the highest contribution to the MaxEnt model in the whole area (Model 1), reaching 84.1%, which was a key variable influencing the spatial distribution of HFRS cases. Additionally, precipitation and population density contributed 8.9% and 5.4%, respectively, to the MaxEnt model. Excluding the impact of topography, the spatial distribution of the HFRS epidemic was closely related to the precipitation and population density in the northern plains of the total area (Model 2), with contribution rates were 60.7% and 28.0%, respectively. Considering only the northern plains of Chang’an District, precipitation and NDVI jointly affected the spatial distribution of HFRS epidemic (Model 3), with contribution rates reaching 66.4% and 25.2%, respectively. In the northern plains of Hu County, land-use type, temperature, precipitation, and population density had the highest contribution rates to the spatial distribution of the HFRS epidemic in the plains of Hu County (Model 4), reaching 32.3%, 31.2%, 16.9%, and 16.8%, respectively. These results showed that topography was the dominant factor influencing the HFRS epidemic in Chang’an District and Hu County and that precipitation and population density jointly affected the HFRS epidemic in the northern plains. Furthermore, in the northern plain of Chang’an District, the HFRS epidemic was primarily influenced by natural environmental factors while in the northern plain of Hu County, it was primarily influenced by socioeconomic factors and meteorological factors.
Variable Model 1, % (n) Model 2, % (n) Model 3, % (n) Model 4, % (n) Topography 84.1 (1) — — — Precipitation 8.9 (2) 60.7 (1) 66.4 (1) 16.9 (3) Population density 5.4 (3) 28.0 (2) 3.1 (4) 16.8 (4) NDVI 0.7 (4) 6.8 (3) 25.2 (2) 2.8 (5) DEM 0.4 (5) — — — Temperature 0.4 (6) 1.4 (5) 4.2 (3) 31.2 (2) Land use type 0.1 (7) 3.1 (4) 1.1 (5) 32.3 (1) Note. Model 1, 2, 3, and 4 were built for Chang’an District and Hu County as a whole, the plains of Chang’an District and Hu County, the plains of Chang’an District and the plains of Hu County, respectively. NDVI, normalized difference vegetation index; DEM, digital elevation model. Table 3. Contribution rates (%) and ranking of environmental variables in the MaxEnt models
-
The topography was the dominant factor in this area. From the HFRS risk values of different landform types (Figure 6A), plains, platforms, and hills were identified as the main risk factors influencing the HFRS epidemic, and mountains had the lowest risk of HFRS spreading.
Figure 6. Response curve of influencing factors in four models. (A) Model 1, (B) Model 2, (C) Model 3, and (F−I) Model 4 were constructed for Chang’an District and Hu County as a whole, the plains of Chang’an District and Hu County, the plains of Chang’an District, and the plains of Hu County, respectively. The influencing factors whose contribution rates of each model were higher than 10% are shown in this figure. When the variable was continuous, the response curve of the variable to the spread of the HFRS epidemic was drawn; when the variable was categorical, the risk value of each type of variable to the spread of the HFRS epidemic was drawn.
Removing topography effects, precipitation, and population density jointly affected the spatial differentiation of the HFRS epidemic in the northern plains. From the response curve of HFRS epidemic risk to different precipitation areas, Models 2 and 3 (Figures 6B and 6D) showed that the HFRS epidemic risk increased rapidly with an increase in precipitation and then slowed down. When the precipitation was 678 and 676 mm, the HFRS transmission risk reached its highest value, and then the risk of HFRS transmission decreased as precipitation increased. The response curve for Model 4 differed slightly (Figure 6H); that is, when the precipitation was lower than 677 mm, the risk of HFRS transmission in Hu County remained high. On the plain of Chang’an District and Hu County as a whole (Figure 6C), the risk of HFRS epidemic transmission increased as population density increased. When the population density reached 1,670 Person/km2, the risk of HFRS epidemic spread was greater, and the risk of HFRS epidemic spread decreased as the population density increased.
In addition to precipitation, the HFRS epidemic in the plain of Chang’an District was also closely related to the NDVI. As shown in Figure 6E, the spread risk of the epidemic in the low-NDVI area was higher than that in the high-NDVI area, and the peak transmission risk was mainly concentrated below 0.41 and approximately 0.87. Apart from precipitation, the HFRS epidemic in the plain of Hu County was also related to temperature, land use, and population density. According to the risk impact of HFRS on different land-use types (Figure 6F), construction and cultivated lands were the main risk areas for the spread of HFRS, followed by water areas, woodlands, and grassland. In terms of the HFRS epidemic transmission risk to different temperature areas (Figure 6G), the risk of epidemic transmission increased as the temperature increased. When the temperature was 15.51 °C, the epidemic transmission risk was the highest, and then the risk was reduced. In terms of the HFRS epidemic transmission risk to different population density areas (Figure 6I), the risk of an HFRS epidemic spreading intensified as population density increased. When the population density reached 1,123 Person/km2, the risk of HFRS epidemic spread was the highest, then the risk of HFRS epidemic spread remained relatively flat.
Spatial Heterogeneity and Influencing Factors of HFRS Epidemics in Rural and Urban Areas: A Study in Guanzhong Plain of Shaanxi Province, China
doi: 10.3967/bes2022.130
- Received Date: 2022-06-20
- Accepted Date: 2022-09-16
-
Key words:
- Hemorrhagic fever with renal syndrome (HFRS) /
- Spatial heterogeneity /
- Influencing factors /
- Economic development stages /
- Fine scale /
- Maximum Entropy model
Abstract:
Citation: | ZHU Ling Li, LI Yan Ping, LU Liang, LI Shu Juan, REN Hong Yan. Spatial Heterogeneity and Influencing Factors of HFRS Epidemics in Rural and Urban Areas: A Study in Guanzhong Plain of Shaanxi Province, China[J]. Biomedical and Environmental Sciences, 2022, 35(11): 1012-1024. doi: 10.3967/bes2022.130 |