-
Supplementary Table S1 (available in www.besjournal.com) summarizes the patient characteristics in the training and validation sets. There were 2,124 males and 2,485 females (53.92%) in the training set, and 943 males and 1,061 females (52.94%) in the validation set. In our study, the differences between the training and validation sets were not statistically significant, except for race, which suggests that the subjects in these two datasets had similar characteristics (P > 0.05).
Characteristic Training set (n = 4,609 ) Validation set (n = 2,004) P-value Sex, n (%) 0.455 Males 2,124 (46.08) 943 (47.06) Females 2,485 (53.92) 1,061 (52.94) Age (years), n (%) 0.721 20– 1,809 (39.25) 790 (39.42) 45– 1,149 (24.93) 494 (24.65) 60– 1,134 (24.60) 470 (23.45) ≥ 75 517 (11.22) 250 (12.48) Race/Ethnicity, n (%) 0.009 Mexican American 751 (16.29) 289 (14.42) Other hispanic 535 (11.61) 203 (10.13) Non-hispanic white 2,069 (44.89) 931 (46.46) Non-hispanic black 817 (17.73) 383 (19.11) Other/multiracial 437 (9.48) 198 (9.88) BMI (kg/m2), n (%) 0.166 < 25 1,251 (27.18) 573 (28.64) 25–30 1,531 (33.27) 674 (33.68) ≥ 30 1,820 (39.55) 754 (37.68) Educational level, n (%) 0.182 < High school 1,203 (26.15) 501 (25.00) High school 1,056 (22.95) 441 (22.01) > High school 2,342 (50.90) 1,062 (52.99) Annual household income (CNY), n (%) 0.081 < 20,000 933 (22.21) 408 (21.19) 20,000– 1,614 (36.11) 669 (34.76) 45,000– 850 (19.02) 340 (17.66) ≥ 75,000 1,013 (22.66) 508 (26.39) Smoking status, n (%) 0.446 Yes 1,906 (41.37) 801 (39.97) No 2,701 (58.63) 1,203 (60.03) Vigorous recreational activity, n (%) 0.350 Yes 895 (19.42) 406 (20.26) No 3,714 (80.58) 1,598 (79.74) Hypertension, n (%) 0.455 Yes 2,198 (47.69) 989 (49.35) No 2,411 (52.31) 1,015 (50.65) Diabetes, n (%) 0.362 Yes 981 (21.28) 434 (21.66) No 3,628 (78.72) 1,570 (78.34) Cholesterol, mg/dL 192.63 ± 41.61 193.59 ± 40.98 0.386 Uric Acid, mg/dL 5.45 ± 1.40 5.51 ± 1.44 0.112 High-density lipoprotein, mg/dL 52.26 ± 14.45 52.82 ± 15.35 0.155 Low-density lipoprotein, mg/dL 114.87 ± 35.60 115.39 ± 35.02 0.583 Average energy intake, kcal/day 1,898.36 ± 703.84 1,925.17 ± 696.28 0.153 Total dietary retinol intake, RAEs, μg/1,000 kcal/day 335.31 ± 288.18 330.68 ± 269.25 0.540 Animal-derived dietary retinol intake, RAEs, μg/1,000 kcal/day 127.36 ± 168.10 126.82 ± 183.63 0.907 Plant-derived dietary retinol intake, RAEs, μg/1,000 kcal/day 189.62 ± 231.38 186.58 ± 190.60 0.605 Note. BMI: Body Mass Index, RAEs: retinol activity equivalents. Table S1. Characteristics of the participants in training set and validation set
The individuals with NAFLD in the training set included 901 males (52.88%) and 803 females (47.12%) (Table 1). We found that risk predictors such as sex, age, race, BMI, education level, annual household income, smoking status, vigorous recreational activity, hypertension, diabetes, uric acid (UA), high-density lipoprotein (HDL), animal-derived dietary retinol intake, and plant-derived dietary retinol intake showed statistically significant differences between participants with NAFLD and healthy controls in the training set (P < 0.05). Participants with NAFLD tended to be older, were more likely to be Mexican Americans, and had lower educational levels, annual household incomes, vigorous recreational activity levels, and HDL and plant-derived dietary retinol intake than the control group. In addition, participants with NAFLD were more likely to be obese, smokers, have hypertension and diabetes, and have higher levels of serum UA and animal-derived dietary retinol intake than those without NAFLD (P < 0.01).
Characteristics NAFLD P-value No (n = 2,905) Yes (n = 1,704) Sex, n (%) < 0.001 Males 1,223 (42.10) 901 (52.88) Females 1,682 (57.90) 803 (47.12) Age (years), n (%) < 0.001 20– 1,317 (45.34) 492 (28.87) 45– 681 (23.44) 468 (27.46) 60– 606 (20.86) 528 (30.99) ≥ 75 301 (10.36) 216 (12.68) Race/ethnicity, n (%) < 0.001 Mexican American 344 (11.84) 407 (23.88) Other hispanic 328 (11.29) 207 (12.15) Non-hispanic white 1,271 (43.76) 798 (46.84) Non-hispanic black 636 (21.89) 181 (10.62) Other/multiracial 326 (11.22) 111 (6.51) BMI (kg/m2), n (%) < 0.001 < 25 1,172 (40.41) 79 (4.64) 25–30 1,087 (37.48) 444 (26.09) ≥ 30 641 (22.11) 1,179 (69.27) Educational level, n (%) < 0.001 < High school 624 (21.51) 579 (34.06) High school 665 (22.92) 391 (23.00) > High school 1,612 (55.57) 730 (42.94) Annual household income (CNY), n (%) < 0.001 < 20,000 539 (19.40) 394 (24.16) 20,000– 951 (34.22) 663 (40.65) 45,000– 558 (20.08) 292 (17.90) ≥ 75,000 731 (26.30) 282 (17.29) Smoking status, n (%) < 0.001 Yes 1,088 (37.48) 818 (48.00) No 1,815 (62.52) 886 (52.00) Vigorous recreational activity, n (%) < 0.001 Yes 709 (24.41) 186 (10.92) No 2,196 (75.59) 1,518 (89.08) Hypertension, n (%) < 0.001 Yes 1,133 (39.00) 1,065 (62.50) No 1,772 (61.00) 639 (37.50) Diabetes, n (%) < 0.001 Yes 343 (11.81) 638 (37.44) No 2,562 (88.19) 1,066 (62.56) Cholesterol, mg/dL 191.81 ± 41.45 194.03 ± 41.86 0.080 Uric Acid, mg/dL 5.13 ± 1.29 5.99 ± 1.42 < 0.001 High-density lipoprotein, mg/dL 56.00 ± 14.51 45.87 ± 11.91 < 0.001 Low-density lipoprotein, mg/dL 114.76 ± 35.28 115.08 ± 36.17 0.766 Average energy intake, kcal per day 1,892.93 ± 696.32 1,907.61 ± 716.59 0.494 Total dietary retinol intake, RAEs, μg/1,000 kcal per day 339.77 ± 292.55 327.69 ± 280.49 0.170 Animal-derived dietary retinol intake, RAEs, μg/1,000 kcal per day 122.52 ± 144.50 135.60 ± 201.85 0.011 Plant-derived dietary retinol intake, RAEs, μg/1,000 kcal per day 200.37 ± 253.68 171.29 ± 185.98 < 0.001 Note. Continuous variables are represented by mean ± standard deviation (SD). NAFLD: non-alcoholic fatty liver disease, BMI: Body Mass Index, RAEs: retinol activity equivalents. Table 1. Participant characteristics in training data set by NAFLD status
-
The predictors of NAFLD risk identified in the logistic regression analysis of the training set are shown in Table 2. In univariate logistic regression models, race (OR = 0.634, 95% CI: 0.588–0.684), vigorous recreational activity (OR = 0.679, 95% CI: 0.540–0.853) and HDL (OR = 0.955, 95% CI: 0.948–0.961) were inversely associated with NAFLD risk. Moreover, there were significant positive correlations between NAFLD risk and the following predictors: age (OR = 1.251, 95% CI: 1.139–1.374), BMI (OR = 4.386, 95% CI: 3.870–4.971), smoking (OR = 1.353, 95% CI: 1.149–1.594), hypertension (OR = 1.546, 95% CI: 1.289–1.855), diabetes (OR = 2.423, 95% CI: 1.985–2.957) and UA (OR = 1.291, 95% CI: 1.212–1.375).
Variables B SE Wald χ2 P OR (95% CI) Age 0.224 0.048 22.004 < 0.001 1.251 (1.139–1.374) BMI 1.478 0.064 535.961 < 0.001 4.386 (3.870–4.971) Race/ethnicity −0.455 0.038 140.054 < 0.001 0.634 (0.588–0.684) Smoking 0.302 0.084 13.085 < 0.001 1.353 (1.149–1.594) Vigorous recreational activity −0.387 0.117 11.018 0.001 0.679 (0.540–0.853) Hypertension 0.436 0.093 21.982 < 0.001 1.546 (1.289–1.855) Diabetes 0.885 0.102 75.776 < 0.001 2.423 (1.985–2.957) HDL −0.046 0.004 172.778 < 0.001 0.955 (0.948–0.961) Uric Acid 0.256 0.032 63.325 < 0.001 1.291 (1.212–1.375) Note. NAFLD: non-alcoholic fatty liver disease, SE: standard error, OR: Odds ratios, CI: confidence intervals, BMI: Body Mass Index, HDL: High-density lipoprotein. Table 2. Analysis of risk factors for NAFLD using univariate logistic regression model
The weighted ORs (95% CIs) for NAFLD as a dichotomous outcome based on the quartiles of total dietary retinol intake, animal-, and plant-derived dietary retinol intake are shown in Table 3. In the table, outcomes are given as classified variables (quartiles) for total dietary retinol intake, animal-derived dietary retinol intake, and plant-derived dietary retinol intake due to evidence of nonlinearity in some situations. Nevertheless, the P-value for the trend computed from the models was included in our analyses, which included dietary retinol as a reference for continuous exposure. In univariate logistic regression models, we found that the ORs (95% CIs) of NAFLD for the highest quartile of plant-derived dietary retinol intake (OR = 0.75, 95% CI: 0.57–0.99) were inversely associated with NAFLD risk compared to the lowest quartile of intake by adjusting for confounding factors such as sex, age, race, education level, smoking status, recreational activities, income level, hypertension, diabetes, BMI, LDL, UA and TC (model 2).
Variables Crude OR (95% CI) P-trend Model 1 OR (95% CI) P-trend Model 2 OR (95% CI) P-trend Total dietary retinol intake
(RAEs, μg/1,000 kcal per day)0.005 0.001 0.176 < 190.54 1.00 (ref.) 1.00 (ref.) 1.00 (ref.) 190.54–284.02 1.11 (0.89–1.38) 1.02 (0.81–1.28) 1.10 (0.80–1.51) 284.02–422.22 1.12 (0.94–1.34) 1.01 (0.82–1.25) 1.03 (0.79–1.34) ≥ 422.22 0.79 (0.66–0.95)* 0.70 (0.57–0.87)** 0.86 (0.65–1.15) Animal-derived dietary retinol
intake (RAEs, μg/1,000 kcal per day)0.339 0.853 0.559 < 59.37 1.00 (ref.) 1.00 (ref.) 1.00 (ref.) 59.37–105.57 1.04 (0.83–1.29) 0.97 (0.78–1.19) 1.03 (0.80–1.33) 105.57–165.14 1.33 (1.07–1.67)* 1.25 (1.00–1.56)* 1.21 (0.90–1.61) ≥ 165.14 1.07 (0.85–1.36) 0.98 (0.76–1.26) 0.90 (0.63–1.26) Plant-derived dietary retinol intake
(RAEs, μg/1,000 kcal per day)< 0.001 < 0.001 0.042 < 70.22 1.00 (ref.) 1.00 (ref.) 1.00 (ref.) 70.22–135.01 0.89 (0.73–1.09) 0.89 (0.72–1.10) 0.95 (0.72–1.27) 135.01–253.74 0.84 (0.67–1.05) 0.81 (0.64–1.04) 0.91 (0.67–1.24) ≥ 253.74 0.64 (0.50–0.82)** 0.60 (0.46–0.79)** 0.75 (0.57–0.99)* Note. Model 1 is adjusted for sex and age. Model 2 was adjusted for sex, age, race, education level, smoking status, physical activity, income level, hypertension, diabetes, BMI, LDL, UA, and TC levels. The lowest dietary retinol intake quartile was used as the reference group. The results are survey-weighted. *P < 0.05, **P < 0.01. Tests for trends based on variables containing median values for each quartile. NAFLD: non-alcoholic fatty liver disease, OR: Odds ratios, CI: confidence intervals, RAEs: retinol activity equivalents. Table 3. Weighted ORs and 95% CIs for NAFLD according to dietary retinol intake quartile (μg /1,000 kcal per day) using the univariate logistic regression model
-
An ANN model was established based on the NAFLD risk predictors obtained from the logistic regression analysis. The input variables for the ANN model included age, race, BMI, smoking status, recreational activities, hypertension, diabetes, HDL, UA, and plant-derived dietary retinol intake. The output variable is a binary variable indicating whether an individual has NAFLD. The structure of the BP neural network consists of three layers (Figure 2). These parameters were selected based on previous studies[36,37]. We set the training parameters (e.g., learning rate and momentum) to their default values. The Levenberg-Marquardt algorithm was used as the training function. The neural network was trained for over 100 epochs. It is usually optimal to remove 20% of the input units and 50% of the hidden units because this simple method can prevent the overfitting of neural networks[60]. To ensure that the output was not heavily skewed toward the dominant class, each data point was weighted according to its outcome ratio. The ANN model corresponding to the prediction variable, which was the probability of having NAFLD, had 10 neurons in the input layer, seven neurons in the hidden layer, and one neuron in the output layer.
-
Figure 3 shows the areas under the ROC curves for the training and validation sets of the ANN model. The area under the receiver operating characteristic curve (AUC) was 0.874 and 0.883 for the training and validation sets, respectively. Therefore, a well-trained ANN model with high accuracy and large AUC can successfully predict the individual risk of NAFLD. The cutoff values of the incidence probability of NAFLD were 0.388 in the training set and 0.427 in the validation set, indicating that NAFLD will occur when the probability of incidence is greater than 0.388.
Figure 3. The receiver operating characteristic (ROC) curves obtained from the artificial neural network (ANN) model in training and test sets. AUC: area under the curve.
Table 4 shows that the accuracy indices of the training set and validation set are 0.807 and 0.800 for the ANN, respectively. The Se, Sp, and Youden index of the training set and validation set were 0.804, 0.785, and 0.589 and 0.793, 0.829, and 0.622 for the ANN, respectively. The AUC of the ANN model were 0.874 for the training set and 0.883 for the validation set. The accuracy, Se, Sp, Youden index and AUC values were 0.798, 0.697, 0.856, 0.553 and 0.871 for logistic regression, respectively. So, the ability of the ANN to predict the risk of NAFLD was significantly greater than that of logistic regression model.
Indicator ANN Logistic regression Training set (n = 4,609) Validation set (n = 2,004) Accuracy 0.807 0.800 0.798 Sensitivity 0.804 0.793 0.697 Specificity 0.785 0.829 0.856 Yuden index 0.589 0.622 0.553 AUC (95% CI) 0.874 (0.864–0.884) 0.883 (0.868–0.898) 0.871 (0.861–0.881) Note. ANN: artificial neural network, AUC: area under the curve. Table 4. The performance of artificial neural network (ANN) and logistic model
HTML
Participant Characteristics
Predictors of NAFLD Risk
Prediction Models
Discriminatory Ability of Models
23060+Supplementary Materials.pdf |