LI Qiang; SHEN Qun Lun; XU Chao Nan; LI Ming Liang; MA Zhi Min; ZHANG Shan; CHEN Shuo; ZHANG Min Ying; ZHANG Jing Bo; LIU Feng; YANG Xing Hua

doi:10.3967/bes2020.116

A Self-test Prediction Model to Determine the Probability Risk of Non-alcoholic Fatty Liver Disease

Beijing Physical Examination Center, Beijing 100077, China

Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China

Medical Engineering Department, Peking University Third Hospital, Beijing 100191, China

School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China

Cente Of Quality Control and Improvement for Physical Check Up, Beijing Municipality, Beijing 100077, China

School of Medicine, Nankai University, Tianjin 300071, China

Funds: This study was supported by the National Key R&D Program of China [2016YFC0900600/2016YFC0900604]; The Capital Health Development Special Fund, Health Management Cohort Study Based on Beijing Health Checkup Data [2018-1-2241]; Database source: China Cohort Consortium, the Beijing Health Management Cohort Study [CCC2018050804]

More Information

Author Bio:
LI Qiang, male, born in 1981, majoring in health management and promotion

SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

Corresponding author: LIU Feng, Tel/Fax: 86-10-8391-1779, E-mail: liufeng@bjtjzx.com; YANG Xing Hua, E-mail: xinghuay@sina.com

*This study was approved by the Beijing Physical Examination Center Institutional Review Board.,&The authors contributed equally to this paper.

Variables

Normal (n = 13,459)

NAFLD (n = 1,883)

P value^*

Age (years)

39.41 ± 13.09

43.66 ± 13.21

< 0.0001

WBC (× 10⁹/L)

6.17 ± 1.46

6.75 ± 1.55

< 0.0001

TG (mmol/L)

1.09 ± 0.69

1.71 ± 1.28

< 0.0001

ALT (U/L)

17.14 ± 13.48

25.60 ± 32.15

< 0.0001

AST (U/L)

18.60 ± 7.96

21.40 ± 16.39

< 0.0001

HCT (%)

41.90 ± 3.85

44.34 ± 3.57

< 0.0001

RBC (× 10⁹/L)

4.75 ± 0.44

5.02 ± 0.43

< 0.0001

Cr (μmoI/L)

60.33 ± 13.20

67.88 ± 14.62

< 0.0001

BUN (mmol/L)

4.68 ± 1.23

5.09 ± 1.24

< 0.0001

UA (umol/L)

282.30 ± 81.85

353.40 ± 87.11

< 0.0001

MCV (fL)

88.38 ± 4.49

88.50 ± 4.43

0.2886

MCH (pg)

29.97 ± 1.95

30.34 ± 1.85

< 0.0001

MCHC (g/L)

339.00 ± 11.22

342.70 ± 11.13

< 0.0001

Hb (g/L)

142.20 ± 15.24

152.10 ± 13.87

< 0.0001

FPG (mmol/L)

5.23 ± 0.81

5.58 ± 1.22

< 0.0001

PLT (× 10⁹/L)

228.50 ± 51.43

227.10 ± 51.08

0.2771

TC (mmol/L)

4.63 ± 0.86

4.84 ± 0.96

< 0.0001

LDL-C (mmol/L)

2.45 ± 0.67

2.67 ± 0.73

< 0.0001

HDL-C (mmol/L)

1.42 ± 0.29

1.22 ± 0.24

< 0.0001

RDW

12.92 ± 0.90

12.84 ± 0.71

0.0002

Lymph (%)

34.95 ± 7.50

34.18 ± 7.44

< 0.0001

SBP (mmHg)

112.80 ± 14.20

121.50 ± 14.11

< 0.0001

DBP (mmHg)

75.12 ± 9.27

81.38 ± 9.49

< 0.0001

BMI (kg/m²)

22.36 ± 2.76

25.68 ± 2.63

< 0.0001

PCT (%)

0.23 ± 0.05

0.4969

WHR

0.81 ± 0.06

0.87 ± 0.05

< 0.0001

VAI

1.24 ± 1.02

2.07 ± 1.90

< 0.0001

Neut (%)

57.27 ± 7.94

57.76 ± 7.81

0.0122

Non-HDL-C

3.21 ± 0.81

3.62 ± 0.90

< 0.0010

Male (%)

5,503 (78.63)

1,496 (21.37)

< 0.0001

Overweight (%)

2,281 (67.37)

1,105 (32.60)

< 0.0001

Ms (%)

270 (66.34)

137 (33.66)

< 0.0001

Diabetes (%)

271 (71.50)

108 (28.50)

< 0.0001

Impaired FPG (%)

412 (72.03)

160 (27.97)

< 0.0001

　　Note. WBC, White blood cell count; ALT, alanine aminotransferase; AST, aspartate aminotransferase; HCT, Hematocrit; RBC, Red blood cell count; Cr, Creatinine; BUN, Blood urea nitrogen; MCV, Mean red blood cell volume; MCH, Mean erythrocyte hemoglobin content; MCHC, Mean erythrocyte hemoglobin concentration; Hb, Hemoglobin; FPG, fasting plasma glucose; PLT, Platelet count; RDW, RBC distribution width; Lymph, Percentage of lymphocytes; SBP, systolic blood pressure; DBP, diastolic blood pressure; PCT, Platelet hematocrit; WHR, Waist-to-hip ratio; Neut, Neutrophilic granulocyte. Data are expressed as the mean ± standard deviation, n (%). ^*P < 0.05 was considered statistically significant.

Variables

Wald χ² test

P value

OR (95% CI)

Constant term

−15.7878

469.4885

< 0.0001

Age

0.0109

17.6484

< 0.0001

1.011 (1.006, 1.016)

WBC

0.0406

4.1928

0.0406

1.041 (1.002, 1.083)

0.4990

25.2471

< 0.0001

1.647 (1.356, 2.001)

ALT

0.0197

41.1115

< 0.0001

1.020 (1.014, 1.026)

AST

−0.0175

8.8296

0.0030

0.983 (0.971, 0.997)

RBC

0.1757

4.5543

0.0328

1.192 (1.014, 1.401)

−0.00852

10.6270

0.0011

0.992 (0.986, 0.997)

0.00347

68.5765

< 0.0001

1.003 (1.003, 1.004)

FBG

0.0717

7.1301

0.0076

1.074 (1.019, 1.132)

HDL-C

−1.1027

54.9475

< 0.0001

0.332 (0.248, 0.444)

DBP

0.0123

14.8173

0.0001

1.012 (1.006, 1.019)

BMI

0.1995

133.4437

< 0.0001

1.221 (1.180, 1.263)

PCT

1.8978

9.1519

0.0025

6.671 (1.951, 22.815)

WHR

3.9170

18.4845

< 0.0001

50.250 (8.426, 299.669)

0.0294

12.6515

0.0004

1.030 (1.013, 1.047)

VAI

−0.2286

11.8039

0.0006

0.796 (0.698, 0.906)

Non-HDL-C

0.1409

14.5095

0.0001

1.151 (1.071, 1.238)

Model

Sensitivity

Specificity

AUC

Logistic regression

0.222

0.977

0.600

Linear discriminant analysis

0.251

0.971

0.611

Random forest

0.807

0.747

0.777

K-nearest neighbors

0.210

0.948

0.579

SVM

0.752

0.756

0.754

LightGBM

0.860

0.720

0.790

A Self-test Prediction Model to Determine the Probability Risk of Non-alcoholic Fatty Liver Disease

基金项目: This study was supported by the National Key R&D Program of China [2016YFC0900600/2016YFC0900604]; The Capital Health Development Special Fund, Health Management Cohort Study Based on Beijing Health Checkup Data [2018-1-2241]; Database source: China Cohort Consortium, the Beijing Health Management Cohort Study [CCC2018050804]

作者简介:
LI Qiang, male, born in 1981, majoring in health management and promotion

SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

通讯作者: LIU Feng, Tel/Fax: 86-10-8391-1779, E-mail: liufeng@bjtjzx.com; YANG Xing Hua, E-mail: xinghuay@sina.com

收稿日期: 2020-05-26

录用日期: 2020-09-09

网络出版日期: 2020-12-22

刊出日期: 2020-11-30

注释:

A Self-test Prediction Model to Determine the Probability Risk of Non-alcoholic Fatty Liver Disease

1. Beijing Physical Examination Center, Beijing 100077, China

2. Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China

3. Medical Engineering Department, Peking University Third Hospital, Beijing 100191, China

4. School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China

5. Cente Of Quality Control and Improvement for Physical Check Up, Beijing Municipality, Beijing 100077, China

6. School of Medicine, Nankai University, Tianjin 300071, China

Author Bio:
LI Qiang, male, born in 1981, majoring in health management and promotion

SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

Corresponding author: LIU Feng, Tel/Fax: 86-10-8391-1779, E-mail: liufeng@bjtjzx.com; YANG Xing Hua, E-mail: xinghuay@sina.com

*This study was approved by the Beijing Physical Examination Center Institutional Review Board.,&The authors contributed equally to this paper.

Received Date: 2020-05-26

Accepted Date: 2020-09-09

Available Online: 2020-12-22

Publish Date: 2020-11-30

*This study was approved by the Beijing Physical Examination Center Institutional Review Board.,&The authors contributed equally to this paper.

全文HTML

Non-alcoholic fatty liver disease (NAFLD) is the most common liver disease in Asia. Several studies have shown that NAFLD is associated with obesity, insulin resistance, type 2 diabetes, hypertension, hyperlipidemia, and metabolic syndrome (MS)^[1]. Several studies concluded that indicators such as body mass index (BMI), waist circumference (WC), uric acid (UA), and visceral adiposity index (VAI) are closely related to the occurrence of non-alcoholic fatty liver^[2]. NAFLD is usually diagnosed by ultrasonography in a hospital. However, the pre-diagnosis of NAFLD remains unclear, and less attention is paid in predicting the onset of NAFLD^[3].

NAFLD is also an ongoing process. If no measures are taken, it can develop into irreversible diseases such as cirrhosis, hepatic carcinoma. In addition, some studies have confirmed that non-alcoholic fatty liver can be alleviated by adapting diet control and regular exercise, suggesting that it is significant to determine individuals who may have NAFLD and guide them to reduce their risk of NAFLD^[4]. Thus, we plan to develop an efficient prediction model using an electronic health record database to enable individuals to self-test their risk probability of developing NAFLD 1 year in advance.

Data of 73,720 individuals who underwent health checkups at Beijing Physical Examination Center from January 1, 2015 to December 31, 2016 were collected. Finally, 15,342 participants were recruited in this study (Supplementary Figure S1, available in www.besjournal.com).

Figure S1. A schematic diagram of the study participants

The research data were collected in a unified manner and included a consultation, physical examination, laboratory testing, and ultrasound (US) examination. The consultation included the assessment of patients’ personal medical history and alcohol consumption history. The physical examination included measuring the patients’ blood pressure, height, weight, and WC. The laboratory tests included measurements of patients’ total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), triglyceride (TG), and low-density lipoprotein cholesterol (LDL-C) levels and several conventional test indicators.

We combined the US diagnosis of fatty liver and the alcohol intake consultation to diagnose NAFLD. The Chinese criteria for the US diagnosis of fatty liver were used in this study. People were identified following the previously published diagnostic criteria^[5]: (1) slight diffuse increase and bright homogeneous echo pattern in the liver parenchyma with normal visualization of the diaphragm and portal and hepatic vein borders and a normal hepatorenal echogenicity contrast, (2) diffuse increase in the bright echoes in the liver parenchyma with slightly impaired visualization of the peripheral portal and hepatic vein borders, and (3) marked increase in the bright echoes at a shallow depth with deep attenuation, impaired visualization of the diaphragm, and marked vascular blurring. The thresholds for alcohol consumption were < 140 g/week for men and < 70 g/week for women.

The diagnostic criterion for overweight was according to the World Health Organization’s standard of BMI higher than 25 kg/m². The diagnostic criteria for diabetes and prediabetes were based on venous samples and laboratory methods. We used the harmonized definition of the MS established in 2009, with at least 3 or more of the following criteria required for diagnosis.

The BMI and VAI are usually calculated to measure the degree of obesity, and they have been proven to be associated with NAFLD. Thus, we added the other two parameters in our analysis.

VAI = WC (cm)/(39.68 + 1.88 × BMI) × (TG/1.03) × (1.31/HDL-C) for men; VAI = WC (cm)/(36.58 + 1.89 × BMI) × (TG/0.81) × (1.52/HDL−C) for women

The demographic and clinical biochemical indexes were described in all participants using the mean ± standard deviation for continuous variables and percentages for categorical variables. Subsequently, we used the logistic regression, linear discriminant analysis, random forest, K-nearest neighbors, support vector machine (SVM), and lightGBM to construct the predictive model.

LightGBM is a tree-based ensemble learning algorithm, which is one of the most efficient gradient boosting decision tree (GBDT) algorithms. LightGBM is an effective GBDT algorithm with good accuracy and scalability, fast training speed, and low memory usage. Here, viewing our problem as a classification task, we used cross-entropy loss as a loss function described below:

$$ loss=-\sum\limits_{i=i}^{n}{y}_{i}log\left({p}_{i}\right)+\lambda (1-{y}_{i})log\left({p}_{i}\right) $$

Here, the outcome variable $ {p}_{i} $ is the probability of developing NAFLD within a year of sample i, while $ {y}_{i} $ is the ground truth label of sample I, which is known in advance, and $ \lambda$ is a hyperparameter that balances the loss between the negative and positive samples.

To achieve stable and reliable evaluation, we used 10-fold cross-validation as our training and testing process. Specifically, we uniformly split all our data into 10 parts. Nine of them were used as a training set, and the remaining was used as a testing set. We repeated this process 10 times and used the averaged evaluation index as the result.

The sensitivity, specificity, and the area under the receiver operating curve (AUC) were used to compare the ability of predicting NAFLD. All statistical analyses were performed using the Statistical Analysis System version 9.4 and Python version 3.6.2. The two-sided statistical significance level was set at α = 0.05.

In 73,720 participants, the prevalence of NAFLD in 2015 was 42.97%. A total of 15,342 participants who did not have NAFLD in 2015 were selected and followed up to observe their indicator change in 2016. The incidence of NAFLD in this population was 12.27%. Among them, 8,343 (54.38%) participants were female. Table 1 shows the baseline characteristics of the participants. The differences between the NAFLD group and the non-NAFLD group were statistically significant.

Table 1. Baseline characteristics of participants grouped by non-alcoholic fatty liver disease

Variables	Normal (n = 13,459)	NAFLD (n = 1,883)	P value^*
Age (years)	39.41 ± 13.09	43.66 ± 13.21	< 0.0001
WBC (× 10⁹/L)	6.17 ± 1.46	6.75 ± 1.55	< 0.0001
TG (mmol/L)	1.09 ± 0.69	1.71 ± 1.28	< 0.0001
ALT (U/L)	17.14 ± 13.48	25.60 ± 32.15	< 0.0001
AST (U/L)	18.60 ± 7.96	21.40 ± 16.39	< 0.0001
HCT (%)	41.90 ± 3.85	44.34 ± 3.57	< 0.0001
RBC (× 10⁹/L)	4.75 ± 0.44	5.02 ± 0.43	< 0.0001
Cr (μmoI/L)	60.33 ± 13.20	67.88 ± 14.62	< 0.0001
BUN (mmol/L)	4.68 ± 1.23	5.09 ± 1.24	< 0.0001
UA (umol/L)	282.30 ± 81.85	353.40 ± 87.11	< 0.0001
MCV (fL)	88.38 ± 4.49	88.50 ± 4.43	0.2886
MCH (pg)	29.97 ± 1.95	30.34 ± 1.85	< 0.0001
MCHC (g/L)	339.00 ± 11.22	342.70 ± 11.13	< 0.0001
Hb (g/L)	142.20 ± 15.24	152.10 ± 13.87	< 0.0001
FPG (mmol/L)	5.23 ± 0.81	5.58 ± 1.22	< 0.0001
PLT (× 10⁹/L)	228.50 ± 51.43	227.10 ± 51.08	0.2771
TC (mmol/L)	4.63 ± 0.86	4.84 ± 0.96	< 0.0001
LDL-C (mmol/L)	2.45 ± 0.67	2.67 ± 0.73	< 0.0001
HDL-C (mmol/L)	1.42 ± 0.29	1.22 ± 0.24	< 0.0001
RDW	12.92 ± 0.90	12.84 ± 0.71	0.0002
Lymph (%)	34.95 ± 7.50	34.18 ± 7.44	< 0.0001
SBP (mmHg)	112.80 ± 14.20	121.50 ± 14.11	< 0.0001
DBP (mmHg)	75.12 ± 9.27	81.38 ± 9.49	< 0.0001
BMI (kg/m²)	22.36 ± 2.76	25.68 ± 2.63	< 0.0001
PCT (%)	0.23 ± 0.05	0.23 ± 0.05	0.4969
WHR	0.81 ± 0.06	0.87 ± 0.05	< 0.0001
VAI	1.24 ± 1.02	2.07 ± 1.90	< 0.0001
Neut (%)	57.27 ± 7.94	57.76 ± 7.81	0.0122
Non-HDL-C	3.21 ± 0.81	3.62 ± 0.90	< 0.0010
Male (%)	5,503 (78.63)	1,496 (21.37)	< 0.0001
Overweight (%)	2,281 (67.37)	1,105 (32.60)	< 0.0001
Ms (%)	270 (66.34)	137 (33.66)	< 0.0001
Diabetes (%)	271 (71.50)	108 (28.50)	< 0.0001
Impaired FPG (%)	412 (72.03)	160 (27.97)	< 0.0001
*Note.* WBC, White blood cell count; ALT, alanine aminotransferase; AST, aspartate aminotransferase; HCT, Hematocrit; RBC, Red blood cell count; Cr, Creatinine; BUN, Blood urea nitrogen; MCV, Mean red blood cell volume; MCH, Mean erythrocyte hemoglobin content; MCHC, Mean erythrocyte hemoglobin concentration; Hb, Hemoglobin; FPG, fasting plasma glucose; PLT, Platelet count; RDW, RBC distribution width; Lymph, Percentage of lymphocytes; SBP, systolic blood pressure; DBP, diastolic blood pressure; PCT, Platelet hematocrit; WHR, Waist-to-hip ratio; Neut, Neutrophilic granulocyte. Data are expressed as the mean ± standard deviation, n (%). ^*P < 0.05 was considered statistically significant.

The results of the multiple logistic regression analysis showed that age, WBC, TG, ALT, RBC count, UA, FPG, DBP, BMI, PCT, WHR, WC, and non-HDL-C were predictive factors of NAFLD, whereas AST, Cr, HDL-C, and VAI were protective factors (Table 2).

Table 2. Multivariate logistic regression model of non-alcoholic fatty liver disease

Variables	β	Wald χ² test	P value	OR (95% CI)
Constant term	−15.7878	469.4885	< 0.0001
Age	0.0109	17.6484	< 0.0001	1.011 (1.006, 1.016)
WBC	0.0406	4.1928	0.0406	1.041 (1.002, 1.083)
TG	0.4990	25.2471	< 0.0001	1.647 (1.356, 2.001)
ALT	0.0197	41.1115	< 0.0001	1.020 (1.014, 1.026)
AST	−0.0175	8.8296	0.0030	0.983 (0.971, 0.997)
RBC	0.1757	4.5543	0.0328	1.192 (1.014, 1.401)
Cr	−0.00852	10.6270	0.0011	0.992 (0.986, 0.997)
UA	0.00347	68.5765	< 0.0001	1.003 (1.003, 1.004)
FBG	0.0717	7.1301	0.0076	1.074 (1.019, 1.132)
HDL-C	−1.1027	54.9475	< 0.0001	0.332 (0.248, 0.444)
DBP	0.0123	14.8173	0.0001	1.012 (1.006, 1.019)
BMI	0.1995	133.4437	< 0.0001	1.221 (1.180, 1.263)
PCT	1.8978	9.1519	0.0025	6.671 (1.951, 22.815)
WHR	3.9170	18.4845	< 0.0001	50.250 (8.426, 299.669)
WC	0.0294	12.6515	0.0004	1.030 (1.013, 1.047)
VAI	−0.2286	11.8039	0.0006	0.796 (0.698, 0.906)
Non-HDL-C	0.1409	14.5095	0.0001	1.151 (1.071, 1.238)

For lightGBM, we first constructed a model with all the variables (10-fold cross-validation) to determine the importance of the variables according to the interpretability of the model. Subsequently, we selected 20 most important variables including age, WBC, TG, ALT, AST, RBC, Cr, UA, MCV, FPG, PLT, LDLC, HDL-C, weight, BMI, WHR, WC, VAI, ALT_AST, and non-HDL-C to construct a new model (10-fold cross-validation). The best hyperparameters were chosen based on the grid search. The importance of the variables selected by lightGBM is BMI (72.3), UA (59.8), TG (50.9), AGE (49.2), WC (41.4), ALT/AST (38.0), FPG (37.5), Cr (37.3), VAI (35.6) and HDL-C (32.5), respectively. These variables are ranked by the important score, numbers in brackets, which represents the times of the model-chosen variables as split node. The bigger the means, the more important the variable.

The overall comparison of the six models is shown in Table 3. The logistic regression model and linear discriminant analysis model were used to predict the NAFLD-free participants with the highest specificity of approximately 0.97, but its sensitivity was lower than 0.26. The predictive powers of SVM, random forest, and lightGBM were close to each other. They all had similar sensitivity and specificity, which was higher than 0.70. However, the lightGBM model could acutely predict the NAFLD participants. It had higher sensitivity and acceptable decreased specificity. Thus, the AUC (0.790) was better than the other models.

Table 3. Comparison of different prediction models

Model	Sensitivity	Specificity	AUC
Logistic regression	0.222	0.977	0.600
Linear discriminant analysis	0.251	0.971	0.611
Random forest	0.807	0.747	0.777
K-nearest neighbors	0.210	0.948	0.579
SVM	0.752	0.756	0.754
LightGBM	0.860	0.720	0.790

A previous study has reported that the cumulative incidence of NAFLD in 5 years in Hong Kong was 13.5%^[6], and a cohort study found that the 2-year incidence of NAFLD was 13.27%^[7]. In our study, 1,883 (12.27%) participants developed NAFLD after the 1-year follow-up. The incidence in our study population was higher than that in the other studies.

In this paper, we described the baseline physical indicator differences between healthy individuals and individuals diagnosed with NAFLD. Subsequently, we developed an efficient model to establish the pre-diagnosis of NAFLD. The results of the predictive performance in Table 4 showed that the lightGBM model could be used to screen undiagnosed NAFLD individuals considering its high sensitivity (0.860) and specificity (0.720). For the lightGBM model, the AUC value was 0.790 after a 10-fold cross-validation. A previous study used a classification tree to predict the NAFLD, which was 0.780 and 0.750 in the training and testing dataset, respectively^[8]. Although the two models were appropriate for different populations, our model comprised more variables and had a higher AUC value. The variables in our model were easy to determine.

We calculated the importance of each variable by lightGBM. The top 10 variables were BMI, UA, TG, AGE, WC, ALT/AST, FPG, Cr, VAI, and HDL-C. The importance score of BMI was 72.3, accounting for the biggest portion of the lightGBM model. The result is similar to those of previous studies, reporting that BMI was the most useful predictive factor of NAFLD onset^[9]. We analyzed the association between UA level and the development of NAFLD and found that an elevated UA level was a risk factor for NAFLD. As it is known, both elevated TG levels and abdominal obesity play an important role in the pathogenesis of NAFLD. The lightGBM model comprised variables TG, WC, and VAI, which is consistent with the findings of previous studies^[10]. In the lightGBM model, the importance score of Cr was 37.3, which is similar to that of FPG. The Cr in the blood is one of the indicators of renal function, which was closely related to the body’s total muscle mass and was not susceptible to diet. Moreover, serum Cr and age can be used to calculate the glomerular filtration rate to diagnose chronic kidney disease. A previous study has shown that NAFLD was associated with chronic kidney disease ^[11]; hence, the serum Cr may initiate NAFLD by affecting kidney function.

The lightGBM model has good predictive ability and could directly estimate the risk of NAFLD in patients undergoing physical examination. We use this method to develop the model for the following considerations. First, lightGBM is an ensemble model, which combines several weak classifiers to perform prediction. Hence, it is significantly better than simple classifiers such as logistic regression or decision tree. Second, lightGBM is one of the gradient boosting algorithms combing boosting and gradient descent, which learns residual while training. Thus, each classifier reduces the previous residual, making each classifier different. Due to the diversity of the classifiers, lightGBM can reduce the risk of overfitting. Third, the model has several parameters; therefore, we can easily tune these parameters to apply bagging, early-stopping, or boosting policy to achieve a better result, while logistic regression does not have this advantage. Moreover, by exploiting the nonlinear structure of data, tree-based model lightGBM can select variables with nonlinear dependence; however, linear models such as logistic regression do not possess such property.

In the future, we will focus on the application of the model and develop an online prediction system for healthy individuals to use.

The authors declare that they have no competing interests.

This study was approved by the Beijing Physical Examination Center Institutional Review Board.

参考文献 (11)

[1]	Xinghao Yi, Haoxue Zhu, Mengyu He, Shan Gao, Ming Li. Decoding Links Between Gut Microbiota and Metabolic-associated Fatty Liver Disease: Meta-Analysis and Mediation Study Uncover Species-Specific Taxa and a Novel Bile Acid Mediator . Biomedical and Environmental Sciences, 2025, 38(): 1-12. doi: 10.3967/bes2025.162
[2]	Xiongfeng Pan, Cailian Wei, Jiayou Luo, Junxia Yan, Xiang Xiao, Jie Wang, Yan Zhong, Miyang Luo. Exploration of New Susceptible Genes associated with Non-Alcoholic Fatty Liver Disease among Children with Obesity Using Whole Exome Sequencing . Biomedical and Environmental Sciences, 2025, 38(6): 727-739. doi: 10.3967/bes2025.045
[3]	Weihua Cao, Yaqin Zhang, Ziyu Zhang, Xinxin Li, Wen Deng, Shiyu Wang, Xin Wei, Linmei Yao, Zixuan Gao, Shuojie Wang, Lu Zhang, Yao Lu, Ruyu Liu, Shuling Wu, Yuanjiao Gao, Hongxiao Hao, Yao Xie, Minghui Li. Lipid-related Insulin Resistance Indices for the Diagnosis of Non-alcoholic Fatty Liver Disease in Chinese Adults . Biomedical and Environmental Sciences, 2025, 38(): 1-12. doi: 10.3967/bes2025.161
[4]	Lei Li, Lin Luo, Junnai Wang, Ying Hong, Jianfang Geng. Multivariate Analysis of Female Stress Urinary Incontinence and Establishment of a Prediction Model . Biomedical and Environmental Sciences, 2024, 37(8): 931-935. doi: 10.3967/bes2024.114
[5]	Yamei Li, Xiang Xiao, Jie Wang, Yixu Liu, Xiongfeng Pan, Haibin Yu, Jiayou Luo, Miyang Luo. Genetic Variations and Nonalcoholic Fatty Liver Disease: Field Synopsis, Systematic Meta-Analysis, and Epidemiological Evidence . Biomedical and Environmental Sciences, 2024, 37(7): 762-773. doi: 10.3967/bes2024.079
[6]	ZHAO Hui, QIU Xia, LI Hua Zi, CUI Jia Jia, SUN Yong Ye. Association between Serum Uric Acid to HDL-Cholesterol Ratio and Nonalcoholic Fatty Liver Disease Risk among Chinese Adults . Biomedical and Environmental Sciences, 2023, 36(1): 1-9. doi: 10.3967/bes2022.111
[7]	LIU Can, ZHOU Shi Hui, SU Hong, YANG Wen Qin, LU Jiao. An Artificial Neural Network Model Combined with Dietary Retinol Intake from Different Sources to Predict the Risk of Nonalcoholic Fatty Liver Disease . Biomedical and Environmental Sciences, 2023, 36(12): 1123-1135. doi: 10.3967/bes2023.120
[8]	LIU Li, SHAO Yu Han, FENG En Qiang, SHAO Zhu Gang, XING Dong Ming. Risk of Developing Non-Alcoholic Fatty Liver Disease Over Time in a Cohort of the Elderly in Qingdao, China . Biomedical and Environmental Sciences, 2023, 36(8): 760-767. doi: 10.3967/bes2023.101
[9]	JI Tian Jiao, CHENG Qiang, ZHANG Yong, ZENG Han Ri, WANG Jian Xing, YANG Guan Yu, XU Wen Bo, LIU Hong Tu. A Novel Early Warning Model for Hand, Foot and Mouth Disease Prediction Based on a Graph Convolutional Network . Biomedical and Environmental Sciences, 2022, 35(6): 494-503. doi: 10.3967/bes2022.065
[10]	WEI Ran, QI Hong Yan, LIN Lin, ZHU Yuan Yue, ZHANG Yi, ZHANG Jie, WU Xue Yan, HU Chun Yan, WANG Shuang Yuan, LIN Hong, XU Yu, XU Min, BI Yu Fang, WANG Wei Qing, LU Jie Li, NING Guang, CHEN Yu Hong. Association of Early-Life Famine Exposure with Metabolic Dysfunction-Associated Fatty Liver Disease and Fibrosis in Adulthood . Biomedical and Environmental Sciences, 2022, 35(6): 558-562. doi: 10.3967/bes2022.075
[11]	YE Cheng Yin, LI Jin Mei, WU Jing Hua, LI Zheng, XIAO Juan, YIN Xiao Yu, WANG Da Hui, XU Guo Zhang, YANG Lei. A Risk Prediction Model for Ischemic Stroke in Southern Chinese Population: Impact of Multiple Genetic Variants and Clinical/Lifestyle Factors . Biomedical and Environmental Sciences, 2021, 34(8): 641-645. doi: 10.3967/bes2021.089
[12]	ZHU Wen, DENG Chan Juan, XUAN Li Ping, DAI Hua Jie, ZHAO Zhi Yun, WANG Tian Ge, LI Mian, LU Jie Li, XU Yu, CHEN Yu Hong, WANG Wei Qing, BI Yu Fang, XU Min. Peripheral Artery Disease and Risk of Fibrosis Deterioration in Nonalcoholic Fatty Liver Disease: A Prospective Investigation . Biomedical and Environmental Sciences, 2020, 33(4): 217-226. doi: 10.3967/bes2020.031
[13]	SUN Xiao Ya, HE Zhi Qiang, WANG Li Qun, WANG Zhi Zhong, BAI Fei Hu, YOU Yan Jie. Interaction Effects of Gene Polymorphisms and Obesity on Nonalcoholic Fatty Liver Disease . Biomedical and Environmental Sciences, 2019, 32(10): 793-796. doi: 10.3967/bes2019.100
[14]	ZHAO Zhong Yao, LIU Di, CAO Wei Jie, SUN Ming, SONG Man Shu, WANG Wei, WANG You Xin. Association between IgG N-glycans and Nonalcoholic Fatty Liver Disease in Han Chinese . Biomedical and Environmental Sciences, 2018, 31(6): 454-458. doi: 10.3967/bes2018.059
[15]	ZHANG Mei, JIANG Yong, WANG LiMin, LI YiChong, HUANG ZhengJing, LI JianHong, ZHOU MaiGeng, ZHAO WenHua. Prediction of 10-year Atherosclerotic Cardiovascular Disease Risk among Adults Aged 40-79 Years in China: a Nationally Representative Survey . Biomedical and Environmental Sciences, 2017, 30(4): 244-254. doi: 10.3967/bes2017.034
[16]	BAI Juan, ZHU Ying, DONG Ying. Bitter Melon Powder Protects against Obesity-associated Fatty Liver Disease by Improving Colonic Microenvironment in Rats with High-fat Diet-induced Obesity . Biomedical and Environmental Sciences, 2017, 30(8): 611-615. doi: 10.3967/bes2017.081
[17]	GUO Xiang, YANG Xing Fen, YANG Ying, HANS Raabe, CAI Jing Heng, XUE Jin Yu, TAN Xiao Hua, XIE Xiao Ping, XIONG Xi Kun, HUANG Jun Ming. Prediction of Ocular Irritancy of 26 Chemicals and 26 Cosmetic Products with Isolated Rabbit Eye (IRE) Test . Biomedical and Environmental Sciences, 2012, 25(3): 359-366. doi: 10.3967/0895-3988.2012.03.015
[18]	SHI Lei, LIU Zhong Wu, LI Yun, GONG Cai, ZHANG Huan, SONG Li Juan, HUANG Cheng Yu, LI Ming. The Prevalence of Nonalcoholic Fatty Liver Disease and its Association with Lifestyle/dietary Habits among University Faculty and Staff in Chengdu . Biomedical and Environmental Sciences, 2012, 25(4): 383-391. doi: 10.3967/0895-3988.2012.04.002
[19]	MENG LingHui, LUO Na, MI Jie. Impacts of Types and Degree of Obesity on Non-Alcoholic Fatty Liver Disease and Related Dyslipidemia in Chinese School-Age Children? . Biomedical and Environmental Sciences, 2011, 24(1): 22-30. doi: 10.3967/0895-3988.2011.01.003
[20]	HE-SHAN GE, YAN-FENG HONG. A Prediction Model of MF Radiation in Environmental Assessment . Biomedical and Environmental Sciences, 2006, 19(5): 405-408.

doi: 10.3967/bes2020.116

A Self-test Prediction Model to Determine the Probability Risk of Non-alcoholic Fatty Liver Disease

Author Bio:
LI Qiang, male, born in 1981, majoring in health management and promotion

SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

Corresponding author: LIU Feng, Tel/Fax: 86-10-8391-1779, E-mail: liufeng@bjtjzx.com; YANG Xing Hua, E-mail: xinghuay@sina.com

计量

A Self-test Prediction Model to Determine the Probability Risk of Non-alcoholic Fatty Liver Disease

doi: 10.3967/bes2020.116

作者简介:
LI Qiang, male, born in 1981, majoring in health management and promotion

SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

通讯作者: LIU Feng, Tel/Fax: 86-10-8391-1779, E-mail: liufeng@bjtjzx.com; YANG Xing Hua, E-mail: xinghuay@sina.com

English Abstract

A Self-test Prediction Model to Determine the Probability Risk of Non-alcoholic Fatty Liver Disease

Author Bio:
LI Qiang, male, born in 1981, majoring in health management and promotion

SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

Corresponding author: LIU Feng, Tel/Fax: 86-10-8391-1779, E-mail: liufeng@bjtjzx.com; YANG Xing Hua, E-mail: xinghuay@sina.com

全文HTML

目录

[1]	Brar G, Tsukamoto H. Alcoholic and non-alcoholic steatohepatitis: global perspective and emerging science. J Gastroenterol, 2019; 54, 218−25. doi: 10.1007/s00535-018-01542-w
[2]	Xu CN, Ma ZM, Wang YF, et al. Visceral adiposity index as a predictor of NAFLD: a prospective study with 4-year follow-up. Liver Int, 2018; 38, 2294−300. doi: 10.1111/liv.13941
[3]	Ito T, Ishigami M, Ishizu Y, et al. Utility and limitations of noninvasive fibrosis markers for predicting prognosis in biopsy-proven Japanese non-alcoholic fatty liver disease patients. J Gastroenterol Hepatol, 2019; 34, 207−14. doi: 10.1111/jgh.14448
[4]	Dam-Larsen S, Franzmann M, Andersen IB, et al. Long term prognosis of fatty liver: risk of chronic liver disease and death. Gut, 2004; 53, 750−5. doi: 10.1136/gut.2003.019984
[5]	Fan JG. Guidelines for management of nonalcoholic fatty liver disease: an updated and revised edition. Chin J Hepatol, 2010; 18, 163−6.
[6]	Wong VWS, Wong GLH, Yeung DKW, et al. Incidence of non-alcoholic fatty liver disease in Hong Kong: a population study with paired proton-magnetic resonance spectroscopy. J Hepatol, 2015; 62, 182−9. doi: 10.1016/j.jhep.2014.08.041
[7]	Ma ZM, Xu CN, Kang XP, et al. Changing trajectories of serum uric acid and risk of non-alcoholic fatty liver disease: a prospective cohort study. J Trans Med, 2020; 18, 133. doi: 10.1186/s12967-020-02296-x
[8]	Birjandi M, Ayatollahi SMT, Pourahmad S, et al. Prediction and diagnosis of non-alcoholic fatty liver disease (NAFLD) and identification of its associated factors using the classification tree method. Iran Red Crescent Med J, 2016; 18, e32858.
[9]	Zhou YJ, Zheng JN, Liu WY, et al. The NAFL risk score: a simple scoring model to predict 4-y risk for non-alcoholic fatty liver. Clin Chim Acta, 2017; 468, 17−24. doi: 10.1016/j.cca.2017.01.021
[10]	Motamed N, Sohrabi M, Ajdarkosh H, et al. Fatty liver index vs. waist circumference for predicting non-alcoholic fatty liver disease. World J Gastroenterol, 2016; 22, 3023−30. doi: 10.3748/wjg.v22.i10.3023
[11]	Mantovani A, Zaza G, Byrne CD, et al. Nonalcoholic fatty liver disease increases risk of incident chronic kidney disease: a systematic review and meta-analysis. Metabolism, 2018; 79, 64−76. doi: 10.1016/j.metabol.2017.11.003

doi: 10.3967/bes2020.116

A Self-test Prediction Model to Determine the Probability Risk of Non-alcoholic Fatty Liver Disease

Author Bio: LI Qiang, male, born in 1981, majoring in health management and promotion SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

Corresponding author: LIU Feng, Tel/Fax: 86-10-8391-1779, E-mail: liufeng@bjtjzx.com; YANG Xing Hua, E-mail: xinghuay@sina.com

计量

出版历程

A Self-test Prediction Model to Determine the Probability Risk of Non-alcoholic Fatty Liver Disease

doi: 10.3967/bes2020.116

作者简介: LI Qiang, male, born in 1981, majoring in health management and promotion SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

通讯作者: LIU Feng, Tel/Fax: 86-10-8391-1779, E-mail: liufeng@bjtjzx.com; YANG Xing Hua, E-mail: xinghuay@sina.com

English Abstract

A Self-test Prediction Model to Determine the Probability Risk of Non-alcoholic Fatty Liver Disease

Author Bio: LI Qiang, male, born in 1981, majoring in health management and promotion SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

Corresponding author: LIU Feng, Tel/Fax: 86-10-8391-1779, E-mail: liufeng@bjtjzx.com; YANG Xing Hua, E-mail: xinghuay@sina.com

全文HTML

目录

Author Bio:
LI Qiang, male, born in 1981, majoring in health management and promotion

SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

作者简介:
LI Qiang, male, born in 1981, majoring in health management and promotion

SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning

Author Bio:
LI Qiang, male, born in 1981, majoring in health management and promotion

SHEN Qun Lun, male, born in 1996, master, majoring in boinformatics/machine learning