Predicting the Risk of Arterial Stiffness in Coal Miners Based on Different Machine Learning Models

CHEN Qian Wei HUANG Xue Zan DING Yu ZHU Feng Ren WANG Jia ZOU Yuan Jie DU Yuan Zhen ZHANG Ya Jun HUI Zi Wen ZHU Feng Lin MU Min

CHEN Qian Wei, HUANG Xue Zan, DING Yu, ZHU Feng Ren, WANG Jia, ZOU Yuan Jie, DU Yuan Zhen, ZHANG Ya Jun, HUI Zi Wen, ZHU Feng Lin, MU Min. Predicting the Risk of Arterial Stiffness in Coal Miners Based on Different Machine Learning Models[J]. Biomedical and Environmental Sciences, 2024, 37(1): 108-111. doi: 10.3967/bes2024.009
Citation: CHEN Qian Wei, HUANG Xue Zan, DING Yu, ZHU Feng Ren, WANG Jia, ZOU Yuan Jie, DU Yuan Zhen, ZHANG Ya Jun, HUI Zi Wen, ZHU Feng Lin, MU Min. Predicting the Risk of Arterial Stiffness in Coal Miners Based on Different Machine Learning Models[J]. Biomedical and Environmental Sciences, 2024, 37(1): 108-111. doi: 10.3967/bes2024.009

doi: 10.3967/bes2024.009

Predicting the Risk of Arterial Stiffness in Coal Miners Based on Different Machine Learning Models

Funds: The Project was supported by the Open Research Grant of the Joint National-Local Engineering Research Centre for Safe and Precise Coal Mining (EC2021008) and Collaborative Innovation Project of Colleges and Universities of Anhui Province (GXXT-2022-065).
More Information
    Author Bio:

    CHEN Qian Wei, male, born in 1995, Master, Teaching Assistant, majoring in occupational health

    HUANG Xue Zan, male, born in 1997, Master Candidate, majoring in occupational epidemiology

    Corresponding author: ZHU Feng Lin, E-mail: zhufenglin428@163.com, Tel: 19155445669; MU Min, E-mail: candymu@126.com, Tel: 13655618753.
  • &These authors contributed equally to this work.
  • &These authors contributed equally to this work.
    注释:
  • Figure  1.  LASSO regression screening of machine learning model predictors: (A) The process of selecting the most suitable λ in the LASSO model. (B) LASSO coefficient curve of the variable.

    Figure  2.  AUCs of the five machine learning models: (A) training and (B) test dataset ROC curves.

    S1.  Feature importance scores.

    S1.   Baseline clinical characteristics of the two groups of coal miners

    Variables Non-arterial stiffness (n = 651) Arterial stiffness (n = 792) t-value (F/Z) P-value
    Age, year, M (P25, P75) 33 (29, 36) 36 (32, 44) −10.784 < 0.001
    Work age, year, M (P25, P75) 11.5 (7, 14) 13 (10, 21) −8.963 < 0.001
    BMI, kg/m2, M (P25, P75) 23.88 (21.67, 25.95) 24.16 (22.04, 26.57) −2.561 0.01
    Pulse, time/min, M (P25, P75) 72 (67, 80) 75 (68, 82) −3.352 0.001
    SBP, mmHg, M (P25, P75) 117 (109, 124) 124 (115, 133) −10.738 < 0.001
    DBP, mmHg, M (P25, P75) 84 (76, 89) 84 (76, 89) −11.07 < 0.001
    HbA1c, %, M (P25, P75) 4.58 (4.30, 4.99) 4.58 (4.28, 5.00) −0.121 0.903
    AMS, u/L, M (P25, P75) 51.00 (39.00, 66.00) 50.04 (39.00, 65.00) −1.26 0.208
    HCY, μmol/L, M (P25, P75) 6.32 (4.13, 8.25) 5.57 (3.62, 7.88) −2.284 0.022
    ηb 1, mPa.S, M (P25, P75) 23.24 (22.10, 24.31) 23.50 (22.31, 24.70) −2.959 0.003
    ηb5, mPa.S, M (P25, P75) 10.41 (10.12, 10.75) 10.46 (10.16, 10.83) −2.026 0.043
    ηb30, mPa.S, M (P25, P75) 5.79 (5.44, 6.12) 5.80 (5.48, 6.20) −1.757 0.079
    ηb200, mPa.S, M (P25, P75) 4.53 (4.25, 4.90) 4.60 (4.28, 501) −2.689 0.007
    PV, mPa.S, M (P25, P75) 1.43 (1.36, 1.52) 1.43 (1.36, 1.53) −0.65 0.516
    ESR, mm/h, M (P25, P75) 6.00 (5.00, 8.00) 6.00 (5.00, 8.00) −0.288 0.773
    HCT, L/L, M (P25, P75) 0.48 (0.46, 0.50) 0.49 (0.46, 0.51) −0.736 0.462
    HS, M (P25, P75) 3.26 (2.98, 3.50) 3.25 (2.96, 3.50) −1.204 0.229
    LS, M (P25, P75) 15.20 (13.40, 16.20) 14.72 (13.40, 16.20) −0.212 0.832
    ESR-K, M (P25, P75) 62.40 (56.10, 65.10) 62.30 (54.70, 65.10) −0.39 0.696
    AI, M (P25, P75) 5.24 (4.93, 5.62) 5.28 (5.01, 5.70) −2.088 0.037
    IR, M (P25, P75) 3.60 (3.40, 5.30) 3.55 (3.40, 5.30) −0.913 0.361
    TK, M (P25, P75) 0.85 (0.75, 1.02) 0.85 (0.73, 1.02) −0.311 0.311
    FPG, mmol/L, M (P25, P75) 5.13 (4.83, 5.47) 5.25 (4.92, 5.63) −4.657 < 0.001
    CO2CP, M (P25, P75) 24.00 (23.00, 25.10) 24.00 (23.00, 25.00) −2.001 0.045
    TCHO, mmol/L, M (P25, P75) 4.61 (4.04, 5.20) 4.80 (4.21, 5.32) −3.982 < 0.001
    TG, mmol/L, M (P25, P75) 1.14 (0.77, 1.67) 1.32 (0.88, 2.04) −4.405 < 0.001
    HDL, mmol/L, M (P25, P75) 1.54 (1.45, 1.58) 1.54 (1.43, 1.58) −0.366 0.714
    LDL, mmol/L, M (P25, P75) 2.59 (2.07, 3.11) 2.75 (2.21, 3.20) −3.173 0.002
    ApoA1, g/L, M (P25, P75) 1.25 (1.20, 1.40) 1.25 (1.20, 1.40) −1.241 0.215
    ApoB1, g/L, M (P25, P75) 1.00 (0.98, 1.02) 1.02 (0.99, 1.02) −2.402 0.016
    K, mmol/L, M (P25, P75) 4.30 (4.20, 4.80) 4.30 (4.20, 4.69) −0.378 0.706
    Na, mmol/L, M (P25, P75) 139.70 (139.20, 142.30) 139.50 (139.25, 142.15) −1.375 0.169
    Cl, mmol/L, M (P25, P75) 100.54 (98.40, 102.50) 100.25 (98.30, 102.40) −3.473 < 0.001
    Ca, mmol/L, M (P25, P75) 1.25 (1.24, 1.26) 1.25 (1.24, 1.26) −0.491 0.624
    Ca-free, mmol/L, M (P25, P75) 2.41 (2.30, 2.55) 2.45 (2.30, 2.54) −0.255 0.799
    CK, U/L, M (P25, P75) 137.00 (116.00, 162.00) 134.00 (112.00, 161.05) −1.216 0.224
    CKISO, U/L, M (P25, P75) 16.00 (12.40, 21.00) 16.00 (12.30, 21.00) −0.637 0.524
    α-HBDH, U/L, M (P25, P75) 159.00 (132.00, 185.00) 157.00 (132.00, 184.20) −0.811 0.417
    AST, U/L, M (P25, P75) 29.00 (24.00, 35.00) 31.00 (26.00, 36.00) −3.711 < 0.001
    LDH, U/L, M (P25, P75) 163.00 (144.60, 180.00) 163.60 (145.45, 184.10) −0.791 0.429
    APTT, s, M (P25, P75) 28.00 (25.00, 29.45) 27.90 (26.00, 29.60) −0.905 0.366
    Fibrinogen, g/L, M (P25, P75) 3.25 (3.14, 3.51) 3.25 (3.12, 3.52) −0.187 0.852
    TT, s, M (P25, P75) 14.15 (12.80, 15.05) 13.80 (12.70, 14.70) −1.158 0.247
    FVC/per, %, mean ± SD 80.76 ± 11.35 79.98 ± 11.65 1.265 0.206
    FEV1/per, %, mean ± SD 80.52 ± 10.39 81.18 ± 10.84 −1.17 0.242
    FEV1/FVC, %, mean ± SD 93.30 ± 6.27 93.01 ± 6.26 868 0.385
    coal dust, mg/m3-years, M (P25, P75) 18.99 (8.61, 29.05) 25.90 (14.54, 40.21) −5.982 < 0.001
    CO, mg/m3-years, M (P25, P75) 17.96 (10.27, 28.56) 25.09 (14.53, 40.14) −6.132 < 0.001
    CO2, mg/m3-years, M (P25, P75) 11775.32 (6559.46, 16705.80) 14698.84 (9028.04, 22981.75) −6.173 < 0.001
    NO, mg/m3-years, M (P25, P75) 0.12 (0.06, 0.18) 0.16 (0.10, 0.29) −6.565 < 0.001
    NO2, mg/m3-years, M (P25, P75) 0.22 (0.11, 0.38) 0.29 (0.17, 0.59) −5.579 < 0.001
    PAH, mg/m3-years, M (P25, P75) 0.67 (0.33, 1.08) 0.90 (0.50, 1.54) −5.516 < 0.001
      Note. HbA1c, glycosylated hemoglobin; AMS, serum amylase; HCY, homocysteine; ηb1, whole blood viscosity 1; ηb5, whole blood viscosity 5; ηb30, whole blood viscosity 30; ηb200, whole blood viscosity 200; PV, plasma specific viscosity; ESR, erythrocyte sedimentation rate; HCT, hematocrit; HS, whole blood high shear rate; LS, whole blood low shear rate; ESR-K, equation K value of erythrocyte sedimentation rate; AI, red blood cell aggregation index; IR, the index of rigidity of erythrocyte; TK, thymidine kinase; FPG, fasting plasma glucose; CO2CP, carbon dioxide-combining power; TCHO, total cholesterol; TG, triglycerides; HDL, high density lipoprotein; LDL, low density lipoprotein; ApoA1, apolipoprotein A1; ApoB1, apolipoprotein B1; Ca-free, free calcium; CK, creatine kinase; CKISO, creatine kinase isoenzyme; α-HBDH, alpha-hydroxybutyric dehydrogenase; AST, aspartate amino transferase; LDH, lactic dehydrogenase; APTT, activated partial thromboplastin time; TT, thrombin time; FVC/per, predicted percentage of forced vital capacity; FEV1/per, percentage of predicted forced expiratory volume in the first second; FEV1/FVC, ratio of forced expiratory volume in the first second to forced vital capacity; PAH, polycyclic aromatic hydrocarbons.
    下载: 导出CSV

    Table  1.   Efficacy results for the five ML models

    Evaluation indicators Training dataset (70%) Test dataset (30%)
    RF XGboost LR BP CART RF XGboost LR BP CART
    Accuracy (%) 80.4 74.4 66.1 69.1 68.8 83.6 69.4 66.7 67.3 70.3
    Sensitivity (%) 77.4 71.4 60.7 64.6 68.0 80.2 66.5 63.6 63.6 71.0
    Specificity (%) 82.9 76.8 70.6 72.8 69.5 86.3 71.7 69.2 70.2 69.9
    Positive predictive value (%) 79.1 72.1 63.3 66.6 65.1 82.2 65.0 62.0 62.8 65.0
    Negative predictive value (%) 81.4 76.3 68.2 71.1 72.2 84.7 73.0 70.6 71.0 75.3
    F1 value 0.783 0.718 0.620 0.656 0.665 0.812 0.657 0.628 0.632 0.678
    AUC (95% CI) 0.888 (0.874, 0.902) 0.828 (0.813, 0.842) 0.723 (0.701, 0.744) 0.761 (0.741, 0.781) 0.714 (0.692, 0.736 0.893 (0.872, 0.914 0.770 (0.740, 0.801 0.736 (0.703, 0.768) 0.733 (0.600, 0.765) 0.723 (0.690, 0.756)
      Note. RF, Random Forest; XG-Boos, Extreme Gradient Augmentation; LR, Logistic Classification; BP, Back Propagation; CART, Classification and Regression Tree
    下载: 导出CSV
  • [1] Dong J, Peng LC, Yang XH, et al. XGBoost-based intelligence yield prediction and reaction factors analysis of amination reaction. J Comput Chem, 2022; 43, 289−302. doi:  10.1002/jcc.26791
    [2] Ambale-Venkatesh B, Yang XY, Wu CO, et al. Cardiovascular event prediction by machine learning: the multi-ethnic study of atherosclerosis. Circ Res, 2017; 121, 1092−101. doi:  10.1161/CIRCRESAHA.117.311312
    [3] Tomiyama H, Matsumoto C, Shiina K, et al. Brachial-ankle PWV: current status and future directions as a useful marker in the management of cardiovascular disease and/or cardiovascular risk factors. J Atheroscler Thromb, 2016; 23, 128−46. doi:  10.5551/jat.32979
    [4] Safaei M, Sundararajan EA, Driss M, et al. A systematic literature review on obesity: understanding the causes & consequences of obesity and reviewing various machine learning approaches used to predict obesity. Comput Biol Med, 2021; 136, 104754. doi:  10.1016/j.compbiomed.2021.104754
    [5] Zheng Y, Liang LR, Qin TB, et al. Cross-section analysis of coal workers' pneumoconiosis and higher brachial-ankle pulse wave velocity within Kailuan study. BMC Public Health, 2017; 17, 148. doi:  10.1186/s12889-017-4048-7
    [6] Oliveira AC, Cunha PMGM, de Oliveria Vitorino PV, et al. Vascular aging and arterial stiffness. Arq Bras Cardiol, 2022; 119, 604−15. doi:  10.36660/abc.20210708
    [7] Böhm M, Reil JC, Deedwania P, et al. Resting heart rate: risk indicator and emerging risk factor in cardiovascular disease. Am J Med, 2015; 128, 219−28. doi:  10.1016/j.amjmed.2014.09.016
    [8] Zhang J, Song FQ, Zheng GH, et al. Relationship between carbon dioxide combining power and the short-term prognosis in acute ischemic stroke patients after thrombolysis. Chin Crit Care Med, 2022; 34, 529−32. (In Chinese
    [9] Lowe GDO, Harris K, Koenig W, et al. Plasma viscosity, immunoglobulins and risk of cardiovascular disease and mortality: new data and meta-analyses. J Clin Pathol, 2023.
    [10] Zhan CJ, Zheng YF, Zhang HJ, et al. Random-forest-bagging broad learning system with applications for COVID-19 pandemic. IEEE Internet Things J, 2021; 8, 15906−18. doi:  10.1109/JIOT.2021.3066575
  • [1] Li Nani, Qiu Xiaoting, Xue Jingsong, Yi Limu, Chen Mulan, Huang Zhijian.  Predicting the Prognosis and Immunotherapeutic Response of Triple-Negative Breast Cancer by Constructing a Prognostic Model Based on CD8+ T Cell-Related Immune Genes . Biomedical and Environmental Sciences, 2024, 37(6): 581-593. doi: 10.3967/bes2024.065
    [2] Dongdong Zhang, Weiling Chen, Minqi Gu, Xi Li, Yuying Wu, Xueru Fu, Ping Tang, Fulan Hu, Jing Li, Xizhuo Sun, Dongsheng Hu, Ming Zhang.  Sex Disparities in the Association of Blood Pressure Parameters and Arterial Sclerosis Risk . Biomedical and Environmental Sciences, 2024, 37(7): 795-799. doi: 10.3967/bes2024.093
    [3] LIU Bing, XIE Jia Yi.  The Value of Prealbumin and its Combination with NT-proBNP for Predicting in-hospital Mortality in Patients with Heart Failure: Real-World Research Based on Propensity Score Matching . Biomedical and Environmental Sciences, 2023, 36(11): 1090-1094. doi: 10.3967/bes2023.139
    [4] ZHANG Xiao Li, YU Sheng Nan, QU Ruo Di, ZHAO Qiu Yi, PAN Wei Zhe, CHEN Xu Shen, ZHANG Qian, LIU Yan, LI Jia, GAO Yi, LYU Yi, YAN Xiao Yan, LI Ben, REN Xue Feng, QIU Yu Lan.  Mechanism of Learning and Memory Impairment in Rats Exposed to Arsenic and/or Fluoride Based on Microbiome and Metabolome . Biomedical and Environmental Sciences, 2023, 36(3): 253-268. doi: 10.3967/bes2023.028
    [5] WANG Xiao Ping, LI Ze Yan, ZHANG Meng, LIU Hong Yong.  Machine-learning-assisted Investigation into the Relationship between the Built Environment, Behavior, and Physical Health of the Elderly in China . Biomedical and Environmental Sciences, 2023, 36(10): 987-990. doi: 10.3967/bes2023.125
    [6] LIU Shuai, LIU Fang Chao, LI Jian Xin, HUANG Ke Yong, YANG Xue Li, CHEN Ji Chun, CAO Jie, CHEN Shu Feng, HUANG Jian Feng, SHEN Chong, LU Xiang Feng, GU Dong Feng.  Association between Fruit and Vegetable Intake and Arterial Stiffness: The China-PAR Project . Biomedical and Environmental Sciences, 2023, 36(12): 1113-1122. doi: 10.3967/bes2023.143
    [7] ZHENG Zhi Chang, YUAN Wei, WANG Nian, JIANG Bo, MA Chun Peng, AI Hui, WANG Xiao, NIE Shao Ping.  Exploring the Feasibility of Machine Learning to Predict Risk Stratification Within 3 Months in Chest Pain Patients with Suspected NSTE-ACS . Biomedical and Environmental Sciences, 2023, 36(7): 625-634. doi: 10.3967/bes2023.089
    [8] DING Zhong Ao, ZHANG Li Ying, LI Rui Ying, NIU Miao Miao, ZHAO Bo, DONG Xiao Kang, LIU Xiao Tian, HOU Jian, MAO Zhen Xing, WANG Chong Jian.  Contribution of Ambient Air Pollution on Risk Assessment of Type 2 Diabetes Mellitus via Explainable Machine Learning . Biomedical and Environmental Sciences, 2023, 36(6): 557-560. doi: 10.3967/bes2023.069
    [9] Abolfazl Zendehdel, Saeidreza Jamalimoghadamsiahkal, Maedeh Arshadi, Forough Godarzi, Shokouh SHahrousvand, Hamidreza Hekmat, Ehsan Sekhavatimoghadam, Seyedeh Zahra Badrkhahan, Mina Riahi, Isa Akbarzadeh, Mohammad Bidkhori.  Survival Analysis of COVID-19 Patients Based on Different Levels of D-dimer and Coagulation Factors . Biomedical and Environmental Sciences, 2022, 35(10): 957-961. doi: 10.3967/bes2022.122
    [10] WU Jie Wen, JIAO Xiao Kang, DU Xin Hui, JIAO Zeng Tao, LIANG Zuo Ru, PANG Ming Fan, JI Han Ran, CHENG Zhi Da, CAI Kang Ning, QI Xiao Peng.  Assessment of the Benefits of Targeted Interventions for Pandemic Control in China Based on Machine Learning Method and Web Service for COVID-19 Policy Simulation . Biomedical and Environmental Sciences, 2022, 35(5): 412-418. doi: 10.3967/bes2022.057
    [11] LI Ya Mei, ZOU Zhi Yong, MA Ying Hua, LUO Jia You, JING Jin, ZHANG Xin, LUO Chun Yan, WANG Hong, ZHAO Hai Ping, PAN De Hong, LUO Mi Yang.  Predicting Metabolic Syndrome Using Anthropometric Indices among Chinese Adolescents with Different Nutritional Status: A Multicenter Cross-sectional Study . Biomedical and Environmental Sciences, 2021, 34(9): 673-682. doi: 10.3967/bes2021.095
    [12] ZHANG Wen Ping, ZHANG Yi Fan, ZHANG Ying Ying, HAN Zhi Chao, GAO Yuan Yuan, GUO Jian Yong, SHI Xiu Jing, HU Xiao Qin, MU Li Na, ZHOU Yun, LEI Li Jian.  Peripheral Blood Mitochondrial DNA Copy Number and Hypertension Combined with Albuminuria in Chinese Coal Miners . Biomedical and Environmental Sciences, 2021, 34(7): 567-571. doi: 10.3967/bes2021.078
    [13] XU Xiao Gang, WANG Huan, WANG San Ying, ZHANG Jing, SUN Chuan, WEI Ming Xiang, YAN Jing, MAO Gen Xiang.  Predictive Analysis of Susceptibility of Different Species to SARS-CoV-2 based on ACE2 Receptors . Biomedical and Environmental Sciences, 2020, 33(11): 877-881. doi: 10.3967/bes2020.120
    [14] LI Jing, LI Bao Yu, WEI Zi Jian, ZHAO Yu Zhu, LI Tan Shi.  Application Research on Gated Recurrent Unit Deep Learning Prediction and Graded Early Warning of Emergency Department Visits Based on Meteorological Environmental Data . Biomedical and Environmental Sciences, 2020, 33(10): 817-820. doi: 10.3967/bes2020.111
    [15] JIN Yan, FAN Jing Guang, PANG Jing, WEN Ke, ZHANG Pei Ying, WANG Huan Qiang, LI Tao.  Risk of Active Pulmonary Tuberculosis among Patients with Coal Workers' Pneumoconiosis: A Case-control Study in China . Biomedical and Environmental Sciences, 2018, 31(6): 448-453. doi: 10.3967/bes2018.058
    [16] Zlatko Zimet, Marjan Bilban, Teja Fabjan, Kristina Suhadolc, Borut Poljšak, Joško Osredkar.  Lead Exposure and Oxidative Stress in Coal Miners . Biomedical and Environmental Sciences, 2017, 30(11): 841-845. doi: 10.3967/bes2017.113
    [17] ZHOU Ping Ping, LIU Zhao Ping, ZHANG Lei, LIU Ai Dong, SONG Yan, YONG Ling, LI Ning.  Methodology and Application for Health Risk Classification of Chemicals in Foods Based on Risk Matrix . Biomedical and Environmental Sciences, 2014, 27(11): 912-916. doi: 10.3967/bes2014.129
    [18] ZHOU Hai Cheng, LAI Ya Xin, SHAN Zhong Yan, JIA Wei Ping, YANG Wen Ying, LU Ju Ming, WENG Jian Ping, JI Li Nong, LIU Jie, TIAN Hao Ming, JI Qiu He, ZHU Da Long, CHEN Li, GUO Xiao Hui, ZHAO Zhi Gang, Li Qiang, ZHOU Zhi Guang, GE Jia Pu, SHAN Guang Liang.  Effectiveness of Different Waist Circumference Cut-off Values in Predicting Metabolic Syndrome Prevalence and Risk Factors in Adults in China . Biomedical and Environmental Sciences, 2014, 27(5): 325-334. doi: 10.3967/bes2014.057
    [19] SHEN Li, ZHANG Yan Ge, LIU Min, WU Liu Xin, QIANG Dong Chang, SUN Xue Lei, LIU Lin.  Increased Arterial Stiffness in Subjects with Pre-diabetes among Middle Aged Population in Beijing, China . Biomedical and Environmental Sciences, 2013, 26(9): 717-725. doi: 10.3967/0895-3988.2013.09.002
    [20] Bin Yao, HAN-RONG WU.  Risk Factors of Learning Disabilities in Chinese Children in Wuhan . Biomedical and Environmental Sciences, 2003, 16(4): 392-397.
  • 23217+Supplementary Materials.pdf
  • 加载中
图(3) / 表ll (2)
计量
  • 文章访问数:  163
  • HTML全文浏览量:  70
  • PDF下载量:  11
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-07-10
  • 录用日期:  2023-10-20
  • 网络出版日期:  2024-02-20
  • 刊出日期:  2024-01-20

Predicting the Risk of Arterial Stiffness in Coal Miners Based on Different Machine Learning Models

doi: 10.3967/bes2024.009
    基金项目:  The Project was supported by the Open Research Grant of the Joint National-Local Engineering Research Centre for Safe and Precise Coal Mining (EC2021008) and Collaborative Innovation Project of Colleges and Universities of Anhui Province (GXXT-2022-065).
    作者简介:

    CHEN Qian Wei, male, born in 1995, Master, Teaching Assistant, majoring in occupational health

    HUANG Xue Zan, male, born in 1997, Master Candidate, majoring in occupational epidemiology

    通讯作者: ZHU Feng Lin, E-mail: zhufenglin428@163.com, Tel: 19155445669; MU Min, E-mail: candymu@126.com, Tel: 13655618753.
注释:

English Abstract

CHEN Qian Wei, HUANG Xue Zan, DING Yu, ZHU Feng Ren, WANG Jia, ZOU Yuan Jie, DU Yuan Zhen, ZHANG Ya Jun, HUI Zi Wen, ZHU Feng Lin, MU Min. Predicting the Risk of Arterial Stiffness in Coal Miners Based on Different Machine Learning Models[J]. Biomedical and Environmental Sciences, 2024, 37(1): 108-111. doi: 10.3967/bes2024.009
Citation: CHEN Qian Wei, HUANG Xue Zan, DING Yu, ZHU Feng Ren, WANG Jia, ZOU Yuan Jie, DU Yuan Zhen, ZHANG Ya Jun, HUI Zi Wen, ZHU Feng Lin, MU Min. Predicting the Risk of Arterial Stiffness in Coal Miners Based on Different Machine Learning Models[J]. Biomedical and Environmental Sciences, 2024, 37(1): 108-111. doi: 10.3967/bes2024.009
  • Coal is one of the world’s main energy resources, accounting for approximately 68% of China’s current total power generation. However, several studies have demonstrated that dust, exhaust fumes, and other harmful factors in coal mines increase the risk of cardiovascular disease (CVD) among miners[1]. Arterial stiffness (AS) is an independent risk factor of CVD, and epidemiological studies have shown that AS plays a vital role in assessing the risk of CVD[2]. Currently, Pulse Wave Velocity (PWV) serves as the gold standard for assessing AS, and it is widely utilized in CVD screening for diagnosis[3]. Machine learning is an artificial intelligence technique that is widely used in disease diagnosis and prediction because it offers quick and accurate identification of risk factors and condition likelihoods[4]. Studies have shown that AS is associated with traditional CVD-related factors, such as blood pressure and lipids, as well as with coal dust and other harmful factors in coal mines[5]. Therefore, this study aimed to use these potential predictors to predict AS risk in coal miners using machine learning.

    This study collected data from 1,535 coal miners employed by a major coal mining company in Shaanxi Province, China. After excluding individuals who did not meet the criteria or whose relevant information was incomplete, data on 1,443 coal miners were collected for inclusion in our study. The investigators used a unified standard questionnaire to collect respondents’ information. Data on height, weight, body mass index (BMI), blood pressure, and blood lipids were collected using standard conventional methods. PWV was measured using the Vascular Profiler BP-203RPEIII system (Omron, Japan).

    R software version 4.2.2 was used for statistical analysis of the data and machine learning classification modeling. Count data were expressed as frequencies and percentages (%), and the χ2 test was used for comparison between groups; measurement data conforming to a normal distribution were expressed as (x ± s) and compared by T-test, while data not conforming to a normal distribution were expressed as M (P25, P75) and compared by the rank sum test. P < 0.05 was considered a statistically significant difference.

    To ensure the quality of the data, before data analysis, the data were pre-processed by deleting duplicates and outliers, and the Synthetic Minority Oversampling Technique (SMOTE) method was used to balance the data. We included factors with significant differences or thought to be associated with AS in a LASSO regression analysis. The variables that were finally included in the prediction model were determined according to the optimal λ value obtained. The dataset was randomly divided into training (70%) and test (30%) datasets. Five different ML models were used to analyze the data: Random Forest (RF), Extreme Gradient Augmentation (XG-Boost), Logistic Classification (LR), Back Propagation (BP), and Classification and Regression Tree (CART). The predictive performances of the five ML models were evaluated by comparing their accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 value, and area under the subject operating characteristic curve (AUC) on the dataset.

    A total of 1,443 eligible coal miners were included in this study. There were 651 cases (45.11%) in the non-AS group and 792 cases (54.89%) in the AS group. There were significant differences in age, BMI, pulse, systolic blood pressure (SBP), diastolic blood pressure (DBP), and other factors between the two groups (Supplementary Table S1, available in www.besjournal.com), all at P < 0.05.

    Table S1.  Baseline clinical characteristics of the two groups of coal miners

    Variables Non-arterial stiffness (n = 651) Arterial stiffness (n = 792) t-value (F/Z) P-value
    Age, year, M (P25, P75) 33 (29, 36) 36 (32, 44) −10.784 < 0.001
    Work age, year, M (P25, P75) 11.5 (7, 14) 13 (10, 21) −8.963 < 0.001
    BMI, kg/m2, M (P25, P75) 23.88 (21.67, 25.95) 24.16 (22.04, 26.57) −2.561 0.01
    Pulse, time/min, M (P25, P75) 72 (67, 80) 75 (68, 82) −3.352 0.001
    SBP, mmHg, M (P25, P75) 117 (109, 124) 124 (115, 133) −10.738 < 0.001
    DBP, mmHg, M (P25, P75) 84 (76, 89) 84 (76, 89) −11.07 < 0.001
    HbA1c, %, M (P25, P75) 4.58 (4.30, 4.99) 4.58 (4.28, 5.00) −0.121 0.903
    AMS, u/L, M (P25, P75) 51.00 (39.00, 66.00) 50.04 (39.00, 65.00) −1.26 0.208
    HCY, μmol/L, M (P25, P75) 6.32 (4.13, 8.25) 5.57 (3.62, 7.88) −2.284 0.022
    ηb 1, mPa.S, M (P25, P75) 23.24 (22.10, 24.31) 23.50 (22.31, 24.70) −2.959 0.003
    ηb5, mPa.S, M (P25, P75) 10.41 (10.12, 10.75) 10.46 (10.16, 10.83) −2.026 0.043
    ηb30, mPa.S, M (P25, P75) 5.79 (5.44, 6.12) 5.80 (5.48, 6.20) −1.757 0.079
    ηb200, mPa.S, M (P25, P75) 4.53 (4.25, 4.90) 4.60 (4.28, 501) −2.689 0.007
    PV, mPa.S, M (P25, P75) 1.43 (1.36, 1.52) 1.43 (1.36, 1.53) −0.65 0.516
    ESR, mm/h, M (P25, P75) 6.00 (5.00, 8.00) 6.00 (5.00, 8.00) −0.288 0.773
    HCT, L/L, M (P25, P75) 0.48 (0.46, 0.50) 0.49 (0.46, 0.51) −0.736 0.462
    HS, M (P25, P75) 3.26 (2.98, 3.50) 3.25 (2.96, 3.50) −1.204 0.229
    LS, M (P25, P75) 15.20 (13.40, 16.20) 14.72 (13.40, 16.20) −0.212 0.832
    ESR-K, M (P25, P75) 62.40 (56.10, 65.10) 62.30 (54.70, 65.10) −0.39 0.696
    AI, M (P25, P75) 5.24 (4.93, 5.62) 5.28 (5.01, 5.70) −2.088 0.037
    IR, M (P25, P75) 3.60 (3.40, 5.30) 3.55 (3.40, 5.30) −0.913 0.361
    TK, M (P25, P75) 0.85 (0.75, 1.02) 0.85 (0.73, 1.02) −0.311 0.311
    FPG, mmol/L, M (P25, P75) 5.13 (4.83, 5.47) 5.25 (4.92, 5.63) −4.657 < 0.001
    CO2CP, M (P25, P75) 24.00 (23.00, 25.10) 24.00 (23.00, 25.00) −2.001 0.045
    TCHO, mmol/L, M (P25, P75) 4.61 (4.04, 5.20) 4.80 (4.21, 5.32) −3.982 < 0.001
    TG, mmol/L, M (P25, P75) 1.14 (0.77, 1.67) 1.32 (0.88, 2.04) −4.405 < 0.001
    HDL, mmol/L, M (P25, P75) 1.54 (1.45, 1.58) 1.54 (1.43, 1.58) −0.366 0.714
    LDL, mmol/L, M (P25, P75) 2.59 (2.07, 3.11) 2.75 (2.21, 3.20) −3.173 0.002
    ApoA1, g/L, M (P25, P75) 1.25 (1.20, 1.40) 1.25 (1.20, 1.40) −1.241 0.215
    ApoB1, g/L, M (P25, P75) 1.00 (0.98, 1.02) 1.02 (0.99, 1.02) −2.402 0.016
    K, mmol/L, M (P25, P75) 4.30 (4.20, 4.80) 4.30 (4.20, 4.69) −0.378 0.706
    Na, mmol/L, M (P25, P75) 139.70 (139.20, 142.30) 139.50 (139.25, 142.15) −1.375 0.169
    Cl, mmol/L, M (P25, P75) 100.54 (98.40, 102.50) 100.25 (98.30, 102.40) −3.473 < 0.001
    Ca, mmol/L, M (P25, P75) 1.25 (1.24, 1.26) 1.25 (1.24, 1.26) −0.491 0.624
    Ca-free, mmol/L, M (P25, P75) 2.41 (2.30, 2.55) 2.45 (2.30, 2.54) −0.255 0.799
    CK, U/L, M (P25, P75) 137.00 (116.00, 162.00) 134.00 (112.00, 161.05) −1.216 0.224
    CKISO, U/L, M (P25, P75) 16.00 (12.40, 21.00) 16.00 (12.30, 21.00) −0.637 0.524
    α-HBDH, U/L, M (P25, P75) 159.00 (132.00, 185.00) 157.00 (132.00, 184.20) −0.811 0.417
    AST, U/L, M (P25, P75) 29.00 (24.00, 35.00) 31.00 (26.00, 36.00) −3.711 < 0.001
    LDH, U/L, M (P25, P75) 163.00 (144.60, 180.00) 163.60 (145.45, 184.10) −0.791 0.429
    APTT, s, M (P25, P75) 28.00 (25.00, 29.45) 27.90 (26.00, 29.60) −0.905 0.366
    Fibrinogen, g/L, M (P25, P75) 3.25 (3.14, 3.51) 3.25 (3.12, 3.52) −0.187 0.852
    TT, s, M (P25, P75) 14.15 (12.80, 15.05) 13.80 (12.70, 14.70) −1.158 0.247
    FVC/per, %, mean ± SD 80.76 ± 11.35 79.98 ± 11.65 1.265 0.206
    FEV1/per, %, mean ± SD 80.52 ± 10.39 81.18 ± 10.84 −1.17 0.242
    FEV1/FVC, %, mean ± SD 93.30 ± 6.27 93.01 ± 6.26 868 0.385
    coal dust, mg/m3-years, M (P25, P75) 18.99 (8.61, 29.05) 25.90 (14.54, 40.21) −5.982 < 0.001
    CO, mg/m3-years, M (P25, P75) 17.96 (10.27, 28.56) 25.09 (14.53, 40.14) −6.132 < 0.001
    CO2, mg/m3-years, M (P25, P75) 11775.32 (6559.46, 16705.80) 14698.84 (9028.04, 22981.75) −6.173 < 0.001
    NO, mg/m3-years, M (P25, P75) 0.12 (0.06, 0.18) 0.16 (0.10, 0.29) −6.565 < 0.001
    NO2, mg/m3-years, M (P25, P75) 0.22 (0.11, 0.38) 0.29 (0.17, 0.59) −5.579 < 0.001
    PAH, mg/m3-years, M (P25, P75) 0.67 (0.33, 1.08) 0.90 (0.50, 1.54) −5.516 < 0.001
      Note. HbA1c, glycosylated hemoglobin; AMS, serum amylase; HCY, homocysteine; ηb1, whole blood viscosity 1; ηb5, whole blood viscosity 5; ηb30, whole blood viscosity 30; ηb200, whole blood viscosity 200; PV, plasma specific viscosity; ESR, erythrocyte sedimentation rate; HCT, hematocrit; HS, whole blood high shear rate; LS, whole blood low shear rate; ESR-K, equation K value of erythrocyte sedimentation rate; AI, red blood cell aggregation index; IR, the index of rigidity of erythrocyte; TK, thymidine kinase; FPG, fasting plasma glucose; CO2CP, carbon dioxide-combining power; TCHO, total cholesterol; TG, triglycerides; HDL, high density lipoprotein; LDL, low density lipoprotein; ApoA1, apolipoprotein A1; ApoB1, apolipoprotein B1; Ca-free, free calcium; CK, creatine kinase; CKISO, creatine kinase isoenzyme; α-HBDH, alpha-hydroxybutyric dehydrogenase; AST, aspartate amino transferase; LDH, lactic dehydrogenase; APTT, activated partial thromboplastin time; TT, thrombin time; FVC/per, predicted percentage of forced vital capacity; FEV1/per, percentage of predicted forced expiratory volume in the first second; FEV1/FVC, ratio of forced expiratory volume in the first second to forced vital capacity; PAH, polycyclic aromatic hydrocarbons.

    To identify suitable predictive variables, we employed Lasso regression with cross-validated noose fitting binomial deviance plots (Figure 1A) and noose fitting coefficient locus plots (Figure 1B). The optimal λ value, corresponding to the lowest point on the loss function, was determined from Figure 1A. The variables intersecting with the optimal λ value in Figure 1B were ultimately included as model variables. Consequently, the model’s predictive variables corresponding to the optimal λ value were determined to be age, pulse, SBP, whole blood high shear rate (HS), carbon dioxide-combining power (CO2CP), Cl, and TG.

    Figure 1.  LASSO regression screening of machine learning model predictors: (A) The process of selecting the most suitable λ in the LASSO model. (B) LASSO coefficient curve of the variable.

    Numerous studies have established an association between these factors and CVD occurrence. The academic community has reached a consensus regarding the close relationship between age, hyperlipidemia, and CVD incidence, which is potentially attributed to vascular aging and AS[6]. The link between hypertension and CVD is well-documented. For example, Webb’s study demonstrated a relationship between blood pressure and AS, where higher DBP and SBP corresponded to a greater likelihood of AS occurrence[5]. Pulse has also been connected to CVD, with some researchers proposing its use as a predictor of such conditions[7]. Studies have also found associations between CO2CP and the occurrence and prognosis of CVD[8]. Moreover, HS serves as an important indicator of blood viscosity, and numerous studies have revealed that higher blood viscosity is associated with more severe AS and an increased likelihood of CVD[9]. Unfortunately, although we collected data on the exposure of coal miners to occupational hazards and tried to include them in our study, we excluded all variables of occupational hazards when we used LASSO regression to screen predictor variables; therefore, the variables of the final predictive model were all composed of physiological indicators. We speculate that there may be some mediating factors between occupational hazards and AS or CVD. We will attempt such an analysis in future studies to improve our research.

    The five machine learning models were compared using various indicators to assess their predictive performance for the occurrence of AS in coal miners (Table 1). The results demonstrated that the RF model achieved the highest accuracy (83.6%), sensitivity (80.2%), specificity (86.3%), positive predictive value (82.2%), negative predictive value (84.7%), F1 value (0.812), and AUC (0.893) on both the training and test datasets.

    Table 1.  Efficacy results for the five ML models

    Evaluation indicators Training dataset (70%) Test dataset (30%)
    RF XGboost LR BP CART RF XGboost LR BP CART
    Accuracy (%) 80.4 74.4 66.1 69.1 68.8 83.6 69.4 66.7 67.3 70.3
    Sensitivity (%) 77.4 71.4 60.7 64.6 68.0 80.2 66.5 63.6 63.6 71.0
    Specificity (%) 82.9 76.8 70.6 72.8 69.5 86.3 71.7 69.2 70.2 69.9
    Positive predictive value (%) 79.1 72.1 63.3 66.6 65.1 82.2 65.0 62.0 62.8 65.0
    Negative predictive value (%) 81.4 76.3 68.2 71.1 72.2 84.7 73.0 70.6 71.0 75.3
    F1 value 0.783 0.718 0.620 0.656 0.665 0.812 0.657 0.628 0.632 0.678
    AUC (95% CI) 0.888 (0.874, 0.902) 0.828 (0.813, 0.842) 0.723 (0.701, 0.744) 0.761 (0.741, 0.781) 0.714 (0.692, 0.736 0.893 (0.872, 0.914 0.770 (0.740, 0.801 0.736 (0.703, 0.768) 0.733 (0.600, 0.765) 0.723 (0.690, 0.756)
      Note. RF, Random Forest; XG-Boos, Extreme Gradient Augmentation; LR, Logistic Classification; BP, Back Propagation; CART, Classification and Regression Tree

    An AUC closer to 1 indicates a better predictive performance of the machine learning model. The AUCs of the five machine learning models on the dataset are shown in Figure 2. On the training dataset, the AUC of the RF model was significantly higher than those of the other models (Figure 2A). On the test dataset, the AUC value of the RF model was also the highest (0.893), which proves that the RF model has good prediction performance for AS (Figure 2B).

    Figure 2.  AUCs of the five machine learning models: (A) training and (B) test dataset ROC curves.

    The RF model is a classifier model based on decision trees that has been widely applied in the medical field. The bagging method it employs significantly enhances the accuracy of the model predictions. In our study, the RF model outperformed other machine learning models on the training dataset, exhibiting higher evaluation scores and superior performance on the test set. Therefore, based on our evaluation index analysis, we considered the RF model to be the most suitable machine learning model for predicting the risk of AS among coal miners.

    The RF model has demonstrated exceptional performance in predicting various diseases or symptoms, likely owing to the following advantages: 1) ability to generate highly accurate classifiers, 2) capability to evaluate variable importance and build models accordingly, and 3) potential to estimate missing data and balance errors within a dataset[10].

    In our study, we used simple physical examination data, such as age, pulse, blood pressure, and HS, to accurately predict the risk of AS among coal miners. Consequently, we were able to predict the risk of CVD among coal miners by asking a few questions and collecting a small number of blood samples, without relying on professional PWV detection equipment. To facilitate better use of our model, we provide the variable importance scores of the RF model in Supplementary Figure S1 (available in www.besjournal.com). For a decision tree model, the variable importance score can help readers better understand the value of these variables in predicting AS outcomes.

    Figure S1.  Feature importance scores.

    Nevertheless, our study had certain limitations. First, the outcome of our study was AS, which predicts CVD but does not directly indicate it. Second, the data used were obtained exclusively from a large coal mine in Shaanxi Province, which may affect the generalizability of our findings. Third, although we selected five machine learning models, there are numerous other widely used models, including Gaussian Parsimonious Bayesian Classification (GNB), Neural Network Classification (MLP), and Complementary Parsimonious Bayesian Classification (CNB), which could be included in future studies.

    This study represents the first attempt to employ various machine learning models to predict the risk of AS in coal miners. Among the five machine learning models examined, the RF model demonstrated the best predictive performance for AS in this population. Clinicians and public health practitioners can effectively utilize the RF model to assess the early-stage risk of AS among coal miners, enabling the implementation of appropriate preventive interventions. If readers are interested in our model, they can contact us via email, and we will provide the model and code.

    There are no potential conflicts of interest to disclose.

    MU Ming and DING Yu conceived of and designed this study. ZHU Feng Ren contributed to the writing of the manuscript. DING Yu, WANG Jia, ZOU Yuan Jie, and DU Yuan Zhen contributed to the data retrieval and manuscript review. ZHU Feng Lin, ZHANG Ya Jun, and HUI Zi Wen contributed to data collection and collation. All authors made significant contributions to the research process of this manuscript and have read and approved the submitted manuscript.

参考文献 (10)
补充材料:
23217+Supplementary Materials.pdf

目录

    /

    返回文章
    返回