绿地覆盖与成人居民血脂异常的关联:结合可解释机器学习SHAP方法

Association between green space coverage and dyslipidemia in adults: Combined with interpretable machine learning SHAP methods

  • 摘要:
    背景 血脂异常与绿地覆盖的关联研究仍有不足,现有方法多依赖传统固定模型,不足以充分揭示特征众多的大型数据集中存在的复杂非线性关系及其相互作用。
    目的 引入可解释机器学习方法,系统评估绿地覆盖率与成人居民血脂异常之间的关联,深入探讨绿化环境对心血管健康的潜在影响。
    方法 依托国家心血管病高危人群早期筛查与综合干预项目,调查2015年12月—2018年12月之间在湖北省咸宁市参与心血管疾病高危人群初次筛查的自然人群(≥18岁)。采用归一化差异植被指数(NDVI)评估人群的绿地覆盖水平。采用logistic回归模型和基于轻量级梯度提升机(LightGBM)模型的Shapley加性解释(SHAP)分析方法评估居民绿地覆盖与血脂异常患病率之间的关联。采用R语言mediation包构建中介模型,评估PM2.5等污染物及体质量指数的中介效应,Bootstrap法检验显著性并计算中介效应占比。同时基于人口社会学特征以及污染物暴露水平分层,在不同亚组中评估绿地覆盖与血脂异常患病率之间的关联。采用限制性立方样条函数描述剂量-效应关系。
    结果 10680名研究对象中,血脂异常患者851例,患病率为7.97%。logistic回归模型结果表明,NDVI(1500 m缓冲区内NDVI均值)每增加0.1个单位与血脂异常患病率降低15.8%具有统计学意义的关联(OR=0.842,95%CI:0.773~0.918)。NDVI与血脂异常患病之间的负向关联在高二氧化氮(NO₂)暴露人群(OR=0.774,95%CI:0.689~0.869)、已婚人群(OR=0.837,95%CI:0.763~0.917)、非肥胖人群(OR=0.811,95%CI:0.738~0.890)以及未服用常规心血管药物(OR=0.781,95%CI:0.694~0.876)、无相关疾病史人群(OR=0.836,95%CI:0.736~0.948)中具有统计学意义,而在其余相对应的人群中该关联不具有统计学意义。剂量-效应曲线以及SHAP特征交互依赖图结果显示,绿地覆盖与血脂异常之间存在非线性关联,且随着绿地覆盖水平的增加,其与血脂异常患病率之间的负向统计学关联呈现增强趋势。此外,SHAP分析结果显示,NDVI在所有环境要素(包括空气污染物)中具有最高的特征重要性。
    结论 绿地覆盖与血脂异常之间存在非线性关联,且此关联在高NO₂暴露、已婚、非肥胖、未服药及无相关疾病史的人群中具有统计学意义。此外,绿地对血脂异常的影响比空气污染物更为关键。

     

    Abstract:
    Background Research on the association between dyslipidemia and green space coverage remains limited, and existing methods too rely on traditional fixed models to fully reveal the complex and nonlinear relationships and their interactions in large datasets comprising numerous features.
    Objective To systematically evaluate the association between green space coverage and the prevalence of dyslipidemia in adult residents by interpretable machine learning methods, exploring the potential impacts of green environments on cardiovascular health.
    Methods Based on the National Early Screening and Comprehensive Intervention Project for High-Risk Groups of Cardiovascular Diseases, this study surveyed a general population (aged ≥18 years) participating in the initial screening for high-risk cardiovascular diseases in Xianning City, Hubei Province, from December 2015 to December 2018. The normalized difference vegetation index (NDVI) was used to assess green space coverage levels. Logistic regression models and Shapley additive explanations (SHAP) analysis based on light gradient boosting machine (LightGBM) model were employed to evaluate the association between green space coverage and the prevalence of dyslipidemia. The R package “mediation” was used to construct mediation models to assess potential mediating effects of pollutants such as PM2.5 and body mass index. Bootstrap method was applied to test significance and calculate the proportion of mediation effects. Stratified analyses were conducted based on sociodemographic characteristics and pollutant exposure levels to evaluate the association between green space coverage and dyslipidemia prevalence across subgroups. Restricted cubic spline functions were used to describe dose-response relationships.
    Results Among the 10680 participants, 851 were diagnosed with dyslipidemia, yielding a prevalence rate of 7.97%. The logistic regression models indicated that a 0.1-unit increase in NDVI (the average NDVI within a 1500 m buffer zone) was significantly associated with a 15.8% reduction in dyslipidemia prevalence (OR=0.842, 95%CI: 0.773, 0.918). A negative association between NDVI and dyslipidemia was found statistically significant in subgroups with high nitrogen dioxide (NO₂) exposure (OR=0.774, 95%CI: 0.689, 0.869), married individuals (OR=0.837, 95%CI: 0.763, 0.917), non-obese individuals (OR=0.811, 95%CI: 0.738, 0.890), those not taking regular cardiovascular medications (OR=0.781, 95%CI: 0.694, 0.876), or those without a relevant disease history (OR=0.836, 95%CI: 0.736, 0.948). This association was not statistically significant in the corresponding comparison groups. The dose-response curves and SHAP feature interaction dependency plots revealed a nonlinear association between green space coverage and dyslipidemia, with stronger negative statistical association as green space coverage increased. Additionally, the SHAP analysis showed that NDVI had the highest feature importance among environmental factors, including multiple air pollutants.
    Conclusion A nonlinear association exists between green space coverage and dyslipidemia, especially in adults with high NO₂ exposure, married individuals, non-obese individuals, those not taking medications, and those without a relevant disease history, but not in their corresponding counterparts. Furthermore, green space has a more critical impact on dyslipidemia than air pollutants.

     

/

返回文章
返回