Abstract:
Background Research on the association between dyslipidemia and green space coverage remains limited, and existing methods too rely on traditional fixed models to fully reveal the complex and nonlinear relationships and their interactions in large datasets comprising numerous features.
Objective To systematically evaluate the association between green space coverage and the prevalence of dyslipidemia in adult residents by interpretable machine learning methods, exploring the potential impacts of green environments on cardiovascular health.
Methods Based on the National Early Screening and Comprehensive Intervention Project for High-Risk Groups of Cardiovascular Diseases, this study surveyed a general population (aged ≥18 years) participating in the initial screening for high-risk cardiovascular diseases in Xianning City, Hubei Province, from December 2015 to December 2018. The normalized difference vegetation index (NDVI) was used to assess green space coverage levels. Logistic regression models and Shapley additive explanations (SHAP) analysis based on light gradient boosting machine (LightGBM) model were employed to evaluate the association between green space coverage and the prevalence of dyslipidemia. The R package “mediation” was used to construct mediation models to assess potential mediating effects of pollutants such as PM2.5 and body mass index. Bootstrap method was applied to test significance and calculate the proportion of mediation effects. Stratified analyses were conducted based on sociodemographic characteristics and pollutant exposure levels to evaluate the association between green space coverage and dyslipidemia prevalence across subgroups. Restricted cubic spline functions were used to describe dose-response relationships.
Results Among the 10680 participants, 851 were diagnosed with dyslipidemia, yielding a prevalence rate of 7.97%. The logistic regression models indicated that a 0.1-unit increase in NDVI (the average NDVI within a 1500 m buffer zone) was significantly associated with a 15.8% reduction in dyslipidemia prevalence (OR=0.842, 95%CI: 0.773, 0.918). A negative association between NDVI and dyslipidemia was found statistically significant in subgroups with high nitrogen dioxide (NO₂) exposure (OR=0.774, 95%CI: 0.689, 0.869), married individuals (OR=0.837, 95%CI: 0.763, 0.917), non-obese individuals (OR=0.811, 95%CI: 0.738, 0.890), those not taking regular cardiovascular medications (OR=0.781, 95%CI: 0.694, 0.876), or those without a relevant disease history (OR=0.836, 95%CI: 0.736, 0.948). This association was not statistically significant in the corresponding comparison groups. The dose-response curves and SHAP feature interaction dependency plots revealed a nonlinear association between green space coverage and dyslipidemia, with stronger negative statistical association as green space coverage increased. Additionally, the SHAP analysis showed that NDVI had the highest feature importance among environmental factors, including multiple air pollutants.
Conclusion A nonlinear association exists between green space coverage and dyslipidemia, especially in adults with high NO₂ exposure, married individuals, non-obese individuals, those not taking medications, and those without a relevant disease history, but not in their corresponding counterparts. Furthermore, green space has a more critical impact on dyslipidemia than air pollutants.