曹凯鑫, 汤猛猛, 葛建鸿, 李泽康, 王晓芸, 李国星, 魏雪涛. 大气污染物PM2.5缺失数据插值方法的比较研究:基于北京市数据[J]. 环境与职业医学, 2020, 37(4): 299-305. DOI: 10.13213/j.cnki.jeom.2020.19740
引用本文: 曹凯鑫, 汤猛猛, 葛建鸿, 李泽康, 王晓芸, 李国星, 魏雪涛. 大气污染物PM2.5缺失数据插值方法的比较研究:基于北京市数据[J]. 环境与职业医学, 2020, 37(4): 299-305. DOI: 10.13213/j.cnki.jeom.2020.19740
CAO Kai-xin, TANG Meng-meng, GE Jian-hong, LI Ze-kang, WANG Xiao-yun, LI Guoxing, WEI Xue-tao. Comparison of methods to interpolate missing PM2.5 values: Based on air surveillance data of Beijing[J]. Journal of Environmental and Occupational Medicine, 2020, 37(4): 299-305. DOI: 10.13213/j.cnki.jeom.2020.19740
Citation: CAO Kai-xin, TANG Meng-meng, GE Jian-hong, LI Ze-kang, WANG Xiao-yun, LI Guoxing, WEI Xue-tao. Comparison of methods to interpolate missing PM2.5 values: Based on air surveillance data of Beijing[J]. Journal of Environmental and Occupational Medicine, 2020, 37(4): 299-305. DOI: 10.13213/j.cnki.jeom.2020.19740

大气污染物PM2.5缺失数据插值方法的比较研究:基于北京市数据

Comparison of methods to interpolate missing PM2.5 values: Based on air surveillance data of Beijing

  • 摘要: 背景

    地面监测站点大气污染物数据越来越多地被应用到环境流行病学个体暴露评估中。鉴于大气监测等实时数据缺失信息无法弥补,利用历史数据进行相关研究时,不同填补方法引起的预测误差将影响研究者对结果的判断。

    目的

    综合比较6种插值方法在大气污染物PM2.5数据中的填补效果,评估每种插值方法平均预测误差大小,为暴露评估中测量误差大小提供线索。

    方法

    基于2016年北京市35个监测站点PM2.5数据,选取3个有代表性的评价站点(东四、密云和房山),基于4个统计量(中位绝对误差、中位相对误差、均方误差和均方根误差)进行6种插值方法(日均值、最近监测站点、多重线性回归、多重插补、反距离权重和克里金插值法)插值效果的比较研究。

    结果

    6种插值方法中,在"东四"站点,多重线性回归插值法效果最优,其次为反距离权重插值法,日均值插值法最差;均方根误差分别为6.67、8.19和52.19;日均值插值法中位绝对误差为19.00,其余各方法中位绝对误差均在4以内。"密云"站点多重插补法插值效果最优,其次为克里金插值法,日均值插值法最差;均方根误差分别为8.34、11.76和42.53;日均值插值法中位绝对误差为16.00,其余各方法中位绝对误差均在5以内。"房山"站点克里金插值法效果最优,其次为多重插补法,日均值插值法最差;均方根误差分别为18.74、22.73和50.93;日均值插值法中位绝对误差为27.50,其余各方法中位绝对误差均在10以内。3个站点综合分析,克里金插值法最优,其次为多重插补法,日均值插值法最差;均方根误差分别为13.65、14.77和48.74;日均值插值法中位绝对误差为19.00,其余各方法中位绝对误差均在5以内。

    结论

    6种插值方法中,克里金插值法和多重插补法插值效果较优,日均值插值法效果最差;克里金插值法稳定性高于反距离权重插值法。除日均值插值法外,各方法平均预测误差在5以内。监测点密度、地形等相关因素对插值效果有很大影响。

     

    Abstract: Background

    Air pollutant data from ground monitoring sites are increasingly being applied for individual exposure assessment in environmental epidemiology. For research based on historical monitoring data, due to the impossibility of remeasurement for missing values, the prediction errors caused by different interpolation methods will affect the final interpretation.

    Objective

    This study compares the accuracy and precision of six interpolation methods and provides insights into the measurement bias arising from exposure assessment in PM2.5-associated studies.

    Methods

    Based on the PM2.5 data observed at 35 monitoring sites in Beijing, the results from six interpolation methods (time-average, the nearest monitoring site, multiple linear regression, multivariate imputation, inverse distance weighted, and Kriging interpolation) were compared at three typical monitoring sites (Dongsi, Miyun, and Fangshan), respectively, using four statisticsmedian absolute error (MAE), median relative error, mean squared error, and root mean squared error (RMSE).

    Results

    Among the six interpolation methods, the optimal method at "Dongsi" monitoring site was multiple linear regression, followed by inverse distance weight, and the worst one was time-average; the RMSEs of the three interpolation methods were 6.67, 8.19, and 52.19, respectively; the MAEs were smaller than 4, except the value of 19.00 for time-average. At "Miyun" monitoring site, the optimal interpolation method was multiple interpolation, followed by Kriging, and the worst one was time-average; the RMSEs of the three methods were 8.34, 11.76, and 42.53, respectively; the MAEs were smaller than 5, except in the case of 16.00 for time-average. At the "Fangshan" monitoring site, the optimal interpolation method was Kriging, followed by multiple interpolation, and the worst one was time-average; the RMSEs of the three methods were 18.74, 22.73, and 50.93, respectively; the MAEs were smaller than 10, except in the case of 27.50 for time-average. Taking the three monitoring sites together, the optimal method was Kriging, followed by multiple interpolation, and the worst one was time-average; the RMSEs were 13.65, 14.77, and 48.74, respectively; the MAEs were smaller than 5, except in the case of 19.00 for time-average.

    Conclusion

    Among the six interpolation methods, Kriging and multiple interpolation methods are the best, while time-average is the worst. Kriging interpolation method shows a more stable performance than inverse distance weight. Except time-average, the average prediction error of each method is within 5. Factors like surveillance density and topography may influence interpolation efficiency.

     

/

返回文章
返回