主办单位:中国气象局沈阳大气环境研究所
国际刊号:ISSN 1673-503X
国内刊号:CN 21-1531/P

气象与环境学报 ›› 2022, Vol. 38 ›› Issue (4): 47-56.doi: 10.3969/j.issn.1673-503X.2022.04.006

• 论文 • 上一篇    下一篇

基于机器学习方法的辽宁省初霜冻日期预测模型研究

王涛1,2(),王乙舒1,赵春雨1,王小桃1,秦美欧1,沈玉敏1,*(),侯依玲1,赵建云1   

  1. 1. 沈阳区域气候中心, 辽宁 沈阳 110166
    2. 中国气象局沈阳大气环境研究所, 辽宁 沈阳 110166
  • 收稿日期:2021-04-25 出版日期:2022-08-28 发布日期:2022-09-22
  • 通讯作者: 沈玉敏 E-mail:nick_bsb@126.com
  • 作者简介:王涛, 男, 1985年生, 工程师, 主要从事气候预测、人工智能技术研究, E-mail: nick_bsb@126.com
  • 基金资助:
    中国气象局气候变化专项(CCSF202013);中国气象局创新发展专项(CXFZ2021J047);辽宁省科技厅自然基金指导计划(2019-ZD-0860);东北冷涡研究重点开放实验室开放基金课题(2022SYIAZKFMS09);农业攻关及产业化指导计划(2019JH8/10200023);辽宁省气象局科学技术课题(BA202005)

Prediction model of first-frost date in Liaoning province using machine learning methods

Tao WANG1,2(),Yi-shu WANG1,Chun-yu ZHAO1,Xiao-tao WANG1,Mei-ou QIN1,Yu-min SHEN1,*(),Yi-ling HOU1,Jian-yun ZHAO1   

  1. 1. Shenyang Regional Climate Centre, Shenyang 110166, China
    2. Institute of Atmospheric Environment, China Meteorological Administration, Shenyang 110166, China
  • Received:2021-04-25 Online:2022-08-28 Published:2022-09-22
  • Contact: Yu-min SHEN E-mail:nick_bsb@126.com

摘要:

基于前期ERA5逐月再分析数据, 应用3种机器学习算法(Lasso回归、随机森林和神经网络)对辽宁省初霜冻日期进行预测评估。Lasso回归算法提取对初霜冻日期预测有重要指示意义的气象要素特征集, 通过交叉验证和超参数调优建立初霜冻日期预测模型, 利用均方根误差(RMSE)和距平同号率方法定量定性地评估模型的预测效果。结果表明: 特征选择后的气象要素特征集建模提升了模型的泛化能力、可解释性和稳定性; Lasso回归模型在4月起报的预测效果最好(RMSE为6—8 d), 神经网络模型在5月起报性能最好(RMSE为6—9 d), 随机森林模型在3月起报性能最好(RMSE为8—9 d); 辽宁全省大部分站点距平同号率为50%—70%, 其中Lasso回归和神经网络模型为5月起报最高(约为68%), 随机森林算法为3月起报最高(约为62%)。特征选择和敏感性实验结果发现, 低植被覆盖比例是初霜冻日期预测关键预测因子, 植被覆盖率越高越有利于地表含水量保持, 降温容易产生霜冻, 初霜冻日期也就越易提前, 去掉低植被覆盖比例因子后模型预测效果显著下降, 也表明该因子是模型建模的前期关键因子。

关键词: ERA5, 机器学习, Lasso回归, 随机森林, 神经网络

Abstract:

Based on ERA5 monthly reanalysis data, the first-frost date in Liaoning province was predicted and evaluated using three machine learning algorithms (Lasso Regression, Random Forest, and Neural Network). The Lasso Regression algorithm was applied to identify the feature sets of meteorological parameters that have important indications for the prediction of the first-frost date, and the prediction model for the first-frost date was established after cross-validation and hyperparameter-tuning processes. Finally, the performance of first-frost prediction was evaluated quantitatively and qualitatively using the root mean square error (RMSE) and the rate with the same sign of an anomaly. The results showed that the feature sets of meteorological parameters after feature selection can improve the generalization ability, interpretability, and robustness of the model. The prediction performance of the Lasso Regression model performs best with prediction starting from April (with RMSE of 6-8 d), the Neural Network model has the best performance with prediction starting from May (with RMSE 6-9 d), and the Random Forest model performs best with prediction starting from March (with RMSE 8-9 d). The rate with the same sign of anomaly ranges from 50% to 70% at most stations in Liaoning province, with the Lasso Regression and Neural Network models reaching a maximum rate (about 68%) with prediction starting from May and with the Random Forest model reaching a maximum rate (about 62%) with prediction starting from March. Results from feature selection and sensitivity experiments indicated that the low vegetation coverage scale is the key predictor. High vegetation coverage favors the maintenance of surface water content, and frost is more likely to occur with lowered temperatures, leading to an earlier first-frost date. The model has a poor performance after excluding the low vegetation coverage scale factor, which is the key among previous factors. In short, machine learning algorithms have high skills in the quantitative and qualitative prediction of the first-frost date.

Key words: ERA5, Machine learning, Lasso Regression, Random Forest, Neural Network

中图分类号: