主办单位:中国气象局沈阳大气环境研究所
国际刊号:ISSN 1673-503X
国内刊号:CN 21-1531/P

Journal of Meteorology and Environment ›› 2022, Vol. 38 ›› Issue (4): 47-56.doi: 10.3969/j.issn.1673-503X.2022.04.006

Previous Articles     Next Articles

Prediction model of first-frost date in Liaoning province using machine learning methods

Tao WANG1,2(),Yi-shu WANG1,Chun-yu ZHAO1,Xiao-tao WANG1,Mei-ou QIN1,Yu-min SHEN1,*(),Yi-ling HOU1,Jian-yun ZHAO1   

  1. 1. Shenyang Regional Climate Centre, Shenyang 110166, China
    2. Institute of Atmospheric Environment, China Meteorological Administration, Shenyang 110166, China
  • Received:2021-04-25 Online:2022-08-28 Published:2022-09-22
  • Contact: Yu-min SHEN E-mail:nick_bsb@126.com

Abstract:

Based on ERA5 monthly reanalysis data, the first-frost date in Liaoning province was predicted and evaluated using three machine learning algorithms (Lasso Regression, Random Forest, and Neural Network). The Lasso Regression algorithm was applied to identify the feature sets of meteorological parameters that have important indications for the prediction of the first-frost date, and the prediction model for the first-frost date was established after cross-validation and hyperparameter-tuning processes. Finally, the performance of first-frost prediction was evaluated quantitatively and qualitatively using the root mean square error (RMSE) and the rate with the same sign of an anomaly. The results showed that the feature sets of meteorological parameters after feature selection can improve the generalization ability, interpretability, and robustness of the model. The prediction performance of the Lasso Regression model performs best with prediction starting from April (with RMSE of 6-8 d), the Neural Network model has the best performance with prediction starting from May (with RMSE 6-9 d), and the Random Forest model performs best with prediction starting from March (with RMSE 8-9 d). The rate with the same sign of anomaly ranges from 50% to 70% at most stations in Liaoning province, with the Lasso Regression and Neural Network models reaching a maximum rate (about 68%) with prediction starting from May and with the Random Forest model reaching a maximum rate (about 62%) with prediction starting from March. Results from feature selection and sensitivity experiments indicated that the low vegetation coverage scale is the key predictor. High vegetation coverage favors the maintenance of surface water content, and frost is more likely to occur with lowered temperatures, leading to an earlier first-frost date. The model has a poor performance after excluding the low vegetation coverage scale factor, which is the key among previous factors. In short, machine learning algorithms have high skills in the quantitative and qualitative prediction of the first-frost date.

Key words: ERA5, Machine learning, Lasso Regression, Random Forest, Neural Network

CLC Number: