Water | |
Splitting and Length of Years for Improving Tree-Based Models to Predict Reference Crop Evapotranspiration in the Humid Regions of China | |
Fucang Zhang1  Fulai Yan1  Xiaoqiang Liu1  Wenqiang Bai1  Lifeng Wu2  Guomin Huang2  | |
[1] Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of the Ministry of Education, Northwest A&F University, Yangling, Xianyang 712100, China;School of Hydraulic and Ecological Engineering, Nanchang Institute of Technology, Nanchang 330099, China; | |
关键词: data splitting; length of years; random forest; extreme gradient boosting; reference crop evapotranspiration; | |
DOI : 10.3390/w13233478 | |
来源: DOAJ |
【 摘 要 】
To improve the accuracy of estimating reference crop evapotranspiration for the efficient management of water resources and the optimal design of irrigation scheduling, the drawback of the traditional FAO-56 Penman–Monteith method requiring complete meteorological input variables needs to be overcome. This study evaluates the effects of using five data splitting strategies and three different time lengths of input datasets on predicting ET0. The random forest (RF) and extreme gradient boosting (XGB) models coupled with a K-fold cross-validation approach were applied to accomplish this objective. The results showed that the accuracy of the RF (R2 = 0.862, RMSE = 0.528, MAE = 0.383, NSE = 0.854) was overall better than that of XGB (R2 = 0.867, RMSE = 0.517, MAE = 0.377, NSE = 0.860) in different input parameters. Both the RF and XGB models with the combination of Tmax, Tmin, and Rs as inputs provided better accuracy on daily ET0 estimation than the corresponding models with other input combinations. Among all the data splitting strategies, S5 (with a 9:1 proportion) showed the optimal performance. Compared with the length of 30 years, the estimation accuracy of the 50-year length with limited data was reduced, while the length of meteorological data of 10 years improved the accuracy in southern China. Nevertheless, the performance of the 10-year data was the worst among the three time spans when considering the independent test. Therefore, to improve the daily ET0 predicting performance of the tree-based models in humid regions of China, the random forest model with datasets of 30 years and the 9:1 data splitting strategy is recommended.
【 授权许可】
Unknown