摘要: |
溶解氧(DO)作为反映水体自净能力和水环境质量的关键指标,是评估鄱阳湖水体健康状况的重要参数。随机森林(RF)和改进支持向量回归(PSO-SVR)2种机器学习的高效算法被引入到鄱阳湖DO的预测工作中,时间上选择1988~2023年水质数据进行预测,空间上挑选了位于鄱阳湖和入湖五条河流的共8个关键监测站点:棠荫,信江东支,鄱阳,赣江主支,抚河口,修河口,康山和湖口。对8个监测站点的DO进行曼肯达尔趋势检验,整体上DO上升的站点为抚河口、修河口、康山和湖口4个站点,其中康山和湖口的DO在后期表现出显著上升趋势。基于随机森林重要性指数探究了DO与其他水质因子间的响应关系,在8个监测站点中T对DO的重要性指数均较高,其次是高锰酸盐指数,各个因子的平均IMI排名为T>高锰酸盐指数>TN>NH4+-N>TP>pH,其重要性指数值分别为2.54,0.81,0.65,0.63,0.43和0.37。使用RF和PSO-SVR对1988~2023年月均水质数据进行模型预测对比分析。整体上,RF模型在8个监测站点的总体平均误差为0.32,PSO-SVR模型为0.54。基于混淆矩阵的模型性能评价中,平均准确率η分别为,RF为0.67,PSO-SVR为0.52。模型在训练集上整体预测性能为:RF (R2=0.953;RMSE=0.397 mg·L-1)>PSO-SVR (R2=0.822;RMSE=0.764 mg·L-1)。模型在预测集上整体预测性能为:RF (R2=0.836;RMSE=0.660 mg·L-1)>PSO-SVR (R2=0.815;RMSE=0.686 mg·L-1)。两种模型均表现出优秀的预测性能,其中RF的预测能力更好。引入机器学习的高效算法实现对鄱阳湖DO的精准预测,以期揭示鄱阳湖水质规律以及水质因子之间的内在联系,为环境监测与管理提供科学的决策支持。 |
关键词: 鄱阳湖,溶解氧,预测,随机森林,支持向量回归,混淆矩阵 |
DOI: |
分类号: |
基金项目:国家自然科学基金项目 |
|
Characterization and prediction of dissolved oxygen fluctuation in Poyang Lake based on machine learningLI Xiao-ying1,2, WANG Hua1,2*, WU Xiao-mao3, WU Yi1,2, XU Hao-sen1,2 |
lixiaoying, wanghua
|
Hohai University
|
Abstract: |
Dissolved oxygen (DO), as a key indicator reflecting the self-purification ability of water bodies and the quality of water environment, is an important parameter for assessing the health of water bodies in Poyang Lake. Two efficient machine learning algorithms, Random Forest and Improved Support Vector Regression, were introduced into the monitoring and prediction of DO at Poyang Lake. The water quality data from 1988 to 2023 were selected for prediction in time, and a total of eight key monitoring stations located at Poyang Lake and the five rivers entering the lake were spatially selected: Tangyin, East Branch of Xinjiang River, Poyang, Main Branch of Ganjiang River, Fuhekou, Xiuhekou, Kangshan and Hukou. Firstly, the Mann-Kendall trend test was performed on the DO of the eight monitoring stations, and the stations with overall increasing DO were four stations, namely, Fuhekou, Xiuhekou, Kangshan and Hukou, among which Kangshan and Hukou showed a significant increasing trend in the later stage. Secondly, the response and relationship between DO and other water quality factors were explored based on the Random Forest Importance Index (IMI), and the importance index of T to DO was higher in all 8 monitoring stations, followed by month, and the average IMI of each factor ranked T> CODMn index> TN> NH4+-N> TP> pH, with importance index values of 2.54,0.81,0.65,0.63,0.43 and 0.37, respectively. The model predictions were then analyzed in comparison to the monthly average water quality data from 1988 to 2023 using Random Forest and Improved Support Vector Machine regression. Overall, the overall mean errors were 0.32 for the RF model and 0.54 for the SVR model at the eight monitoring stations.The mean accuracies η in the model performance evaluation based on the confusion matrix were, respectively, 0.67 for RF and 0.52 for PSO-SVR. The overall prediction performances on the training set were RF (R2=0.953; RMSE=0.397 mg·L-1)>SVR (R2=0.822; RMSE=0.764 mg·L-1). The overall prediction performance of the models on the prediction set was RF (R2=0.836; RMSE=0.660 mg·L-1) > SVR (R2=0.815; RMSE=0.686 mg·L-1). Both models showed excellent predictive performance, with RF having better predictive ability. The R2 values of the RF model are more concentrated in the training and prediction sets, indicating that the model has better stability and generalization ability.The RMSE values are also more concentrated in the training and prediction sets, but slightly higher in the prediction set.The R2 and RMSE values of the SVR model are more dispersed in the training and test sets, indicating that the model"s performance varies greatly in different cross-sections, and it may need to be adjusted for different data characteristics. On the whole, the RF model shows the best prediction ability on all monitoring sections, with the highest R2 value and the lowest RMSE value, and shows excellent performance and generalization ability on both training and test sets.The PSO-SVR model also performs well on most monitoring sections, and its prediction performance is slightly inferior to that of the RF model, and it may need to optimize the structure or parameters of the model to improve the prediction accuracy and stability. improve the prediction accuracy and stability. Both models showed excellent predictive performance, with RF having better predictive ability. An efficient algorithm of machine learning was introduced to realize the accurate prediction of dissolved oxygen in Poyang Lake, with a view to revealing the water quality pattern of Poyang Lake and the intrinsic connection between the water quality factors, and providing scientific decision support for environmental monitoring and management. |
Key words: Poyang Lake, dissolved oxygen, prediction, random forest, support vector regression, confusion matrix |