Abstract:Dissolved oxygen (DO) is a key indicator reflecting the self-purification ability of water bodies and the quality of water environment. DO is also an important parameter for assessing the health of water bodies in Lake Poyang. In this study, two efficient machine learning algorithms, random forest (RF) and improved support vector regression (PSO-SVR), were introduced into the monitoring and prediction of DO in Lake Poyang. The water quality data from 1988 to 2023 were selected for prediction in time, and a total of eight key monitoring stations of Lake Poyang and five rivers entering the lake were spatially selected: Tangyin, east branch of Xinjiang River, Poyang, main branch of Ganjiang River, Fuhekou, Xiuhekou, Kangshan and Hukou. Firstly, Mann-Kendall trend test was performed on the DO of the eight monitoring stations. The stations with overall increasing DO were Fuhekou, Xiuhekou, Kangshan and Hukou, among which Kangshan and Hukou showed a significant increasing trend in the later stage. Secondly, the response and relationship between DO and other water quality factors were explored based on the random forest importance index (IMI). The importance index of water temperature (T) to DO was higher in all 8 monitoring stations, followed by month, and the average IMI of each factor ranked T>CODMn>TN> NH3-N>TP>pH, with importance index values of 2.54, 0.81, 0.65, 0.63, 0.43 and 0.37, respectively. The model predictions were then analyzed in comparison to the monthly average water quality data from 1988 to 2023 using RF and PSO-SVR. Overall, the overall mean errors were 0.32 for the RF model and 0.54 for the PSO-SVR model at the eight monitoring stations. The mean accuracies η in the model performance evaluation based on the confusion matrix were 0.67 for RF and 0.52 for PSO-SVR, respectively. The overall prediction performances on the training set were RF (R=0.953; RMSE=0.397 mg/L)>SVR (R=0.822; RMSE=0.764 mg/L). The overall prediction performance of the models on the prediction set was RF (R=0.836; RMSE=0.660 mg/L)>SVR (R=0.815; RMSE=0.686 mg/L). Both models showed excellent predictive performance, with RF having better predictive ability. The R values of the RF model were more concentrated in the training and prediction sets, indicating that the model had better stability and generalization ability. The RMSE values were also more concentrated in the training and prediction sets, but slightly higher in the prediction set. The R and RMSE values of the PSO-SVR model were more dispersed in the training and test sets, indicating that the model's performance varied greatly in different cross-sections, and it may need to be adjusted for different data characteristics. Overall, the RF model showed the best prediction ability on all monitoring sections, with the highest R value and the lowest RMSE value, and showed excellent performance and generalization ability on both training and test sets. The PSO-SVR model also performed well on most monitoring sections, and its prediction performance was slightly inferior to that of the RF model, and it may need to optimize the structure or parameters of the model to improve the prediction accuracy and stability. improve the prediction accuracy and stability. Both models showed excellent predictive performance, with RF having better predictive ability. An efficient algorithm of machine learning was introduced to realize the accurate prediction of dissolved oxygen in Lake Poyang, with a view to revealing the water quality pattern of Lake Poyang and the intrinsic connection between the water quality factors, and providing scientific decision support for environmental monitoring and management.