Comparative Evaluation of Machine Learning Models for Multi-Lead-Time Water-Level Forecasting in the Gandak River Basin
Main Article Content
Abstract
Accurate short- and medium-range water-level forecasting is vital for effective flood management and reservoir operation in monsoon-driven basins. This study evaluates the performance of three machine-learning models: Random Forest (RF), Support Vector Regression (SVR), and K-Nearest Neighbors (KNN) for one-, five-, and ten-day-ahead water-level prediction at the Triveni gauging station in the Gandak River Basin, India. Daily water level and discharge data from the Central Water Commission (CWC) and rainfall data from three Indian Meteorological Department (IMD) stations (Balmiki, Bagaha, and Ramnagar) for the period 2003–2023 were used as inputs. Model performance was assessed using the coefficient of determination (R²), root-mean-square error (RMSE), and mean absolute error (MAE). Results showed that all models performed well for one-day forecasting, with RF achieving the highest accuracy (R² = 0.941, RMSE = 0.296 m) and SVR yielding the smallest average error (MAE = 0.145 m). As forecast lead time increased, performance declined gradually; however, both RF and SVR maintained R² values above 0.90 even at ten days, indicating strong temporal persistence. KNN performed satisfactorily for short horizons but showed higher dispersion for longer leads. Overall, RF demonstrated the best balance between accuracy and stability, while SVR provided superior average error control. These findings highlight the potential of ensemble and kernel-based models for improving real-time flood forecasting and water-level management in data-scarce river basins.