Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.13044/j.sdewes.d13.0634

Machine Learning-Based Water Quality Prediction

Ali Al-Ataby ; AURAK, Ras Al Khaimah, United Arab Emirates
Beza Getu ; AURAK, Ras Al Khaimah, United Arab Emirates
Hussain Attia ; American University of Ras Al Khaimah, Ras Al Khaimah, United Arab Emirates


Puni tekst: engleski pdf 3.443 Kb

str. 1-20

preuzimanja: 50

citiraj


Sažetak

Water is an indispensable resource for all forms of life, with a particularly critical role in supporting human health, agriculture, and industrial development. With the predicted water scarcity worldwide, it is critical to have a tool to analyse and predict water potability accurately and in real-time. This study used machine learning models to predict water potability based on quality features such as potential of Hydrogen (pH) value, hardness, solids content, chloramines, sulfate, and conductivity. Potability is determined based on the concentration of these features in the water. Four Machine Learning algorithms, namely, Random Forest (RF), Logistic Regression (LR), Extreme Gradient Boosting (XGBoost), and Deep Learning Neural Networks, are used to analyse water potability after training using a water quality dataset. Initial experiments showed moderate performance, with Random Forest (F1-score = 0.47 and area under the receiver operating characteristic curve of 0.68) and XGBoost (F1-score = 0.49 and area under the receiver operating characteristic curve of 0.66), outperforming the other two models. After addressing class imbalance and introducing more features using feature engineering, the performance of the four models was significantly improved, with Random Forest achieving an F1-score of 0.85 and an area under the curve of 0.90 and XGBoost achieving an F1-score of 0.86 and an area under the curve of 0.91. The results clearly indicate that Random Forest and XGBoost consistently outperformed the Linear Regression model and the Deep Learning model in terms of predictive accuracy and robustness. These results demonstrate the critical importance of feature engineering and hyperparameter optimization in enhancing model effectiveness. A real-time water potability prediction application was developed to classify water as either “safe to drink” or “unsafe to drink”, and its functionality was successfully validated, and its output was displayed on a user-friendly graphical user interface (GUI).

Ključne riječi

Water; Potability; Machine Learning; Random Forest; XGBoost; Deep Learning; Feature Engineering; AUC.

Hrčak ID:

346098

URI

https://hrcak.srce.hr/346098

Datum izdavanja:

25.5.2026.

Posjeta: 140 *