Original scientific paper
https://doi.org/10.7307/ptt.v37i6.1016
Prediction and Investigation of the Injury Severity of Drivers Involved in Speeding-Related Crashes Using Machine Learning Models
Neero Gumsar SORUM
; Department of Civil Engineering, North Eastern Regional Institute of Science and Technology, Nirjuli, India
*
Martina Gumsar SORUM
; Department of Civil Engineering, North Eastern Regional Institute of Science and Technology, Nirjuli, India
* Corresponding author.
Abstract
Speeding is the major reason for road traffic crashes and deaths in India. The other driver’s faults include driving under the influence, using mobile phones while driving and driving on the wrong side of the road. Therefore, this study attempts to predict and investigate the driver injury severity (DIS) in speeding-related crashes. A total of 793 police-reported single-vehicle and two-vehicle crash data from Imphal City, India, collected between 2011–2020, were analysed and modelled. For DIS prediction, eleven supervised machine learning (ML) models were implemented using 5-fold and 10-fold cross-validation (FCVs) and trained at train ratio (TR) values of 0.5, 0.6, 0.7 and 0.8 in each FCV. The top ML model for the DIS prediction was selected based on the best combination of recall, accuracy, F1 score, area under the curve (AUC) and precision metrics. Feature importance analysis (FIA) was conducted to determine the impactful factors in DIS prediction. The gradient boosting tree (GBT), stochastic gradient descent, decision tree and lasso-LARS models were identified as the top-performing ML models for the DIS prediction at TR = 0.5, 0.6, 0.7 and 0.8, respectively, in 5-FCV. The light GBM (TR = 0.5 and 0.7), GBT (TR = 0.6) and lasso-LARS (TR = 0.8) were the best-performing ML models in 10-FCV. The FIA results indicated that vehicle type (two-wheeler), nature of crash (head-on collision) and time of crash (12 PM–6 PM and 6 AM–12 PM) variables were the most impactful variables on the DIS prediction in Imphal speeding-related crashes. These ML models can be employed in hilly areas for the accurate prediction of DIS. The study results can help transportation planners in designing road safety measures and strategies to lessen DIS in speeding-related crashes.
Keywords
speeding; driver injury severity; machine learning; Dataiku; gradient boosting tree; lasso-LARS
Hrčak ID:
337235
URI
Publication date:
27.10.2025.
Visits: 86 *