Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.31803/tg-20251118071140

Machine Learning-Based xG forecasting in the Top 5 European Soccer Leagues: A Comparative Analysis

Davronbek Malikov ; Department of AI Convergence Engineering, Gyeongsang National University, 501, Jinju-daero, Jinju-si, 52828 Gyeongsangnam-do, Republic of Korea
Jaeho Kim ; Department of AI Convergence Engineering, Gyeongsang National University, 501, Jinju-daero, Jinju-si, 52828 Gyeongsangnam-do, Republic of Korea *

* Dopisni autor.


Puni tekst: engleski pdf 367 Kb

str. 286-296

preuzimanja: 236

citiraj


Sažetak

In the modern soccer world, also referred to as football, analytics has become integral IT serves a role akin to assistant coaches, contributing significantly to the analysis of team and individual player performance during both games and training sessions. Despite the abundance of advanced technologies, there is still a need for concrete measurements in soccer analytics. A prime example is Expected Goals (xG), a widely embraced metric that goes beyond mere scorelines to offer in-depth insights into player and team dynamics, and has proven its value in recent years. The identification of key features from a dataset to predict xG is a critical aspect of machine learning research applied to soccer analytics. However, prior investigations have either overlooked certain crucial features or failed to recognize their significance. This study proposes a novel approach by incorporating features related to coaches, coaching tenure, and tactics, which have the potential to improve predictive accuracy and provide actionable insights for teams and analysts. Using a dataset of 2,917 observations covering the top five European leagues, we applied regression-based machine learning models, employing a preprocessing pipeline and k-fold cross-validation to ensure robust evaluation. The findings reveal that the xG values for the English Premier League (EPL) surpass those of the other four leagues studied, with an average xG of approximately 1.93. This indicates that, on average, teams in the EPL tend to have a higher expected goal count per match than teams in other top European leagues. This superiority can be attributed to various factors, such as the higher average quality of teams and players, tactical nuances, and the overall competitiveness inherent within the EPL compared with the other leagues under scrutiny. In our study, we introduced new features such as coaches, coaching durations (years), and tactics. Incorporating these features enhanced the model performance across MSE, EVS, and R², thereby demonstrating the efficacy of our approach. Lasso and Ridge Regression models achieved improved predictive accuracy, with EVC and R² reaching up to 96%, while the Decision Tree model showed a nearly 6% reduction in the MSE.

Ključne riječi

Expected Goals (xG); football; regression machine learning; Soccer; Top 5 league xG

Hrčak ID:

346381

URI

https://hrcak.srce.hr/346381

Datum izdavanja:

15.6.2026.

Posjeta: 338 *