Anuj Razdan,
- Assistant Professor, Department of Computer Science & Engineering, Echelon Institute of Technology, Faridabad, Haryana, India
Abstract
In recent years, the increasing availability of soccer data has greatly enhanced the accuracy and depth of player performance evaluation. Soccer, being one of the most popular sports worldwide, attracts millions of fans due to its simple rules, minimal equipment requirements, and high entertainment value. However, analyzing an entire match manually can be time-consuming, leading to a growing demand for automated methods that can summarize and interpret game data efficiently. This study focuses on extracting, processing, and analyzing soccer data from a CSV file to evaluate player performance and compute expected goals (xG). The approach employs a Gradient Boosting Classifier to assess predictive performance, with the F1 score used as the primary evaluation metric. Additionally, the analysis includes player-specific performance metrics and team-based insights, offering a comprehensive understanding of individual contributions and match outcomes. The integration of machine learning and statistical modeling in this work demonstrates how data-driven techniques can provide valuable insights into soccer analytics, supporting coaches, analysts, and fans in making informed decisions.
Keywords: Soccer data, player, semantics analysis, machine learning, data preprocessing
[This article belongs to International Journal of Data Structure Studies ]
Anuj Razdan. Semantics Analysis of Expected Goals in Soccer Data Using Machine Learning. International Journal of Data Structure Studies. 2025; 03(02):31-47.
Anuj Razdan. Semantics Analysis of Expected Goals in Soccer Data Using Machine Learning. International Journal of Data Structure Studies. 2025; 03(02):31-47. Available from: https://journals.stmjournals.com/ijdss/article=2025/view=232729
References
- Wang Z. Semantic analysis based on fusion of audio/visual features for soccer video. Procedia Comput Sci. 2021 Jan 1; 183: 563–71.
- Zhou J, Shen X, Wang J, Zhang J, Sun W, Zhang J, Birchfield S, Guo D, Kong L, Wang M, Zhong Y. Audio-visual segmentation with semantics. Int J Comput Vis. 2025 Apr; 133(4): 1644–64.
- Oskouie P, Alipour S, Eftekhari-Moghadam AM. Multimodal feature extraction and fusion for semantic mining of soccer video: a survey. Artif Intell Rev. 2014 Aug; 42(2): 173–210.
- Qian X, Wang H, Liu G, Hou X. HMM based soccer video event detection using enhanced mid-level semantic. Multimed Tools Appl. 2012 Sep; 60(1): 233–55.
- Tjondronegoro DW, Chen YP. Knowledge-discounted event detection in sports video. IEEE Trans Syst Man Cybern-Part A: Syst Hum. 2010 May 20; 40(5): 1009–24.
- Xu C, Wang J, Lu H, Zhang Y. A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans Multimed. 2008 Mar 21; 10(3): 421–36.
- Yu J, Lei A, Hu Y. Soccer video event detection based on deep learning. In International Conference on Multimedia Modeling. Cham: Springer International Publishing; 2018 Dec 11; 377–389.
- Xu M, Xu C, Duan L, Jin JS, Luo S. Audio keywords generation for sports video analysis. ACM Trans Multimed Comput Commun Appl. 2008 May 16; 4(2): 1–23.
- Zhang YZ, Wang JY, Dai YW. Soccer video shot segmentation based on self-adapting dual threshold and dominant color percentage. J Nanjing Univ Sci Technol (Nat Sci). 2009; 33(4): 432–7.
- Yu JQ, Wang N. Shot classification for soccer video based on sub-window region. J Image Graph. 2008; 13(7): 1347–1352.
- Huang Q, Hu J, Hu W, Wang T, Bai H, Zhang Y. A reliable logo and replay detector for sports video. In 2007 IEEE International Conference on Multimedia and Expo. 2007 Jul 2; 1695–1698.
- Yu JQ, He HH, He YF. Highlights extraction for soccer video based on affection arousal. J Comput Res Dev. 2010; 47(10): 1823–31.
- Hanjalic A. Adaptive extraction of highlights from a sport video based on excitement modeling. IEEE Trans Multimed. 2005 Nov 21; 7(6): 1114–22.
- Gabriel Manfredi. (2020). Expected Goals & Player Analysis. [Online]. Kaggle. Available from: https://www.kaggle.com/code/gabrielmanfredi/expected-goals-player-analysis

International Journal of Data Structure Studies
| Volume | 03 |
| Issue | 02 |
| Received | 08/07/2025 |
| Accepted | 08/10/2025 |
| Published | 24/10/2025 |
| Publication Time | 108 Days |
Login
PlumX Metrics