Timestamp Extraction and Log Classification Using Supervised Machine Learning: A Comparative Study

Year : 2025 | Volume : 12 | Issue : 03 | Page : 26 38
    By

    Sayantika Laskar,

  • Harsh Shukla,

  1. Student, Department of Computer Science and Engineering, Vellore Institute of Technology, Bhopal University, Bhopal, Madhya Pradesh, India
  2. Student, Department of Computer Science and Engineering, Vellore Institute of Technology, Bhopal University, Bhopal, Madhya Pradesh, India

Abstract

In modern software systems, logs are vital for monitoring application behavior, diagnosing issues, and analyzing performance. Timestamps are especially important for sequencing events, identifying anomalies, and understanding system failures. However, detecting timestamps in logs is challenging due to inconsistent formatting across systems and the presence of timestamp-like strings in non-timestamp fields. Traditional rule-based methods often fail in such cases. This study proposes a supervised machine learning approach to accurately classify timestamp tokens in structured log data. The methodology involves preprocessing log entries through tokenization and normalization, extracting relevant features, and generating a balanced dataset with timestamp and non-timestamp samples. Several classifiers, including Random Forest, K-Nearest Neighbors, Logistic Regression, XGBoost, and Support Vector Machines (SVM), were trained and evaluated. Performance was measured using accuracy, precision, recall, F1-score, and precision-recall curves. SVM achieved the best results across metrics and formats. The findings confirm that machine learning provides a scalable and reliable solution for automated timestamp detection in log analysis.

Keywords: Timestamp detection, system logs, supervised learning, log analysis, machine learning, SVM, feature engineering, log parsing, anomaly detection, observability

[This article belongs to Journal of Software Engineering Tools & Technology Trends ]

How to cite this article:
Sayantika Laskar, Harsh Shukla. Timestamp Extraction and Log Classification Using Supervised Machine Learning: A Comparative Study. Journal of Software Engineering Tools & Technology Trends. 2025; 12(03):26-38.
How to cite this URL:
Sayantika Laskar, Harsh Shukla. Timestamp Extraction and Log Classification Using Supervised Machine Learning: A Comparative Study. Journal of Software Engineering Tools & Technology Trends. 2025; 12(03):26-38. Available from: https://journals.stmjournals.com/josettt/article=2025/view=227104


References

  1. Liu Z, Niu Z, Shu R, Cheng W, Yuan L, Nelson J, Ports DR, Cheng P, Xiong Y. HyperDrive: Direct Network Telemetry Storage via Programmable Switches. IEEE Trans Cloud Comput. 2025 Feb 18; 13(2): 498–511.
  2. Egersdoerfer C, Zhang D, Dai D. Clusterlog: Clustering logs for effective log-based anomaly detection. In 2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS). 2022 Nov 13; 1–10.
  3. Thepa T, Ateetanan P, Khubpatiwitthayakul P, Fugkeaw S. Design and development of scalable SIEM as a service using spark and anomaly detection. In 2024 IEEE 21st International Joint Conference on Computer Science and Software Engineering (JCSSE). 2024 Jun 19; 199–205.
  4. Jiang T, Gradus JL, Rosellini AJ. Supervised machine learning: a brief primer. Behav Ther. 2020 Sep 1; 51(5): 675–87.
  5. Tefera MA, Dehnaw AM, Manie YC, Yao CK, Bogale SD, Peng PC. Advanced Denoising and Meta-Learning Techniques for Enhancing Smart Health Monitoring Using Wearable Sensors. Future Internet. 2024 Aug 5; 16(8): 280.
  6. Korzeniowski Ł, Goczyła K. Landscape of automated log analysis: A systematic literature review and mapping study. IEEE Access. 2022 Feb 17; 10: 21892–913.
  7. Dusane P, Sujatha G. Logea: Log extraction and analysis tool to support forensic investigation of linux-based system. In 2021 IEEE 5th International Conference on Trends in Electronics and Informatics (ICOEI). 2021 Jun 3; 909–916.
  8. Benova L, Hudec L. Detecting anomalous user behavior from NGINX web server logs. In 2022 IEEE zooming innovation in consumer technologies conference (ZINC). 2022 May 25; 1–6.
  9. Xie Y, Yang K. Domain adaptive log anomaly prediction for hadoop system. IEEE Internet Things J. 2022 Jun 6; 9(20): 20778–87.
  10. Zhang D, Egersdoerfer C, Mahmud T, Zheng M, Dai D. Drill: Log-based anomaly detection for large-scale storage systems using source code analysis. In 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2023 May 15; 189–199.
  11. Oktadika A, Lim C, Erlangga K. Hunting cyber threats in the enterprise using network defense log. In 2021 IEEE 9th International Conference on Information and Communication Technology (ICoICT). 2021 Aug 3; 528–533.
  12. Nema R, Patel N, Chourasia S. Neural Network Solutions for Advanced Persistent Threat Analysis. In 2023 IEEE International Conference on ICT in Business Industry & Government (ICTBIG). 2023 Dec 8; 1–5.
  13. Lohar P, Baraskar T. Automated AI Tool for Log File Analysis. In 2025 IEEE 6th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI). 2025 Jan 7; 1762–1766.
  14. Tian D. Detecting user-perceived failure in mobile applications via mining user traces. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 2021 May 25; 123–125.
  15. Pitchappan PR, Subramanian S, Harishkumar R, Sadasivam GS, Thomas M. Event Prediction for Network Edge Devices using Log Analysis. In 2021 IEEE 5th International Conference on Computer, Communication and Signal Processing (ICCCSP). 2021 May 24; 1–7.
  16. Jeelani A, Vaishnawi C, Yadav RK. Leveraging Deep Learning Techniques for Profiling and Categorizing Lung and Pancreatic Tumors. In 2023 IEEE International Conference on Recent Advances in Science and Engineering Technology (ICRASET). 2023 Nov 23; 1–6.
  17. Shibu R, Krishnan S. Development of a Debugging tool for ISAM in Python. In 2023 IEEE 3rd Asian Conference on Innovation in Technology (ASIANCON). 2023 Aug 25; 1–12.
  18. Zhao J, Tang Y, Sunil S, Shang W. Studying and complementing the use of identifiers in logs. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 2023 Mar 21; 97–107

Regular Issue Subscription Original Research
Volume 12
Issue 03
Received 20/06/2025
Accepted 23/07/2025
Published 15/09/2025
Publication Time 87 Days


Login


My IP

PlumX Metrics