Timestamp Extraction and Log Classification Using Supervised Machine Learning: A Comparative Study

[{“box”:0,”content”:”n[if 992 equals=”Open Access”]n

n

n

n

Open Access

nn

n

n[/if 992]n[if 2704 equals=”Yes”]n

n

Notice

nThis is an unedited manuscript accepted for publication and provided as an Article in Press for early access at the author’s request. The article will undergo copyediting, typesetting, and galley proof review before final publication. Please be aware that errors may be identified during production that could affect the content. All legal disclaimers of the journal apply.n

n[/if 2704]n

n

Year : 2025 [if 2224 equals=””]15/09/2025 at 12:21 PM[/if 2224] | [if 1553 equals=””] Volume : 12 [else] Volume : 12[/if 1553] | [if 424 equals=”Regular Issue”]Issue : [/if 424][if 424 equals=”Special Issue”]Special Issue[/if 424] [if 424 equals=”Conference”][/if 424] 03 | Page : 26 38

n

n

nn

n

n

n

    By

    n

    [foreach 286]n

    n

    Sayantika Laskar, Harsh Shukla,

    n t

  • n

    n[/foreach]

    n

n[if 2099 not_equal=”Yes”]n

    [foreach 286] [if 1175 not_equal=””]n t

  1. Student, Student, Department of Computer Science and Engineering, Vellore Institute of Technology, Bhopal University, Bhopal, Department of Computer Science and Engineering, Vellore Institute of Technology, Bhopal University, Bhopal, Madhya Pradesh, Madhya Pradesh, India, India
  2. n[/if 1175][/foreach]

n[/if 2099][if 2099 equals=”Yes”][/if 2099]n

n

Abstract

n

n

nIn modern software systems, logs are vital for monitoring application behavior, diagnosing issues, and analyzing performance. Timestamps are especially important for sequencing events, identifying anomalies, and understanding system failures. However, detecting timestamps in logs is challenging due to inconsistent formatting across systems and the presence of timestamp-like strings in non-timestamp fields. Traditional rule-based methods often fail in such cases. This study proposes a supervised machine learning approach to accurately classify timestamp tokens in structured log data. The methodology involves preprocessing log entries through tokenization and normalization, extracting relevant features, and generating a balanced dataset with timestamp and non-timestamp samples. Several classifiers, including Random Forest, K-Nearest Neighbors, Logistic Regression, XGBoost, and Support Vector Machines (SVM), were trained and evaluated. Performance was measured using accuracy, precision, recall, F1-score, and precision-recall curves. SVM achieved the best results across metrics and formats. The findings confirm that machine learning provides a scalable and reliable solution for automated timestamp detection in log analysis.nn

n

n

n

Keywords: Timestamp detection, system logs, supervised learning, log analysis, machine learning, SVM, feature engineering, log parsing, anomaly detection, observability

n[if 424 equals=”Regular Issue”][This article belongs to Journal of Software Engineering Tools & Technology Trends ]

n

[/if 424][if 424 equals=”Special Issue”][This article belongs to Special Issue under section in Journal of Software Engineering Tools & Technology Trends (josettt)][/if 424][if 424 equals=”Conference”]This article belongs to Conference [/if 424]

n

n

n

How to cite this article:
nSayantika Laskar, Harsh Shukla. [if 2584 equals=”][226 wpautop=0 striphtml=1][else]Timestamp Extraction and Log Classification Using Supervised Machine Learning: A Comparative Study[/if 2584]. Journal of Software Engineering Tools & Technology Trends. 15/09/2025; 12(03):26-38.

n

How to cite this URL:
nSayantika Laskar, Harsh Shukla. [if 2584 equals=”][226 striphtml=1][else]Timestamp Extraction and Log Classification Using Supervised Machine Learning: A Comparative Study[/if 2584]. Journal of Software Engineering Tools & Technology Trends. 15/09/2025; 12(03):26-38. Available from: https://journals.stmjournals.com/josettt/article=15/09/2025/view=0

nn

n

n[if 992 equals=”Open Access”]Full Text PDF[/if 992]n

n

n[if 992 not_equal=”Open Access”]n

n

n[/if 992]n

nn

nnn

n[if 379 not_equal=””]nn

Browse Figures

n

n

n[foreach 379]

figures

[/foreach]n

n

n

n[/if 379]

n

n

n

n

n

References n

n[if 1104 equals=””]n

  1. Liu Z, Niu Z, Shu R, Cheng W, Yuan L, Nelson J, Ports DR, Cheng P, Xiong Y. HyperDrive: Direct Network Telemetry Storage via Programmable Switches. IEEE Trans Cloud Comput. 2025 Feb 18; 13(2): 498–511.
  2. Egersdoerfer C, Zhang D, Dai D. Clusterlog: Clustering logs for effective log-based anomaly detection. In 2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS). 2022 Nov 13; 1–10.
  3. Thepa T, Ateetanan P, Khubpatiwitthayakul P, Fugkeaw S. Design and development of scalable SIEM as a service using spark and anomaly detection. In 2024 IEEE 21st International Joint Conference on Computer Science and Software Engineering (JCSSE). 2024 Jun 19; 199–205.
  4. Jiang T, Gradus JL, Rosellini AJ. Supervised machine learning: a brief primer. Behav Ther. 2020 Sep 1; 51(5): 675–87.
  5. Tefera MA, Dehnaw AM, Manie YC, Yao CK, Bogale SD, Peng PC. Advanced Denoising and Meta-Learning Techniques for Enhancing Smart Health Monitoring Using Wearable Sensors. Future Internet. 2024 Aug 5; 16(8): 280.
  6. Korzeniowski Ł, Goczyła K. Landscape of automated log analysis: A systematic literature review and mapping study. IEEE Access. 2022 Feb 17; 10: 21892–913.
  7. Dusane P, Sujatha G. Logea: Log extraction and analysis tool to support forensic investigation of linux-based system. In 2021 IEEE 5th International Conference on Trends in Electronics and Informatics (ICOEI). 2021 Jun 3; 909–916.
  8. Benova L, Hudec L. Detecting anomalous user behavior from NGINX web server logs. In 2022 IEEE zooming innovation in consumer technologies conference (ZINC). 2022 May 25; 1–6.
  9. Xie Y, Yang K. Domain adaptive log anomaly prediction for hadoop system. IEEE Internet Things J. 2022 Jun 6; 9(20): 20778–87.
  10. Zhang D, Egersdoerfer C, Mahmud T, Zheng M, Dai D. Drill: Log-based anomaly detection for large-scale storage systems using source code analysis. In 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2023 May 15; 189–199.
  11. Oktadika A, Lim C, Erlangga K. Hunting cyber threats in the enterprise using network defense log. In 2021 IEEE 9th International Conference on Information and Communication Technology (ICoICT). 2021 Aug 3; 528–533.
  12. Nema R, Patel N, Chourasia S. Neural Network Solutions for Advanced Persistent Threat Analysis. In 2023 IEEE International Conference on ICT in Business Industry & Government (ICTBIG). 2023 Dec 8; 1–5.
  13. Lohar P, Baraskar T. Automated AI Tool for Log File Analysis. In 2025 IEEE 6th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI). 2025 Jan 7; 1762–1766.
  14. Tian D. Detecting user-perceived failure in mobile applications via mining user traces. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 2021 May 25; 123–125.
  15. Pitchappan PR, Subramanian S, Harishkumar R, Sadasivam GS, Thomas M. Event Prediction for Network Edge Devices using Log Analysis. In 2021 IEEE 5th International Conference on Computer, Communication and Signal Processing (ICCCSP). 2021 May 24; 1–7.
  16. Jeelani A, Vaishnawi C, Yadav RK. Leveraging Deep Learning Techniques for Profiling and Categorizing Lung and Pancreatic Tumors. In 2023 IEEE International Conference on Recent Advances in Science and Engineering Technology (ICRASET). 2023 Nov 23; 1–6.
  17. Shibu R, Krishnan S. Development of a Debugging tool for ISAM in Python. In 2023 IEEE 3rd Asian Conference on Innovation in Technology (ASIANCON). 2023 Aug 25; 1–12.
  18. Zhao J, Tang Y, Sunil S, Shang W. Studying and complementing the use of identifiers in logs. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 2023 Mar 21; 97–107

nn[/if 1104][if 1104 not_equal=””]n

    [foreach 1102]n t

  1. [if 1106 equals=””], [/if 1106][if 1106 not_equal=””],[/if 1106]
  2. n[/foreach]

n[/if 1104]

n


nn[if 1114 equals=”Yes”]n

n[/if 1114]

n

n

[if 424 not_equal=””]Regular Issue[else]Published[/if 424] Subscription Original Research

n

n

[if 2146 equals=”Yes”][/if 2146][if 2146 not_equal=”Yes”][/if 2146]n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n[if 1748 not_equal=””]

[else]

[/if 1748]n

n[if 1746 equals=”Retracted”]n

n

n

n

[/if 1746]n[if 4734 not_equal=””]

n

n

n

[/if 4734]n

n

Volume 12
[if 424 equals=”Regular Issue”]Issue[/if 424][if 424 equals=”Special Issue”]Special Issue[/if 424] [if 424 equals=”Conference”][/if 424] 03
Received 20/06/2025
Accepted 23/07/2025
Published 15/09/2025
Retracted
Publication Time 87 Days

n

n

nn


n

Login

n
My IP
n

PlumX Metrics

nn

n

n

n[if 1746 equals=”Retracted”]n

[/if 1746]nnn

nnn”}]