This is an unedited manuscript accepted for publication and provided as an Article in Press for early access at the author’s request. The article will undergo copyediting, typesetting, and galley proof review before final publication. Please be aware that errors may be identified during production that could affect the content. All legal disclaimers of the journal apply.

Sridevi Ravada,

Harika Kasukurthi,

Deekshitha Govindu,

Sowmya Vara,

Akshaya Enumukkala,
- Assistant Professor, Department of Information Technology, Gayatri Vidya Parishad College of Engineering for Women, Visakhapatnam, Andhra Pradesh, India
- Student, Department of Information Technology, Gayatri Vidya Parishad College of Engineering for Women, Visakhapatnam, Andhra Pradesh, India
- Student, Department of Information Technology, Gayatri Vidya Parishad College of Engineering for Women, Visakhapatnam, Andhra Pradesh, India
- Student, Department of Information Technology, Gayatri Vidya Parishad College of Engineering for Women, Visakhapatnam, Andhra Pradesh, India
- Student, Department of Information Technology, Gayatri Vidya Parishad College of Engineering for Women, Visakhapatnam, Andhra Pradesh, India
Abstract document.addEventListener(‘DOMContentLoaded’,function(){frmFrontForm.scrollToID(‘frm_container_abs_130335’);});Edit Abstract & Keyword
Currently, phishing attacks via SMS/Email and URL have become significant threat to cybersecurity, posing risks to both individuals and organizations alike. Phishing attacks typically involve the creation of fraudulent websites or the dissemination of deceptive emails s and SMS messages to trick users into disclosing sensitive information such as passwords, credit card numbers or personal details. To respond to these attacks, we develop a robust system for the detection of phishing URLs and SMS/emails using machine learning techniques. For phishing website URL detection, we extracted various features from HTML content using Beautiful Soup, and then applied supervised learning algorithms such as Decision Trees, Random Forest, and XGBoost, Multinomial Naïve Bayes to classify URLs as Legitimate or Phishing. We achieved promising results with the Random Forest, which demonstrated high accuracy in distinguishing between legitimate and phishing URLs. For email/SMS, TF-IDF Vectorization, Natural language preprocessing is used and then applied supervised learning algorithms such as Multinomial Naïve Bayes, support vector classifier (SVC), Random Forest, Decision Tree, AdaBoost and XGBoost. We achieved promising results with the Multinomial Naïve Bayes, which demonstrated high accuracy in distinguishing between spam and not spam.
Keywords: Phishing attacks, Multinomial Naïve Bayes, TF-IDF Vectorization, Natural language preprocessing, Random Forest, Beautiful Soup.
[This article belongs to Journal of Computer Technology & Applications (jocta)]
Sridevi Ravada, Harika Kasukurthi, Deekshitha Govindu, Sowmya Vara, Akshaya Enumukkala. Detection of Phishing Website URLs and EMAIL/SMS Using Random Forest and Multinomial Naive Bayes. Journal of Computer Technology & Applications. 2024; 16(01):-.
Sridevi Ravada, Harika Kasukurthi, Deekshitha Govindu, Sowmya Vara, Akshaya Enumukkala. Detection of Phishing Website URLs and EMAIL/SMS Using Random Forest and Multinomial Naive Bayes. Journal of Computer Technology & Applications. 2024; 16(01):-. Available from: https://journals.stmjournals.com/jocta/article=2024/view=0
References
document.addEventListener(‘DOMContentLoaded’,function(){frmFrontForm.scrollToID(‘frm_container_ref_130335’);});Edit
- Ahammad SH, Kale SD, Upadhye GD, Pande SD, Babu EV, Dhumane AV, Bahadur MD. Phishing URL detection using machine learning methods. Advances in Engineering Software. 2022 Nov 1;173:103288.
- Salloum S, Gaber T, Vadera S, Shaalan K. Phishing email detection using natural language processing techniques: a literature survey. Procedia Computer Science. 2021 Jan 1;189:19-28.
- Gualberto ES, De Sousa RT, Vieira TP, Da Costa JP, Duque CG. The answer is in the text: Multi-stage methods for phishing detection based on feature engineering. IEEE Access. 2020 Dec 9;8:223529-47.
- Mutalib NH, Sabri AQ, Wahab AW, Abdullah ER, AlDahoul N. Explainable deep learning approach for advanced persistent threats (APTs) detection in cybersecurity: a review. Artificial Intelligence Review. 2024 Nov;57(11):1-47.
- Liang M, Miao J, Wang X, Chang T, An B, Duan X, Xu L, Gao X, Zhang L, Li J, Gao H. Application of ensemble learning to genomic selection in chinese simmental beef cattle. Journal of Animal Breeding and Genetics. 2021 May;138(3):291-9.
- El Aassal A, Baki S, Das A, Verma RM. An in-depth benchmarking and evaluation of phishing detection research for security needs. Ieee Access. 2020 Jan 28;8:22170-92.
- Harun NZ, Jaffar N, Kassim PS. Physical attributes significant in preserving the social sustainability of the traditional malay settlement. InReframing the Vernacular: Politics, Semiotics, and Representation 2020 (pp. 225-238). Springer International Publishing.
- Divakaran DM, Oest A. Phishing detection leveraging machine learning and deep learning: A review. IEEE Security & Privacy. 2022 Jun 14;20(5):86-95.
- Akanchha A. Exploring a robust machine learning classifier for detecting phishing domains using SSL certificates. 2020. Available form https://dalspace.library.dal.ca/items/445ef57f-5c6b-4232-a05c-3f4073238a63
- Liu DJ, Geng GG, Jin XB, Wang W. An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment. Computers & Security. 2021 Nov 1;110:102421.
- Rao RS, Pais AR. Detection of phishing websites using an efficient feature-based machine learning framework. Neural Computing and applications. 2019 Aug;31:3851-73.
- Cao Y, Han W, Le Y. Anti-phishing based on automated individual white-list. In Proceedings of the 4th ACM workshop on Digital identity management 2008 Oct 31 (pp. 51-60).
- Agarwal S, Kaur S, Garhwal S. SMS spam detection for Indian messages. In2015 1st International Conference on Next Generation Computing Technologies (NGCT) 2015 Sep 4 (pp. 634-638). IEEE.
- Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H. Survey of review spam detection using machine learning techniques. Journal of Big Data. 2015 Dec;2:1-24.
- Radhakrishnan A, Vaidhehi V. Email Classification using Machine learning algorithms. International Journal of Engineering and technology (IJET). 2017 Apr;9(2):335-40.
- Hota HS, Shrivas AK, Hota R. An ensemble model for detecting phishing attack with proposed remove-replace feature selection technique. Procedia computer science. 2018 Jan 1;132:900-7.

Journal of Computer Technology & Applications
| Volume | 16 |
| Issue | 01 |
| Received | 25/10/2024 |
| Accepted | 19/12/2024 |
| Published | 31/12/2024 |
