Natural Language Processing-Based Plagiarism Detector

Year : 2024 | Volume :02 | Issue : 01 | Page : 27-35
By

G. Harsha Vardhani,

Sri. G. Kalyan Chakravarthi,

Praveen Kumar j,

M. Nishita,

P. Satish,

  1. Student, Gayatri Vidya Parishad College for Degree and PG courses, Vishakhapatnam, Andhra Pradesh, India
  2. Assistant professor , Gayatri Vidya Parishad College for Degree and PG courses, Vishakhapatnam, Andhra Pradesh, India
  3. Assistant professor , Gayatri Vidya Parishad College for Degree and PG courses, Vishakhapatnam, Andhra Pradesh, India
  4. Assistant professor , Gayatri Vidya Parishad College for Degree and PG courses, Vishakhapatnam, Andhra Pradesh, India
  5. Assistant professor , Gayatri Vidya Parishad College for Degree and PG courses, Vishakhapatnam, Andhra Pradesh, India

Abstract

Plagiarism is the act of taking another’s work and claiming it as his/her own without giving the credit. Plagiarism is a grave concern in many fields like education, literature, business, etc. The project aims for text and image plagiarism detection employed by a web application containing four modules. Text plagiarism is detected when the input is either text or PDF. For PDF, the text extraction is done from the PDF and then detected for plagiarism. Image plagiarism detection is performed by taking an image as input. Image plagiarism detection has an extension that takes input as an image containing text. The text extraction is done from the input and compared for similarity with the documents of the dataset and the output generated. Text plagiarism percentage detection uses Jaccard Similarity and Image plagiarism percentage detection uses Cosine Similarity. Image plagiarism detection uses Levenshtein Distance if the input is image-containing text. Additionally, some features are added to this project like Citation status detection, AI content detection, and Type of Plagiarism detection.

Keywords: Plagiarism detection, web application, Jaccard similarity, cosine similarity, levenshtein distance

[This article belongs to International Journal of Electrical and Communication Engineering Technology(ijecet)]

How to cite this article: G. Harsha Vardhani, Sri. G. Kalyan Chakravarthi, Praveen Kumar j, M. Nishita, P. Satish. Natural Language Processing-Based Plagiarism Detector. International Journal of Electrical and Communication Engineering Technology. 2024; 02(01):27-35.
How to cite this URL: G. Harsha Vardhani, Sri. G. Kalyan Chakravarthi, Praveen Kumar j, M. Nishita, P. Satish. Natural Language Processing-Based Plagiarism Detector. International Journal of Electrical and Communication Engineering Technology. 2024; 02(01):27-35. Available from: https://journals.stmjournals.com/ijecet/article=2024/view=156560



References

  1. Eppa and A. H. Murali, “Machine Learning Techniques for Multisource Plagiarism Detection,” 2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bangalore, India, 2021, pp. 1-5, doi: 10.1109/CSITSS54238.2021.9683752.
  2. N. Kulkarni, C. Ganesh, D. K. B K, H. B and A. P. Reddy, “Novel Approach to Detect Plagiarism in the Document,” 2023 International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballar, India, 2023, pp. 1-6, doi: 10.1109/ICDCECE57866.2023.10150442.
  3. Meuschke, N., Gipp, B., & Breitinger, C. (2012). CitePlag : A Citation-based Plagiarism Detection System Prototype.
  4. DEEPA GUPTA, VANI K., LEEMA L. M., “PLAGIARISM DETECTION IN TEXT DOCUMENTS USING SENTENCE BOUNDED STOP WORD N-GRAMS” Journal of Engineering Science and Technology Vol. 11, No. 10 (2016) 1403 – 1420.
  5. Petr Hurtik, Petra Hodakova University of Ostrava, Centre of Excellence IT4Innovations, Institute for Research and Applications of Fuzzy Modeling, 30. dubna 22, 701 03 Ostrava 1, Czech Republic, “FTIP: a Tool for an Image Plagiarism Detection”
  6. Norman Meuschke, Christopher Gondek, Daniel Seebacher, Corinna Breitinger, Daniel Keim, Bela Gipp. An Adaptive Image-based Plagiarism Detection Approach. JCDL ’18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital LibrariesMay 2018Pages 131–140https://doi.org/10.1145/3197026.3197042.
  7. Ekbal, S. Saha and G. Choudhary, “Plagiarism detection in text using Vector Space Model,” 2012 12th International Conference on Hybrid Intelligent Systems (HIS), Pune, India, 2012, pp. 366-371, doi: 10.1109/HIS.2012.6421362.
  8. Ovhal, “Detecting plagiarism in images,” 2015 International Conference on Information Processing (ICIP), Pune, India, 2015, pp. 85-89, doi: 10.1109/INFOP.2015.7489356.
  9. Dutta and D. Bhattacharjee, “Plagiarism Detection by Identifying the Keywords,” 2014 International Conference on Computational Intelligence and Communication Networks, Bhopal, India, 2014, pp. 703-707, doi: 10.1109/CICN.2014.154.
  10. Hariharan S. Automatic plagiarism detection using similarity analysis. Int. Arab J. Inf. Technol.. 2012 Jul 1;9(4):322-6.
  11. Kadir Yalcin, Ilyas Cicekli, Gonenc Ercan. An external plagiarism detection system based on part-of-speech (POS) tag n-grams and word embedding. Expert Systems with Applications. Volume 197, 1 July 2022, 116677
  12. Marwah Najm Mansoor, Mohammed S. H. Al-Tamimi.Computer-based plagiarism detection techniques: A comparative study. Int. J. Nonlinear Anal. Appl. 13 (2022) 1, 3599-3611 http://dx.doi.org/10.22075/ijnaa.2022.6140
  13. K. Pal, O. J. Raffik, R. Roy, V. B. Lalman, S. Srivastava and B. Sharma, “Automatic Plagiarism Detection Using Natural Language Processing,” 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 2023, pp. 218-222.
  14. MAC Jiffriya, MAC Akmal Jahan, RG Ragel. “Plagiarism detection tools and techniques: A comprehensive survey”. Journal of Science-FAS-SEUSL (2021) 02(02) 47-64.
  15. “Development of an Algorithm for Plagiarism Detection” https://doi.org/10.58694/20.500.12479/1624
  16. Chang, CY., Lee, SJ., Wu, CH. et al.Using word semantic concepts for plagiarism detection in text documents. Inf Retrieval J 24, 298–321 (2021). https://doi.org/10.1007/s10791-021-09394-4

Regular Issue Subscription Review Article
Volume 02
Issue 01
Received May 21, 2024
Accepted June 11, 2024
Published July 17, 2024