Enhancing Profanity Detection in Dravidian Languages: Leveraging Language Models for Optimization and Improvement

Year : 2024 | Volume :11 | Issue : 02 | Page : –
By

Jyoti Jayesh Chavhan,

  1. Assistant Professor Department of Computer Science and Engineering, SIES Graduate School of Technology, Nerul, Navi Mumbai Maharashtra India

Abstract

Detecting and documenting instances of abusive behavior can significantly improve the quality of virtual environments. Given the vast amount of content published daily on social media, it is impractical for human annotators to manually identify potentially harmful content. Recent algorithmic initiatives, especially on platforms like Twitter, have advanced in abuse detection. However, for Dravidian texts, there remains a need to understand the context better and build robust language models for classification.
In our study, we employed the XLM-Roberta language model alongside various optimizers to train our model, achieving state-of-the-art results. The exceptional outcomes can be attributed to the meticulous integration of these optimizers and activation functions into pre-existing language models. Our proposed technique significantly enhances overall accuracy by 19% across multiple domains. This improvement stems from incorporating advanced optimization methods into the language models.
Our model demonstrated remarkable accuracy across different Dravidian languages: 74.13% for Kannada, 96.25% for Malayalam, and 79.72% for Tamil. These results highlight the efficacy of our approach in enhancing the detection of abusive content in these languages. The combination of advanced language models and tailored optimizers/activation functions has led to substantial performance gains, setting a new benchmark in the field of abuse detection for Dravidian texts.

Keywords: Detection of Hate Speech, Dravidian Code-Mixed Data, Language Models, Deep Learning, Natural Language Processing.

[This article belongs to Recent Trends in Programming languages(rtpl)]

How to cite this article: Jyoti Jayesh Chavhan. Enhancing Profanity Detection in Dravidian Languages: Leveraging Language Models for Optimization and Improvement. Recent Trends in Programming languages. 2024; 11(02):-.
How to cite this URL: Jyoti Jayesh Chavhan. Enhancing Profanity Detection in Dravidian Languages: Leveraging Language Models for Optimization and Improvement. Recent Trends in Programming languages. 2024; 11(02):-. Available from: https://journals.stmjournals.com/rtpl/article=2024/view=161657



References

[1] Alqarni M, Azim A. Low Level Source Code Vulnerability Detection Using Advanced BERT Language Model. InCanadian AI 2022 May 27.
[2] Risch J, Ruff R, Krestel R. Explaining offensive language detection. Journal for Language Technology and Computational Linguistics. 2020 Jul 1;34(1):29-47.
[3] Husain F, Uzuner O. A survey of offensive language detection for the Arabic language. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). 2021 Mar 9;20(1):1-44.
[4] Djandji M, Baly F, Antoun W, Hajj H. Multi-task learning using AraBert for offensive language detection. InProceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection 2020 May (pp. 97-101).
[5] Wiedemann G, Ruppert E, Jindal R, Biemann C. Transfer learning from lda to bilstm-cnn for offensive language detection in twitter. arXiv preprint arXiv:1811.02906. 2018 Nov 7.
[6] Andrew JJ. JudithJeyafreedaAndrew@ DravidianLangTech-EACL2021: offensive language detection for Dravidian code-mixed YouTube comments. InProceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages 2021 Apr (pp. 169-174).
[7] Roy PK, Bhawal S, Subalalitha CN. Hate speech and offensive language detection in Dravidian languages using deep ensemble framework. Computer Speech & Language. 2022 Sep 1;75:101386.
[8] Bharathi B. SSNCSE_NLP@ DravidianLangTech-EACL2021: Offensive language identification on multilingual code mixing text. InProceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages 2021 Apr (pp. 313-318).
[9] Garain A, Mandal A, Naskar SK. JUNLP@ DravidianLangTech-EACL2021: Offensive language identification in Dravidian langauges. InProceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages 2021 Apr (pp. 319-322).
[10] Subramanian M, Ponnusamy R, Benhur S, Shanmugavadivel K, Ganesan A, Ravi D, Shanmugasundaram GK, Priyadharshini R, Chakravarthi BR. Offensive language detection in Tamil YouTube comments by adapters and cross-domain knowledge transfer. Computer Speech & Language. 2022 Nov 1;76:101404.
[11] Kim Y, Dyer C, Rush AM. Compound probabilistic context-free grammars for grammar induction. arXiv preprint arXiv:1906.10225. 2019 Jun 24.
[12] Clarke CL, Cormack GV, Lynam TR. Exploiting redundancy in question answering. InProceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval 2001 Sep 1 (pp. 358-365).
[13] Schwitter R, Mollá D, Fournier R, Hess M. Answer extraction towards better evaluations of NLP systems. InProceedings of the 2000 ANLP/NAACL Workshop on Reading comprehension tests as evaluation for computer-based language understanding sytems-Volume 6 2000 May 4 (pp. 20-27).
[14] Irie K, Tüske Z, Alkhouli T, Schlüter R, Ney H. LSTM, GRU, highway and a bit of attention: An empirical overview for language modeling in speech recognition. InInterspeech 2016 Sep 8 (pp. 3519-3523).


Regular Issue Subscription Review Article
Volume 11
Issue 02
Received June 27, 2024
Accepted July 29, 2024
Published August 7, 2024

Check Our other Platform for Workshops in the field of AI, Biotechnology & Nanotechnology.
Check Out Platform for Webinars in the field of AI, Biotech. & Nanotech.