Jyoti Jayesh Chavhan,
- Assistant Professor, Department of Computer Science and Engineering, South Indian Educational Society Graduate School of Technology, Nerul, Navi Mumbai, Maharashtra, India
Abstract
Detecting and documenting instances of abusive behaviour can significantly improve the quality of virtual environments. Given the vast amount of content published daily on social media, it is impractical for human annotators to manually identify potentially harmful content. Recent algorithmic initiatives, especially on platforms like Twitter, have advanced in abuse detection. However, for Dravidian texts, there remains a need to understand the context better and build robust language models for classification. In our study, we employed the XLM-Roberta language model alongside various optimizers to train our model, achieving state-of-the-art results. The exceptional outcomes can be attributed to the meticulous integration of these optimizers and activation functions into pre-existing language models. Our proposed technique significantly enhances overall accuracy by 19% across multiple domains. This improvement stems from incorporating advanced optimization methods into the language models. Our model demonstrated remarkable accuracy across different Dravidian languages: 74.13% for Kannada, 96.25% for Malayalam, and 79.72% for Tamil. These results highlight the efficacy of our approach in enhancing the detection of abusive content in these languages. The combination of advanced language models and tailored optimizers/activation functions has led to substantial performance gains, setting a new benchmark in the field of abuse detection for Dravidian texts.
Keywords: Detection of hate speech, Dravidian code-mixed data, language models, deep learning, natural language processing
[This article belongs to Recent Trends in Programming languages ]
Jyoti Jayesh Chavhan. Enhancing Profanity Detection in Dravidian Languages: Leveraging Language Models for Optimization and Improvement. Recent Trends in Programming languages. 2024; 11(02):17-23.
Jyoti Jayesh Chavhan. Enhancing Profanity Detection in Dravidian Languages: Leveraging Language Models for Optimization and Improvement. Recent Trends in Programming languages. 2024; 11(02):17-23. Available from: https://journals.stmjournals.com/rtpl/article=2024/view=161657
References
- Alqarni M, Azim A. Low Level Source Code Vulnerability Detection Using Advanced BERT Language Model. In 35th Canadian AI Conf. 2022 May 27.
- Risch J, Ruff R, Krestel R. Explaining offensive language detection. Journal for Language Technology and Computational Linguistics. 2020 Jul 1; 34(1): 29–47.
- Husain F, Uzuner O. A survey of offensive language detection for the Arabic language. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). 2021 Mar 9; 20(1): 1–44.
- Djandji M, Baly F, Antoun W, Hajj H. Multi-task learning using AraBert for offensive language detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection. 2020 May; 97–101.
- Andrew JJ. JudithJeyafreedaAndrew@ DravidianLangTech-EACL2021: offensive language detection for Dravidian code-mixed YouTube comments. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. 2021 Apr; 169–174.
- Wiedemann G, Ruppert E, Jindal R, Biemann C. Transfer learning from lda to bilstm-cnn for offensive language detection in twitter. arXiv preprint arXiv:1811.02906. 2018 Nov 7.
- Roy PK, Bhawal S, Subalalitha CN. Hate speech and offensive language detection in Dravidian languages using deep ensemble framework. Comput Speech Lang. 2022 Sep 1; 75: 101386.
- Garain A, Mandal A, Naskar SK. JUNLP@ DravidianLangTech-EACL2021: Offensive language identification in Dravidian languages. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. 2021 Apr; 319–322.
- Bharathi B. SSNCSE_NLP@ DravidianLangTech-EACL2021: Offensive language identification on multilingual code mixing text. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. 2021 Apr; 313–318.
- Subramanian M, Ponnusamy R, Benhur S, Shanmugavadivel K, Ganesan A, Ravi D, Shanmugasundaram GK, Priyadharshini R, Chakravarthi BR. Offensive language detection in Tamil YouTube comments by adapters and cross-domain knowledge transfer. Comput Speech Lang. 2022 Nov 1; 76: 101404.
- Kim Y, Dyer C, Rush AM. Compound probabilistic context-free grammars for grammar induction. arXiv preprint arXiv:1906.10225. 2019 Jun 24.
- Clarke CL, Cormack GV, Lynam TR. Exploiting redundancy in question answering. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001 Sep 1; 358–365.
- Schwitter R, Mollá D, Fournier R, Hess M. Answer extraction towards better evaluations of NLP systems. In Proceedings of the 2000 ANLP/NAACL Workshop on Reading comprehension tests as evaluation for computer-based language understanding systems. 2000 May 4; 6: 20–27.
- Irie K, Tüske Z, Alkhouli T, Schlüter R, Ney H. LSTM, GRU, highway and a bit of attention: An empirical overview for language modeling in speech recognition. In Interspeech. 2016 Sep 8; 3519–3523.

Recent Trends in Programming languages
| Volume | 11 |
| Issue | 02 |
| Received | 27/06/2024 |
| Accepted | 29/07/2024 |
| Published | 07/08/2024 |
Login
PlumX Metrics