Comparative Study of BERT Variants for Sentiment Analysis with Error Analysis

Notice

This is an unedited manuscript accepted for publication and provided as an Article in Press for early access at the author’s request. The article will undergo copyediting, typesetting, and galley proof review before final publication. Please be aware that errors may be identified during production that could affect the content. All legal disclaimers of the journal apply.

Year : 2026 | Volume : 04 | 01 | Page :
    By

    Deepshikha Prajapati,

  • Shruti Prajapati,

  • Thara Chakkingal,

  1. Research Scholar, Department of MCA, Thakur Institute of Management Studies, Career Development & Research (TIMSCDR) Mumbai, Maharashtra, India
  2. Research Scholar, Department of MCA, Thakur Institute of Management Studies, Career Development & Research (TIMSCDR) Mumbai, Maharashtra, India
  3. Assistant Professor, Department of MCA, Thakur Institute of Management Studies, Career Development & Research (TIMSCDR) Mumbai, Maharashtra, India

Abstract

The use of media is going up really fast in India and this has led to the rise of Hinglish. Hinglish is an informal blend of Hindi and English that people commonly use in everyday conversations, especially across social media platforms such as Twitter, Facebook, and WhatsApp. People use Hinglish to talk to each other in a way that’s not very formal. Hinglish blends English vocabulary with informal usage, often ignoring standard grammatical rules, which makes it challenging for computers to accurately interpret and process it. It is especially hard for computers to figure out how people are feeling when they use Hinglish on media. Hinglish poses a significant challenge for Natural Language Processing, particularly when it comes to accurately interpreting people’s emotions and opinions. The problem with Hinglish is that it often has words from languages mixed together and the spelling and sentence structure can be weird. This makes it hard for regular language models to understand. Even though models like BERT are really good at understanding text in languages they need a lot of computer power to work. This means they are not good, for situations where we need to get answers and we do not have a lot of computer power. So we looked at how three smaller models work: DistilBERT, MuRIL and XLM-RoBERTa. We used the Kaggle Hinglish Sentiment Dataset to test these models. When we look at how these models work and where they make mistakes the research helps us understand how they can handle the complexities of Hinglish language. This is important because Hinglish is a mix of Hindi and English. The research shows that these models can work with Hinglish while still being good at classifying things and not using much computer power.

The study is helpful because it adds to what we know about working with languages that do not have a lot of resources. It also helps us make systems that can figure out how people feel about things. We can use these systems in real life. The research on Hinglish models is useful, for sentiment analysis systems.

Keywords: Hinglish, code-mixed text, sentiment analysis, NLP, transformer models

How to cite this article:
Deepshikha Prajapati, Shruti Prajapati, Thara Chakkingal. Comparative Study of BERT Variants for Sentiment Analysis with Error Analysis. International Journal of Computer Science Languages. 2026; 04(01):-.
How to cite this URL:
Deepshikha Prajapati, Shruti Prajapati, Thara Chakkingal. Comparative Study of BERT Variants for Sentiment Analysis with Error Analysis. International Journal of Computer Science Languages. 2026; 04(01):-. Available from: https://journals.stmjournals.com/ijcsl/article=2026/view=247731


References

  1. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); 2019 Jun. p. 4171-86.
  2. Çetinoğlu Ö, Schulz S, Vu NT. Challenges of computational processing of code-switching. In: Proceedings of the Second Workshop on Computational Approaches to Code Switching; 2016 Nov. p. 1-11.
  3. Patil A, Patwardhan V, Phaltankar A, Takawane G, Joshi R. Comparative study of pre-trained BERT models for code-mixed Hindi-English data. In: 2023 IEEE 8th International Conference for Convergence in Technology (I2CT); 2023 Apr 7. p. 1-7.
  4. Hashmi E, Yayilgan SY, Shaikh S. Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers. Soc Netw Anal Min. 2024;14(1):86.
  5. Astuti LW, Sari Y. Code-mixed sentiment analysis using transformer for Twitter social media data. Int J Adv Comput Sci Appl. 2023;14(10).
  6. Sampath KK, Supriya M. Transformer based sentiment analysis on code mixed data. Procedia Comput Sci. 2024;233:682-91.
  7. Tewari P, Gumber M, Tyagi S, Seekhwal P. Hinglish text analysis: Challenges and opportunities in multilingual natural language processing. SSRN. 2025 Mar 25.
  8. Jadon AS, Parmar M, Agrawal R. Hinglish sentiment analysis: Deep learning models for nuanced sentiment classification in multilingual digital communication. In: 2024 2nd International Conference on Device Intelligence, Computing and Communication Technologies (DICCT); 2024 Mar 15. p. 318-23.
  9. Singh SK, Sharma A, Singh D, Pandit S, Saghir U. Sentiment analysis of English-Hindi code-mixed text using mBERT model. In: 2025 3rd International Conference on Inventive Computing and Informatics (ICICI); 2025 Jun 4. p. 552-6.
  10. KT MP, Shrinithi G, Nithish P, Pranesh AC. Comparative analysis of transformer models for sentiment classification in code-mixed Indic languages. Int J Eng Res Sustain Technol. 2025;3(1):1-9.
  11. Almalki SS. Sentiment analysis and emotion detection using transformer models in multilingual social media data. Int J Adv Comput Sci Appl. 2025;16(3).
  12. Aliyu Y, Sarlan A, Danyaro KU, Abd Rahman AS, Muazu AA, Abubakar MY. Deep learning techniques for sentiment analysis in code-switched Hausa-English tweets. Int J Inf Manage Data Insights. 2025;5(1):100330.
  13. Ramesh G, Doddapaneni S, Bheemaraj A, Jobanputra M, Ak R, Sharma A, Sahoo S, Diddee H, Kakwani D, Kumar N, et al. Samanantar: The largest publicly available parallel corpora collection for 11 Indic languages. Trans Assoc Comput Linguist. 2022;10:145-62.
  14. Nazir MK, Faisal CN, Habib MA, Ahmad H. Leveraging multilingual transformer for multiclass sentiment analysis in code-mixed data of low-resource languages. IEEE Access. 2025;13:7538-54.
  15. Mamta, Ekbal A. Transformer based multilingual joint learning framework for code-mixed and English sentiment analysis. J Intell Inf Syst. 2024;62(1):231-53.
  16. Veeramani H, Thapa S, Naseem U. MLInitiative@WILDRE7: Hybrid approaches with large language models for enhanced sentiment analysis in code-switched and code-mixed texts. In: Proceedings of the 7th Workshop on Indian Language Data: Resources and Evaluation; 2024 May. p. 66-72.
  17. Thakur V, Sahu R, Omer S. Current state of Hinglish text sentiment analysis. In: Proceedings of the International Conference on Innovative Computing & Communications (ICICC); 2020 May 30.
  18. Yuan LS, Ming LT. Sentiment prediction using multilingual bidirectional encoder representations and cross-lingual language model robustly optimized BERT approach from transformers on code-mixed text. In: 2025 IEEE International Conference on Computation, Big-Data and Engineering (ICCBE); 2025 Jun 27. p. 901-5.
  19. Kumar A, Susan S. Supervised sentiment analysis of movie reviews with SHAP-based interpretability analysis. In: International Conference on Data Analytics & Management; 2024 Jun 14. Singapore: Springer Nature; 2024. p. 381-8.

Ahead of Print Subscription Original Research
Volume 04
01
Received 10/03/2026
Accepted 11/04/2026
Published 10/05/2026
Publication Time 61 Days


Login


My IP

PlumX Metrics