Diabetes Risk Prediction from Survey Data Using Machine Learning Algorithms

Notice

This is an unedited manuscript accepted for publication and provided as an Article in Press for early access at the author’s request. The article will undergo copyediting, typesetting, and galley proof review before final publication. Please be aware that errors may be identified during production that could affect the content. All legal disclaimers of the journal apply.

Year : 2026 | Volume : 04 | 02 | Page :
    By

    Aayush Shukla,

  • Alok Singh,

  • Padma Mishra,

  • Melissa Fernandes,

  1. Research Scholar, MCA, Thakur Institute of Management Studies, Career Development & Research (TIMSCDR) Mumbai, Maharashtra, India
  2. Research Scholar, MCA, Thakur Institute of Management Studies, Career Development & Research (TIMSCDR) Mumbai, Maharashtra, India
  3. Associate Professor, Thakur Institute of Management Studies & Career Development & Research MCA Department, Mumbai, Maharashtra, India
  4. Associate Professor, Thakur Institute of Management Studies & Career Development & Research MCA Department, Mumbai, Maharashtra, India

Abstract

Diabetes mellitus represents one of the most significant global health challenges, affecting millions worldwide and leading to severe complications if left undiagnosed or poorly managed. Early detection and risk assessment are crucial for preventing the progression of this chronic condition. This research presents a comprehensive machine learning approach for predicting diabetes risk using survey-based health parameters. The study implements and compares four prominent classification algorithms: Logistic Regression, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Random Forest to analyze patient health indicators including glucose levels, blood pressure, body mass index, insulin levels, skin thickness, age, pregnancies, and diabetes pedigree function. The dataset comprises health survey responses from individuals with varying diabetes risk profiles. Through systematic feature analysis and model optimization, the research demonstrates that ensemble methods, particularly Random Forest, achieve superior predictive accuracy of 94.2%, followed by SVM at 91.8%, Logistic Regression at 89.3%, and KNN at 87.6%. The implemented web-based prediction system provides real- time risk assessment, enabling healthcare professionals and individuals to make informed decisions about diabetes prevention and management. The results indicate that machine learning-driven survey analysis can serve as an effective screening tool for diabetes risk prediction, potentially reducing healthcare costs and improving patient outcomes through early intervention strategies.

Keywords: Diabetes prediction, machine learning, survey data analysis, logistic regression, k-nearest neighbors, support vector machine, random forest, healthcare analytics, risk assessment, preventive medicine.

How to cite this article:
Aayush Shukla, Alok Singh, Padma Mishra, Melissa Fernandes. Diabetes Risk Prediction from Survey Data Using Machine Learning Algorithms. International Journal of Bioinformatics and Computational Biology. 2026; 04(02):-.
How to cite this URL:
Aayush Shukla, Alok Singh, Padma Mishra, Melissa Fernandes. Diabetes Risk Prediction from Survey Data Using Machine Learning Algorithms. International Journal of Bioinformatics and Computational Biology. 2026; 04(02):-. Available from: https://journals.stmjournals.com/ijbcb/article=2026/view=246085


References

  1. Kandhasamy JP, Balamurali S. Performance analysis of classifier models to predict diabetes mellitus. Procedia Comput Sci. 2015;47:45–51.
  2.  Perveen S, Shahbaz M, Guergachi A, Keshavjee K. Performance analysis of data mining classification techniques to predict diabetes. Procedia Comput Sci. 2016;82:115–121.
  3. Kumari VA, Chitra R. Classification of diabetes disease using support vector machine. Int J Eng Res Appl. 2013 Mar;3(2):1797–801.
  4. Sneha N, Gangil T. Analysis of diabetes mellitus for early prediction using optimal features selection. J Big Data. 2019 Dec;6(1):13.
  5.  Sisodia D, Sisodia DS. Prediction of diabetes using classification algorithms. Procedia Comput Sci. 2018 Jan 1;132:1578–85.
  6.  Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting diabetes mellitus with machine learning techniques. Front Genet. 2018 Nov 6;9:515.
  7. Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM. Classification and prediction of diabetes disease using machine learning paradigm. Health Inf Sci Syst. 2020;8(1):1–14.
  8.  Dinh A, Miertschin S, Young A, Mohanty SD. A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med Inform Decis Mak. 2019 Nov 6;19(1):211.
  9.  Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G. Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci Rep. 2020 Jul 20;10(1):11981.
  10. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I. Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J. 2017 Jan 1;15:104– 16.
  11.  Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017 Jan 1;97:120–7.
  12.  Kaur H, Kumari V. Predictive modelling and analytics for diabetes using a machine learning approach. Appl Comput Inform. 2022 Mar 1;18(1–2):90–100.
  13. Nai R, Tran D, Kwon M. A comparative study of machine learning algorithms for diabetes prediction. Int J Adv Comput Sci Appl. 2021;12(8):23–29.

Ahead of Print Subscription Review Article
Volume 04
02
Received 30/03/2026
Accepted 10/04/2026
Published 20/04/2026
Publication Time 21 Days


Login


My IP

PlumX Metrics