This is an unedited manuscript accepted for publication and provided as an Article in Press for early access at the author’s request. The article will undergo copyediting, typesetting, and galley proof review before final publication. Please be aware that errors may be identified during production that could affect the content. All legal disclaimers of the journal apply.
Satyam Shinde,
Mr. Nikhilesh Mankar,
Amrut Nikam,
Dr. Swati Shirke,
Dr. Rahul Sonkambale,
- Research scholar, CSE, Pimpri Chinchwad University, India
- Professor, CSE, Pimpri Chinchwad University, India
- Research scholar, CSE, Pimpri Chinchwad University, India
- professor, CSE-AIML, Pimpri Chinchwad University, India
- Professor, CSE-AIML, Pimpri Chinchwad University, India
Abstract
Cardiovascular disease is still one of the leading causes of death, and hence, the early prediction of risk is a very important task in preventive medicine. Although recent studies have shown encouraging results in the application of machine learning algorithms to the prediction of heart disease, it has been noticed that most of the algorithms are more concerned with accuracy-driven optimization than the concerns of safety and false negatives. In medical decision support systems, false negatives are more harmful.This paper presents a light-weight and interpretable machine learning approach for the early risk prediction of heart disease based on structured clinical data. Various models such as Logistic Regression, Random Forest, XGBoost, and stacking ensemble classifiers are compared based on clinically meaningful evaluation metrics such as accuracy, pre- cision, recall, F1-score, and ROC- AUC. The experimental results indicate that ensemble classifiers perform better than individual models, and the unoptimized StackingClassifier performs the best (Recall: 0.8807, F1-score: 0.8930, AUC: 0.9147). Cost-sensitive and threshold-optimized stacking further enhances the recall to 0.9266. To improve the transparency and clinical trust, SHAP and LIME are combined to offer global and local explanations. The findings point out ST depression, maximum heart rate reached, type of chest pain, cholesterol, and exercise-induced angina as the important risk factors. The proposed approach shows that simple and interpretable ensemble models can provide accurate heart disease risk predictions.
Keywords: Heart Disease Prediction, Explainable Artificial Intelligence, Ensemble Learning, Stacking Classifier, Cost-Sensitive Learning, Threshold Optimization, SHAP, LIME.
Satyam Shinde, Mr. Nikhilesh Mankar, Amrut Nikam, Dr. Swati Shirke, Dr. Rahul Sonkambale. A Lightweight Cost-Sensitive Explainable Ensemble Framework for Early Heart Disease Risk Prediction. International Journal of Bioinformatics and Computational Biology. 2026; 04(02):-.
Satyam Shinde, Mr. Nikhilesh Mankar, Amrut Nikam, Dr. Swati Shirke, Dr. Rahul Sonkambale. A Lightweight Cost-Sensitive Explainable Ensemble Framework for Early Heart Disease Risk Prediction. International Journal of Bioinformatics and Computational Biology. 2026; 04(02):-. Available from: https://journals.stmjournals.com/ijbcb/article=2026/view=246467
References
1. World Health Organization. World Health Organization cardiovascular diseases (CVDs) fact sheet. World Health Organ.. 2020;42(1):207-16.
2. Breiman L. Random forests. Machine learning. 2001 Oct;45(1):5-32. © STM Journals 2021. All Rights Reserved P A G E 1
3. Greedy Function Approximation: A Gradient Boosting Machine Request PDF. ResearchGate. Available from: https://www.researchgate.net/publication/2424824_Greedy_Function_Approximation_A_Gr adient_Boosting_MachineChen T, Guestrin C. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 2016 Aug 13 (pp. 785-794).
4. Wolpert DH. Stacked generalization. Neural Networks. 1992 Jan;5(2):241–59.
5. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017;30.
6. Ribeiro MT, Singh S, Guestrin C. ” Why should i trust you?” Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 2016 Aug 13 (pp. 1135-1144).
7. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J. Scikit-learn: Machine learning in Python. the Journal of machine Learning research. 2011 Nov 1;12:2825-30.
8. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. New England Journal of Medicine. 2019 Apr 4;380(14):1347-58.
9. Obermeyer Z, Emanuel EJ. Predicting the future—big data, machine learning, and clinical medicine. The New England journal of medicine. 2016 Sep 29;375(13):1216.
10. Ahmad MA, Eckert C, Teredesai A. Interpretable machine learning in healthcare. InProceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics 2018 Aug 15 (pp. 559-560).
11. Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM computing surveys (CSUR). 2018 Aug 22;51(5):1-42.
12. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nature medicine. 2019 Jan;25(1):44-56.
13. Khera R, Haimovich J, Hurley NC, McNamara R, Spertus JA, Desai N, Rumsfeld JS, Masoudi FA, Huang C, Normand SL, Mortazavi BJ. Use of machine learning models to predict death after acute myocardial infarction. JAMA cardiology. 2021 Jun;6(6):633-41.
14. Goldstein A, Kapelner A, Bleich J, Pitkin E. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. journal of Computational and Graphical Statistics. 2015 Jan 2;24(1):44-65.
15. Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence. 2019 May;1(5):206-15.
16. Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE journal of biomedical and health informatics. 2017 Oct 27;22(5):1589-604.
17. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Annals of internal medicine. 2015 Jan 6;162(1):W1-73.
18. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC medicine. 2019 Oct 29;17(1):195.
| Volume | 04 |
| 02 | |
| Received | 11/05/2026 |
| Accepted | 06/06/2026 |
| Published | 06/06/2026 |
| Publication Time | 26 Days |
Login
PlumX Metrics
