This is an unedited manuscript accepted for publication and provided as an Article in Press for early access at the author’s request. The article will undergo copyediting, typesetting, and galley proof review before final publication. Please be aware that errors may be identified during production that could affect the content. All legal disclaimers of the journal apply.

Tufail Ahmad,
- Student, Department of Biotechnology, Amity University Gurgaon,, Haryana, India
Abstract
document.addEventListener(‘DOMContentLoaded’,function(){frmFrontForm.scrollToID(‘frm_container_abs_149318’);});Edit Abstract & Keyword
Differences in genetics are key to understanding why some individuals are more prone to certain diseases than others. Recent advancements in genomic research, combined with statistical modeling techniques, have made significant strides in predicting disease risk based on genetic factors. This review explores the application of statistical models for predicting genetic variability and their role in disease susceptibility. We discuss traditional methods like linear regression and genome-wide association studies (GWAS), as well as newer techniques such as polygenic risk scores (PRS) and machine learning algorithms (e.g., random forests, support vector machines, and neural networks). While these models have enhanced our ability to predict diseases like coronary artery disease, diabetes, and cancer, challenges remain, particularly in integrating environmental factors and genetic data across diverse populations. Polygenic risk scores, which aggregate the effects of numerous genetic variants, have shown promise but often exhibit limited accuracy when applied to different populations, highlighting the need for large, diverse datasets. Moreover, the integration of multi-omic data (including genomic, transcriptomic, proteomic, and epigenomic data) is increasingly seen as a way to improve prediction models by capturing complex gene-environment interactions. Despite the potential, issues such as model overfitting, data quality, and ethical concerns regarding genetic data privacy need to be addressed. The review emphasizes the future of statistical models in predicting disease risk and suggests that the combination of genetic data with lifestyle, environmental, and multi-omic factors could lead to more accurate and personalized predictions of disease susceptibility.
Keywords: Genetic Variability, Disease Susceptibility, Statistical Models, Polygenic Risk Scores (PRS), Genome-Wide Association Studies (GWAS),
[This article belongs to Research & Reviews : Journal of Computational Biology (rrjocb)]
Tufail Ahmad. Statistical Models for Predicting Genetic Variability and Disease Susceptibility. Research & Reviews : Journal of Computational Biology. 2025; 14(01):-.
Tufail Ahmad. Statistical Models for Predicting Genetic Variability and Disease Susceptibility. Research & Reviews : Journal of Computational Biology. 2025; 14(01):-. Available from: https://journals.stmjournals.com/rrjocb/article=2025/view=0
References
- Rowlands CF, Baralle D, Ellingford JM. Machine Learning Approaches for the Prioritization of Genomic Variants Impacting Pre-mRNA Splicing. Cells. 2019 Nov 26;8(12):1513. doi: 10.3390/cells8121513.
- Bush WS, Moore JH. Chapter 11: Genome-Wide Association Studies. PLoS Computational Biology .2012 Dec 27;8(12):e1002822–2. https://doi.org/10.1371/journal.pcbi.1002822
- Cai Z, Poulos RC, Liu J, Zhong Q. Machine learning for multi-omics data integration in cancer. Iscience. 2022 Feb 18;25(2).
- Kessler T, Schunkert H. Coronary artery disease genetics enlightened by genome-wide association studies. Basic to Translational Science. 2021 Jul 1;6(7):610-23.
- Cordell HJ. Detecting gene–gene interactions that underlie human diseases. Nature Reviews Genetics. 2009 Jun;10(6):392-404.
- Osborne JW, Overbay A. The power of outliers (and why researchers should always check for them). Practical Assessment, Research, and Evaluation. 2019;9(1):6.
- Ober C, Vercelli D. Gene-environment interactions in human disease: nuisance or opportunity? Trends Genet. 2011 Mar;27(3):107-15. doi: 10.1016/j.tig.2010.12.004.
- Charbonneau AR, Taylor E, Mitchell CJ, Robinson C, Cain AK, Leigh JA, Maskell DJ, Waller AS. Identification of genes required for the fitness of Streptococcus equi subsp. equi in whole equine blood and hydrogen peroxide. Microbial Genomics. 2020 Apr;6(4):e000362.
- Zhang GP. Neural networks for classification: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 2000 Nov;30(4):451-62.
- Stegmayer G, Di Persia LE, Rubiolo M, Gerard M, Pividori M, Yones C, Bugnon LA, Rodriguez T, Raad J, Milone DH. Predicting novel microRNA: a comprehensive comparison of machine learning approaches. Briefings in bioinformatics. 2019 Sep;20(5):1607-20.
- Lynch KE. The meaning of “cause” in genetics. Cold Spring Harbor Perspectives in Medicine. 2021 Sep 1;11(9):a040519.
- Martin SL, Parent JS, Laforest M, Page E, Kreiner JM, James T. Population genomic approaches for weed science. Plants. 2019 Sep 19;8(9):354.
- International HapMap Consortium. The International HapMap Project. Nature. 2003 Dec 18;426(6968):789-96. doi: 10.1038/nature02168.
- Pantelis C, Papadimitriou GN, Papiol S, Parkhomenko E, Pato MT, Paunio T, Pejovic-Milovancevic M, Perkins DO, Pietiläinen O, Pimm J. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014 Jul 24;511(7510):421-7.
- Subramanian I, Verma S, Kumar S, Jere A, Anamika K. Multi-omics data integration, interpretation, and its application. Bioinformatics and biology insights. 2020 Jan;14:1177932219899051.
- Flynn ED, Lappalainen T. Functional characterization of genetic variant effects on expression. Annual Review of Biomedical Data Science. 2022 Aug 10;5(1):119-39.
- Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017 Jul 6;101(1):5-22.
- Ikegawa S. A short history of the genome-wide association study: where we were and where we are going. Genomics & informatics. 2012 Dec;10(4):220.
