Tufail Ahmad,
- Student, Department of Biotechnology, Amity University Gurgaon,, Haryana, India
Abstract
Differences in genetics are key to understanding why some individuals are more prone to certain diseases than others. Recent advancements in genomic research, combined with statistical modeling techniques, have made significant strides in predicting disease risk based on genetic factors. This review explores the application of statistical models for predicting genetic variability and their role in disease susceptibility. We discuss traditional methods like linear regression and genome-wide association studies (GWAS), as well as newer techniques, such as polygenic risk scores (PRS) and machine learning algorithms (e.g., random forests, support vector machines, and neural networks). While these models have enhanced our ability to predict diseases like coronary artery disease, diabetes, and cancer, challenges remain, particularly in integrating environmental factors and genetic data across diverse populations. Polygenic risk scores, which aggregate the effects of numerous genetic variants, have shown promise but often exhibit limited accuracy when applied to different populations, highlighting the need for large, diverse datasets. Moreover, the integration of multi-omic data (including genomic, transcriptomic, proteomic, and epigenomic data) is increasingly seen as a way to improve prediction models by capturing complex gene-environment interactions. Despite the potential, issues, such as model overfitting, data quality, and ethical concerns regarding genetic data privacy need to be addressed. The review emphasizes the future of statistical models in predicting disease risk and suggests that the combination of genetic data with lifestyle, environmental, and multi-omic factors could lead to more accurate and personalized predictions of disease susceptibility
Keywords: Genetic variability, disease susceptibility, statistical models, polygenic risk scores (PRS), genome-wide association studies (GWAS)
[This article belongs to Research and Reviews : Journal of Computational Biology ]
Tufail Ahmad. Statistical Models for Predicting Genetic Variability and Disease Susceptibility. Research and Reviews : Journal of Computational Biology. 2025; 14(01):30-34.
Tufail Ahmad. Statistical Models for Predicting Genetic Variability and Disease Susceptibility. Research and Reviews : Journal of Computational Biology. 2025; 14(01):30-34. Available from: https://journals.stmjournals.com/rrjocb/article=2025/view=194702
References
1. Rowlands CF, Baralle D, Ellingford JM. Machine learning approaches for the prioritization of genomic variants impacting pre-mRNA splicing. Cells. 2019;8(12):1513.
2. Bush WS, Moore JH. Chapter 11: Genome-Wide Association Studies. PLoS Computational Biology. 2012;8(12):e1002822.
3. Cai Z, Poulos RC, Liu J, Zhong Q. Machine learning for multi-omics data integration in cancer. Iscience. 2022;25(2).
4. Kessler T, Schunkert H. Coronary artery disease genetics enlightened by genome-wide association studies. Basic Transl Sci. 2021;6(7):610–623.
5. Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009;10(6):392–404.
6. Osborne JW, Overbay A. The power of outliers (and why researchers should always check for them). Pract Assess Res Eval. 2019;9(1):6.
7. Ober C, Vercelli D. Gene-environment interactions in human disease: nuisance or opportunity? Trends Genet. 2011;27(3):107–115.
8. Charbonneau AR, Taylor E, Mitchell CJ, Robinson C, Cain AK, Leigh JA, Maskell DJ, Waller AS. Identification of genes required for the fitness of Streptococcus equi subsp. equi in whole equine blood and hydrogen peroxide. Microb Genom. 2020;6(4):e000362.
9. Zhang GP. Neural networks for classification: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 2000;30(4):451–462.
10. Stegmayer G, Di Persia LE, Rubiolo M, Gerard M, Pividori M, Yones C, et al. Predicting novel microRNA: a comprehensive comparison of machine learning approaches. Briefings Bioinform. 2019;20(5):1607–1620.
11. Lynch KE. The meaning of “cause” in genetics. Cold Spring Harbor Perspectives in Medicine. 2021;11(9):a040519.
12. Martin SL, Parent JS, Laforest M, Page E, Kreiner JM, James T. Population genomic approaches for weed science. Plants. 2019;8(9):354.
13. Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu FL, Yang HM, et al. The international HapMap project. 2003.
14. Pantelis C, Papadimitriou GN, Papiol S, Parkhomenko E, Pato MT, Paunio T, et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511(7510):421–427.
15. Subramanian I, Verma S, Kumar S, Jere A, Anamika K. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights. 2020;14:1177932219899051.
16. Flynn ED, Lappalainen T. Functional characterization of genetic variant effects on expression. Ann Rev Biomed Data Sci. 2022;5(1):119–139.
17. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Human Genet. 2017;101(1):5–22.
18. Ikegawa S. A short history of the genome-wide association study: where we were and where we are going. Genom Inform. 2012;10(4):220.