Data Integration and Visualization in Bioinformatics: Techniques and Challenges

Notice

This is an unedited manuscript accepted for publication and provided as an Article in Press for early access at the author’s request. The article will undergo copyediting, typesetting, and galley proof review before final publication. Please be aware that errors may be identified during production that could affect the content. All legal disclaimers of the journal apply.

Year : 2024 | Volume :13 | Issue : 03 | Page : –
By
vector

Mansi Srivastava,

  1. Student, Department of Biotechnology, Amity University Gurgaon, Haryana, India

Abstract document.addEventListener(‘DOMContentLoaded’,function(){frmFrontForm.scrollToID(‘frm_container_abs_128387’);});Edit Abstract & Keyword

Data integration and visualization play essential roles in bioinformatics, facilitating the thorough analysis and interpretation of intricate biological datasets. In the field of bioinformatics, vast amounts of data are generated from various experimental platforms, such as genomic sequencing, proteomics, transcriptomics, and metabolomics. However, the heterogeneity of these datasets, coupled with their large scale and complexity, presents significant challenges in terms of integration, analysis, and visualization. Data integration techniques aim to combine disparate datasets from multiple sources, enabling the creation of a unified view of biological systems. These methods include approaches such as concatenation, matrix factorization, and multivariate statistics, along with more advanced techniques leveraging machine learning and deep learning algorithms. Integration is essential for revealing hidden relationships between various types of biological data and for generating holistic insights into the underlying biology. Visualization tools are crucial for transforming complex biological data into easily understandable formats. They transform raw data into intuitive visual representations, such as heatmaps, scatter plots, and network diagrams, which facilitate the identification of patterns, trends, and outliers. Advanced techniques like principal component analysis (PCA), t-SNE, and network-based visualizations are increasingly being used to represent multidimensional data.Despite the advancements, challenges persist in integrating and visualizing bioinformatics data. Issues such as data heterogeneity, scalability, noise, and the curse of dimensionality complicate these tasks. Additionally, integrating multi-omics data and visualizing high-dimensional datasets continue to present major challenges. This article discusses the techniques used in data integration and visualization, highlights the challenges faced, and outlines the future directions in bioinformatics for overcoming these hurdles.  

Keywords: Data integration, bioinformatics data, biological data, protein structures, genomic sequences

[This article belongs to Research & Reviews : Journal of Computational Biology (rrjocb)]

How to cite this article:
Mansi Srivastava. Data Integration and Visualization in Bioinformatics: Techniques and Challenges. Research & Reviews : Journal of Computational Biology. 2024; 13(03):-.
How to cite this URL:
Mansi Srivastava. Data Integration and Visualization in Bioinformatics: Techniques and Challenges. Research & Reviews : Journal of Computational Biology. 2024; 13(03):-. Available from: https://journals.stmjournals.com/rrjocb/article=2024/view=0

Full Text PDF

References
document.addEventListener(‘DOMContentLoaded’,function(){frmFrontForm.scrollToID(‘frm_container_ref_128387’);});Edit

1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA. Gene ontology: tool for the unification of biology. Nature genetics. 2000 May;25(1):25-9. 2. Koh GC, Porras P, Aranda B, Hermjakob H, Orchard SE. Analyzing protein-protein interaction networks. J Proteome Res. 2012 Apr 6;11(4):2014-31. doi: 10.1021/pr201211w. 3. Baldi P, Brunak S. Bioinformatics: the machine learning approach. MIT press; 2001 Jul 20. 4. UniProt C. UniProt: a worldwide hub of protein knowledge Nucleic Acids Res 47. D506-D515. 2019;10. 5. Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004 Jun 12;20(9):1464-5. 6. Bengio Y. Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning 2009 ,2(1):1–127. 7. Stein-O’Brien GL, Ainslie MC, Fertig EJ. Forecasting cellular states: from descriptive to predictive biology via single-cell multiomics. Current opinion in systems biology. 2021 Jun 1;26:24-32. 8. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome biology. 2016 Dec;17:1-9. 9. Allen GI. Statistical data integration: Challenges and opportunities. Statistical Modelling. 2017 Aug;17(4-5):332-7. 10. Díez P, Droste C, Dégano RM, González-Muñoz M, Ibarrola N, Pérez-Andrés M, Garin-Muga A, Segura V, Marko-Varga G, LaBaer J, Orfao A, Corrales FJ, De Las Rivas J, Fuentes M. Integration of Proteomics and Transcriptomics Data Sets for the Analysis of a Lymphoma B-Cell Line in the Context of the Chromosome-Centric Human Proteome Project. J Proteome Res. 2015 Sep 4;14(9):3530-40. 11. Fayyad U, Stolorz P. Data mining and KDD: Promise and challenges. Future generation computer systems. 1997 Nov 1;13(2-3):99-115. 12. Mehboob-ur-Rahman TS, Mahmood-ur-Rahman MA, Zafar Y. Bioinformatics: a way forward to explore “plant omics”. Bioinformatics-Updated Features and Applications. Croatia: Intech. 2016 Jul 27:203. 13. Ren G, Zhang X, Li Y, Ridout K, Serrano-Serrano ML, Yang Y, Liu A, Ravikanth G, Nawaz MA, Mumtaz AS, Salamin N. Large-scale whole-genome resequencing unravels the domestication history of Cannabis sativa. Science advances. 2021 Jul 16;7(29) 14. Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, Aben N, Gonçalves E, Barthorpe S, Lightfoot H, Cokelaer T. A landscape of pharmacogenomic interactions in cancer. Cell. 2016 Jul 28;166(3):740-54. 15. Aggarwal S, Karmakar A, Krishnakumar S, Paul U, Singh A, Banerjee N, Laha N, Roy Ball G, Srivastava S. Advances in drug discovery based on genomics, proteomics and bioinformatics in malaria. Current Topics in Medicinal Chemistry. 2023 Mar 1;23(7):551-78. 16. Patra P, Izawa T, Peña-Castillo L. REPA: applying pathway analysis to genome-wide transcription factor binding data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015 Jul 8;15(4):1270-83. 17. Olsen LR, Campos B, Barnkob MS, Winther O, Brusic V, Andersen MH. Bioinformatics for cancer immunotherapy target discovery. Cancer immunology, immunotherapy. 2014 Dec;63:1235-49. 18. Rossini GP, Hartung T. Towards tailored assays for cell-based approaches to toxicity testing. ALTERNATIVES TO ANIMAL EXPERIMENTATION. 2012;29:359-72. 19. Fernandez NF, Gundersen GW, Rahman A, Grimes ML, Rikova K, Hornbeck P, Ma’ayan A. Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Sci Data. 2017 Oct 10;4:170151. doi: 10.1038/sdata.2017.151. 20. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott MG, Monteiro CD, Gundersen GW, Ma’ayan A. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016 Jul 8;44(W1):W90-7. doi: 10.1093/nar/gkw377. 21. Chen S, Qin R, Mahal LK. Sweet systems: technologies for glycomic analysis and their integration into systems biology. Critical reviews in biochemistry and molecular biology. 2021 May 4;56(3):301-20. 22. Graw S, Chappell K, Washam CL, Gies A, Bird J, Robeson MS, Byrum SD. Multi-omics data integration considerations and study design for biological systems and disease. Molecular omics. 2021;17(2):170-85. 23. Yu XT, Zeng T. Integrative analysis of omics big data. Computational Systems Biology: Methods and Protocols. 2018:109-35. 24. McInnes L, Healy J, Saul N, Lukas Großberger. UMAP: Uniform Manifold Approximation and Projection. The Journal of Open Source Software,2018 Sep 2;3(29):861–1. 25. Mertins P, Mani DR, Ruggles KV, Gillette MA, Clauser KR, Wang P, Wang X, Qiao JW, Cao S, Petralia F, Kawaler E. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature. 2016 Jun 2;534(7605):55-62. 26. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstråle M, Laurila E, Houstis N. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature genetics. 2003 Jul 1;34(3):267-73. 27. Lapatas V, Stefanidakis M, Jimenez RC, Via A, Schneider MV. Data integration in biological research: an overview. J Biol Res (Thessalon). 2015 Sep 2;22(1):9. doi: 10.1186/s40709-015-0032-5. 28. Platzer A. Visualization of SNPs with t-SNE. PLoS One. 2013;8(2):e56883. doi: 10.1371/journal.pone.0056883. 29. Gutierrez Reyes CD, Alejo-Jacuinde G, Perez Sanchez B, Chavez Reyes J, Onigbinde S, Mogut D, Hernández-Jasso I, Calderón-Vallejo D, Quintanar JL, Mechref Y. Multi Omics Applications in Biological Systems. Current Issues in Molecular Biology. 2024 Jun 11;46(6):5777-93. 30. Perez-Riverol Y, Csordas A, Bai J, Bernal-Llinares M, Hewapathirana S, Kundu DJ, Inuganti A, Griss J, Mayer G, Eisenacher M, Pérez E. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic acids research. 2019 Jan 8;47(D1):D442-50. 31. Cantafio ME, Grillone K, Caracciolo D, Scionti F, Arbitrio M, Barbieri V, Pensabene L, Guzzi PH, Di Martino MT. From single level analysis to multi-omics integrative approaches: a powerful strategy towards the precision oncology. The Road from Nanomedicine to Precision Medicine. 2019 Dec 9:829-69. 32. Marko NF, Quackenbush J, Weil RJ. Why is there a lack of consensus on molecular subgroups of glioblastoma? Understanding the nature of biological and statistical variability in glioblastoma expression data. PloS one. 2011 Jul 28;6(7). 33. Ritchie ME, Phipson B, Wu DI, Hu Y, Law CW, Shi W, Smyth GK. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research. 2015 Apr 20;43(7). 34. Diboun I, Wernisch L, Orengo CA, Koltzenburg M. Microarray analysis after RNA amplification can detect pronounced differences in gene expression using limma. BMC genomics. 2006 Dec;7:1-4. 35. Pettini F, Visibelli A, Cicaloni V, Iovinelli D, Spiga O. Multi-omics model applied to cancer genetics. International journal of molecular sciences. 2021 May 27;22(11):5751. 36. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005 Oct 25;102(43):15545-50. 37. Zhang Z, Zhao L, Wei X, Guo Q, Zhu X, Wei R, Yin X, Zhang Y, Wang B, Li X. Integrated bioinformatic analysis of microarray data reveals shared gene signature between MDS and AML. Oncology Letters. 2018 Oct 1;16(4):5147-59. 38. Auwerx C, Sadler MC, Reymond A, Kutalik Z. From pharmacogenetics to pharmaco-omics: Milestones and future directions. Human Genetics and Genomics Advances. 2022 Apr 14;3(2). 39. Hill M, Tran N. miRNA interplay: mechanisms and consequences in cancer. Disease models & mechanisms. 2021 Apr 1;14(4) 40. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005 Oct 25;102(43):15545-50.


Regular Issue Subscription Review Article
Volume 13
Issue 03
Received 27/11/2024
Accepted 02/12/2024
Published 19/12/2024