Enzyme Stability Prediction using BERT and CNN-A Deep Learning Approach for Enhanced Biocatalysis

Year : 2024 | Volume : | : | Page : –
By

Nikki Rani

Mehak Khurana

  1. Student Department of Computer Science and Engineering, The NorthCap University, Gurugram Haryana India
  2. Associate Professor Department of Computer Science and Engineering, The NorthCap University, Gurugram Haryana India

Abstract

An important factor in determining the efficacy of industrial enzymes used in various biotechnological applications is their stability. The goal of this study is to develop a predictive model for industrial enzyme stability, which is essential to the efficiency of these enzymes in biotechnological applications. The research takes a comprehensive strategy to comprehend the parameters affecting enzyme stability by combining statistical analysis, deep learning algorithms (BERT and CNN), and molecular dynamics simulations. Numerous different enzymes and information about their stability are included in the dataset. The model looks at the effects of environmental factors including temperature, pH, and salt concentration in order to pinpoint important factors that affect enzyme stability. With a mean squared error (MSE) of 0.007 and a cross-validation score of 0.4108, the BERT model should be used with caution due to the possibility of overfitting. However, performance was enhanced by freezing specific transformer layers and adding mutant embeddings. For ddG(free energy upon mutation) values in the test dataset, the CNN model produced predictions based on three different operation types. The molecule-level interactions influencing enzyme stability are revealed by the results of molecular dynamics simulations. This study aims to create a robust predictive model that can help in the design and optimisation of stable and effective industrial enzymes for biotechnological applications by incorporating several analytical ideas. The results have important ramifications for biotechnology since they offer useful tools for improving enzyme stability and effectiveness, which advances a variety of industrial processes.

Keywords: Biotechnology, CNN, BERT, Free energy change(ddg), Enzymes

How to cite this article: Nikki Rani, Mehak Khurana. Enzyme Stability Prediction using BERT and CNN-A Deep Learning Approach for Enhanced Biocatalysis. Research & Reviews : A Journal of Life Sciences. 2024; ():-.
How to cite this URL: Nikki Rani, Mehak Khurana. Enzyme Stability Prediction using BERT and CNN-A Deep Learning Approach for Enhanced Biocatalysis. Research & Reviews : A Journal of Life Sciences. 2024; ():-. Available from: https://journals.stmjournals.com/rrjols/article=2024/view=146543





References

  1. McCoy, M. (2001). Novozymes emerges. Chemical & Engineering News, 79(8), 23-23.
  2. Buß, O., Rudat, J., & Ochsenreither, K. (2018). FoldX as protein engineering tool: better than random based approaches?. Computational and structural biotechnology journal, 16, 25-33.
  3. Rohl, C. A., Strauss, C. E., Misura, K. M., & Baker, D. (2004). Protein structure prediction using Rosetta. In Methods in enzymology (Vol. 383, pp. 66-93). Academic Press.
  4. Mardikoraem, M., & Woldring, D. (2023). Protein Fitness Prediction is Impacted by the Interplay of Language Models, Ensemble Learning, and Sampling Methods. bioRxiv, 2023-02.
  5. Schaller, K. S., Kari, J., Borch, K., & Peters, G. H. (2022). Binding prediction of multi-domain cellulases with a dual-CNN. arXiv preprint arXiv:2207.02698.
  6. Song, J., Xiao, J., Tian, C., Hu, Y., You, L., & Zhang, S. (2022). A Dual CNN for Image Super-Resolution. Electronics, 11(5), 757.
  7. Rosales-Calderon, O., Trajano, H. L., & Duff, S. J. (2014). Stability of commercial glucanase and β-glucosidase preparations under hydrolysis conditions. PeerJ, 2, e402.
  8. Abdu, H., Ahmad, F. B., Basri, M., Ismail, I. S., & Rahman, M. B. (2014). Optimization of Enzymatic Synthesis of 3-O-β-D-Glucopyranoside Betulinic Acid by Novozyme-435. Asian Journal of Research In Chemistry, 7(7), 640-643.
  9. Ingmarsson, E. (2019). Kinetic Modelling of Fluidised Bed Drying of Enzyme Granules and Effect on Enzyme Stability.
  10. Toprak-Cavdur, T., Anis, P., Bakir, M., Sebatli-Saglam, A., & Cavdur, F. (2023). Dyeing Behavior of Enzyme and Chitosan-Modified Polyester and Estimation of Colorimetry Parameters Using Random Forests. Fibers and Polymers, 1-21.
  11. Ferreira, P., Fernandes, P. A., & Ramos, M. J. (2022). Modern computational methods for rational enzyme engineering. Chem Catalysis, 2(10), 2481-2498.
  12. Petitte, J., Doherty, M., Ladd, J., Marin, C. L., Siles, S., Michelou, V., … & Rice, J. W. (2019). Use of high-content analysis and machine learning to characterize complex microbial samples via morphological analysis. Plos one, 14(9), e0222528.
  13. Nielsen, R. F., Nazemzadeh, N., Sillesen, L. W., Andersson, M. P., Gernaey, K. V., & Mansouri, S. S. (2020). Hybrid machine learning assisted modelling framework for particle processes. Computers & Chemical Engineering, 140, 106916.
  14. Bisong, E., & Bisong, E. (2019). Matplotlib and seaborn. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, 151-165.
  15. Schaap, M. G., Leij, F. J., & Van Genuchten, M. T. (2001). Rosetta: A computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions. Journal of hydrology, 251(3-4), 163-176.
  16. Aziz, R. M., Baluch, M. F., Patel, S., & Ganie, A. H. (2022). LGBM: a machine learning approach for Ethereum fraud detection. International Journal of Information Technology, 1-11.
  17. Aziz, R. M., Baluch, M. F., Patel, S., & Ganie, A. H. (2022). LGBM: a machine learning approach for Ethereum fraud detection. International Journal of Information Technology, 1-11.
  18. Di, C. (2023). The Interplay Between Diseases and Adaptation in the Human Genome (Doctoral dissertation, The University of Arizona).
  19. Yu, Y., Wang, R., & Teo, R. D. (2022). Machine learning approaches for metalloproteins. Molecules, 27(4), 1277.
  20. Cartwright, M. D. (1995). Experimental and analytical investigation of the bubble nucleation characteristics in subcooled flow.
  21. Pasrija, P., Singh, U., & Khurana, M. (2024). Performance Analysis of Intrusion Detection System Using ML Techniques. Applying Artificial Intelligence in Cybersecurity Analytics and Cyber Threat Detection, 135-150.

Ahead of Print Subscription Original Research
Volume
Received April 18, 2024
Accepted May 2, 2024
Published May 18, 2024