Protein Folding With AlphaFold2: Chapter One

MoleculeAI
3 min readJul 15, 2023

In the last blog, we gave an idea on the protein folding problem and three-dimensional structures of proteins from different levels of their organizations. We also discussed the importance of having computationally efficient and accurate models. In this direction, the first revolutionary results came from DeepMind community as part of the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition in 2020. They developed Alphafold2, a deep-learning based protein structure prediction system, which achieved a median score of 92.4 GDT (Global Distance Test) overall across all targets in the 14th CASP assessment [1].

The development of Alphafold2 [2] was on DeepMind’s experience and insights gained from developing the original Alphafold (2018), as well as other AI systems they had developed. It utilizes deep neural networks to predict the 3D structures of proteins from their amino acid sequences. The system combines novel machine learning methods with physical constraints and other information to predict the most likely folding of a protein. It has revolutionized the field of protein structure prediction and has the potential to have a major impact on drug discovery and development. Though there exists the Nature paper and detailed information of the model in the supplementary information from the developer of AlphaFold2, we will give an overview of the model by dividing its various components, and connecting the machine learning techniques with the physics and biology of protein structures.

Overview of the Architecture

Figure Source: Diagram of AlphaFold 2 as published in the official Nature paper in July 2021 and reused in the blog from Oxford Informatics Group
  1. Sequence preprocessing module: This module processes the amino acid sequence to generate multiple sequence alignments (MSAs) and predicted secondary structure information. The MSAs are generated using profile-based search and hidden Markov model-based search, which identify homologous sequences in sequence databases such as UniProt and Uniclust30. The MSAs are then used to calculate evolutionary couplings, which identify correlated mutations in the protein sequence. The predicted secondary structure information is generated using a neural network that incorporates information from the MSAs.
  2. Evoformer module: This module generates the initial 3D protein structure using a neural network that combines a novel residual network architecture with attention mechanisms to incorporate information from the MSAs and evolutionary couplings. The predicted structure is then refined using a neural network that uses a novel multi-scale neural network architecture. The Evoformer module is the heart of the Alphafold2 architecture and is responsible for the high accuracy of the protein structure predictions.
  3. Structure generation module: This module further refines the predicted structure using special attention mechanism techniques and refining the structure in iterative fashion. The final predicted structure is accompanied by a confidence score that estimates the accuracy of the prediction.

Overall, the Alphafold2 architecture is a highly sophisticated and complex neural network that combines multiple techniques to generate highly accurate protein structure predictions. The sequence preprocessing module generates the input for the evoformer module, which generates the initial structure. The structure generation module then refines the structure to generate the final prediction. In the subsequent post we will discuss each part in depth.

At Molecule AI, we’re creating cutting-edge approaches that harness the power of deep learning in the realm of protein design. To learn more, feel free to contact us at info@moleculeai.com.

References:

[1] https://predictioncenter.org/casp14/index.cgi

[2] Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, T., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S., Ballard, A., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Stengger, M., Pacholska, M., Berghammer, T., Bodenstein, S., Silver, D., Vinyals, O., Senior, A. W., Kavukcuoglu, K., Kohli, P., & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589.

--

--

MoleculeAI

This page would let you know about the interesting developments in the field of Drug discovery to cure neurogenerative diseases, using artificial intelligence.