Search in Co-Wiki

AlphaFold

game-theory 4649 tokens 3 outbound links

AlphaFold

thumb|AlphaFold's predicted structure for RNA polymerase T1044 at each layer of the network

AlphaFold 1 (2018) placed first in the overall rankings of the 13th Critical Assessment of Structure Prediction (CASP) in December 2018. It was particularly successful at predicting the most accurate structures for targets rated as most difficult by the competition organizers, where no existing template structures were available from proteins with partially similar sequences.

AlphaFold 2 (2020) repeated this placement in the CASP14 competition in November 2020. It achieved a level of accuracy much higher than any other entry. It scored above 90 on CASP's global distance test (GDT) for approximately two-thirds of the proteins, a test measuring the similarity between a computationally predicted structure and the experimentally determined structure, where 100 represents a complete match. The inclusion of metagenomic data has improved the quality of the prediction of multiple sequence alignments. One of the biggest sources of the training data was the custom-built Big Fantastic Database of 65,983,866 protein families, represented as multiple sequence alignments and Hidden Markov models, covering 2,204,359,010 protein sequences from reference databases, metagenomes, and metatranscriptomes.

AlphaFold 2's results at CASP14 were described as "astounding"

Despite this, the technical achievement was widely recognized. On 15 July 2021, the AlphaFold 2 paper was published in Nature as an advance access publication alongside open source software and a searchable database of species proteomes. As of November 2025, the paper had been cited nearly 43,000 times.

AlphaFold 3 was announced on 8 May 2024. It can predict the structure of complexes created by proteins with DNA, RNA, various ligands, and ions.

Demis Hassabis and John Jumper shared one half of the 2024 Nobel Prize in Chemistry, awarded "for protein structure prediction," while the other half went to David Baker "for computational protein design." Hassabis and Jumper had previously won the Breakthrough Prize in Life Sciences and the Albert Lasker Award for Basic Medical Research in 2023 for their leadership of the AlphaFold project. Such efforts, using the experimental methods, have identified the structures of about 170,000 proteins over the last 60 years, while there are over 200 million known proteins across all life forms.

AlphaFold 2 (2020)

[[File:Architectural details of AlphaFold 2.png|thumb|upright=1.3|Architectural details of AlphaFold 2

AlphaFold 1 used a number of separately trained modules to produce a guide potential, which was then combined with a physics-based energy potential. AlphaFold 2 replaced this with a system of interconnected sub-networks, forming a single, differentiable, end-to-end model based on pattern recognition. This model was trained in an integrated manner. After the neural network's prediction converges, a final refinement step applies local physical constraints using energy minimization based on the AMBER force field. This step only slightly adjusts the predicted structure.

A key part of the 2020 system are two modules, believed to be based on a transformer design, which are used to progressively refine a vector of information for each relationship (or "edge" in graph-theory terminology) between an amino acid residue of the protein and another amino acid residue (these relationships are represented by the array shown in green); and between each amino acid position and each different sequences in the input sequence alignment (these relationships are represented by the array shown in red). and is itself then iterated. In an example presented by DeepMind, the structure prediction module achieved a correct topology for the target protein on its first iteration, scored as having a GDT_TS of 78, but with a large number (90%) of stereochemical violations – i.e. unphysical bond angles or lengths. With subsequent iterations the number of stereochemical violations fell. By the third iteration the GDT_TS of the prediction was approaching 90, and by the eighth iteration the number of stereochemical violations was approaching zero.

The training data was originally restricted to single peptide chains. However, the October 2021 update, named AlphaFold-Multimer, included protein complexes in its training data. DeepMind stated this update succeeded about 70% of the time at accurately predicting protein-protein interactions.

AlphaFold 3 (2024) Announced on 8 May 2024, AlphaFold 3 was co-developed by Google DeepMind and Isomorphic Labs, both subsidiaries of Alphabet. AlphaFold 3 is not limited to single-chain proteins, as it can also predict the structures of protein complexes with DNA, RNA, post-translational modifications and selected ligands and ions. The Pairformer module's initial predictions are refined by a diffusion model. This model begins with a cloud of atoms and iteratively refines their positions, guided by the Pairformer's output, to generate a 3D representation of the molecular structure. As of November 2025, the AlphaFold 3 research paper has been directly cited more than 9,000 times.

Competitions thumb|right|500px|Results achieved for protein prediction by the best reconstructions in the CASP 2018 competition (small circles) and CASP 2020 competition (large circles), compared with results achieved in previous years.The crimson trend-line shows how a handful of models including AlphaFold 1 achieved a significant step-change in 2018 over the rate of progress that had previously been achieved, particularly in respect of the protein sequences considered the most difficult to predict. (Qualitative improvement had been made in earlier years, but it is only as changes bring structures within 8 Å of their experimental positions that they start to affect the CASP GDS-TS measure). The orange trend-line shows that by 2020 online prediction servers had been able to learn from and match this performance, while the best other groups (green curve) had on average been able to make some improvements on it. However, the black trend curve shows the degree to which AlphaFold 2 had surpassed this again in 2020, across the board. The detailed spread of data points indicates the degree of consistency or variation achieved by AlphaFold. Outliers represent the handful of sequences for which it did not make such a successful prediction.

CASP13 In December 2018, DeepMind's AlphaFold placed first in the overall rankings of the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP). achieving a median score of 58.9 on the CASP's global distance test (GDT) score, ahead of 52.5 and 52.4 by the two next best-placed teams, who were also using deep learning to estimate contact distances. Overall, across all targets, AlphaFold 1 achieved a GDT score of 68.5.

In January 2020, implementations and illustrative code of AlphaFold 1 was released open-source on GitHub. Overall, AlphaFold 2 made the best prediction for 88 out of the 97 targets. a level of accuracy reported to be comparable to experimental techniques like X-ray crystallography. with a median RMS deviation in its predictions of 2.1 Å for a set of overlapped CA atoms.

Reception AlphaFold 2 scoring more than 90 in CASP's global distance test (GDT) is a great achievement in computational biology. News pieces appeared in the science press, such as Nature, and the story was covered by national newspapers. A frequent theme was the ability to predict protein structures based on the constituent amino acid sequence, expected to have benefits in the life sciences—accelerating drug discovery and enabling better understanding of diseases. Some have noted that even a perfect answer to the protein prediction problem still leaves questions about the protein folding problem (and thus protein dynamics)—understanding in detail how the folding process actually occurs in nature (and how sometimes they can also misfold).

In 2023, Demis Hassabis and John Jumper won the Breakthrough Prize in Life Sciences as well as the Albert Lasker Award for Basic Medical Research for their management of the AlphaFold project. Hassabis and Jumper proceeded to win the Nobel Prize in Chemistry in 2024 for their work on "protein structure prediction" with David Baker of the University of Washington.

Source code Open access to source code of several AlphaFold versions (excluding AlphaFold 3) has been provided by DeepMind in 2022 after requests from the scientific community. The source code and weights of AlphaFold 3 were made available for non-commercial use to the scientific community upon request in November 2024. It became publicly available in February 2025, still retaining the non-commercial restriction.

Clones and derivatives A number of AlphaFold clones have also been published, mostly with permissive license terms. Clones for AlphaFold3 include ByteDance's Protenix (Apache 2.0 License), AlQuraishi Laboratory's OpenFold-3 (MIT license), and Boltz-1/2 (MIT license).

There are also clones for older versions, though they became less relevant with the open-source release of AlphaFold 1 and 2 source codes. Still relevant are models, both open- and closed-source, that include modifications to the AlphaFold architecture. For AlphaFold 2, a notable example is ESMFold from Meta, which replaces the multiple sequence alignment with the latent space of a protein language model.

Open-source tools that complement AlphaFold have also been made. One well-cited example is ColabFold, which uses HHblits to speed up the sequence search, allowing the AlphaFold pipelines to run quickly on Google Colab.

Database of protein models generated by AlphaFold {{Infobox biodatabase |title = AlphaFold Protein Structure Database |scope = protein structure prediction |organism = all UniProt proteomes |center = EMBL-EBI |citation = but for humans they are available in the whole batch file. AlphaFold's initial goal (as of early 2022) was to expand the database to cover most of the UniRef90 set, which contains over 100 million proteins. As of May 15, 2022, the database contained 992,316 predictions.

In July 2021, UniProt-KB and InterPro has been updated to show AlphaFold predictions when available.

On July 28, 2022, the team uploaded to the database the structures of around 200 million proteins from 1 million species, covering nearly every known protein on the planet. The number as of 2024 is 214 million, with 26 million being duplicates (exact sequence matches) of another protein in the database. The predicted structures can differ significantly between duplicates.

As of 2025, the AFDB uses AlphaFold 2 for its predictions. All structures produced remain monomeric, but multimeric structures produced by other databases are linked on the page through the 3D-Beacons API. Foldseek, which provides fast and accurate structure searches, is also integrated. Information from AlphaMissense (a tool that uses AlphaFold to predict the outcome of missense mutations) is also integrated.

Derived databases AlphaFill adds cofactors to AlphaFold models where appropriate. This is achieved by searching the Protein Data Bank for similar structures and transplanting cofactors to analogous positions. It is also linked to by UniProt.

TmAlphaFold docks AlphaFold models to biological membranes, similar to what OPM does for PDB structures.

AFTM uses AlphaFold models to identify transmembrane regions in human proteins, similar to what PDBTM does for PDB structures.

The AFDB is not updated with UniProt sequences chanegs. AlphaSync keeps the AFDB in sync with UniProt entry changes, generating updated structures, residue-level features and contacts. It tries to use an AFDB entry for the exact updated sequence when available and run AlphaFold 2 independently otherwise. It fills in AFDB's blank for large (> 2,700 aa) proteins and proteins with special FASTA characters such as B, Z, U or X.

The Encyclopedia of Domains (TED) applies the domain-recognition method from CATH database to 188 million unique structures from the AFDB, identifying nearly 365 million domains, which is 100 million more than what sequence-based methods could identify.

Performance, validations and limitations AlphaFold has shown certain limitations.

AlphaFold 1, 2, and AlphaFold DB *AlphaFold DB provides models of individual protein chains (monomers), rather than their biologically relevant complexes. * Many protein regions are predicted with low confidence score, including the intrinsically disordered protein regions. * Alphafold-2 was validated for predicting effects of point mutations on structure and free energy, with a partial success.

AlphaFold 3 * Across several benchmarks, AlphaFold3 has demonstrated, on average, superior performance to conventional search-based docking algorithms in predicting small-molecule–protein binding modes. * AlphaFold 3 version can predict structures of protein complexes with a very limited set of selected cofactors and co- and post-translational modifications. Between 50% and 70% of the structures of the human proteome are incomplete without covalently-attached glycans. Other work has found that AlphaFold is insensitive to adversarial decoys generated by altering the physicochemical properties of binding pockets, suggesting potential reliance on training-set memorization rather than genuine chemical awareness.

General * In the algorithm, the residues are moved freely, without any restraints. Therefore, during modeling the integrity of the chain is not maintained. As a result, AlphaFold may produce topologically wrong results, like structures with an arbitrary number of knots. (The study uses AlphaFold 2.3.2.) * The model relies, to some extent, on co-evolutionary information from similar proteins. Therefore, it may not perform as well on synthetic proteins or proteins with very low homology to those in the training database. Benchmarks support this limitation: when applied to naturally evolved de novo proteins, AlphaFold2 often yields low-confidence and predictor-dependent models, and protein language model–based (alignment-free) structure predictors can perform better for orphan proteins than AlphaFold2. More broadly, comparative analyses show that structure/disorder predictors (including AlphaFold2 and ESMFold) behave differently on de novo and random-sequence proteins than on conserved proteins, and that confidence metrics can show different relationships with predicted disorder in these sequence classes. * The model's ability to predict multiple native conformations of proteins is limited. * Proteins are inherently dynamic, and accessing multiple native conformations is often crucial for understanding their function. However, the model has limited capability to represent these alternative conformational states, particularly those that coexist or interconvert in biological environments.

Applications AlphaFold has been used to predict structures of proteins of SARS-CoV-2, the causative agent of COVID-19. The structures of these proteins were pending experimental detection in early 2020. The team acknowledged that although these protein structures might not be the subject of ongoing therapeutical research efforts, they will add to the community's understanding of the SARS-CoV-2 virus.

Published works Andrew W. Senior et al. (December 2019), ["Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13)"](https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.25834), Proteins: Structure, Function, Bioinformatics 87*(12) 1141–1148 Andrew W. Senior et al. (15 January 2020), ["Improved protein structure prediction using potentials from deep learning"](https://www.nature.com/articles/s41586-019-1923-7), Nature 577* 706–710 John Jumper et al. (December 2020), "High Accuracy Protein Structure Prediction Using Deep Learning", in Fourteenth Critical Assessment of Techniques for Protein Structure Prediction (Abstract Book)*, pp. 22–24 John Jumper et al.* (December 2020), "[AlphaFold 2](https://predictioncenter.org/casp14/doc/presentations/2020_12_01_TS_predictor_AlphaFold2.pdf)". Presentation given at CASP 14. * Abramson, J., Adler, J., Dunger, J. et al. (May 2024), "[Accurate structure prediction of biomolecular interactions with AlphaFold 3](https://www.nature.com/articles/s41586-024-07487-w)", Nature 630, 493–500 (2024)

See also * Folding@home *IBM Blue Gene *Foldit *Rosetta@home *Human Proteome Folding Project * AlphaZero * AlphaGo * [[alphageometry]] * Predicted Aligned Error

References ## Further reading * Carlos Outeiral, [CASP14: what Google DeepMind's AlphaFold 2 really achieved, and what it means for protein folding, biology and bioinformatics](https://www.blopig.com/blog/2020/12/casp14-what-google-deepminds-alphafold-2-really-achieved-and-what-it-means-for-protein-folding-biology-and-bioinformatics/), Oxford Protein Informatics Group. (3 December) * Mohammed AlQuraishi, [AlphaFold2 @ CASP14: "It feels like one's child has left home."](https://moalquraishi.wordpress.com/2020/12/08/alphafold2-casp14-it-feels-like-ones-child-has-left-home/) (blog), 8 December 2020 * Mohammed AlQuraishi, [The AlphaFold2 Method Paper: A Fount of Good Ideas](https://moalquraishi.wordpress.com/2021/07/25/the-alphafold2-method-paper-a-fount-of-good-ideas/) (blog), 25 July 2021

External links * [AlphaFold-3 web server](https://golgi.sandbox.google.com/about) * * [Open access to protein structure predictions for the human proteome and 20 other key organisms](https://alphafold.ebi.ac.uk/) at European Bioinformatics Institute (AlphaFold Protein Structure Database) * [CASP 14](https://predictioncenter.org/casp14/index.cgi) website * [AlphaFold: The making of a scientific breakthrough](https://www.youtube.com/watch?v=gg7WjuFs8F4), DeepMind, via YouTube. * [ColabFold](https://github.com/sokrypton/ColabFold), [version](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) for homooligomeric prediction and complexes