Researchers and enthusiasts across the internet have met the news enthusiastically, with some proclaiming that AlphaFold has solved the “protein solving problem.” But what does that mean, exactly? And how do we stand to benefit from it?
To start answering these questions, we need to take a closer look at the proteins themselves. As your biology teacher might have said, proteins are the building blocks of life, responsible for countless functions inside and outside the human body. Each one starts as a series of amino acids strung together into a chain, but it doesn’t take long — sometimes just milliseconds — before things start to get complicated. Some parts of the amino acid chain twist into helixes. Others fold back onto themselves as “sheets”. Before long, these helixes and sheets coalesce and contort into a protein’s final structure, and that’s what gives a protein the ability to perform specific tasks, like ferrying oxygen through your body or strengthening the structure of your bones.
In other words, shape is everything, and researchers have spent decades trying to find a way to determine a protein’s final, folded structure based solely on the amino acids that make up its backbone. That’s where CASP comes in — since 1994, the program has served as a focal point of sorts for teams around the world working to crack the protein solving problem with computational ingenuity. The rules are fairly simple: Every other year, organizers select a series of target proteins from a bevy of submissions whose structures have been determined experimentally, but haven’t been published yet. Researchers then get a few months to tune their systems and make their predictions, which are then judged by experts in the field for about a month after submissions are closed.
While CASP has been running for 26 years, it’s been in the past few that the scientific community has been able to bring quantum leaps in compute power and machine learning to bear on the challenge. In DeepMind’s case, that involved training AlphaFold 2’s prediction model on about 170,000 known protein structures, along with a vast number of protein sequences whose 3D structures haven’t yet been determined. This testing data, the team admits, is fairly similar to what it used in 2018, when the original AlphaFold system achieved top marks during CASP 13. (At the time, organizers hailed DeepMind’s “unprecedented progress in the ability of computational methods to predict protein structure.”)
That said, the team made some notable changes to its machine learning approach — they haven’t published a full paper yet, but the CASP 14 abstract book highlights some of their modifications. And beyond that, DeepMind also relied on about 128 of Google’s cloud-based TPUv3 cores, which ultimately gave AlphaFold 2 the ability to accurately determine a protein’s structure within just days, if not sooner — the New York Times notes that, in some cases, predictions can be generated in a matter of hours.