November 30, 2023
Will AI crack the code for antigens?
The scientific implications of artificial intelligence loom large. One tangible application is designing better vaccines.
The rocketing of artificial intelligence (AI) into the public eye over the last several months has been astounding, with its potential role in our future lives drawing both intense awe and scrutiny. The scientific implications of AI alone are incredibly wide-ranging, but one tangible use researchers are capitalizing on is the role of machine learning in sequencing proteins and potentially guiding the design of finely tuned vaccine antigens.
This field of research is already in motion. In 2021 DeepMind, Google’s AI venture, launched a database platform built on a system called AlphaFold that draws upon publicly available sequence data from the Protein Data Bank, a vast library of molecular structures that dates back to 1971. AlphaFold uses machine learning algorithms, a pattern-recognizing subset of artificial intelligence, to plumb the depths of this library and predict the three-dimensional structure of proteins based on the one-dimensional sequence of their amino acids. The platform now includes models of nearly every human protein. In the space of only a few years, more than a million researchers have drawn on AlphaFold as a resource.
Another avenue of research is modifying amino acid sequences with the intent of synthesizing more stable protein structures that would make suitable vaccine antigens. This past summer at a lab in Leipzig the pharmacist and immunotherapeutic drug researcher Clara Schoeder and her team began deploying machine learning tools in a new effort to do just that. She and her colleagues are trying to design antigens for a set of pathogens with pandemic potential.
“Ideally, we identify a protein design where we can say these are more stable and behave better, and so they would likely make good vaccine candidates. The next step is to figure out whether they actually are good vaccine candidates. Their immunogenicity, for example. That’s a very important point,” she says.
Schoeder and her colleagues are part of a new network of partners with the goal of building a data library of potential vaccine antigens against a specific set of pathogens. These pathogens are identified by the Coalition for Epidemic Preparedness Innovations (CEPI) as those most likely to be the cause of next severe epidemic or pandemic. They include the viruses Lassa, Middle East respiratory syndrome coronavirus, Ebola, Rift Valley Fever, Chikungunya, Nipah, and the next novel pathogen, which is referred to as Disease X. CEPI is the sponsor of this AI-driven partnership, which launched in July backed by at least US$7 million in funding to start.
Structural or so-called rational vaccine design is an energetic field, especially for pathogens such as HIV or influenza, which have proven to be difficult vaccine targets in many ways. Artificial intelligence tools would seem to be a natural fit for designing better vaccine antigens against highly variable pathogens such as these.
In the last decade or so it’s become much more practical to employ machine learning. This is due to more powerful and cheaper computing power, the ability to gather and manipulate massive pools of data, and the development of new programming ideas. There is a vibrant open-source community for scientific use in AI and machine learning. There is also a large community of concern around potential ethical issues and how to regulate the development of AI and how it is deployed. And there are vying commercial interests.
Meta, the company behind Facebook, built an AI system called Evolutionary Scale Modeling (ESM) to build an atlas that predicts the protein folding structures of metagenomic sequences. The modeling was trained on language datasets, created using the letter sequences signifying individual amino acids, and by calculating how those letters tend to pair with one another in an astronomical number of ways. The AI atlas mapped more than 600 million metagenomic protein structures. However, Meta promptly shut down the development team behind ESM, the Financial Times reports, in favor of finding more commercially viable projects.
Even so, pharmaceutical companies and biotechs, for decades now deeply immersed in bioinformatics, are already incorporating AI in their efforts, either through developing new partnerships, adding data scientists to prominent roles, or by making developers part of the organization. BioNTech, the German biotech which alongside Pfizer produced one of the two mRNA-based COVID-19 vaccines, worked with London-based AI developer InstaDeep to analyze global sequencing data to flag risky new variants of SARS-CoV-2. This year they bought the company.
The Massachusetts Institute of Technology runs a software collaboration steeped in AI — the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium — that now includes several multinational pharmaceutical companies.
A prominent structural biologist, Peter Kwong, sees great potential in AI’s ability to hone vaccine development, starting with its ability to aid researchers in identifying more stable protein structures that could serve as vaccine antigens, and to do so much more quickly. But he also sees AI’s limitations. Kwong was the longtime leader of the Structural Biology section at the Vaccine Research Center at the National Institute of Allergy and Infectious Diseases, and as of December will be co-director of the Aaron Diamond AIDS Research Center at Columbia University.
“We now use AlphaFold routinely to assist in cryo-electron microscopy determination of antibody-antigen complexes,” Kwong says. “But it is more difficult to apply these technologies to optimize vaccine responses. It’s unclear how to obtain an equivalent to the Protein Data Bank for vaccine responses.”
Calculating how a protein folds is a profoundly immense task. Accurately predicting how the immune system will respond to a protein is much more immense. And right now, Kwong says, there are not enough viral proteins in the Protein Data Bank to train an AI to the same level as on standard, non-viral proteins. Difficult proteins like HIV remain out of reach.
“But I’m optimistic that machine learning can be trained to extract critical parameters about immune response, like the degree of neutralization elicited by a specific strain of a pathogen against different strains.” Artificial intelligence doesn’t have to improve understanding of vaccinology to “work,” Kwong says; it just needs to correctly predict an important value in a bigger puzzle.
Kwong and his colleagues have already published research employing machine learning to predict resistance of HIV-1 isolates to broadly neutralizing antibodies. They’ve also used machine algorithms to predict protein solubility. Some tasks are sped drastically. If AI correctly predicts a given protein structure through AlphaFold, Kwong says, it bypasses what is a laborious, sometimes years-long, effort of structure determination.
The set of research initiatives applying machine learning to HIV neutralizing antibodies is growing, and there is at least one tool, SLAPNAP, showing promise in comparing and ranking the neutralization potency of combination regimes of broadly neutralizing antibodies for HIV prevention that are currently in development.
Building machine learning tools for antigen design has been a long time coming. In 2007 Andreas Holm Mattsson was finishing bioinformatics studies at the Technical University of Denmark when he began assembling datasets from old PDF files in patents and in scientific literature on infectious diseases. “I could see if I make the best data set in the whole world, I will have the fundamentals for creating the best predictor in the whole world,” he says.
Mattsson went on to build machine learning models, including the one with which he launched a company: Evaxion, which in September inked a deal to work with Afrigen Biologics to develop a prophylactic mRNA vaccine against multidrug-resistant bacterial gonorrhea. It will use antigens pinpointed by Evaxion’s AI models. The platform created ranking lists of proteins that, in this case, show protection against the bacterial proteome. The top-ranking proteins are selected for structural modeling and antigen design. Preclinical studies are ongoing.
If AI can help scientists study and weed out which proteins aren’t likely to work as vaccine antigens, it would be a big help. “We hope that the computer can predict these changes that are necessary and to save us a lot of time and effort. That is the major advantage that we see. These proteins are huge, and they are hard to produce a lot of times and it’s expensive,” says Schoeder.
But Schoeder says it is also important to look beyond the hype surrounding AI. “There are new technologies coming out every other week. Sometimes it’s oversold. You really need to find out what is working well and what is not.”
Michael Dumiak, based in Berlin, reports on global science, public health, and technology.
- Machine versus virus: Deploying artificial intelligence against future pandemics
- On Jens Meiler and Rosetta, the computational modeling software suite for protein structures
- Protein Design in the Age of Artificial Intelligence, at RosettaCo