bunch of white oval medication tablets and white medication capsules

Can an AI Predict the Language of Viral Mutation?

Viruses lead a rather repetitive existence. They enter a cell, hijack its machinery to turn it into a viral copy machine, and those copies head on to other cells armed with instructions to do the same. So it goes, over and over again.

Source: Can an AI Predict the Language of Viral Mutation?

But somewhat often, amidst this repeated copy-pasting, things get mixed up. Mutations arise in the copies. Sometimes, a mutation means an amino acid doesn’t get made and a vital protein doesn’t fold—so into the dustbin of evolutionary history that viral version goes. Sometimes the mutation does nothing at all, because different sequences that encode the same proteins make up for the error. But every once in a while, mutations go perfectly right. The changes don’t affect the virus’s ability to exist; instead, they produce a helpful change, like making the virus unrecognizable to a person’s immune defenses. When that allows the virus to evade antibodies generated from past infections or from a vaccine, that mutant variant of the virus is said to have “escaped.”

Scientists are always on the lookout for signs of potential escape. That’s true for SARS-CoV-2, as new strains emerge and scientists investigate what genetic changes could mean for a long-lasting vaccine. (So far, things are looking okay.) It’s also what confounds researchers studying influenza and HIV, which routinely evade our immune defenses. So in an effort to see what’s possibly to come, researchers create hypothetical mutants in the lab and see if they can evade antibodies taken from recent patients or vaccine recipients. But the genetic code offers too many possibilities to test every evolutionary branch the virus might take over time. It’s a matter of keeping up.

Advertisements
SaleBestseller No. 1
INSIGNIA 32-inch Class F20 Series Smart HD 720p Fire TV with Alexa Voice Remote (NS-32F201NA23, 2022 Model)
  • 720p resolution View your favorite movies, shows...
  • Alexa voice control - The Alexa Voice Remote lets...
  • Fire TV experience built-in - Watch over 1 Million...
  • Supports Apple AirPlay - Share videos, photos,...
  • Supports HDMI ARC - Sends audio directly from the...
SaleBestseller No. 2
VIZIO 40-inch D-Series Full HD 1080p Smart TV with AMD FreeSync, Apple AirPlay and Chromecast Built-in, Alexa Compatibility, D40f-J09, 2022 Model
  • 1080p High-Definition - Watch TV in crisp, clear...
  • Full Array LED Backlight - Evenly distributed LEDs...
  • IQ Picture Processor - Delivers superior picture...
  • V-Gaming Engine Automatically optimizes picture...
  • SmartCast - With intuitive navigation, enjoy...

Last winter, Brian Hie, a computational biologist at MIT and a fan of the lyric poetry of John Donne, was thinking about this problem when he alighted upon an analogy: What if we thought of viral sequences the way we think of written language? Every viral sequence has a sort of grammar, he reasoned—a set of rules it needs to follow in order to be that particular virus. When mutations violate that grammar, the virus reaches an evolutionary dead end. In virology terms, it lacks “fitness.” Also like language, from the immune system’s perspective, the sequence could also be said to have a kind of semantics. There are some sequences the immune system can interpret—and thus stop the virus with antibodies and other defenses—and some that it can’t. So a viral escape could be seen as a change that preserves the sequence’s grammar but changes its meaning.

The analogy had a simple, almost too simple, elegance. But to Hie, it was also practical. In recent years, AI systems have gotten very good at modeling principles of grammar and semantics in human language. They do this by training a system with data sets of billions of words, arranged in sentences and paragraphs, from which the system derives patterns. In this way, without being told any specific rules, the system learns where the commas should go and how to structure a clause. It can also be said to intuit the meaning of certain sequences—words and phrases—based on the many contexts in which they appear throughout the data set. It’s patterns, all the way down. That’s how the most advanced language models, like OpenAI’s GPT-3, can learn to produce perfectly grammatical prose that manages to stay reasonably on topic.

New
Samsung 85 Inch DU8000 Crystal UHD LED 4K Smart TV Bundle with 2 YR CPS Enhanced Protection Pack (2024 Model)
  • SAMSUNG USA AUTHORIZED - Includes 2 Year Extended...
  • Samsung 85 Inch DU8000 Crystal UHD LED 4K Smart TV...
  • UHD Dimming | Auto Game Mode (ALLM) | Alexa...
  • SAMSUNG TIZEN OS: Stream your favorite shows, play...
  • BUNDLE INCLUDES: Samsung DU8000 Series 4K HDR...
New
Samsung 75 Inch DU8000 Crystal UHD LED 4K Smart TV Bundle with 2 YR CPS Enhanced Protection Pack (2024 Model)
  • SAMSUNG USA AUTHORIZED - Includes 2 Year Extended...
  • Samsung 75 Inch DU8000 Crystal UHD LED 4K Smart TV...
  • UHD Dimming | Auto Game Mode (ALLM) | Alexa...
  • SAMSUNG TIZEN OS: Stream your favorite shows, play...
  • BUNDLE INCLUDES: Samsung DU8000 Series 4K HDR...

One advantage of this idea is that it’s generalizable. To a machine learning model, a sequence is a sequence, whether it’s arranged in sonnets or amino acids. According to Jeremy Howard, an AI researcher at the University of San Francisco and a language model expert, applying such models to biological sequences can be fruitful. With enough data from, say, genetic sequences of viruses known to be infectious, the model will implicitly learn something about how infectious viruses are structured. “That model will have a lot of sophisticated and complex knowledge,” he says.