Imagine a world where we could predict the behavior of life just by analyzing a sequence of letters. This just isn’t science fiction or a magic world, but an actual world where scientists have been striving to realize this goal for years. These sequences, made up of 4 nucleotides (A, T, C, and G), contain the elemental instructions for all times on Earth, from the smallest microbe to the biggest mammal. Decoding these sequences has the potential to unlock complex biological processes, transforming fields like personalized medicine and environmental sustainability.
Nevertheless, despite this immense potential, decoding even the only microbial genomes is a highly complex task. These genomes consist of tens of millions of DNA base pairs that regulate the interactions between DNA, RNA, and proteins—the three key elements within the central dogma of molecular biology. This complexity exists on multiple levels, from individual molecules to entire genomes, creating an enormous field of genetic information that evolved over a span of billions of years.
Traditional computational tools have struggled to handle the complexity of biological sequences. But with the rise of generative AI, it’s now possible to scale over trillions of sequences and understand complex relationships across sequences of tokens. Constructing on this advancement, researchers on the Arc Institute, Stanford University, and NVIDIA have been working on constructing an AI system that may understand biological sequences like large language models understand human text. Now, they’ve made a groundbreaking development by making a model that captures each the central dogma’s multimodal nature and the complexities of evolution. This innovation may lead to predicting and designing recent biological sequences, from individual molecules to entire genomes. In this text, we’ll explore how this technology works, its potential applications, the challenges it faces, and the long run of genomic modeling.
EVO 1: A Pioneering Model in Genomic Modeling
This research gained attention in late 2024 when NVIDIA and its collaborators introduced Evo 1, a groundbreaking model for analyzing and generating biological sequences across DNA, RNA, and proteins. Trained on 2.7 million prokaryotic and phage genomes, totaling 300 billion nucleotide tokens, the model focused on integrating the central dogma of molecular biology, modeling the flow of genetic information from DNA to RNA to proteins. Its StripedHyena architecture, a hybrid model using convolutional filters and gates, efficiently handled long contexts of as much as 131,072 tokens. This design allowed Evo 1 to link small sequence changes to broader system-wide and organism-level effects, bridging the gap between molecular biology and evolutionary genomics.
Evo 1 was step one in computational modeling of biological evolution. It successfully predicted molecular interactions and genetic variations by analyzing evolutionary patterns in genetic sequences. Nevertheless, as scientists aimed to use it to more complex eukaryotic genomes, the model’s limitations became clear. Evo 1 struggled with single-nucleotide resolution over long DNA sequences and was computationally expensive for larger genomes. These challenges led to the necessity for a more advanced model able to integrating biological data across multiple scales.
EVO 2: A Foundational Model for Genomic Modeling
Constructing upon the teachings learned from Evo-1, researchers launched Evo 2 in February 2025, advancing the sector of biological sequence modeling. Trained on a staggering 9.3 trillion DNA base pairs, the model has learned to know and predict the functional consequences of genetic variation across all domains of life, including bacteria, archaea, plants, fungi, and animals. With over 40 billion parameters, Evo-2’s model can handle an unprecedented sequence length of as much as 1 million base pairs, something that previous models, including Evo-1, couldn’t manage.
What sets Evo 2 other than its predecessors is its ability to model not only the DNA sequences but in addition the interactions between DNA, RNA, and proteins—the whole central dogma of molecular biology. This enables Evo 2 to accurately predict the impact of genetic mutations, from the smallest nucleotide changes to larger structural variations, in ways in which were previously unimaginable.
A key feature of Evo 2 is its strong zero-shot prediction capability which enables it to predict the functional effects of mutations without requiring task-specific fine-tuning. As an example, it accurately classifies clinically significant BRCA1 variants, an important consider breast cancer research, by analyzing DNA sequences alone.
 Potential Applications in Biomolecular Sciences
Evo 2’s capabilities open recent frontiers in genomics, molecular biology, and biotechnology. Among the most promising applications include:
- Healthcare and Drug Discovery: Evo 2 can predict which gene variants are related to specific diseases, aiding in the event of targeted therapies. As an example, in tests with variants of the breast cancer-associated gene BRCA1, Evo 2 achieved over 90% accuracy in predicting which mutations are benign versus potentially pathogenic. Such insights could speed up the event of latest medicines and personalized treatments. ​
- Synthetic Biology and Genetic Engineering: Evo 2’s ability to generate entire genomes opens recent avenues in designing synthetic organisms with desired traits. Researchers can utilize Evo 2 to engineer genes with specific functions, advancing the event of biofuels, environmentally friendly chemicals, and novel therapeutics.
- Agricultural Biotechnology: It may possibly be used to design genetically modified crops with improved traits similar to drought resistance or pest resilience, contributing to global food security and agricultural sustainability.
- Environmental Science: Evo 2 could be applied to design biofuels or engineer proteins that break down environmental pollutants like oil or plastic, contributing to sustainability efforts.​
Challenges and Future Directions
Despite its impressive capabilities, Evo 2 faces challenges. One key hurdle is the computational complexity involved in training and running the model. With a context window of 1 million base pairs and 40 billion parameters, Evo 2 requires significant computational resources to operate effectively. This makes it difficult for smaller research teams to completely utilize its potential without access to high-performance computing infrastructure.
Moreover, while Evo 2 excels at predicting genetic mutation effects, there continues to be much to study the way to use it to design novel biological systems from scratch. Generating realistic biological sequences is just step one; the true challenge lies in understanding the way to use this power to create functional, sustainable biological systems.
Accessibility and Democratization of AI in Genomics
Probably the most exciting elements of Evo 2 is its open-source availability. To democratize access to advanced genomic modeling tools, NVIDIA has made model parameters, training code, and datasets publicly available. This open-access approach allows researchers from all over the world to explore and expand upon Evo 2’s capabilities, accelerating innovation across the scientific community.
The Bottom Line
Evo 2 is a major advancement in genomic modeling, using AI to decode the complex genetic language of life. Its ability to model DNA sequences and their interactions with RNA and proteins opens up recent possibilities in healthcare, drug discovery, synthetic biology, and environmental science. Evo 2 can predict genetic mutations and design recent biological sequences, offering transformative potential for personalized medicine and sustainable solutions. Nevertheless, its computational complexity presents challenges, especially for smaller research teams. By making Evo 2 open-source, NVIDIA is enabling researchers worldwide to explore and expand its capabilities, driving innovation in genomics and biotechnology. As technology continues to evolve, it holds the potential to reshape the long run of biological sciences and environmental sustainability.