How you can Predict Biomolecular Structures Using the OpenFold3 NIM

-


​​For a long time, one in every of biology’s deepest mysteries was how a string of amino acids folds itself into the intricate architecture of life. Researchers built painstaking simulations and statistical models, inching toward a solution but never crossing the brink of prediction at scale. 

Then, deep learning modified all the things. By learning the language of evolution directly from sequence data, AI began to uncover the hidden rules of molecular form, transforming structure prediction from an art into an engineering discipline.

Today, that transformation reaches a brand new milestone. OpenFold3 brings production-ready protein AI into the NVIDIA ecosystem, uniting open science with enterprise-grade performance. Developed by the OpenFold Consortium and accelerated by NVIDIA, OpenFold3 extends structure prediction beyond single proteins to model multi-chain complexes, nucleic acids, and small-molecule ligands—the entire grammar of biological interaction. 

With NVIDIA cuEquivariance for symmetry-aware GPU acceleration, compatibility with MMseqs2-GPU for rapid sequence search, and NVIDIA FLARE for federated training, OpenFold3 delivers unprecedented speed, scale, and privacy-preserving collaboration for biopharma and biotech teams worldwide. 

OpenFold3 is now available and, as an NVIDIA NIM, with additional acceleration. This post walks you thru the best way to use the OpenFold3 NIM to your structure prediction work. 

Prerequisites

Structure prediction with the OpenFold3 NIM

With OpenFold3 NIM, structure prediction can move from prototype to production in only a couple of steps, as detailed below.

Step 1: Access the model

OpenFold3 NIM is accessible through construct.nvidia.com. You possibly can deploy the container locally, on a cluster, or as a managed NIM service.

docker pull nvcr.io/nim/openfold/openfold3:latest

export LOCAL_NIM_CACHE=~/.cache/nim
export NGC_API_KEY=

docker run --rm --name openfold3 
    --runtime=nvidia 
    --gpus 'device=0' 
    -p 8000:8000 
    -e NGC_API_KEY 
    -v $LOCAL_NIM_CACHE:/opt/nim/.cache 
    --shm-size=16g 
    nvcr.io/nim/openfold/openfold3:latest

Step 2: Submit a structure prediction job

Once deployed, you’ll be able to interact with the API using standard REST calls or Python clients:

#!/usr/bin/env python3

import requests
import os
import json
from pathlib import Path

# Define output file and inference endpoint
output_file = "output.json"
url = "http://localhost:8000/biology/openfold/openfold3/predict"

# Define protein sequence
protein_sequence = "MGREEPLNHVEAERQRREKLNQRFYALRAVVPNVSKMDKASLLGDAIAYINELKSKVVKTESEKLQIKNQLEEVKLELAGRLEHHHHHH"

# Define MSA alignment in CSV format
msa_alignment_csv = "key,sequencen-1,MGREEPLNHVEAERQRREKLNQRFYALRAVVPNVSKMDKASLLGDAIAYINELKSKVVKTESEKLQIKNQLEEVKLELAGRLEHHHHHH"

# Define DNA sequences (complementary pair)
dna_sequence_b = "AGGAACACGTGACCC"
dna_sequence_c = "TGGGTCACGTGTTCC"

# Construct request data
data = {
    "request_id": "5GNJ",
    "inputs": [
        {
            "input_id": "5GNJ",
            "molecules": [
                {
                    "type": "protein",
                    "id": "A",
                    "sequence": protein_sequence,
                    "msa": {
                        "main_db": {
                            "csv": {
                                "alignment": msa_alignment_csv,
                                "format": "csv",
                            }
                        }
                    }
                },
                {
                    "type": "dna",
                    "id": "B",
                    "sequence": dna_sequence_b
                },
                {
                    "type": "dna",
                    "id": "C",
                    "sequence": dna_sequence_c
                }
            ],
            "output_format": "pdb"
        }
    ]
}

r = requests.post(url=url, json=data)

# Save the json output
print(r, "Saving to output.json:n", r.text[:200], "...")
Path(output_file).write_text(r.text)

Predictions include 3D coordinates (PDB/mmCIF) and confidence metrics akin to pLDDT, pTM, and ipTM, all delivered in seconds on NVIDIA H100 Tensor Core GPUs.

A brand new open standard for protein structure prediction

The OpenFold Consortium, an industry-led coalition including Bayer, Bristol Myers Squibb, Johnson & Johnson, Novo Nordisk, Outpace Bio, and others, has been instrumental in advancing open, reproducible modeling systems.

OpenFold3 represents the consortium’s most important milestone yet. The model extends structure prediction to multimers, protein–DNA/RNA complexes, and ligand-bound assemblies, achieving accuracy that meets or exceeds leading open-source models. 

Notably, OpenFold3 reaches parity with AlphaFold3 performance on protein–nucleic acid benchmarks, an area where earlier models have traditionally lagged. Additionally it is classified as a Class 1 open-source system under the Linux Foundation open model definitions, ensuring full transparency and reproducibility.

Open science meets enterprise reliability

OpenFold3 is optimized for the NVIDIA accelerated AI computing stack, including:

  • cuEquivariance: Physics-aware acceleration for 3D symmetry operations.
  • MMseqs2-GPU: compatible with this GPU-native multiple sequence alignment tool.
  • NVIDIA FLARE: Compatible with federated learning for cross-institutional fine-tuning without data sharing.

Together, these integrations make OpenFold3 NIM each developer-accessible and enterprise-deployable—a drop-in service for on-prem, hybrid, and cloud environments. NVIDIA TensorRT enables as much as 1.8x faster inference for giant multimers and nucleic acid complexes.

OpenFold3 has been validated in secure federated workflows by Apheris and SandboxAQ, proving its ability to scale across global pharma R&D environments. Federated pipelines enable partners to fine-tune on proprietary data, akin to antibody–antigen complexes or RNA–ligand assemblies, without moving datasets across institutional boundaries.

And since OpenFold3 is a Class 1 open system in response to the Linux Foundation open model definitions, the software and consortium profit from a rapidly growing ecosystem of contributors and benchmarks. This ensures continuous improvement and long-term reliability.

With NVIDIA FLARE integration, organizations can train OpenFold3 collaboratively across multiple sites, akin to pharma partners, research consortia, and hospitals, without sharing sensitive data.

This approach supports regulatory compliance (GDPR and HIPAA, for instance) while unlocking improvements to models from diverse datasets that might otherwise remain siloed.

Constructing the longer term of open protein AI

OpenFold3 is greater than a model. It’s a foundation for the subsequent decade of protein AI. It reflects the convergence of greater than 40 institutions at OpenFold Consortium, open source science, accelerated computing, and federated collaboration, ensuring that the tools utilized by global researchers can even meet enterprise reliability and security standards.

Acknowledgments

Special due to the OpenFold Consortium and partners, including SandboxAQ and Apheris, for his or her collaboration in advancing open, accelerated AI for molecular science.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x