Designing Protein Binders Using the Generative Model Proteina-Complexa

-


Developing recent protein-based therapies and catalysts involves the difficult task of designing protein binders, or proteins that bind to a goal protein or small molecule. The search space for possible amino acid sequence permutations and resulting 3D protein structures for a designed binder is vast, and achieving strong, specific binding requires careful optimization of the interactions between the protein binder and the goal. 

To deal with these challenges, NVIDIA has released Proteina-Complexa, a generative model that designs de novo protein binders and enzymes. 

On this post, we detail the important thing technologies behind Proteina-Complexa, explore primary use cases, and highlight the extensive experimental validation of generated protein binders. We also provide a step-by-step guide for using the command-line interface to generate your personal binders.

Key technologies in Proteina-Complexa

Proteina-Complexa performance relies on three distinct technical components: the bottom generative model, the training datasets, and the combination of inference-time compute scaling.

Built on top of the La-Proteina model, Proteina-Complexa uses a partially latent flow-matching framework to generate each fully atomistic binder structures (protein backbone and side-chain) and the corresponding amino acid sequence, called co-design. On this approach, backbone alpha carbon atoms are explicitly modeled in 3D Cartesian space while all other atoms (side-chain and non-alpha-carbon) and the amino acid sequence are compressed right into a learned latent space through an autoencoder. This balances atomic fidelity with computational tractability.

Historically, computational workflows have approached binder design as a fragmented process, often counting on separate models for generating the backbone and the sequence. While these modular methods can yield strong results, co-design enables reasoning at an atomistic level. By generating the amino acid sequence and the fully atomistic structure (backbone and side-chains) concurrently, Proteina-Complexa ensures that the chemical identities and 3D geometry are tightly coupled. This integrated generation allows for the design of precise, high-affinity interfaces which are inherently optimized for folding and synthesis.

Training a generative model for protein binder design requires a considerable amount of structural data on binders and their targets. Proteina-Complexa was trained on over 1 million curated, high-quality experimental and predicted structures from the Protein Data Bank (PDB), AlphaFold Protein Structure Database, PLINDER, and the recently published Teddymer dataset.

The Proteina-Complexa model also introduces a brand new approach for designing binders, unifying a generative approach that leverages knowledge about protein binder structures with inference-time compute scaling to iteratively optimize designs during inference. During binder generation, “reasoning” search algorithms (for instance, Beam Search, Best-of-N) evaluate and refine candidates at intermediate steps, investing additional compute on difficult targets while maintaining the computational efficiency of protein structure knowledge. 

This recent unified approach increases the computational efficiency of the model and the standard of generated binders, measured by in silico success metrics and experimentally validated binding to the goal.

Use cases for Proteina-Complexa

Proteina-Complexa use cases include protein binders for protein targets and small molecule targets, in addition to enzyme design.

Protein binders for protein targets

You should use Proteina-Complexa to design de novo protein binders against disease-relevant targets across indications including oncology, immunology, and neurology. Proteina-Complexa generates binders with full atomic detail: protein backbone, side-chains, and amino acid sequence, enabling direct handoff to experimental testing without intermediate modeling steps.

This use case has been experimentally validated with collaborators from Manifold Bio, Novo Nordisk, Viva Biotech, and Duke University.

Figure 2 shows the next binders generated by Proteina-Complexa: 

  • Difficult TNF-alpha three-chain protein goal (a), surface representation with generated binder in purple 
  • Claudin-1 protein goal (b) in gray surface representation; zoom-in shows red interface hydrogen bonds between goal and binder 
  • Small molecule goal (c) in gray with generated binder in purple/gold

Protein binders for small molecule targets

You should use Proteina-Complexa to design proteins that bind to specific small molecules. Applications include targeted drug delivery, biosensors, and prodrug activation.

This use case has been experimentally validated in collaboration with the University of Cambridge.

Enzyme design

Given a selected enzyme lively site, the 3D arrangement of amino acid residues accountable for catalyzing a chemical response, you need to use Proteina-Complexa to generate structurally diverse proteins that incorporate the lively site structure. This capability enables de novo enzyme design for industrial biocatalysis, environmental remediation, and artificial biology applications.

Experimental validation

The NVIDIA team validated the de novo proteins generated by Proteina-Complexa in extensive wet lab experiments in collaboration with multiple external partners. Overall, tens of hundreds of thousands of initial in silico candidates were generated by Proteina-Complexa. After filtering, around 1 million binder candidates were experimentally tested against 133 distinct protein targets, starting from well-established benchmark targets to therapeutically relevant targets without previously reported binders. 

Large-scale experiments leveraging state-of-the-art multiplexed phage screening technology were run to measure binding hit rates of all candidates against all targets, representing one among the biggest binder design benchmarks so far. 

Moreover, using surface plasmon resonance and western blotting, quantitative binding kinetics were measured for chosen targets of interest. The generated proteins expressed well, demonstrating high folding stability, and Proteina-Complexa was capable of produce binders against most targets, including binders with nano- and picomolar affinities. For example, Proteina-Complexa generated strong binders against the Activin Receptor Type-2A, a promising therapeutic goal in disorders characterised by muscle wasting, for which no similar mini-binders have been reported within the literature.

Beyond protein targets, the team pushed the boundaries of Proteina-Complexa by designing proteins that bind to sugar molecules on the surface of red blood cells. Designing proteins to keep on with sugars is a significant challenge because carbohydrates are small, highly polar, and covered in a dense layer of water that typically prevents a protein from forming a stable attachment. 

While existing AI tools primarily succeed on hydrophobic (water-repelling) surfaces, our system generated 24 candidates for this difficult sugar-binding task. In laboratory assays, 4 of those designs showed strong agglutination signals, being more efficient at clumping red blood cells together than the natural proteins, called lectins, currently utilized in laboratories. 

Additional bio-layer interferometry unambiguously confirmed the direct binding of a lead candidate to the carbohydrate goal. By successfully binding to those highly polar targets, Proteina-Complexa has demonstrated it will possibly tackle complex medical targets that were previously considered nearly unimaginable to design for.

To learn more, see Latent Generative Search unlocks de novo Design of Untapped Biomolecular Interactions at Scale.

How one can generate your personal protein binders using Proteina-Complexa

The next examples use the Proteina-Complexa command line interface.

Prerequisites

  • Familiarity with Python, YAML configuration files, and basic protein structure concepts
  • Access to a minimum of one NVIDIA A100, H100, or newer GPU

Installation and setup

Step 1: Download the code

# Clone the repository
git clone https://github.com/NVIDIA-Digital-Bio/Proteina-Complexa
cd Proteina-Complexa

Step 2: Arrange the environment

Using UV package manager:

# Create a virtual environment and install packages
./env/build_uv_env.sh
source .venv/bin/activate

# Create the environment configuration file (.env) 
complexa init

Edit the environment configuration file (.env) and set the suitable environment variable paths:

LOCAL_CODE_PATH=/path/to/Proteina-Complexa/ 
LOCAL_DATA_PATH=/path/to/Proteina-Complexa/assets 

Load the environment configuration:

# Create the shell arrange script
complexa init uv

# Load the environment variables into the present session
source env.sh

Step 3: Download model checkpoints

# Download Proteina-Complexa model checkpoints
complexa download --complexa-all

# Download community model checkpoints
complexa download --all

Step 4: Validate your setup

complexa validate design 
configs/search_binder_local_pipeline.yaml

How one can design a binder for a protein goal

This instance designs binders for PD-L1, a validated therapeutic goal.

Step 1: Add the goal protein, goal information, and binder length

Note that this step will not be required for the PD-L1 example since the goal protein has already been added.

complexa goal add pdl1 
      --target-path /path/to/your/pdl1.pdb 
      --target-input A1-150 
      --hotspot-residues A45 A67 A89 
      --binder-length 60 120

Step 2: Confirm that the goal was added successfully

complexa goal list
complexa goal show 02_PDL1

Step 3: Run the complete design pipeline: generate -> filter -> evaluate -> analyze

complexa design configs/search_binder_local_pipeline.yaml 
   ++run_name=pdl1_design 
   ++generation.task_name=02_PDL1

Step 4: Monitor the pipeline progress

The complexa design command runs all 4 pipeline stages sequentially. The ++key=value syntax uses Hydra to override YAML configuration parameters on the command line.

On this case, the pipeline generates candidate binders using Proteina-Complexa, filters them by AlphaFold2 reward scores, evaluates the highest candidates by redesigning sequences with ProteinMPNN and refolding with structure prediction, and outputs a summary CSV with all metrics.

It’s also possible to run each stage individually:

complexa generate configs/search_binder_local_pipeline.yaml  # Generate binder structures
complexa filter configs/search_binder_local_pipeline.yaml    # Filter by reward scores
complexa evaluate configs/search_binder_local_pipeline.yaml  # Evaluate with refolding
complexa analyze configs/search_binder_local_pipeline.yaml   # Aggregate results

How one can design a binder for a small molecule goal

The ligand binder workflow uses the identical four-stage pipeline with a distinct configuration file that points to the ligand-target model checkpoint. This instance designs binders for S-adenosylmethionine (SAM), a small molecule certain to CntL, an aminobutyrate transferase (PDB entry 7C7M).

Step 1: Add the small molecule goal

Note that this step will not be required for the SAM example since the goal ligand has already been added.

complexa goal add sam 
    --target-path /path/to/your/7C7M.pdb 
    --ligand SAM 
    --binder-length 100 
    --dict configs/targets/ligand_targets_dict.yaml

Step 2: Confirm that the goal was added successfully

# List all ligand targets in ligand_targets_dict.yaml
complexa goal list --dict configs/targets/ligand_targets_dict.yaml

# Show details for the ligand in 7C7M
complexa goal show 42_7C7M_LIGAND --dict configs/targets/ligand_targets_dict.yaml

Step 3: Run the ligand binder design pipeline

complexa design configs/search_ligand_binder_local_pipeline.yaml 
   ++run_name=sam_design 
   ++generation.task_name=42_7C7M_LIGAND

The pipeline stages (generate, filter, evaluate, analyze) are equivalent to the protein goal workflow. The one differences are the configuration file (which selects the ligand-target checkpoint) and the goal specification format.

Note the next requirements:

  • Proteina-Complexa is designed to run locally on a single or multi-GPU machine, in addition to on a cluster of multiple machines.
  • Each Docker and UV-based virtual environments are supported.

Start with protein binder design 

Proteina-Complexa is a step forward in computational protein binder design, combining co-design of fully atomistic structures and sequences with inference-time compute to generate high-quality binders for protein and small molecule targets, while also enabling the precision scaffolding of enzyme lively sites.

By releasing the source code, trained model checkpoints, datasets, and research papers detailing the innovations, we aim to offer a customizable foundation for researchers and developers constructing the following generation of protein-based therapeutics, catalysts, and biosensors. 

Able to start? 

  • Run inference: Generate high-quality, fully atomistic binders to your targets.
  • Train and fine-tune the model: Adapt the Proteina-Complexa model to your use cases.

Try these resources: 

We invite you to hitch our collaborators from Manifold Bio, Novo Nordisk, Viva Biotech, Duke University, the University of Cambridge, LMU Munich, and the University of Bonn in exploring the capabilities of Proteina-Complexa to generate protein binders, and more.

Acknowledgments

We would love to acknowledge the next people for his or her support and contributions to this post: Micha Livne, Tomas Geffner, Zhonglin Cao, Guoqing Zhou, Kushal Shah, Quiara Neam, Xi Chen, Tianjing Zhang, Pia Hardy, Alejandra Rico, Emine Kucukbenli, and Arash Vahdat.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x