Predicting where proteins are situated inside a cell is critical in biology and drug discovery. This process is often known as subcellular localization. The situation of a protein is tightly linked to its function. Knowing whether a protein resides within the nucleus, cytoplasm, or cell membrane can unlock latest insights into cellular processes and potential therapeutic targets.Â
This post explains how researchers can collaboratively train AI models to predict protein properties akin to subcellular location—without moving sensitive data across institutions—using NVIDIA FLARE and NVIDIA BioNeMo Framework.Â
The right way to fine-tune a model for subcellular localizationÂ
A brand new NVIDIA FLARE tutorial demonstrates the best way to fine-tune an ESM-2nv model to categorise proteins by their subcellular localization. The ESM-2nv model learns from embeddings of protein sequences, leveraging datasets introduced in Light Attention Predicts Protein Location from the Language of Life.
We concentrate on subcellular localization prediction, formatted as FASTA files following the biotrainer standard that include the sequence, training/validation split, and site class (certainly one of 10, for instance: Nucleus, Cell_membrane, and so forth).


An information sample on this FASTA format looks like this:Â
>Sequence1 TARGET=Cell_membrane SET=train VALIDATION=False
MMKTLSSGNCTLNVPAKNSYRMVVLGASRVGKSSIVSRFLNGRFEDQYTPTIEDFHRKVYNIHGDMYQLDILDTSGNHPFPAMRRLSILT
GDVFILVFSLDSRESFDEVKRLQKQILEVKSCLKNKTKEAAELPMVICGNKNDHSELCRQVPAMEAELLVSGDENCAYFEVSAKKNTNVNE
MFYVLFSMAKLPHEMSPALHHKISVQYGDAFHPRPFCMRRTKVAGAYGMVSPFARRPSVNSDLKYIKAKVLREGQARERDKCSIQ
Where:
- TARGET = subcellular location classÂ
- SET = training versus test dataÂ
- VALIDATION = marks validation sequencesÂ
The dataset spans 10 location classes, making it a superb real-world classification challenge.Â
The right way to use federated learning with BioNeMo protein language models
Running this instance is refreshingly easy. With BioNeMo Framework v2.5 in Docker, you possibly can spin up a Jupyter Lab environment directly and run the Federated Protein Property Prediction with BioNeMo tutorial notebook in your browser.Â
On top of the BioNeMo framework, NVIDIA FLARE is used to usher in federated training. As an alternative of pooling datasets from multiple sites, each participant trains locally and contributes only model updates. With FedAvg, those updates are aggregated centrally to form a shared global model—privacy preserved, collaboration enabled.
Training and visualizationÂ
For this demonstration, the team fine-tuned the 650-million-parameter ESM-2nv model, pretrained in BioNeMo. This larger model offers a robust balance between predictive accuracy and computational efficiency, making it well-suited for federated training scenarios.Â
Key steps within the workflow include:Â
- Data splitting: Heterogeneous sampling is applied to mimic the variability one would expect across real-world institutions. This ensures the federated setup more closely reflects practical deployment conditions.Â
- Federated averaging (FedAvg): Local client updates are aggregated right into a shared global model, enabling collaboration without exposing raw protein sequence data.Â
- Visualization with TensorBoard: Researchers can monitor each local and federated training runs in real time. Continuous server-side metrics provide insight into how the worldwide model evolves with each communication round.Â


ResultsÂ
The team compared local training at each site against federated training (FedAvg) under heterogeneous data conditions (alpha = 1.0).Â
| Client | # Samples | Local accuracy | FedAvg accuracy |
| Site-1Â | 1,844Â | 78.2Â | 81.8Â |
| Site-2Â | 2,921Â | 78.9Â | 81.3 |
| Site-3Â | 2,151Â | 79.2Â | 82.1 |
| Average | — | 78.8 | 81.7 |
These results highlight how federated learning leverages knowledge across institutions to construct a stronger model than any site could achieve alone.


Advantages of using BioNeMo and FLARE for protein prediction
The advantages of using BioNeMo and FLARE extend beyond predicting where proteins localize in a cell. This approach supports the community to construct AI for science together. With BioNeMo plus FLARE:Â
- Federated learning strengthens protein property prediction: Pool collective intelligence without sharing raw data.Â
- Collaboration advantages everyone: Each site contributes to a stronger model while keeping sensitive data local.Â
- BioNeMo Framework accelerates discovery: Access state-of-the-art tools for biological sequence evaluation.Â
Start with federated protein predictionÂ
Federated protein property prediction with NVIDIA BioNeMo and NVIDIA FLARE is a component of a strong latest paradigm. Combining the language of life (protein sequences) with federated AI workflows can speed up discoveries in drug development, healthcare, and biotech—all while respecting data privacy.Â
The longer term of life sciences AI isn’t siloed—it’s collaborative. And with FLARE and BioNeMo, that future is already here. Visit the NVIDIA/NVFlare GitHub repo to start with Federated Protein Property Prediction with BioNeMo and to see more advanced examples.
