NVIDIA Open Sources Audio2Face Animation Model

By leveraging large language and speech models, generative AI is creating intelligent 3D avatars that may engage users in natural conversation, from video games to customer support. To make these characters truly lifelike, they need human-like expressions. NVIDIA Audio2Face accelerates the creation of realistic digital characters by providing real-time facial animation and lip-sync driven by generative AI.

Today, NVIDIA is open sourcing our Audio2Face technology to speed up adoption of AI-powered avatars in games and 3D applications.

Video 1. Demo of the NVIDIA Audio2Face 3.0 diffusion model in motion

Audio2Face uses AI to generate realistic facial animations from audio input. It really works by analyzing acoustic features like phonemes and intonation to create a stream of animation data, which is then mapped to a personality’s facial poses. This data will be rendered offline for pre-scripted content or streamed in real-time for dynamic, AI-driven characters, providing accurate lip-sync and emotional expressions.

NVIDIA Audio2Face diagram — *Figure 1. Speech audio and emotional triggers generate facial animations and lip-sync.*

NVIDIA is open sourcing the Audio2Face models and SDK so every game and 3D application developer can construct and deploy high fidelity characters with innovative animations. We’re also open sourcing the Audio2Face training framework, so anyone can fine-tune and customize our pre-existing models for specific use cases.

See the tables below for the whole list of open source tools and learn more at NVIDIA Developer.

Package	Use
Audio2Face SDK	Libraries and documentation for authoring and runtime facial animations on-device or within the cloud
Autodesk Maya plugin	Reference plugin (v2.0) with local execution that enables users to send audio inputs and receive facial animation for characters in Maya
Unreal Engine 5 plugin	UE5 plugin (v2.5) for UE 5.5 and 5.6 that enables users to send audio inputs and receive facial animation for characters in Unreal Engine 5
Audio2Face Training Framework	Framework (v1.0) to create Audio2Face models together with your data

Table 1. Audio2Face SDK and plugins

Package	Use
Audio2Face Training Sample Data	Example data to start with the training framework
Audio2Face Models	Regression (v2.2) and diffusion (v3.0) models to generate lip-sync
Audio2Emotion Models	Production (v2.2) and experimental (v3.0) models to infer emotional state from audio

Table 2. Audio2Face models and training data

Open sourcing technology allows developers, students, and researchers to learn from and construct upon state-of-the-art code. This creates a feedback loop where the community can add latest features and optimize the technology for diverse use cases. We’re excited to make high-quality facial animation more accessible and may’t wait to see what the community creates with it. Join our NVIDIA Audio2Face developer community on Discord and share your latest work.

The industry-leading Audio2Face model is deployed widely across gaming, media and entertainment, and customer support industries. Quite a few ISVs and game developers, including Convai, Codemasters, GSC Games World, Inworld AI, NetEase, Reallusion, Perfect World Games, Streamlabs, and UneeQ Digital Humans have integrated Audio2Face of their applications.

Video 2. NVIDIA Audio2Face technology in F1 25

Reallusion, who offers a platform for creators to construct 3D characters, integrated Audio2Face inside its suite of tools. “Audio2Face uses AI to create expressive, multilingual facial animation from audio,” said Elvis Huang, head of innovation at Reallusion, Inc. “Its seamless integration with Reallusion’s iClone, Character Creator, and iClone AI Assistant, plus advanced editing tools like face-key editing, face puppeteering, and AccuLip make it easier than ever to supply high-quality character animation.”

Survios, developers of Alien: Rogue Incursion Evolved Edition, sped up their animation process, making it possible to deliver prime quality character experiences sooner. “By integrating Audio2Face into Evolved Edition, we streamlined the pipeline for lip-syncing and facial capture while ensuring a more immersive and authentic character experience for our players,” said Eugene Elkin, game director and lead engineer at Survios.

The Farm 51, creators of the Chernobylite game series, integrated Audio2Face of their latest game. “The combination of NVIDIA Audio2Face technology in Chernobylite 2: Exclusion Zone has been a game-changer for us,” said Wojciech Pazdur, creative director at The Farm 51. “It has allowed us to generate highly detailed facial animations directly from audio, saving countless hours of animation work. Ideas that were unimaginable in the unique Chernobylite are actually possible which brings a brand new level of realism and immersion to the characters, making their performances feel more authentic than ever.”

Below are the opposite announcements for game developers released this month.

NVIDIA Open Sources Audio2Face Animation Model

Latest updates to RTX Kit

NVIDIA vGPU scales up the sport development environment

Graphics development and performance tuning sessions from SIGGRAPH 2025

What’s Next

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Improving Parquet Dedupe on Hugging Face Hub

Faster Assisted Generation with Dynamic Speculation

Welcome, Gradio 5

Why MAP and MRR Fail for Search Rating (and What to Use As an alternative)

Scaling AI-based Data Processing with Hugging Face + Dask

NVIDIA Open Sources Audio2Face Animation Model

Latest updates to RTX Kit

NVIDIA vGPU scales up the sport development environment

Graphics development and performance tuning sessions from SIGGRAPH 2025

What’s Next

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.