The Real-Time Communication Library for Python

In the previous few months, many latest real-time speech models have been released and full corporations have been founded around each open and closed source models. To call a number of milestones:

OpenAI and Google released their live multimodal APIs for ChatGPT and Gemini. OpenAI even went up to now as to release a 1-800-ChatGPT phone number!
Kyutai released Moshi, a completely open-source audio-to-audio LLM. Alibaba released Qwen2-Audio and Fixie.ai released Ultravox – two open-source LLMs that natively understand audio.
ElevenLabs raised $180m of their Series C.

Despite the explosion on the model and funding side, it’s still difficult to construct real-time AI applications that stream audio and video, especially in Python.

ML engineers may not have experience with the technologies needed to construct real-time applications, akin to WebRTC.
Even code assistant tools like Cursor and Copilot struggle to write down Python code that supports real-time audio/video applications. I do know from experience!

That is why we’re excited to announce FastRTC, the real-time communication library for Python. The library is designed to make it super easy to construct real-time audio and video AI applications entirely in Python!

On this blog post, we’ll walk through the fundamentals of FastRTC by constructing real-time audio applications. At the top, you may understand the core features of FastRTC:

🗣️ Automatic Voice Detection and Turn Taking built-in, so you simply must worry concerning the logic for responding to the user.
💻 Automatic UI – Built-in WebRTC-enabled Gradio UI for testing (or deploying to production!).
📞 Call via Phone – Use fastphone() to get a FREE phone number to call into your audio stream (HF Token required. Increased limits for PRO accounts).
⚡️ WebRTC and Websocket support.
💪 Customizable – You may mount the stream to any FastAPI app so you’ll be able to serve a custom UI or deploy beyond Gradio.
🧰 Numerous utilities for text-to-speech, speech-to-text, stop word detection to get you began.

Let’s dive in.

Getting Began

We’ll start by constructing the “hello world” of real-time audio: echoing back what the user says. In FastRTC, this is so simple as:

from fastrtc import Stream, ReplyOnPause
import numpy as np

def echo(audio: tuple[int, np.ndarray]) -> tuple[int, np.ndarray]:
    yield audio

stream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")
stream.ui.launch()

Let’s break it down:

The ReplyOnPause will handle the voice detection and switch taking for you. You only need to worry concerning the logic for responding to the user. Any generator that returns a tuple of audio, (represented as (sample_rate, audio_data)) will work.
The Stream class will construct a Gradio UI so that you can quickly test out your stream. Once you’ve gotten finished prototyping, you’ll be able to deploy your Stream as a production-ready FastAPI app in a single line of code – stream.mount(app). Where app is a FastAPI app.

Here it’s in motion:

Leveling-Up: LLM Voice Chat

The subsequent level is to make use of an LLM to reply to the user. FastRTC comes with built-in speech-to-text and text-to-speech capabilities, so working with LLMs is de facto easy. Let’s change our echo function accordingly:

import os

from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model)
from openai import OpenAI

sambanova_client = OpenAI(
    api_key=os.getenv("SAMBANOVA_API_KEY"), base_url="https://api.sambanova.ai/v1"
)
stt_model = get_stt_model()
tts_model = get_tts_model()

def echo(audio):
    prompt = stt_model.stt(audio)
    response = sambanova_client.chat.completions.create(
        model="Meta-Llama-3.2-3B-Instruct",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200,
    )
    prompt = response.decisions[0].message.content
    for audio_chunk in tts_model.stream_tts_sync(prompt):
        yield audio_chunk

stream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")
stream.ui.launch()

We’re using the SambaNova API because it’s fast. The get_stt_model() will fetch Moonshine Base and get_tts_model() will fetch Kokoro from the Hub, each of which have been further optimized for on-device CPU inference. But you should use any LLM/text-to-speech/speech-to-text API or perhaps a speech-to-speech model. Bring the tools you’re keen on – FastRTC just handles the real-time communication layer.

Bonus: Call via Phone

If as a substitute of stream.ui.launch(), you call stream.fastphone(), you may get a free phone number to call into your stream. Note, a Hugging Face token is required. Increased limits for PRO accounts.

You may see something like this in your terminal:

INFO:	  Your FastPhone is now live! Call +1 877-713-4471 and use code 530574 to hook up with your stream.
INFO:	  You've 30:00 minutes remaining in your quota (Resetting on 2025-03-23)

You may then call the number and it’s going to connect you to your stream!

Next Steps

Read the docs to learn more concerning the basics of FastRTC.
The very best strategy to start constructing is by trying out the cookbook. Discover methods to integrate with popular LLM providers (including OpenAI and Gemini’s real-time APIs), integrate your stream with a FastAPI app and do a custom deployment, return additional data out of your handler, do video processing, and more!
⭐️ Star the repo and file bug and issue requests!
Follow the FastRTC Org on HuggingFace for updates and take a look at deployed examples!

Thanks for trying out FastRTC!

Source link

The Real-Time Communication Library for Python

Getting Began

Leveling-Up: LLM Voice Chat

Bonus: Call via Phone

Next Steps

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Optimizing Bark using 🤗 Transformers

🧑‍💻 Claude Code sparks ‘selfware’ era

Deploying Hugging Face Models with BentoML: DeepFloyd IF in Motion

Pay along with your AWS Account

Differential Transformer V2

The Real-Time Communication Library for Python

Getting Began

Leveling-Up: LLM Voice Chat

Bonus: Call via Phone

Next Steps

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.