structured generation in Rust and Python

dottxt and Hugging Face are excited to announce that we have now been collaborating on outlines-core, a Rust port of outlines’s core algorithms for structured generation. On top of getting reliable output from LLMs with outlines, this Rust port offers several further advantages to users of outlines:

Speed: Users can expect to see an 2x improvement in index compilation.
Separation of Concerns: It’s now easier to include structured generation into other libraries. outlines-core could be very lightweight.
Portability: Having core algorithms in Rust allows binding for languages aside from Python.

These improvements shouldn’t only improve the performance for existing outlines users, but in addition dramatically increase the ways users can incorporate structured generation into their LLM workflows. outlines-core is now public, integrated in outlines, and the version 0.1.0 of the Python bindings are out. You will discover the repo here.

A fast primer on structured generation 🧑‍🎓

How it really works

Structured generation signifies that your LLM is guaranteed to follow a desired format. This might be JSON, a Pydantic Model, a daily expression or a context-free grammar. The bottom line is that structured generation forbids the ‘improper’ tokens from being generated.

Let’s take a particularly easy example. The LLM should generate a boolean, “true” or “false”. And nothing more. For the sake of illustration, let’s say that LLMs generate characters as an alternative of tokens. So the primary character is ", we are able to just skip the forward pass. For the second, we don’t must sample from all possible characters. The LLM should just choose from t or f.

After that, whatever the path we take, there is just one valid next character. If the LLM selected t as the primary character, then it has to follow with r, u and e. And similarly if it selected f it follows with a, l, s, e. And can select the last " as the ultimate character whatever the path. There may be in fact more under the hood, for more in-depth coverage we recommend this dottxt blog and the associated paper on arxiv.

Why it’s necessary

It may not immediately be obvious how amazing structured generation might be. The primary use-case many consider is “nice, now my LLM can return valid JSON, so I can treat it as an API and serialize/deserialize JSON reliably”. But that’s just scratching the surface. When you concentrate on it, structure is in every single place, even in places where you least expect it just like the GSM8K benchmark.

These are only a few examples of what structured generation enables:

And, perhaps more surprising, it reduces the sensitivity of evaluations to the specific prompt getting used and the variety of shots. Aside from the amazing tricks that structure gives you, it’s also more performant. The dottxt blog has many good articles with performance benchmarks.

Why rewrite in Rust? 🦀

Speed

Probably the very first thing that involves your mind while you hear “rewrite in Rust” is performance. And yes, that’s the case for outlines-core as well. Several key parts are yet to be moved over to Rust, and despite that, we already see an average 2x improvement in compilation speed.

Before the Rust port, Outlines used Numba to speed up the constructing of the index. While Numba is fast (the runtime performance is comparable to Rust), the JIT-compilation of the Numba functions added a source of latency through the first run, which was a source of frustration for a lot of users. Using Rust means we are able to compile the index constructing functions ahead of time, adding no latency through the first run. While this was not necessary in a production context (for the reason that first run could anyhow be done as a part of deployment), it could possibly make an enormous difference through the experimentation phase!

Safety and Reliability

One in all the most important reasons for rewriting Outlines in Rust is the emphasis on safety and reliability that Rust brings to the table. Rust’s strong static typing, combined with Rust’s ownership model, eliminate entire classes of bugs, corresponding to null pointer dereferences and data races in concurrent code. This results in more robust and secure software.

Within the context of Outlines, safety is crucial. Structured generation often involves complex data structures and manipulations, especially when coping with high-performance inference engines. By leveraging Rust’s safety guarantees, we reduce the danger of runtime errors and undefined behaviors that may arise from memory mismanagement.

Moreover, Rust’s compile-time checks encourage developers to put in writing cleaner and more maintainable code. This improves the present codebase and makes future development more efficient. Latest contributors can onboard more quickly, and the code is less complicated to audit and confirm for correctness.

Separation of concerns

Outlines was designed to do greater than providing the core algorithms for structured generation. Amongst other things, it includes integrations to other libraries like transformers which mean the library packs many dependencies. Separating the core algorithms from the Outlines library signifies that other libraries wishing to incorporate structured generation can achieve this by importing a really lightweight library. So we are able to imagine within the near future libraries corresponding to transformers and llama-cpp-python integrating structured generation directly. This permits the dottxt team to give attention to the core algorithms.

Portability

Most of LLM training is written in Python, but inference is barely different. It happens on many various devices, on specialized servers and is written in a variety of programming languages. Because of this portability also matters for structured generation. By having the core functionality of outlines written in rust, we are able to now create bindings to other languages.

For instance, this port makes the combination into the text-generation-inference much smoother. TGI’s server logic is written in Rust, and we would like to avoid having to call Python code as much as we possibly can. It also means libraries like mistral.rs or models implemented using candle can profit from Outlines’s performance and capabilities.

In the long run we plan to explore bindings to JS/TS, allowing outlines to be utilized in transformers-js. Or potentially Swift bindings, making outlines natively usable on Apple devices. But for now the main focus goes to be on the Python bindings, and continuing to make outlines-core’s feature set complete by expanding support for the JSON Schema specification.

Contribute

Do you want working with structured generation, parsers, making LLMs output only valid JSON? Star the library, tweet about it, take part and contribute! Share your work on Twitter, and with dottxt’s and Hugging Face’s community.

Source link

structured generation in Rust and Python

A fast primer on structured generation 🧑‍🎓

How it really works

Why it’s necessary

Why rewrite in Rust? 🦀

Speed

Safety and Reliability

Separation of concerns

Portability

Contribute

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale

Context Engineering as Your Competitive Edge

Constructing Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo

5 Latest Digital Twin Products Developers Can Use to Construct 6G Networks

Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel

structured generation in Rust and Python

A fast primer on structured generation 🧑‍🎓

How it really works

Why it’s necessary

Why rewrite in Rust? 🦀

Speed

Safety and Reliability

Separation of concerns

Portability

Contribute

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.