Learn how to Get JSON Output from LLMs: A Practical Guide

Tutorial on enforcing JSON output with Llama.cpp or the Gemini’s API

Large Language Models (LLMs) are great at generating text, but getting structured output like JSON normally requires clever prompting and hoping the LLM understands. Thankfully, JSON mode is becoming more common in LLM frameworks and services. This enables you to define the precise output schema you would like.

This post gets into constrained generation using JSON mode. We’ll use a fancy, nested and realistic JSON schema example to guide LLM frameworks/APIs like Llama.cpp or Gemini API to generate structured data, specifically tourist location information. This builds on a previous post about constrained generation using Guidance, but focuses on the more widely adopted JSON mode.

While more limited than Guidance, JSON mode’s broader support makes it more accessible, especially with cloud-based LLM providers.

During a private project, I discovered that while JSON mode was straightforward with Llama.cpp, getting it to work with Gemini API required some extra steps. This post shares those solutions to show you how to utilize JSON mode effectively.

Our example schema represents a TouristLocation. It is a non-trivial structure with nested objects, lists, enums, and various data types like strings and numbers.

Here’s a simplified version:

{
"name": "string",
"location_long_lat": ["number", "number"],
"climate_type": {"type": "string", "enum": ["tropical", "desert", "temperate", "continental", "polar"]},
"activity_types": ["string"],
"attraction_list": [
{
"name": "string",
"description": "string"
}
],
"tags": ["string"],
"description": "string",
"most_notably_known_for": "string",
"location_type": {"type": "string", "enum": ["city", "country", "establishment", "landmark", "national park", "island", "region", "continent"]},
"parents": ["string"]
}

You may write this kind of schema by hand or you possibly can generate it using the Pydantic library. Here is how you possibly can do it on a simplified example:

from typing import List
from pydantic import BaseModel, Fieldclass TouristLocation(BaseModel):
"""Model for a tourist location"""
high_season_months: List[int] = Field(
[], description="List of months (1-12) when the situation is most visited"
)
tags: List[str] = Field(
...,
description="List of tags describing the situation (e.g. accessible, sustainable, sunny, low cost, pricey)",
min_length=1,
)
description: str = Field(..., description="Text description of the situation")
# Example usage and schema output
location = TouristLocation(
high_season_months=[6, 7, 8],
tags=["beach", "sunny", "family-friendly"],
description="A wonderful beach with white sand and clear blue water.",
)
schema = location.model_json_schema()
print(schema)

This code defines a simplified version of TouristLocation data class using Pydantic. It has three fields:

high_season_months: A listing of integers representing the months of the 12 months (1-12) when the situation is most visited. Defaults to an empty list.
tags: A listing of strings describing the situation with tags like “accessible”, “sustainable”, etc. This field is required (...) and should have no less than one element (min_length=1).
description: A string field containing a text description of the situation. This field can also be required.

The code then creates an instance of the TouristLocation class and uses model_json_schema() to get the JSON Schema representation of the model. This schema defines the structure and forms of the information expected for this class.

model_json_schema() returns:

{'description': 'Model for a tourist location',
'properties': {'description': {'description': 'Text description of the '
'location',
'title': 'Description',
'type': 'string'},
'high_season_months': {'default': [],
'description': 'List of months (1-12) '
'when the situation is '
'most visited',
'items': {'type': 'integer'},
'title': 'High Season Months',
'type': 'array'},
'tags': {'description': 'List of tags describing the situation '
'(e.g. accessible, sustainable, sunny, '
'low cost, pricey)',
'items': {'type': 'string'},
'minItems': 1,
'title': 'Tags',
'type': 'array'}},
'required': ['tags', 'description'],
'title': 'TouristLocation',
'type': 'object'}

Now that now we have our schema, lets see how we are able to implement it. First in Llama.cpp with its Python wrapper and second using Gemini’s API.

Llama.cpp, a C++ library for running Llama models locally. It’s beginner-friendly and has an energetic community. We will probably be using it through its Python wrapper.

Here’s find out how to generate TouristLocation data with it:

# Imports and stuff# Model init:
checkpoint = "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF"
model = Llama.from_pretrained(
repo_id=checkpoint,
n_gpu_layers=-1,
filename="*Q4_K_M.gguf",
verbose=False,
n_ctx=12_000,
)
messages = [
{
"role": "system",
"content": "You are a helpful assistant that outputs in JSON."
f"Follow this schema {TouristLocation.model_json_schema()}",
},
{"role": "user", "content": "Generate information about Hawaii, US."},
{"role": "assistant", "content": f"{location.model_dump_json()}"},
{"role": "user", "content": "Generate information about Casablanca"},
]
response_format = {
"type": "json_object",
"schema": TouristLocation.model_json_schema(),
}
start = time.time()
outputs = model.create_chat_completion(
messages=messages, max_tokens=1200, response_format=response_format
)
print(outputs["choices"][0]["message"]["content"])
print(f"Time: {time.time() - start}")

The code first imports obligatory libraries and initializes the LLM model. Then, it defines a listing of messages for a conversation with the model, including a system message instructing the model to output in JSON format in response to a particular schema, user requests for details about Hawaii and Casablanca, and an assistant response using the desired schema.

Llama.cpp uses context-free grammars under the hood to constrain the structure and generate valid JSON output for a brand new city.

Within the output we get the next generated string:

{'activity_types': ['shopping', 'food and wine', 'cultural'],
'attraction_list': [{'description': 'One of the largest mosques in the world '
'and a symbol of Moroccan architecture',
'name': 'Hassan II Mosque'},
{'description': 'A historic walled city with narrow '
'streets and traditional shops',
'name': 'Old Medina'},
{'description': 'A historic square with a beautiful '
'fountain and surrounding buildings',
'name': 'Mohammed V Square'},
{'description': 'A beautiful Catholic cathedral built in '
'the early 20th century',
'name': 'Casablanca Cathedral'},
{'description': 'A scenic waterfront promenade with '
'beautiful views of the city and the sea',
'name': 'Corniche'}],
'climate_type': 'temperate',
'description': 'A big and bustling city with a wealthy history and culture',
'location_type': 'city',
'most_notably_known_for': 'Its historic architecture and cultural '
'significance',
'name': 'Casablanca',
'parents': ['Morocco', 'Africa'],
'tags': ['city', 'cultural', 'historical', 'expensive']}

Which might then be parsed into an instance of our Pydantic class.

Gemini API, Google’s managed LLM service, claims limited JSON mode support for Gemini Flash 1.5 in its documentation. Nevertheless, it could be made to work with a number of adjustments.

Listed here are the overall instructions to get it to work:

schema = TouristLocation.model_json_schema()
schema = replace_value_in_dict(schema.copy(), schema.copy())
del schema["$defs"]
delete_keys_recursive(schema, key_to_delete="title")
delete_keys_recursive(schema, key_to_delete="location_long_lat")
delete_keys_recursive(schema, key_to_delete="default")
delete_keys_recursive(schema, key_to_delete="default")
delete_keys_recursive(schema, key_to_delete="minItems")print(schema)
messages = [
ContentDict(
role="user",
parts=[
"You are a helpful assistant that outputs in JSON."
f"Follow this schema {TouristLocation.model_json_schema()}"
],
),
ContentDict(role="user", parts=["Generate information about Hawaii, US."]),
ContentDict(role="model", parts=[f"{location.model_dump_json()}"]),
ContentDict(role="user", parts=["Generate information about Casablanca"]),
]
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
# Using `response_mime_type` with `response_schema` requires a Gemini 1.5 Pro model
model = genai.GenerativeModel(
"gemini-1.5-flash",
# Set the `response_mime_type` to output JSON
# Pass the schema object to the `response_schema` field
generation_config={
"response_mime_type": "application/json",
"response_schema": schema,
},
)
response = model.generate_content(messages)
print(response.text)

Here’s find out how to overcome Gemini’s limitations:

Replace $ref with Full Definitions: Gemini stumbles on schema references ($ref). These are used when you’ve got a nested object definition. Replace them with the entire definition out of your schema.

def replace_value_in_dict(item, original_schema):
# Source: https://github.com/pydantic/pydantic/issues/889
if isinstance(item, list):
return [replace_value_in_dict(i, original_schema) for i in item]
elif isinstance(item, dict):
if list(item.keys()) == ["$ref"]:
definitions = item["$ref"][2:].split("/")
res = original_schema.copy()
for definition in definitions:
res = res[definition]
return res
else:
return {
key: replace_value_in_dict(i, original_schema)
for key, i in item.items()
}
else:
return item

Remove Unsupported Keys: Gemini doesn’t yet handle keys like “title”, “AnyOf”, or “minItems”. Remove these out of your schema. This has the consequence of a less readable and fewer restrictive schema but we don’t have one other selection if insist on using Gemini.

def delete_keys_recursive(d, key_to_delete):
if isinstance(d, dict):
# Delete the important thing if it exists
if key_to_delete in d:
del d[key_to_delete]
# Recursively process all items within the dictionary
for k, v in d.items():
delete_keys_recursive(v, key_to_delete)
elif isinstance(d, list):
# Recursively process all items within the list
for item in d:
delete_keys_recursive(item, key_to_delete)

One-Shot or Few-shot Prompting for Enums: Gemini sometimes struggles with enums, outputting all possible values as an alternative of a single selection. The values are also separated by “|” in a single string, making them invalid in response to our schema. Use one-shot prompting, providing a appropriately formatted example, to guide it towards the specified behavior.

By applying these transformations and providing clear examples, you possibly can successfully generate structured JSON output with Gemini API.

JSON mode means that you can get structured data directly out of your LLMs, making them more useful for practical applications. While frameworks like Llama.cpp offer straightforward implementations, you may encounter issues with cloud services like Gemini API.

Hopefully, this blog allowed you to get a greater practical understanding on how JSON mode works and the way you need to use it even when using Gemini’s API which only has partial support to date.

Now that I used to be in a position to get Gemini to somewhat work with JSON mode, I can complete the implementation of my LLM workflow where having data structured in a particular way is obligatory.

You’ll find the predominant code of this post here: https://gist.github.com/CVxTz/8eace07d9bd2c5123a89bf790b5cc39e

Learn how to Get JSON Output from LLMs: A Practical Guide

Tutorial on enforcing JSON output with Llama.cpp or the Gemini’s API

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

“Llama 3.2 in Keras”

Enhancing Model Security for the ML Community

WebGPU Support, Recent Models & Tasks, and More…

The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel

Diffusers welcomes Stable Diffusion 3.5 Large

Learn how to Get JSON Output from LLMs: A Practical Guide

Tutorial on enforcing JSON output with Llama.cpp or the Gemini’s API

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.