What a boring Jinja snippet tells us concerning the recent Qwen-3 model.
The brand new Qwen-3 model by Qwen ships with a rather more sophisticated chat template than its predecessors Qwen-2.5 and QwQ. By taking a take a look at the differences within the Jinja template, we will find interesting insights into the brand new model.
Chat Templates
What’s a Chat Template?
A chat template defines how conversations between users and models are structured and formatted. The template acts as a translator, converting a human-readable conversation:
[
{ role: "user", content: "Hi there!" },
{ role: "assistant", content: "Hi there, how can I help you today?" },
{ role: "user", content: "I'm looking for a new pair of shoes." },
]
right into a model friendly format:
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Hi there, how can I allow you to today?<|im_end|>
<|im_start|>user
I'm in search of a brand new pair of shoes.<|im_end|>
<|im_start|>assistant
<think>
think>
You may easily view the chat template for a given model on the Hugging Face model page.
Chat Template for Qwen/Qwen3-235B-A22B
Let’s dive into the Qwen-3 chat template and see what we will learn!
1. Reasoning doesn’t need to be forced
and you’ll be able to make it optional via an easy prefill…
Qwen-3 is exclusive in its ability to toggle reasoning via the enable_thinking flag. When set to false, the template inserts an empty pair, telling the model to skip step‑by‑step thoughts. Earlier models baked the tag into every generation, forcing chain‑of‑thought whether you wanted it or not.
{%- if enable_thinking is defined and enable_thinking is fake %}
{{- 'nn nn' }}
{%- endif %}
QwQ for instance, forces reasoning in every conversation.
{%- if add_generation_prompt %}
{>assistantnn' }
{%- endif %}
If the enable_thinking is true, the model is ready to make your mind up whether to think or not.
You may test test out the template with the next code:
import { Template } from "@huggingface/jinja";
import { downloadFile } from "@huggingface/hub";
const HF_TOKEN = process.env.HF_TOKEN;
const file = await downloadFile({
repo: "Qwen/Qwen3-235B-A22B",
path: "tokenizer_config.json",
accessToken: HF_TOKEN,
});
const config = await file!.json();
const template = recent Template(config.chat_template);
const result = template.render({
messages,
add_generation_prompt: true,
enable_thinking: false,
bos_token: config.bos_token,
eos_token: config.eos_token,
});
2. Context Management Ought to be Dynamic
Qwen-3 utilizes a rolling checkpoint system, intelligently preserving or pruning reasoning blocks to keep up relevant context. Older models discarded reasoning prematurely to avoid wasting tokens.
Qwen-3 introduces a “rolling checkpoint” by traversing the message list in reverse to seek out the most recent user turn that wasn’t a tool call. For any assistant replies after that index it keeps the complete blocks; every little thing earlier is stripped out.
Why this matters:
- Keeps the lively plan visible during a multi‑step tool call.
- Supports nested tool workflows without losing context.
- Saves tokens by pruning thoughts the model now not needs.
- Prevents “stale” reasoning from bleeding into recent tasks.
Example
Here’s an example of chain-of-thought preservation through tool calls with Qwen-3 and QwQ.

Try @huggingface/jinja for testing out the chat templates
3. Tool Arguments Need Higher Serialization
Before, every tool_call.arguments field was piped through | tojson, even when it was already a JSON‑encoded string—risking double‑escaping. Qwen‑3 checks the sort first and only serializes when needed.
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{ tojson }
{%- endif %}
4. There’s No Need for a Default System Prompt
Like many models, the Qwen‑2.5 series has a default system prompt.
You’re Qwen, created by Alibaba Cloud. You’re a helpful assistant.
That is pretty common because it helps models reply to user questions like “Who’re you?”
Qwen-3 and QwQ ship without this default system prompt. Despite this, the model can still accurately discover its creator in case you ask it.
Conclusion
Qwen-3 shows us that through the chat_template we will provide higher flexibility, smarter context handling, and improved tool interaction. These improvements not only improve capabilities, but in addition make agentic workflows more reliable and efficent.

