What we’d like to know

Open Responses is a brand new and open inference standard. Initiated by OpenAI, built by the open source AI community, and backed by the Hugging Face ecosystem, Open Responses relies on the Responses API and is designed for the long run of Agents. On this blog post, we’ll take a look at how Open Responses works and why the open source community should use Open Responses.

The era of the chatbot is long gone, and agents dominate inference workloads. Developers are shifting toward autonomous systems that reason, plan, and act over long-time horizons. Despite this shift, much of the ecosystem still uses the Chat Completion format, which was designed for turn-based conversations and falls short for agentic use cases. The Responses format was designed to deal with these limitations, however it is closed and never as widely adopted. The Chat Completion format continues to be the de facto standard despite the alternatives.

This mismatch between the agentic workflow requirements and entrenched interfaces motivates the necessity for an open inference standard. Over the approaching months, we are going to collaborate with the community and inference providers to implement and adapt Open Responses to a shared format, practically able to replacing chat completions.

Open Responses builds on the direction OpenAI has set with their Responses API launched in March 2025, which superseded the present Completion and Assistants APIs with a consistent method to:

Generate Text, Images, and JSON structured outputs
Create Video content through a separate task-based endpoint
Run agentic loops on the provider side, executing tool calls autonomously and returning the .

What’s Open Responses?

Open Responses extends and open-sources the Responses API, making it more accessible for builders and routing providers to interoperate and collaborate on shared interests.

A number of the key points are:

Stateless by default, supporting encrypted reasoning for providers that require it.
Standardized model configuration parameters.
Streaming is modeled as a series of semantic events, not raw text or object deltas.
Extensible via configurable parameters specific to certain model providers.

What do we’d like to know to construct with Open Responses?

We’ll briefly explore the core changes that impact most community members. If you would like to deep dive into the specification, take a look at the Open Responses documentation.

Client Requests to Open Responses

Client requests to Open Responses are just like the present Responses API. Below we display a request to the Open Responses API using curl. We’re calling a proxy endpoint that routes to Inference Providers using the Open Responses API schema.

 curl https://evalstate-openresponses.hf.space/v1/responses 
   -H "Content-Type: application/json" 
   -H "Authorization: Bearer $HF_TOKEN" 
+  -H "OpenResponses-Version: latest" 
   -N 
   -d '{
         "model": "moonshotai/Kimi-K2-Considering:nebius",
         "input": "explain the idea of life"
       }'

Changes for Inference Clients and Providers

Clients that already support the Responses API can migrate to Open Responses with relatively little effort. The most important changes are:

Migrating reasoning streams to make use of extendable “reasoning” chunks reasonably than “reasoning_text”.
Implementing richer state changes and payloads – for instance, a hosted Code Interpreter can send a particular interpreting state to enhance Agent/User observability.

For Model Providers, implementing the changes for Open Responses needs to be straightforward in the event that they already adhere to the Responses API specification. For Routers, there may be now the chance to standardize on a consistent endpoint and support configuration options for personalisation where needed.

Over time, as Providers proceed to innovate, certain features will turn out to be standardized in the bottom specification.

In summary, migrating to Open Responses will make the inference experience more consistent and improve quality as undocumented extensions, interpretations, and workarounds of the legacy Completions API are normalized in Open Responses.

You possibly can see stream reasoning chunks below.

 {
  "model": "moonshotai/Kimi-K2-Considering:together",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": "explain photosynthesis."
    }
  ],
  "stream": true
}

Here’s the difference between Open Response and Responses for reasoning deltas:

{
  "delta": " heres what i'm pondering",
  "sequence_number": 12,
+ "type": "response.reasoning.delta",
- "type": "response.reasoning_text.delta",
  "item_id": "msg_cbfb8a361f26c0ed0cb133b3c2387279b3d54149a262f3a7",
  "output_index": 0,
  "obfuscation": "0HG8OhAdaLQBg",
  "content_index": 0
}

Open Responses for Routing

Open Responses distinguishes between “Model Providers” – those that provide inference, and “Routers” – intermediaries who orchestrate between multiple providers.

Clients can now specify a Provider together with provider-specific API options when making requests, allowing intermediary Routers to orchestrate requests between upstream providers.

Tools

Open Responses natively supports two categories of tools: internal and external. Externally hosted tools are implemented outside the model provider’s system. For instance, client side functions to be executed, or MCP servers. Internally hosted tools are inside the model provider’s system. For instance, OpenAI’s file search or Google Drive integration. The model calls, executes, and retrieves results entirely inside the provider’s infrastructure, requiring no developer intervention.

Sub Agent Loops

Open Responses formalizes the agentic loop which is generally made up of a repeating cycle of reasoning, tool invocation, and response generation that allows models to autonomously complete multi-step tasks.

image source: openresponses.org

The loop operates as follows:

The API receives a user request and samples from the model
If the model emits a tool call, the API executes it (internally or externally)
Tool results are fed back to the model for continued reasoning
The loop repeats until the model signals completion

For internally-hosted tools, the provider manages your complete loop; executing tools, returning results to the model, and streaming output. Which means that multi-step workflows like “search documents, summarize findings, then draft an email” use a single request.

Clients control loop behavior via max_tool_calls to cap iterations and tool_choice to constrain which tools are invocable:

json

{
  "model": "zai-org/GLM-4.7",
  "input": "Find Q3 sales data and email a summary to the team",
  "tools": [...],
  "max_tool_calls": 5,
  "tool_choice": "auto"
}

The response comprises all intermediate items: tool calls, results, reasoning.

Next Steps

Open Responses extends and improves the Responses API, providing richer and more detailed content definitions, compatibility, and deployment options. It also provides a typical method to execute sub-agent loops during primary inference calls, opening up powerful capabilities for AI Applications. We’re looking forward to working with the Open Responses team and the community at large on future development of the specification.

![acceptance test][https://huggingface.co/huggingface/documentation-images/resolve/main/openresponses/image2.png]

You possibly can try Open Responses with Hugging Face Inference Providers today. We now have an early access version available to be used on Hugging Face spaces – try it together with your Client and Open Responses Compliance tool today!

Source link

What we’d like to know

What’s Open Responses?

What do we’d like to know to construct with Open Responses?

Client Requests to Open Responses

Changes for Inference Clients and Providers

Open Responses for Routing

Tools

Sub Agent Loops

Next Steps

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Exciting Changes Are Coming to the TDS Creator Payment Program

I checked out considered one of the largest anti-AI protests ever

OpenAI steps into Anthropic’s Pentagon void

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale

Context Engineering as Your Competitive Edge

What we’d like to know

What’s Open Responses?

What do we’d like to know to construct with Open Responses?

Client Requests to Open Responses

Changes for Inference Clients and Providers

Open Responses for Routing

Tools

Sub Agent Loops

Next Steps

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.