Constructing the Hugging Face MCP Server

TL;DR: The Hugging Face Official MCP Server offers unique customization options for AI Assistants accessing the Hub, together with access to 1000’s of AI applications through one easy URL. We used MCPs “Streamable HTTP” transport for deployment, and examine intimately the trade-offs that Server Developers have.

We have learned many things about constructing a useful MCP server within the last month – we’ll describe our journey here.

Introduction

The Model Context Protocol (MCP) is fulfilling its promise of being the usual to attach AI Assistants to the skin world.

At Hugging Face, providing access to the Hub via MCP is an obvious selection, and this text shares our experience developing the hf.co/mcp MCP Server.

Design Selections

The community uses the Hub for research, development, content creation and more. We desired to let people customize the server for their very own needs, in addition to easily access 1000’s of AI applications available on Spaces. This meant making the MCP Server dynamic by adjusting users’ tools on the fly.

The Hugging Face MCP Settings Page where Users can configure their tools.

We also desired to make access easy by avoiding complicated downloads and configuration, so making it remotely accessible via a straightforward URL was a must.

Distant Servers

When constructing a distant MCP Server, the primary decision is deciding how clients will hook up with it. MCP offers several transport options, with different trade-offs. TL;DR: our open source code supports all variants, but for production we selected to go together with the most up-to-date one. This section goes through the various options intimately.

Since its launch in November 2024, MCP has undergone rapid evolution with 3 protocol revisions in 9 months. This has seen the substitute of the SSE Transport with Streamable HTTP, in addition to the introduction and rework of authorization.

These rapid changes mean support for various MCP Features and revisions in Client applications varies, providing extra challenges for our design selections.

Here’s a transient summary of the Transport Options offered by the Model Context Protocol and associated SDKs:

Transport	Notes
`STDIO`	Typically used when the MCP Server is running on the identical computer because the Client. In a position to access local resources similar to files if needed.
`HTTP with SSE`	Used for distant connections over HTTP. Deprecated within the 2025-03-26 version of MCP but still in use.
`Streamable HTTP`	A more flexible distant HTTP transport that gives more options for deployment than the outgoing SSE version

Each STDIO and HTTP with SSE are fully bi-directional by default – meaning that Client and Server maintain an open connection and might send messages to one another at any time.

SSE refers to “Server Sent Events” – a way for HTTP Servers to take care of an open connection and send events in response to a request.

Understanding Streamable HTTP

MCP Server Developers face numerous selections when organising the Streamable HTTP transport.

There are 3 most important communication patterns to select from:

Direct Response – Easy Request/Response (like standard REST APIs). This is ideal for straightforward, stateless operations like easy searches.
Request Scoped Streams – Temporary SSE Streams related to a single Request. This is beneficial for sending Progress Updates if the Tool Call takes an extended time – similar to Video Generation. Moreover the Server may have to request information from the user with an Elicitation, or conduct a Sampling request.
Server Push Streams – Long-lived SSE connection supporting server-initiated messages. This permits Resource, Tool and Prompt List change notifications or ad-hoc Sampling and Elicitations. These connections need extra management similar to keep-alive and resumption mechanics on re-connection.

When using Request Scoped Streams with the official SDKs, use the sendNotification() and sendRequest() methods provided within the RequestHandlerExtra parameter (TypeScript) or set the related_request_id (Python) to send messages to the proper stream.

An extra factor to contemplate is whether or not or not the MCP Server itself needs to take care of state for every connection. This is set by the Server when the Client sends its Initialize request:

	Stateless	Stateful
Session IDs	Not needed	Server responds with an `mcp-session-id`
What it means	Each request is independent	Server maintains client context
Scaling	Easy horizontal scaling: any instance can handle any request	Need session affinity or shared state mechanisms
Resumption	Not needed	May replay messages for broken connections

The table below summarizes the MCP Features and their supported communication pattern:

MCP Feature	Server Push	Request Scoped	Direct Response
Tools, Prompts, Resources	Y	Y	Y
Sampling/Elicitation	Server Initiated at any time	Related to a Client initiated request	N
Resource Subscriptions	Y	N	N
Tool/Prompt List Changes	Y	N	N
Tool Progress Notification	–	Y	N

With Request Scoped streams, Sampling and Elicitation requests need a Stateful connection in order that the mcp-session-id will be used for response association.

The Hugging Face MCP Server is Open Source – and supports STDIO, SSE and Streamable HTTP deployment in each Direct Response and Server Push mode. You possibly can configure keep-alive and last activity timeouts when using Server Push Streams. There’s also a built-in observability dashboard that you may use to know how different Clients manage connections, and handle Tool List change notifications.

The next picture shows our MCP Server connection dashboard running in “Server Push” Streamable HTTP mode:

The Hugging Face MCP Server Connection Dashboard.

Production Deployment

For production, we decided to launch our MCP Server with Streamable HTTP in a Stateless, Direct Response configuration for the next reasons:

Stateless For anonymous users we supply a typical set of Tools for using the Hub together with an Image Generator. For authenticated users our state comprises their chosen tools and chosen Gradio applications. We also ensure that users ZeroGPU quota is appropriately applied for his or her account. That is managed using the supplied HF_TOKEN or OAuth credentials that we look up on request. None of our existing tools require us to take care of another state between requests.

You need to use OAuth login by adding ?login to the MCP Server url – e.g. https://huggingface.co/mcp?login. We may make this the default once the claude.ai distant integration supports the newest OAuth spec.

Direct Response provides the bottom deployment resource overhead – and we do not currently have any Tools that require Sampling or Elicitation during execution.

Future Support At launch, the “HTTP with SSE” transport was still the distant default in numerous MCP Clients. Nonetheless, we didn’t want to speculate heavily in managing it attributable to its imminent deprecation. Fortunately, popular clients had already began making the switch (VSCode and Cursor), and inside every week of launch claude.ai also added support. If that you must connect with SSE, be happy to deploy a replica of our Server on a FreeCPU Hugging Face Space.

Tool List Change Notifications

In the long run, we would really like to support real-time Tool List Modified notifications when users update their settings on the Hub. Nonetheless, this raises a few practical issues:

First, users are likely to configure their favourite MCP Servers of their Client and leave them enabled. Which means that the Client stays connected whilst the applying is open. Sending notifications would mean maintaining as many open connections as there have been currently energetic Clients – no matter energetic usage – on the possibility the user updates their tool configuration.

Second, most MCP Servers and Clients disconnect after a period of inactivity, resuming when mandatory. This inevitably signifies that immediate push notifications can be missed – because the notification channel may have been closed. In practice, it is way simpler for the Client to refresh the connection and Tool List as needed.

Unless you’ve gotten reasonable control over the Client/Server pair, using Server Push Streams adds numerous complexity to a public deployment, when lower-resource solutions for refreshing the Tool List exist.

URL User Experience

Just before launch, @julien-c submitted a PR to incorporate friendly instructions for users visiting hf.co/mcp. This hugely improves the User Experience – the default response is otherwise an unfriendly little bit of JSON.

Initially, we found this generated an unlimited amount of traffic. After a little bit of investigation we found that when returning an internet page quite than an HTTP 405 error, VSCode would poll the endpoint multiple times per second!

The fix suggested by @coyotte508 was to properly detect browsers and only return the page in that circumstance. Thanks also to the VSCode team who rapidly fixed it.

Although not specifically stated – returning a page in this way does seem acceptable inside the MCP Specification.

MCP Client Behaviour

The MCP Protocol sends several requests during initialization. A typical connection sequence is: Initialize, Notifications/Initialize, tools/list after which prompts/list.

On condition that MCP Clients will connect and reconnect whilst open, and the incontrovertible fact that users make periodic calls, we discover there’s a ratio of around 100 MCP Control messages for every Tool Call.

Some clients also send requests that do not make sense for our Stateless, Direct Response configuration – for instance Pings, Cancellations or attempts to list Resources (which is not a capability we currently advertise).

The primary week of July 2025 saw an astonishing 164 different Clients accessing our Server. Interestingly, some of the popular tools is mcp-remote. Roughly half of all Clients use it as a bridge to connect with our distant server.

Conclusion

MCP is rapidly evolving, and we’re enthusiastic about what has already been achieved across Chat Applications, IDEs, Agents and MCP Servers over the previous couple of months.

We will already see how powerful integrating the Hugging Face Hub has been and support for Gradio Spaces now makes it possible for LLMs to be easily prolonged with the newest Machine Learning applications.

Listed below are some great examples of things people have been doing with our MCP Server to date:

We hope that this post has provided insights to the choices that have to be made constructing Distant MCP Servers, and encourage you to try a number of the examples in your favourite MCP Client.

Take a take a look at our Open Source MCP Server, and take a look at a number of the different transport options along with your Client, or open an Issue or Pull Request to make improvements or suggest latest functionality.

Tell us your thoughts, feedback and questions on this discussion thread.

Source link

Constructing the Hugging Face MCP Server

Introduction

Design Selections