a MCP-powered agent in ~70 lines of code

-


NEW: tiny-agents now supports AGENTS.md standard. 🥳

Inspired by Tiny Agents in JS, we ported the concept to Python 🐍 and prolonged the huggingface_hub client SDK to act as a MCP Client so it will possibly pull tools from MCP servers and pass them to the LLM during inference.

MCP (Model Context Protocol) is an open protocol that standardizes how Large Language Models (LLMs) interact with external tools and APIs. Essentially, it removed the necessity to write down custom integrations for every tool, making it simpler to plug recent capabilities into your LLMs.

On this blog post, we’ll show you the right way to start with a tiny Agent in Python connected to MCP servers to unlock powerful tool capabilities. You may see just how easy it’s to spin up your individual Agent and begin constructing!

Spoiler : An Agent is basically some time loop built right on top of an MCP Client!



Run the Demo

This section walks you thru the right way to use existing Tiny Agents. We’ll cover the setup and the commands to get an agent running.

First, you have to install the newest version of huggingface_hub with the mcp extra to get all of the vital components.

pip install "huggingface_hub[mcp]>=0.32.0"

Now, let’s run an agent using the CLI!

The best part is that you may load agents directly from the Hugging Face Hub tiny-agents Dataset, or specify a path to your individual local agent configuration!

> tiny-agents run --help
                                                                                                                                                                                     
 Usage: tiny-agents run [OPTIONS] [PATH] COMMAND [ARGS]...                                                                                                                           
                                                                                                                                                                                     
 Run the Agent in the CLI                                                                                                                                                            
                                                                                                                                                                                     
                                                                                                                                                                                     
╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│   path      [PATH]  Path to a local folder containing an agent.json file or a built-in agent stored in the 'tiny-agents/tiny-agents' Hugging Face dataset                         │
│                     (https://huggingface.co/datasets/tiny-agents/tiny-agents)                                                                                                     │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                                                                                                                       │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

When you don’t provide a path to a selected agent configuration, our Tiny Agent will connect by default to the next two MCP servers:

  • the “canonical” file system server, which gets access to your Desktop,
  • and the Playwright MCP server, which knows the right way to use a sandboxed Chromium browser for you.

The next example shows a web-browsing agent configured to make use of the Qwen/Qwen2.5-72B-Instruct model via Nebius inference provider, and it comes equipped with a playwright MCP server, which lets it use an internet browser! The agent config is loaded specifying its path within the tiny-agents/tiny-agents Hugging Face dataset.

While you run the agent, you may see it load, listing the tools it has discovered from its connected MCP servers. Then, it’s ready to your prompts!

Prompt utilized in this demo:

do a Web Seek for HF inference providers on Brave Search and open the primary result after which give me the list of the inference providers supported on Hugging Face

You can even use Gradio Spaces as MCP servers! The next example uses Qwen/Qwen2.5-72B-Instruct model via Nebius inference provider, and connects to a FLUX.1 [schnell] image generation HF Space as an MCP server. The agent is loaded from its configuration within the tiny-agents/tiny-agents dataset on the Hugging Face Hub.

Prompt utilized in this demo:

Generate a 1024×1024 image of a tiny astronaut hatching from an egg on the surface of the moon.

Now that you have seen the right way to run existing Tiny Agents, the next sections will dive deeper into how they work and the right way to construct your individual.



Agent Configuration

Each agent’s behavior (its default model, inference provider, which MCP servers to connect with, and its initial system prompt) is defined by an agent.json file. You can even provide a custom PROMPT.md in the identical directory for a more detailed system prompt. Here is an example:

agent.json
The model and provider fields specify the LLM and inference provider utilized by the agent.
The servers array defines the MCP servers the agent will connect with.
In this instance, a “stdio” MCP server is configured. Any such server runs as an area process. The Agent starts it using the required command and args, after which communicates with it via stdin/stdout to find and execute available tools.

{
    "model": "Qwen/Qwen2.5-72B-Instruct",
    "provider": "nebius",
    "servers": [
        {
            "type": "stdio",
            "command": "npx",
            "args": ["@playwright/mcp@latest"]
        }
    ]
}

PROMPT.md

You're an agent - please keep going until the user’s query is totally resolved [...]

You could find more details about Hugging Face Inference Providers here.



LLMs Can Use Tools

Modern LLMs are built for function calling (or tool use), which enables users to simply construct applications tailored to specific use cases and real-world tasks.

A function is defined by its schema, which informs the LLM what it does and what input arguments it expects. The LLM decides when to make use of a tool, the Agent then orchestrates running the tool and feeding the result back.

tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current temperature for a given location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and country e.g. Paris, France"
                        }
                    },
                    "required": ["location"],
                },
            }
        }
]

InferenceClient implements the identical tool calling interface because the OpenAI Chat Completions API, which is the established standard for inference providers and the community.



Constructing our Python MCP Client

The MCPClient is the center of our tool-use functionality. It’s now a part of huggingface_hub and uses the AsyncInferenceClient to speak with LLMs.

The complete MCPClient code is in here if you would like to follow along using the actual code 🤓

Key responsibilities of the MCPClient:

  • Manage async connections to at least one or more MCP servers.
  • Discover tools from these servers.
  • Format these tools for the LLM.
  • Execute tool calls via the proper MCP server.

​​Here’s a glimpse of the way it connects to an MCP server (the add_mcp_server method):



class MCPClient:
    ...
    async def add_mcp_server(self, type: ServerType, **params: Any):
        
        
        
        

        
        
        read, write = await self.exit_stack.enter_async_context(...)

        
        session = await self.exit_stack.enter_async_context(
            ClientSession(read_stream=read, write_stream=write, ...)
        )
        await session.initialize()

        
        response = await session.list_tools()
        for tool in response.tools:
            
            self.sessions[tool.name] = session 
            
            self.available_tools.append({ 
                "type": "function",
                "function": {
                    "name": tool.name,
                    "description": tool.description,
                    "parameters": tool.input_schema,
                },
            })

It supports stdio servers for local tools (like accessing your file system), and http servers for distant tools! It is also compatible with sse, which is the previous standard for distant tools.



Using the Tools: Streaming and Processing

The MCPClient‘s process_single_turn_with_tools method is where the LLM interaction happens. It sends the conversation history and available tools to the LLM via AsyncInferenceClient.chat.completions.create(..., stream=True).



1. Prepare tools and calling the LLM

First, the tactic determines all tools the LLM should pay attention to for the present turn – this includes tools from MCP servers and any special “exit loop” tools for agent control; then, it makes a streaming call to the LLM:




    
    tools = self.available_tools
    if exit_loop_tools is not None:
        tools = [*exit_loop_tools, *self.available_tools]

    
    response = await self.client.chat.completions.create(
        messages=messages,
        tools=tools,
        tool_choice="auto",  
        stream=True,  
    )

As chunks arrive from the LLM, the tactic iterates through them. Each chunk is instantly yielded, then we reconstruct the whole text response and any tool calls.




async for chunk in response:
      
      yield chunk
      
      …



2. Executing tools

Once the stream is complete, if the LLM requested any tool calls (now fully reconstructed in final_tool_calls), the tactic processes each:



for tool_call in final_tool_calls.values():
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments or "{}")

    
    tool_message = {"role": "tool", "tool_call_id": tool_call.id, "content": "", "name": function_name}

    
    if exit_loop_tools and function_name in [t.function.name for t in exit_loop_tools]:
        
        messages.append(ChatCompletionInputMessage.parse_obj_as_instance(tool_message))
        yield ChatCompletionInputMessage.parse_obj_as_instance(tool_message)
        return 

    
    session = self.sessions.get(function_name) 
    if session is not None:
        result = await session.call_tool(function_name, function_args)
        tool_message["content"] = format_result(result) 
    else:
        tool_message["content"] = f"Error: No session found for tool: {function_name}"
        tool_message["content"] = error_msg

    
    ...

It first checks if the tool called exits the loop (exit_loop_tool). If not, it finds the proper MCP session answerable for that tool and calls session.call_tool(). The result (or error response) is then formatted, added to the conversation history, and yielded so the Agent is aware of the tool’s output.



Our Tiny Python Agent: It’s (Almost) Only a Loop!

With the MCPClient doing all of the job for tool interactions, our Agent class becomes splendidly easy. It inherits from MCPClient and adds the conversation management logic.

The Agent class is tiny and focuses on the conversational loop, the code might be found here.



1. Initializing the Agent

When an Agent is created, it takes an agent config (model, provider, which MCP servers to make use of, system prompt) and initializes the conversation history with the system prompt. The load_tools() method then iterates through the server configurations (defined in agent.json) and calls add_mcp_server (from the parent MCPClient) for each, populating the agent’s toolbox.



class Agent(MCPClient):
    def __init__(
        self,
        *,
        model: str,
        servers: Iterable[Dict], 
        provider: Optional[PROVIDER_OR_POLICY_T] = None,
        api_key: Optional[str] = None,
        prompt: Optional[str] = None, 
    ):
        
        super().__init__(model=model, provider=provider, api_key=api_key)
        
        self._servers_cfg = list(servers)
        
        self.messages: List[Union[Dict, ChatCompletionInputMessage]] = [
            {"role": "system", "content": prompt or DEFAULT_SYSTEM_PROMPT}
        ]

    async def load_tools(self) -> None:
        
        for cfg in self._servers_cfg:
            await self.add_mcp_server(**cfg)



2. The agent’s core: the Loop

The Agent.run() method is an asynchronous generator that processes a single user input. It manages the conversation turns, deciding when the agent’s current task is complete.



async def run(self, user_input: str, *, abort_event: Optional[asyncio.Event] = None, ...) -> AsyncGenerator[...]:
    ...
    while True: 
        ...

        
        
        async for item in self.process_single_turn_with_tools(
            self.messages,
            ...
        ):
            yield item 

        ... 
        
        
        
        if last.get("role") == "tool" and last.get("name") in {t.function.name for t in EXIT_LOOP_TOOLS}:
                return

        
        if last.get("role") != "tool" and num_turns > MAX_NUM_TURNS:
                return
        if last.get("role") != "tool" and next_turn_should_call_tools:
            return
        
        next_turn_should_call_tools = (last_message.get("role") != "tool")

Contained in the run() loop:

  • It first adds the user prompt to the conversation.
  • Then it calls MCPClient.process_single_turn_with_tools(...) to get the LLM’s response and handle any tool executions for one step of reasoning.
  • Each item is instantly yielded, enabling real-time streaming to the caller.
  • After each step, it checks exit conditions: if a special “exit loop” tools was used, if a maximum turn limit is hit, or if the LLM provides a text response that seems final for the present request.



Next Steps

There are a variety of cool ways to explore and expand upon the MCP Client and the Tiny Agent 🔥
Listed below are some ideas to get you began:

  • Benchmark how different LLM models and inference providers impact agentic performance: Tool calling performance can differ because each provider may optimize it in a different way. You could find the list of supported providers here.
  • Run tiny agents with local LLM inference servers, corresponding to llama.cpp, or LM Studio.
  • .. and naturally contribute! Share your unique tiny agents and open PRs in tiny-agents/tiny-agents dataset on the Hugging Face Hub.

Pull requests and contributions are welcome! Again, every part here is open source! 💎❤️



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x