easy agents that write actions in code.

-



Today we’re launching smolagents, a quite simple library that unlocks agentic capabilities for language models. Here’s a glimpse:

from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel

agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel())

agent.run("What number of seconds wouldn't it take for a leopard at full speed to run through Pont des Arts?")



Table of Contents



🤔 What are agents?

Any efficient system using AI will need to offer LLMs some form of access to the actual world: as an illustration the chance to call a search tool to get external information, or to act on certain programs with a purpose to solve a task. In other words, LLMs must have agency. Agentic programs are the gateway to the surface world for LLMs.

AI Agents are programs where LLM outputs control the workflow.

Any system leveraging LLMs will integrate the LLM outputs into code. The influence of the LLM’s input on the code workflow is the extent of agency of LLMs within the system.

Note that with this definition, “agent” isn’t a discrete, 0 or 1 definition: as a substitute, “agency” evolves on a continuous spectrum, as you give roughly power to the LLM in your workflow.

The table below illustrates how agency varies across systems:

Agency Level Description How that is called Example Pattern
☆☆☆ LLM output has no impact on program flow Easy processor process_llm_output(llm_response)
★☆☆ LLM output determines basic control flow Router if llm_decision(): path_a() else: path_b()
★★☆ LLM output determines function execution Tool call run_function(llm_chosen_tool, llm_chosen_args)
★★★ LLM output controls iteration and program continuation Multi-step Agent while llm_should_continue(): execute_next_step()
★★★ One agentic workflow can start one other agentic workflow Multi-Agent if llm_trigger(): execute_agent()

The multi-step agent has this code structure:

memory = [user_defined_task]
while llm_should_continue(memory): 
    motion = llm_get_next_action(memory) 
    observations = execute_action(motion)
    memory += [action, observations]

So this technique runs in a loop, executing a brand new motion at each step (the motion can involve calling some pre-determined tools which can be just functions), until its observations make it apparent that a satisfactory state has been reached to unravel the given task. Here’s an example of how a multi-step agent can solve a simple arithmetic query:



✅ When to make use of agents / ⛔ when to avoid them

Agents are useful if you need an LLM to find out the workflow of an app. But they’re often overkill. The query is: do I actually need flexibility within the workflow to efficiently solve the duty at hand?
If the pre-determined workflow falls short too often, meaning you wish more flexibility.
Let’s take an example: say you make an app that handles customer requests on a browsing trip website.

You can know prematurely that the requests will belong to either of two buckets (based on user selection), and you’ve a predefined workflow for every of those 2 cases.

  1. Want some knowledge on the trips? ⇒ give them access to a search bar to look your knowledge base
  2. Desires to consult with sales? ⇒ allow them to type in a contact form.

If that deterministic workflow matches all queries, by all means just code every thing! This offers you a 100% reliable system with no risk of error introduced by letting unpredictable LLMs meddle in your workflow. For the sake of simplicity and robustness, it’s advised to regularize towards not using any agentic behaviour.

But what if the workflow cannot be determined that well prematurely?

As an illustration, a user desires to ask : "I can come on Monday, but I forgot my passport so risk being delayed to Wednesday, is it possible to take me and my stuff to surf on Tuesday morning, with a cancellation insurance?" This query hinges on many aspects, and possibly not one of the predetermined criteria above will suffice for this request.

If the pre-determined workflow falls short too often, meaning you wish more flexibility.

That’s where an agentic setup helps.

Within the above example, you may just make a multi-step agent that has access to a weather API for weather forecasts, Google Maps API to compute travel distance, an worker availability dashboard and a RAG system in your knowledge base.

Until recently, computer programs were restricted to pre-determined workflows, attempting to handle complexity by piling up if/else switches. They focused on extremely narrow tasks, like “compute the sum of those numbers” or “find the shortest path on this graph”. But actually, most real-life tasks, like our trip example above, don’t slot in pre-determined workflows. Agentic systems open up the vast world of real-world tasks to programs!



Code agents

In a multi-step agent, at each step, the LLM can write an motion, in the shape of some calls to external tools. A standard format (utilized by Anthropic, OpenAI, and plenty of others) for writing these actions is mostly different shades of “writing actions as a JSON of tools names and arguments to make use of, which you then parse to know which tool to execute and with which arguments”.

Multiple research papers have shown that having the tool calling LLMs in code is a lot better.

The explanation for this is solely that we crafted our code languages specifically to be the perfect possible approach to express actions performed by a pc. If JSON snippets were a greater expression, JSON can be the highest programming language and programming can be hell on earth.

The figure below, taken from Executable Code Actions Elicit Higher LLM Agents, illustrates some benefits of writing actions in code:

Writing actions in code reasonably than JSON-like snippets provides higher:

  • Composability: could you nest JSON actions inside one another, or define a set of JSON actions to re-use later, the identical way you may just define a python function?
  • Object management: how do you store the output of an motion like generate_image in JSON?
  • Generality: code is built to specific simply anything you’ll be able to have a pc do.
  • Representation in LLM training data: loads of quality code actions is already included in LLMs’ training data which suggests they’re already trained for this!



Introducing smolagents: making agents easy 🥳

We built smolagents with these objectives:

Simplicity: the logic for agents matches in ~1000’s lines of code (see this file). We kept abstractions to their minimal shape above raw code!

🧑‍💻 First-class support for Code Agents, i.e. agents that write their actions in code (versus “agents getting used to write down code”). To make it secure, we support executing in sandboxed environments via E2B.

🤗 Hub integrations: you’ll be able to share and cargo tools to/from the Hub, and more is to come back!

🌐 Support for any LLM: it supports models hosted on the Hub loaded of their transformers version or through our inference API, but additionally supports models from OpenAI, Anthropic and plenty of others via our LiteLLM integration.

smolagents is the successor to transformers.agents, and shall be replacing it as transformers.agents gets deprecated in the long run.



Constructing an agent

To construct an agent, you wish at the least two elements:

  • tools: an inventory of tools the agent has access to
  • model: an LLM that shall be the engine of your agent.

For the model, you should utilize any LLM, either open models using our HfApiModel class, that leverages Hugging Face’s free inference API (as shown within the leopard example above), or you should utilize LiteLLMModel to leverage litellm and pick from an inventory of 100+ different cloud LLMs.

For the tool, you’ll be able to just make a function with type hints on inputs and outputs, and docstrings giving descriptions for inputs, and use the @tool decorator to make it a tool.

Here’s how one can make a custom tool that gets travel times from Google Maps, and how one can use it right into a travel planner agent:

from typing import Optional
from smolagents import CodeAgent, HfApiModel, tool

@tool
def get_travel_duration(start_location: str, destination_location: str, transportation_mode: Optional[str] = None) -> str:
    """Gets the travel time between two places.

    Args:
        start_location: the place from which you begin your ride
        destination_location: the place of arrival
        transportation_mode: The transportation mode, in 'driving', 'walking', 'bicycling', or 'transit'. Defaults to 'driving'.
    """
    import os   
    import googlemaps
    from datetime import datetime

    gmaps = googlemaps.Client(os.getenv("GMAPS_API_KEY"))

    if transportation_mode is None:
        transportation_mode = "driving"
    try:
        directions_result = gmaps.directions(
            start_location,
            destination_location,
            mode=transportation_mode,
            departure_time=datetime(2025, 6, 6, 11, 0), 
        )
        if len(directions_result) == 0:
            return "No way found between these places with the required transportation mode."
        return directions_result[0]["legs"][0]["duration"]["text"]
    except Exception as e:
        print(e)
        return e

agent = CodeAgent(tools=[get_travel_duration], model=HfApiModel(), additional_authorized_imports=["datetime"])

agent.run("Are you able to give me a pleasant one-day trip around Paris with just a few locations and the times? May very well be in the town or outside, but should slot in at some point. I'm travelling only with a rented bicycle.")

After just a few steps of gathering travel times and running calculations, the agent returns this final proposition:

One-day Paris bike trip itinerary:
1. Start at Eiffel Tower at 9:00 AM.
2. Sightseeing at Eiffel Tower until 10:30 AM.
3. Travel to Notre-Dame Cathedral at 10:46 AM.
4. Sightseeing at Notre-Dame Cathedral until 12:16 PM.
5. Travel to Montmartre at 12:41 PM.
6. Sightseeing at Montmartre until 2:11 PM.
7. Travel to Jardin du Luxembourg at 2:33 PM.
8. Sightseeing at Jardin du Luxembourg until 4:03 PM.
9. Travel to Louvre Museum at 4:12 PM.
10. Sightseeing at Louvre Museum until 5:42 PM.
11. Lunch break until 6:12 PM.
12. Planned end time: 6:12 PM.

After constructing a tool, sharing it to the Hub is so simple as:

get_travel_duration.push_to_hub("{your_username}/get-travel-duration-tool")

You may see the result under this space.
You may check the logic for the tool under the file tool.py within the space. As you’ll be able to see, the tool was actually exported to a category inheriting from class Tool, which is the underlying structure for all our tools.



How strong are open models for agentic workflows?

We have created CodeAgent instances with some leading models, and compared them on this benchmark that gathers questions from just a few different benchmarks to propose a varied mix of challenges.

Find the benchmark here for more detail on the agentic setup used, and see a comparison of code agents versus tool calling agents (spoilers: code works higher).

benchmark of different models on agentic workflows

This comparison shows that open source models can now tackle the perfect closed models!



Next steps 🚀



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x