Tool Use, Unified

There’s now a unified tool use API across several popular families of models. This API means the identical code is portable – few or no model-specific changes are needed to make use of tools in chats with Mistral, Cohere, NousResearch or Llama models. As well as, Transformers now includes helper functionality to make tool calling even easier, in addition to complete documentation and examples for the whole tool use process. Support for much more models will likely be added within the near future.

Introduction

Tool use is a curious feature – everyone thinks it’s great, but most individuals haven’t tried it themselves. Conceptually, it’s very straightforward: you give some tools (callable functions) to your LLM, and it may well resolve to call them to assist it reply to user queries. Possibly you give it a calculator so it doesn’t must depend on its internal, unreliable arithmetic abilities. Possibly you let it search the net or view your calendar, otherwise you give it (read-only!) access to an organization database so it may well pull up information or search technical documentation.

Tool use overcomes a number of the core limitations of LLMs. Many LLMs are fluent and loquacious but often imprecise with calculations and facts and hazy on specific details of more area of interest topics. They don’t know anything that happened after their training cutoff date. They’re generalists; they arrive into the conversation with no idea of you or your workplace beyond what you give them within the system message. Tools provide them with access to structured, specific, relevant, and up-to-date information that may help lots in making them into genuinely helpful partners moderately than simply fascinating novelty.

The issues arise, nevertheless, if you actually attempt to implement tool use. Documentation is commonly sparse, inconsistent, and even contradictory – and that is true for each closed-source APIs in addition to open-access models! Although tool use is easy in theory, it often becomes a nightmare in practice: How do you pass tools to the model? How do you make sure the tool prompts match the formats it was trained with? When the model calls a tool, how do you incorporate that into the chat? In case you’ve tried to implement tool use before, you’ve probably found that these questions are surprisingly tricky and that the documentation wasn’t all the time complete and helpful.

Worse, different models can have wildly different implementations of tool use. Even at probably the most basic level of defining the available tools, some providers expect JSON schemas, while others expect Python function headers. Even among the many ones that expect JSON schemas, small details often differ and create big API incompatibilities. This creates a number of friction and usually just deepens user confusion. So, what can we do about all of this?

Chat Templating

Devoted fans of the Hugging Face Cinematic Universe will keep in mind that the open-source community faced the same challenge prior to now with chat models. Chat models use control tokens like <|start_of_user_turn|> or <|end_of_message|> to let the model know what’s occurring within the chat, but different models were trained with totally different control tokens, which meant that users needed to put in writing specific formatting code for every model they wanted to make use of. This was an enormous headache on the time.

Our solution to this was chat templates – essentially, models would include a tiny Jinja template, which might render chats with the proper format and control tokens for every model. Chat templates meant that users could write chats in a universal, model-agnostic format, trusting within the Jinja templates to handle any model-specific formatting required.

The plain approach to supporting tool use, then, was to increase chat templates to support tools as well. And that’s exactly what we did, but tools created many latest challenges for the templating system. Let’s undergo those challenges and the way we solved them. In the method, hopefully, you’ll gain a deeper understanding of how the system works and the way you’ll be able to make it be just right for you.

Passing tools to a chat template

Our first criterion when designing the tool use API was that it needs to be intuitive to define tools and pass them to the chat template. We found that the majority users wrote their tool functions first after which found out the best way to generate tool definitions from them and pass those to the model. This led to an obvious approach: What if users could simply pass functions on to the chat template and let it generate tool definitions for them?

The issue here, though, is that “passing functions” is a really language-specific thing to do, and a number of people access chat models through JavaScript or Rust as a substitute of Python. So, we found a compromise that we expect offers the very best of each worlds: Chat templates expect tools to be defined as JSON schema, but in case you pass Python functions to the template as a substitute, they will likely be mechanically converted to JSON schema for you. This ends in a pleasant, clean API:

def get_current_temperature(location: str):
    """
    Gets the temperature at a given location.

    Args:
        location: The situation to get the temperature for
    """
    return 22.0  

tools = [get_current_temperature]    

chat = [
    {"role": "user", "content": "Hey, what's the weather like in Paris right now?"}
]

tool_prompt = tokenizer.apply_chat_template(
    chat, 
    tools=tools,
    add_generation_prompt=True,
    return_tensors="pt"
)

Internally, the get_current_temperature function will likely be expanded into a whole JSON schema. If you wish to see the generated schema, you should use the get_json_schema function:

>>> from transformers.utils import get_json_schema

>>> get_json_schema(get_current_weather)
{
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The situation to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

In case you prefer manual control otherwise you’re coding in a language aside from Python, you’ll be able to pass JSON schemas like these on to the template. Nonetheless, if you’re working in Python, you’ll be able to avoid handling JSON schema directly. All you might want to do is define your tool functions with clear names, accurate type hints, and complete docstrings, including argument docstrings, since all of those will likely be used to generate the JSON schema that will likely be read by the template. Much of this is nice Python practice anyway, and in case you follow it, you then’ll find that no extra work is required – your functions are already usable as tools!

Remember: accurate JSON schemas, whether generated from docstrings and sort hints or specified manually, are crucial for the model to know the best way to use your tools. The model won’t ever see the code inside your functions, but it can see the JSON schemas. The cleaner and more accurate they’re, the higher!

Adding tool calls to the chat

One detail that is commonly neglected by users (and model documentation 😬) is that when a model calls a tool, this actually requires two messages to be added to the chat history. The primary message is the assistant calling the tool, and the second is the tool response, the output of the called function.

Each tool calls and power responses are essential – keep in mind that the model only knows what’s within the chat history, and it can not give you the option to make sense of a tool response if it may well’t also see the decision it made and the arguments it passed to get that response. “22” by itself is just not very informative, nevertheless it’s very helpful in case you know that the message preceding it was get_current_temperature("Paris, France").

That is certainly one of the areas that could be extremely divergent between different providers, but the usual we settled on is that tool calls are a field of assistant messages, like so:

message = {
    "role": "assistant",
    "tool_calls": [
        {
            "type": "function",
             "function": {
                 "name": "get_current_temperature", 
                 "arguments": {
                     "location": "Paris, France"
                }
            }
        }
    ]
}
chat.append(message)

Adding tool responses to the chat

Tool responses are much simpler, especially when tools only return a single string or number.

message = {
    "role": "tool", 
    "name": "get_current_temperature", 
    "content": "22.0"
}
chat.append(message)

Tool use in motion

Let’s take the code we now have to date and construct a whole example of tool-calling. If you wish to use tools in your personal projects, we recommend fooling around with the code here – try running it yourself, adding or removing tools, swapping models, and tweaking details to get a feel for the system. That familiarity will make things much easier when the time involves implement tool use in your software! To make that easier, this instance is available as a notebook as well.

First, let’s arrange our model. We’ll use Hermes-2-Pro-Llama-3-8B since it’s small, capable, ungated, and it supports tool calling. It’s possible you’ll recuperate results on complex tasks in case you use a bigger model, though!

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, device_map="auto")

Next, we’ll arrange our tool and the chat we wish to make use of. Let’s use the get_current_temperature example from above:

def get_current_temperature(location: str):
    """
    Gets the temperature at a given location.

    Args:
        location: The situation to get the temperature for, within the format "city, country"
    """
    return 22.0  

tools = [get_current_temperature]    

chat = [
    {"role": "user", "content": "Hey, what's the weather like in Paris right now?"}
]

tool_prompt = tokenizer.apply_chat_template(
    chat, 
    tools=tools, 
    return_tensors="pt",
    return_dict=True,
    add_generation_prompt=True,
)
tool_prompt = tool_prompt.to(model.device)

Now we’re able to generate the model’s response to the user query, given the tools it has access to:

out = model.generate(**tool_prompt, max_new_tokens=128)
generated_text = out[0, tool_prompt['input_ids'].shape[1]:]

print(tokenizer.decode(generated_text))

and we get:


{"arguments": {"location": "Paris, France"}, "name": "get_current_temperature"}
<|im_end|>

The model has requested a tool! Note the way it appropriately inferred that it should pass the argument “Paris, France” moderately than simply “Paris”, because that’s the format really useful by the function docstring.

The model does not likely have programmatic access to the tools, though – like all language models, it just generates text. It’s as much as you because the programmer to take the model’s request and call the function. First, though, let’s add the model’s tool request to the chat.

Note that this step can require a bit little bit of manual processing – although you must all the time add the request to the chat within the format below, the text of the tool call request, similar to the tags, may differ between models. Often, it’s quite intuitive, but keep in mind you could need a bit little bit of model-specific json.loads() or re.search() when trying this in your personal code!

message = {
    "role": "assistant", 
    "tool_calls": [
        {
            "type": "function", 
            "function": {
                "name": "get_current_temperature", 
                "arguments": {"location": "Paris, France"}
            }
        }
    ]
}
chat.append(message)

Now, we actually call the tool in our Python code, and we add its response to the chat:

message = {
    "role": "tool", 
    "name": "get_current_temperature", 
    "content": "22.0"
}
chat.append(message)

And eventually, just as we did before, we format the updated chat and pass it to the model, in order that it may well use the tool response in conversation:

tool_prompt = tokenizer.apply_chat_template(
    chat, 
    tools=tools, 
    return_tensors="pt",
    return_dict=True,
    add_generation_prompt=True,
)
tool_prompt = tool_prompt.to(model.device)

out = model.generate(**tool_prompt, max_new_tokens=128)
generated_text = out[0, tool_prompt['input_ids'].shape[1]:]

print(tokenizer.decode(generated_text))

And we get the ultimate response to the user, built using information from the intermediate tool calling step:

The present temperature in Paris is 22.0 degrees Celsius. Enjoy your day!<|im_end|>

The regrettable disunity of response formats

While reading this instance, you will have noticed that regardless that chat templates can hide model-specific differences when converting from chats and power definitions to formatted text, the identical isn’t true in reverse. When the model emits a tool call, it can accomplish that in its own format, so that you’ll must parse it out manually for now before adding it to the chat within the universal format. Thankfully, many of the formats are pretty intuitive, so this could only be a few lines of json.loads() or, at worst, an easy re.search() to create the tool call dict you would like.

Still, that is the most important a part of the method that continues to be “un-unified.” Now we have some ideas on the best way to fix it, but they’re not quite ready for prime time yet. “Allow us to cook,” as the youngsters say.

Conclusion

Despite the minor caveat above, we expect it is a big improvement from the previous situation, where tool use was scattered, confusing, and poorly documented. We hope this makes it lots easier for open-source developers to incorporate tool use of their projects, augmenting powerful LLMs with a variety of tools that add amazing latest capabilities. From smaller models like Hermes-2-Pro-8B to the enormous state-of-the-art behemoths like Mistral-Large, Command-R-Plus or Llama-3.1-405B, lots of the LLMs on the leading edge now support tool use. We predict tools will likely be an integral a part of the following wave of LLM products, and we hope these changes make it easier so that you can use them in your personal projects. Good luck!

Source link

Tool Use, Unified

Introduction

Chat Templating

Passing tools to a chat template

Adding tool calls to the chat

Adding tool responses to the chat

Tool use in motion

The regrettable disunity of response formats

Conclusion

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

The primary strong attention-free 7B model

Breaking the Hardware Barrier: Software FP8 for Older GPUs

Introduction to ggml

Hugging Face Transformers in Motion: Learning How To Leverage AI for NLP

Infini-Attention, and why we must always keep trying?

Tool Use, Unified

Introduction

Chat Templating

Passing tools to a chat template

Adding tool calls to the chat

Adding tool responses to the chat

Tool use in motion

The regrettable disunity of response formats

Conclusion

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.