AI Agents from Zero to Hero — Part 3

In Part 1 of this tutorial series, we introduced AI Agents, autonomous programs that perform tasks, make decisions, and communicate with others.

In Part 2 of this tutorial series, we understood easy methods to make the Agent attempt to retry until the duty is accomplished through Iterations and Chains.

A single Agent can often operate effectively using a tool, but it will probably be less effective when using many tools concurrently. One method to tackle complicated tasks is thru a “divide-and-conquer” approach: create a specialized Agent for every task and have them work together as a Multi-Agent System (MAS).

In a MAS, multiple agents collaborate to realize common goals, often tackling challenges which are too difficult for a single Agent to handle alone. There are two predominant ways they’ll interact:

Sequential flow – The Agents do their work in a selected order, one after the opposite. For instance, Agent 1 finishes its task, after which Agent 2 uses the result to do its task. This is helpful when tasks depend upon one another and have to be done step-by-step.
Hierarchical flow – Normally, one higher-level Agent manages the entire process and offers instructions to lower level Agents which deal with specific tasks. This is helpful when the ultimate output requires some back-and-forth.

On this tutorial, I’m going to indicate easy methods to construct from scratch various kinds of Multi-Agent Systems, from easy to more advanced. I’ll present some useful Python code that could be easily applied in other similar cases (just copy, paste, run) and walk through every line of code with comments so which you could replicate this instance (link to full code at the top of the article).

Setup

Please consult with Part 1 for the setup of and the predominant LLM.

import ollama
llm = "qwen2.5"

In this instance, I’ll ask the model to process images, subsequently I’m also going to want a Vision LLM. It’s a specialized version of a Large Language Model that, integrating NLP with CV, is designed to grasp visual inputs, reminiscent of images and videos, along with text.

Microsoft’s is an efficient alternative as it will probably also run with no GPU.

After the download is accomplished, you’ll be able to move on to Python and begin writing code. Let’s load a picture in order that we will check out the Vision LLM.

from matplotlib import image as pltimg, pyplot as plt

image_file = "draghi.jpeg"

plt.imshow(pltimg.imread(image_file))
plt.show()

As a way to test the Vision LLM, you’ll be able to just pass the image as an input:

import ollama

ollama.generate(model="llava",
 prompt="describe the image",                
 images=[image_file])["response"]

Sequential Multi-Agent System

I shall construct two Agents that can work in a sequential flow, one after the opposite, where the second takes the output of the primary as an input, similar to a Chain.

The primary Agent must process a picture provided by the user and return a verbal description of what it sees.
The second Agent will search the web and take a look at to grasp where and when the image was taken, based on the outline provided by the primary Agent.

Each Agents shall use one Tool each. The primary Agent may have the Vision LLM as a Tool. Please keep in mind that with , with the intention to use a Tool, the function have to be described in a dictionary.

def process_image(path: str) -> str:
    return ollama.generate(model="llava", prompt="describe the image", images=[path])["response"]

tool_process_image = {'type':'function', 'function':{
  'name': 'process_image',
  'description': 'Load a picture for a given path and describe what you see',
  'parameters': {'type': 'object',
                'required': ['path'],
                'properties': {
                    'path': {'type':'str', 'description':'the trail of the image'},
}}}}

The second Agent must have a web-searching Tool. Within the previous articles of this tutorial series, I showed easy methods to leverage the package for searching the net. So, this time, we will use a brand new Tool: (pip install wikipedia==1.4.0). You may directly use the unique library or import the wrapper.

from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

def search_wikipedia(query:str) -> str:
    return WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper()).run(query)

tool_search_wikipedia = {'type':'function', 'function':{
  'name': 'search_wikipedia',
  'description': 'Search on Wikipedia by spending some keywords',
  'parameters': {'type': 'object',
                'required': ['query'],
                'properties': {
                    'query': {'type':'str', 'description':'The input have to be short keywords, not an extended text'},
}}}}
## test
search_wikipedia(query="draghi")

First, that you must write a prompt to explain the duty of every Agent (the more detailed, the higher), and that might be the primary message within the chat history with the LLM.

prompt = '''
You're a photographer that analyzes and describes images in details.
'''
messages_1 = [{"role":"system", "content":prompt}]

One essential decision to make when constructing a MAS is whether or not the Agents should share the chat history or not. The management of chat history is determined by the design and objectives of the system:

Shared chat history – Agents have access to a standard conversation log, allowing them to see what other Agents have said or done in previous interactions. This will enhance the collaboration and the understanding of the general context.
Separate chat history – Agents only have access to their very own interactions, focusing only on their very own communication. This design is often used when independent decision-making is vital.

I like to recommend keeping the chats separate unless it’s essential to do otherwise. LLMs might need a limited context window, so it’s higher to make the history as lite as possible.

prompt = '''
You're a detective. You read the image description provided by the photographer, and also you search Wikipedia to grasp when and where the image was taken.
'''

messages_2 = [{"role":"system", "content":prompt}]

For convenience, I shall use the function defined within the previous articles to process the model’s response.

def use_tool(agent_res:dict, dic_tools:dict) -> dict:
    ## use tool
    if "tool_calls" in agent_res["message"].keys():
        for tool in agent_res["message"]["tool_calls"]:
            t_name, t_inputs = tool["function"]["name"], tool["function"]["arguments"]
            if f := dic_tools.get(t_name):
                ### calling tool
                print('🔧 >', f"x1b[1;31m{t_name} -> Inputs: {t_inputs}x1b[0m")
                ### tool output
                t_output = f(**tool["function"]["arguments"])
                print(t_output)
                ### final res
                res = t_output
            else:
                print('🤬 >', f"x1b[1;31m{t_name} -> NotFoundx1b[0m")
    ## don't use tool
    if agent_res['message']['content'] != '':
        res = agent_res["message"]["content"]
        t_name, t_inputs = '', ''
    return {'res':res, 'tool_used':t_name, 'inputs_used':t_inputs}

As we already did in previous tutorials, the interaction with the Agents could be began with a . The user is requested to offer a picture that the primary Agent will process.

dic_tools = {'process_image':process_image, 
 'search_wikipedia':search_wikipedia}

while True:
    ## user input
    try:
        q = input('📷 > give me the image to research:')
    except EOFError:
        break
    if q == "quit":
        break
    if q.strip() == "":
        proceed
    messages_1.append( {"role":"user", "content":q} )

    plt.imshow(pltimg.imread(q))
    plt.show()

    ## Agent 1
    agent_res = ollama.chat(model=llm,
 tools=[tool_process_image],
 messages=messages_1)
    dic_res = use_tool(agent_res, dic_tools)    
 res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]

    print("👽📷 >", f"x1b[1;30m{res}x1b[0m")
    messages_1.append( {"role":"assistant", "content":res} )

The first Agent used the Vision LLM Tool and recognized text within the image. Now, the description will be passed to the second Agent, which shall extract some keywords to search .

    ## Agent 2
    messages_2.append( {"role":"system", "content":"-Picture: "+res} )

    agent_res = ollama.chat(model=llm,
 tools=[tool_search_wikipedia],
 messages=messages_2)
    dic_res = use_tool(agent_res, dic_tools)    
 res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]

The second Agent used the Tool and extracted information from the net, based on the outline provided by the primary Agent. Now, it will probably process every thing and provides a final answer.

    if tool_used == "search_wikipedia":
        messages_2.append( {"role":"system", "content":"-Wikipedia: "+res} )
        agent_res = ollama.chat(model=llm, tools=[], messages=messages_2)
        dic_res = use_tool(agent_res, dic_tools)        
 res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"] 
    else:
        messages_2.append( {"role":"assistant", "content":res} )
   
    print("👽📖 >", f"x1b[1;30m{res}x1b[0m")

This is literally perfect! Let’s move on to the next example.

Hierarchical Multi-Agent System

Imagine having a squad of Agents that operates with a hierarchical flow, just like a human team, with distinct roles to ensure smooth collaboration and efficient problem-solving. At the top, a manager oversees the overall strategy, talking to the customer (the user), making high-level decisions, and guiding the team toward the goal. Meanwhile, other team members handle operative tasks. Just like humans, Agents can work together and delegate tasks appropriately.

I shall build a tech team of 3 Agents with the objective of querying a SQL database per user’s request. They must work in a hierarchical flow:

The Lead Agent talks to the user and understands the request. Then, it decides which team member is the most appropriate for the task.
The Junior Agent has the job of exploring the db and building SQL queries.
The Senior Agent shall review the SQL code, correct it if necessary, and execute it.

LLMs know how to code by being exposed to a large corpus of both code and natural language text, where they learn patterns, syntax, and semantics of programming languages. The model learns the relationships between different parts of the code by predicting the next token in a sequence. In short, LLMs can generate SQL code but can’t execute it, Agents can.

First of all, I am going to create a database and connect to it, then I shall prepare a series of Tools to execute SQL code.

## Read dataset
import pandas as pd

dtf = pd.read_csv('http://bit.ly/kaggletrain')
dtf.head(3)

## Create dbimport sqlite3
dtf.to_sql(index=False, name="titanic",
 con=sqlite3.connect("database.db"),            
 if_exists="replace")

## Connect db
from langchain_community.utilities.sql_database import SQLDatabase

db = SQLDatabase.from_uri("sqlite:///database.db")

Let’s start with the Junior Agent. LLMs don’t need Tools to generate SQL code, but the Agent doesn’t know the table names and structure. Therefore, we need to provide Tools to investigate the database.

from langchain_community.tools.sql_database.tool import ListSQLDatabaseTool

def get_tables() -> str:
    return ListSQLDatabaseTool(db=db).invoke("")

tool_get_tables = {'type':'function', 'function':{
  'name': 'get_tables',
  'description': 'Returns the name of the tables in the database.',
  'parameters': {'type': 'object',
                'required': [],
                'properties': {}
}}}

## test
get_tables()

That can show the available tables within the db, and it will print the columns in a table.

from langchain_community.tools.sql_database.tool import InfoSQLDatabaseTool

def get_schema(tables: str) -> str:
    tool = InfoSQLDatabaseTool(db=db)
    return tool.invoke(tables)

tool_get_schema = {'type':'function', 'function':{
  'name': 'get_schema',
  'description': 'Returns the name of the columns within the table.',
  'parameters': {'type': 'object',
                'required': ['tables'],
                'properties': {'tables': {'type':'str', 'description':'table name. Example Input: table1, table2, table3'}}
}}}

## test
get_schema(tables='titanic')

Since this Agent must use a couple of Tool which could fail, I’ll write a solid prompt, following the structure of the previous article.

prompt_junior = '''
[GOAL] You're a knowledge engineer who builds efficient SQL queries to get data from the database.

[RETURN] You need to return a final SQL query based on user's instructions.

[WARNINGS] Use your tools just once.

[CONTEXT] As a way to generate the right SQL query, that you must know the name of the table and the schema.
First ALWAYS use the tool 'get_tables' to search out the name of the table.
Then, you MUST use the tool 'get_schema' to get the columns within the table.
Finally, based on the knowledge you bought, generate an SQL query to reply user query.
'''

Moving to the Senior Agent. Code checking doesn’t require any particular trick, you’ll be able to just use the LLM.

def sql_check(sql: str) -> str:
    p = f'''Double check if the SQL query is correct: {sql}. You MUST just SQL code without comments'''
    res = ollama.generate(model=llm, prompt=p)["response"]
    return res.replace('sql','').replace('```','').replace('n',' ').strip()

tool_sql_check = {'type':'function', 'function':{
  'name': 'sql_check',
  'description': 'Before executing a question, all the time review the SQL query and proper the code if essential',
  'parameters': {'type': 'object',
                'required': ['sql'],
                'properties': {'sql': {'type':'str', 'description':'SQL code'}}
}}}

## test
sql_check(sql='SELECT * FROM titanic TOP 3')

Executing code on the database is a distinct story: LLMs can’t do this alone.

from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool

def sql_exec(sql: str) -> str:
    return QuerySQLDataBaseTool(db=db).invoke(sql)
   
tool_sql_exec = {'type':'function', 'function':{
  'name': 'sql_exec',
  'description': 'Execute a SQL query',
  'parameters': {'type': 'object',
                'required': ['sql'],
                'properties': {'sql': {'type':'str', 'description':'SQL code'}}
}}}

## test
sql_exec(sql='SELECT * FROM titanic LIMIT 3')

And naturally, a superb prompt.

prompt_senior = '''[GOAL] You're a senior data engineer who reviews and execute the SQL queries written by others.

[RETURN] You need to return data from the database.

[WARNINGS] Use your tools just once.

[CONTEXT] ALWAYS check the SQL code before executing on the database.First ALWAYS use the tool 'sql_check' to review the query. The output of this tool is the proper SQL query.You MUST use ONLY the proper SQL query once you use the tool 'sql_exec'.'''

Finally, we will create the Lead Agent. It has crucial job: invoking other Agents and telling them what to do. There are a lot of ways to realize that, but I find creating a straightforward Tool essentially the most accurate one.

def invoke_agent(agent:str, instructions:str) -> str:
    return agent+" - "+instructions if agent in ['junior','senior'] else f"Agent '{agent}' Not Found"
   
tool_invoke_agent = {'type':'function', 'function':{
  'name': 'invoke_agent',
  'description': 'Invoke one other Agent to be just right for you.',
  'parameters': {'type': 'object',
                'required': ['agent', 'instructions'],
                'properties': {
                    'agent': {'type':'str', 'description':'the Agent name, one in all "junior" or "senior".'},
                    'instructions': {'type':'str', 'description':'detailed instructions for the Agent.'}
                }
}}}

## test
invoke_agent(agent="intern", instructions="construct a question")

Describe within the prompt what type of behavior you’re expecting. Attempt to be as detailed as possible, for hierarchical Multi-Agent Systems can get very confusing.

prompt_lead = '''
[GOAL] You're a tech lead.
You've got a team with one junior data engineer called 'junior', and one senior data engineer called 'senior'.

[RETURN] You need to return data from the database based on user's requests.

[WARNINGS] You're the one one which talks to the user and gets the requests from the user.
The 'junior' data engineer only builds queries.
The 'senior' data engineer checks the queries and execute them.

[CONTEXT] First ALWAYS ask the users what they need.
Then, you MUST use the tool 'invoke_agent' to pass the instructions to the 'junior' for constructing the query.
Finally, you MUST use the tool 'invoke_agent' to pass the instructions to the 'senior' for retrieving the information from the database.
'''

I shall keep chat history separate so each Agent will know only a selected a part of the entire process.

dic_tools = {'get_tables':get_tables,
            'get_schema':get_schema,
            'sql_exec':sql_exec,
            'sql_check':sql_check,
            'Invoke_agent':invoke_agent}

messages_junior = [{"role":"system", "content":prompt_junior}]
messages_senior = [{"role":"system", "content":prompt_senior}]
messages_lead   = [{"role":"system", "content":prompt_lead}]

All the pieces is able to start the workflow. After the user begins the chat, the primary to reply is the Leader, which is the one one which directly interacts with the human.

while True:
    ## user input
    q = input('🙂 >')
    if q == "quit":
        break
    messages_lead.append( {"role":"user", "content":q} )
   
    ## Lead Agent
    agent_res = ollama.chat(model=llm, messages=messages_lead, tools=[tool_invoke_agent])
    dic_res = use_tool(agent_res, dic_tools)
    res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
    agent_invoked = res.split("-")[0].strip() if len(res.split("-")) > 1 else ''
    instructions = res.split("-")[1].strip() if len(res.split("-")) > 1 else ''

    ###-->CODE TO INVOKE OTHER AGENTS HERE<--###

    ## Lead Agent final response    print("👩‍💼 >", f"x1b[1;30m{res}x1b[0m")    messages_lead.append( {"role":"assistant", "content":res} )

The Lead Agent decided to invoke the Junior Agent giving it some instruction, based on the interaction with the user. Now the Junior Agent shall start working on the query.

    ## Invoke Junior Agent
    if agent_invoked == "junior":
        print("😎 >", f"x1b[1;32mReceived instructions: {instructions}x1b[0m")
        messages_junior.append( {"role":"user", "content":instructions} )
       
        ### use the tools
        available_tools = {"get_tables":tool_get_tables, "get_schema":tool_get_schema}
        context = ''
        while available_tools:
            agent_res = ollama.chat(model=llm, messages=messages_junior,
                                    tools=[v for v in available_tools.values()])
            dic_res = use_tool(agent_res, dic_tools)
            res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
            if tool_used:
                available_tools.pop(tool_used)
            context = context + f"nTool used: {tool_used}. Output: {res}" #->add tool usage context
            messages_junior.append( {"role":"user", "content":context} )
           
        ### response
        agent_res = ollama.chat(model=llm, messages=messages_junior)
        dic_res = use_tool(agent_res, dic_tools)
        res = dic_res["res"]
        print("😎 >", f"x1b[1;32m{res}x1b[0m")
        messages_junior.append( {"role":"assistant", "content":res} )

The Junior Agent activated all its Tools to explore the database and collected the necessary information to generate some SQL code. Now, it must report back to the Lead.

        ## update Lead Agent
        context = "Junior already wrote this query: "+res+ "nNow invoke the Senior to review and execute the code."
        print("👩‍💼 >", f"x1b[1;30m{context}x1b[0m")
        messages_lead.append( {"role":"user", "content":context} )
        agent_res = ollama.chat(model=llm, messages=messages_lead, tools=[tool_invoke_agent])
        dic_res = use_tool(agent_res, dic_tools)
        res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
                agent_invoked = res.split("-")[0].strip() if len(res.split("-")) > 1 else ''
        instructions = res.split("-")[1].strip() if len(res.split("-")) > 1 else ''

The Lead Agent received the output from the Junior and asked the Senior Agent to review and execute the SQL query.

    ## Invoke Senior Agent
    if agent_invoked == "senior":
        print("🧓 >", f"x1b[1;34mReceived instructions: {instructions}x1b[0m")
        messages_senior.append( {"role":"user", "content":instructions} )
       
        ### use the tools
        available_tools = {"sql_check":tool_sql_check, "sql_exec":tool_sql_exec}
        context = ''
        while available_tools:
            agent_res = ollama.chat(model=llm, messages=messages_senior,
                                    tools=[v for v in available_tools.values()])
            dic_res = use_tool(agent_res, dic_tools)
            res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
            if tool_used:
                available_tools.pop(tool_used)
            context = context + f"nTool used: {tool_used}. Output: {res}" #->add tool usage context
            messages_senior.append( {"role":"user", "content":context} )
           
        ### response
        print("🧓 >", f"x1b[1;34m{res}x1b[0m")
        messages_senior.append( {"role":"assistant", "content":res} )

The Senior Agent executed the query on the db and got a solution. Finally, it will probably report back to the Lead which is able to give the ultimate answer to the user.

        ### update Lead Agent
        context = "Senior agent returned this output: "+res
        print("👩‍💼 >", f"x1b[1;30m{context}x1b[0m")
        messages_lead.append( {"role":"user", "content":context} )

Conclusion

This text has covered the fundamental steps of making Multi-Agent Systems from scratch using only . With these constructing blocks in place, you’re already equipped to start out developing your personal MAS for various use cases.

Stay tuned for Part 4, where we are going to dive deeper into more advanced examples.

Full code for this text: GitHub

I hope you enjoyed it! Be at liberty to contact me for questions and feedback or simply to share your interesting projects.

👉 Let’s Connect 👈

All images, unless otherwise noted, are by the writer

AI Agents from Zero to Hero — Part 3

Setup

Sequential Multi-Agent System

Hierarchical Multi-Agent System

Conclusion

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Hugging Face partners with TruffleHog to Scan for Secrets

Exploring TabPFN: A Foundation Model Built for Tabular Data

Speed up 1.0.0

How IntelliNode Automates Complex Workflows with Vibe Agents

Introducing Community Tools on HuggingChat

AI Agents from Zero to Hero — Part 3

Setup

Sequential Multi-Agent System

Hierarchical Multi-Agent System

Conclusion

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.