Selecting Between LLM Agent Frameworks

The tradeoffs between constructing bespoke code-based agents and the main agent frameworks.

Due to John Gilhuly for his contributions to this piece.

Agents are having a moment. With multiple latest frameworks and fresh investment within the space, modern AI agents are overcoming shaky origins to rapidly supplant RAG as an implementation priority. So will 2024 finally be the yr that autonomous AI systems that may take over writing our emails, booking flights, talking to our data, or seemingly some other task?

Perhaps, but much work stays to get to that time. Any developer constructing an agent must not only select foundations — which model, use case, and architecture to make use of — but in addition which framework to leverage. Do you go along with the long-standing LangGraph, or the newer entrant LlamaIndex Workflows? Or do you go the normal route and code the entire thing yourself?

This post goals to make that selection a bit easier. Over the past few weeks, I built the identical agent in major frameworks to look at among the strengths and weaknesses of every at a technical level. The entire code for every agent is out there in this repo.

Background on the Agent Used for Testing

The agent used for testing includes function calling, multiple tools or skills, connections to outside resources, and shared state or memory.

The agent has the next capabilities:

Answering questions from a knowledge base
Talking to data: answering questions on telemetry data of an LLM application
Analyzing data: analyzing higher-level trends and patterns in retrieved telemetry data

To be able to accomplish these, the agent has three starting skills: RAG with product documentation, SQL generation on a trace database, and data evaluation. A straightforward gradio-powered interface is used for the agent UI, with the agent itself structured as a chatbot.

The primary option you may have when developing an agent is to skip the frameworks entirely and construct the agent fully yourself. When embarking on this project, this was the approach I began with.

Pure Code Architecture

The code-based agent below is made up of an OpenAI-powered router that uses function calling to pick out the correct skill to make use of. After that skill completes, it returns back to the router to either call one other skill or reply to the user.

The agent keeps an ongoing list of messages and responses that’s passed fully into the router on each call to preserve context through cycles.

def router(messages):
if not any(
isinstance(message, dict) and message.get("role") == "system" for message in messages
):
system_prompt = {"role": "system", "content": SYSTEM_PROMPT}
messages.append(system_prompt)response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=skill_map.get_combined_function_description_for_openai(),
)
messages.append(response.decisions[0].message)
tool_calls = response.decisions[0].message.tool_calls
if tool_calls:
handle_tool_calls(tool_calls, messages)
return router(messages)
else:
return response.decisions[0].message.content

The talents themselves are defined in their very own classes (e.g. GenerateSQLQuery) which are collectively held in a SkillMap. The router itself only interacts with the SkillMap, which it uses to load skill names, descriptions, and callable functions. This approach implies that adding a brand new skill to the agent is so simple as writing that skill as its own class, then adding it to the list of skills within the SkillMap. The concept here is to make it easy so as to add latest skills without disturbing the router code.

class SkillMap:
def __init__(self):
skills = [AnalyzeData(), GenerateSQLQuery()]self.skill_map = {}
for skill in skills:
self.skill_map[skill.get_function_name()] = (
skill.get_function_dict(),
skill.get_function_callable(),
)
def get_function_callable_by_name(self, skill_name) -> Callable:
return self.skill_map[skill_name][1]
def get_combined_function_description_for_openai(self):
combined_dict = []
for _, (function_dict, _) in self.skill_map.items():
combined_dict.append(function_dict)
return combined_dict
def get_function_list(self):
return list(self.skill_map.keys())
def get_list_of_function_callables(self):
return [skill[1] for skill in self.skill_map.values()]
def get_function_description_by_name(self, skill_name):
return str(self.skill_map[skill_name][0]["function"])

Overall, this approach is fairly straightforward to implement but comes with just a few challenges.

Challenges with Pure Code Agents

The primary difficulty lies in structuring the router system prompt. Often, the router in the instance above insisted on generating SQL itself as a substitute of delegating that to the correct skill. If you happen to’ve ever tried to get an LLM not to do something, you already know how frustrating that have could be; finding a working prompt took many rounds of debugging. Accounting for the various output formats from each step was also tricky. Since I opted not to make use of structured outputs, I needed to be ready for multiple different formats from each of the LLM calls in my router and skills.

Advantages of a Pure Code Agent

A code-based approach provides baseline and start line, offering an incredible technique to learn the way agents work without counting on canned agent tutorials from prevailing frameworks. Although convincing the LLM to behave could be difficult, the code structure itself is straightforward enough to make use of and might make sense for certain use cases (more within the evaluation section below).

LangGraph is one among the longest-standing agent frameworks, first releasing in January 2024. The framework is built to handle the acyclic nature of existing pipelines and chains by adopting a Pregel graph structure as a substitute. LangGraph makes it easier to define loops in your agent by adding the concepts of nodes, edges, and conditional edges to traverse a graph. LangGraph is built on top of LangChain, and uses the objects and kinds from that framework.

LangGraph Architecture

The LangGraph agent looks much like the code-based agent on paper, however the code behind it’s drastically different. LangGraph still uses a “router” technically, in that it calls OpenAI with functions and uses the response to proceed to a brand new step. Nevertheless the way in which this system moves between skills is controlled completely otherwise.

tools = [generate_and_run_sql_query, data_analyzer]
model = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools(tools)def create_agent_graph():
workflow = StateGraph(MessagesState)
tool_node = ToolNode(tools)
workflow.add_node("agent", call_model)
workflow.add_node("tools", tool_node)
workflow.add_edge(START, "agent")
workflow.add_conditional_edges(
"agent",
should_continue,
)
workflow.add_edge("tools", "agent")
checkpointer = MemorySaver()
app = workflow.compile(checkpointer=checkpointer)
return app

The graph defined here has a node for the initial OpenAI call, called “agent” above, and one for the tool handling step, called “tools.” LangGraph has a built-in object called ToolNode that takes an inventory of callable tools and triggers them based on a ChatMessage response, before returning to the “agent” node again.

def should_continue(state: MessagesState):
messages = state["messages"]
last_message = messages[-1]
if last_message.tool_calls:
return "tools"
return ENDdef call_model(state: MessagesState):
messages = state["messages"]
response = model.invoke(messages)
return {"messages": [response]}

After each call of the “agent” node (put one other way: the router within the code-based agent), the should_continue edge decides whether to return the response to the user or pass on to the ToolNode to handle tool calls.

Throughout each node, the “state” stores the list of messages and responses from OpenAI, much like the code-based agent’s approach.

Challenges with LangGraph

A lot of the difficulties with LangGraph in the instance stem from the necessity to use Langchain objects for things to flow nicely.

Challenge #1: Function Call Validation

To be able to use the ToolNode object, I needed to refactor most of my existing Skill code. The ToolNode takes an inventory of callable functions, which originally made me think I could use my existing functions, nonetheless things broke down resulting from my function parameters.

The talents were defined as classes with a callable member function, meaning they’d “self” as their first parameter. GPT-4o was smart enough to not include the “self” parameter within the generated function call, nonetheless LangGraph read this as a validation error resulting from a missing parameter.

This took hours to determine, since the error message as a substitute marked the third parameter within the function (“args” on the information evaluation skill) because the missing parameter:

pydantic.v1.error_wrappers.ValidationError: 1 validation error for data_analysis_toolSchema
args field required (type=value_error.missing)

It’s price mentioning that the error message originated from Pydantic, not from LangGraph.

I ultimately bit the bullet and redefined my skills as basic methods with Langchain’s @tool decorator, and was in a position to get things working.

@tool
def generate_and_run_sql_query(query: str):
"""Generates and runs an SQL query based on the prompt.Args:
query (str): A string containing the unique user prompt.
Returns:
str: The results of the SQL query.
"""

Challenge #2: Debugging

As mentioned, debugging in a framework is difficult. This primarily comes right down to confusing error messages and abstracted concepts that make it harder to view variables.

The abstracted concepts primarily show up when attempting to debug the messages being sent across the agent. LangGraph stores these messages in state[“messages”]. Some nodes inside the graph pull from these messages robotically, which may make it obscure the worth of messages once they are accessed by the node.

*A sequential view of the agent’s actions (image by writer)*

LangGraph Advantages

One in every of the foremost advantages of LangGraph is that it’s easy to work with. The graph structure code is clean and accessible. Especially if you may have complex node logic, having a single view of the graph makes it easier to know how the agent is connected together. LangGraph also makes it straightforward to convert an existing application in-built LangChain.

Takeaway

If you happen to use every thing within the framework, LangGraph works cleanly; in case you step outside of it, prepare for some debugging headaches.

Workflows is a more recent entrant into the agent framework space, premiering earlier this summer. Like LangGraph, it goals to make looping agents easier to construct. Workflows also has a selected give attention to running asynchronously.

Some elements of Workflows appear to be in direct response to LangGraph, specifically its use of events as a substitute of edges and conditional edges. Workflows use steps (analogous to nodes in LangGraph) to deal with logic, and emitted and received events to maneuver between steps.

The structure above looks much like the LangGraph structure, save for one addition. I added a setup step to the Workflow to arrange the agent context, more on this below. Despite the same structure, there could be very different code powering it.

Workflows Architecture

The code below defines the Workflow structure. Just like LangGraph, that is where I prepared the state and attached the abilities to the LLM object.

class AgentFlow(Workflow):
def __init__(self, llm, timeout=300):
super().__init__(timeout=timeout)
self.llm = llm
self.memory = ChatMemoryBuffer(token_limit=1000).from_defaults(llm=llm)
self.tools = []
for func in skill_map.get_function_list():
self.tools.append(
FunctionTool(
skill_map.get_function_callable_by_name(func),
metadata=ToolMetadata(
name=func, description=skill_map.get_function_description_by_name(func)
),
)
)@step
async def prepare_agent(self, ev: StartEvent) -> RouterInputEvent:
user_input = ev.input
user_msg = ChatMessage(role="user", content=user_input)
self.memory.put(user_msg)
chat_history = self.memory.get()
return RouterInputEvent(input=chat_history)

This can also be where I define an additional step, “prepare_agent”. This step creates a ChatMessage from the user input and adds it to the workflow memory. Splitting this out as a separate step implies that we do return to it because the agent loops through steps, which avoids repeatedly adding the user message to the memory.

Within the LangGraph case, I achieved the identical thing with a run_agent method that lived outside the graph. This variation is generally stylistic, nonetheless it’s cleaner for my part to deal with this logic with the Workflow and graph as we’ve done here.

With the Workflow arrange, I then defined the routing code:

@step
async def router(self, ev: RouterInputEvent) -> ToolCallEvent | StopEvent:
messages = ev.inputif not any(
isinstance(message, dict) and message.get("role") == "system" for message in messages
):
system_prompt = ChatMessage(role="system", content=SYSTEM_PROMPT)
messages.insert(0, system_prompt)
with using_prompt_template(template=SYSTEM_PROMPT, version="v0.1"):
response = await self.llm.achat_with_tools(
model="gpt-4o",
messages=messages,
tools=self.tools,
)
self.memory.put(response.message)
tool_calls = self.llm.get_tool_calls_from_response(response, error_on_no_tool_call=False)
if tool_calls:
return ToolCallEvent(tool_calls=tool_calls)
else:
return StopEvent(result=response.message.content)

And the tool call handling code:

@step
async def tool_call_handler(self, ev: ToolCallEvent) -> RouterInputEvent:
tool_calls = ev.tool_callsfor tool_call in tool_calls:
function_name = tool_call.tool_name
arguments = tool_call.tool_kwargs
if "input" in arguments:
arguments["prompt"] = arguments.pop("input")
try:
function_callable = skill_map.get_function_callable_by_name(function_name)
except KeyError:
function_result = "Error: Unknown function call"
function_result = function_callable(arguments)
message = ChatMessage(
role="tool",
content=function_result,
additional_kwargs={"tool_call_id": tool_call.tool_id},
)
self.memory.put(message)
return RouterInputEvent(input=self.memory.get())

Each of those look more much like the code-based agent than the LangGraph agent. This is especially because Workflows keeps the conditional routing logic within the steps versus in conditional edges — lines 18–24 were a conditional edge in LangGraph, whereas now they are only a part of the routing step — and the undeniable fact that LangGraph has a ToolNode object that does nearly every thing within the tool_call_handler method robotically.

Moving past the routing step, one thing I used to be very pleased to see is that I could use my SkillMap and existing skills from my code-based agent with Workflows. These required no changes to work with Workflows, which made my life much easier.

Challenges with Workflows

Challenge #1: Sync vs Async

While asynchronous execution is preferable for a live agent, debugging a synchronous agent is way easier. Workflows is designed to work asynchronously, and attempting to force synchronous execution was very difficult.

I initially thought I’d just find a way to remove the “async” method designations and switch from “achat_with_tools” to “chat_with_tools”. Nevertheless, because the underlying methods inside the Workflow class were also marked as asynchronous, it was vital to redefine those in an effort to run synchronously. I ended up sticking to an asynchronous approach, but this didn’t make debugging harder.

Challenge #2: Pydantic Validation Errors

In a repeat of the woes with LangGraph, similar problems emerged around confusing Pydantic validation errors on skills. Fortunately, these were easier to handle this time since Workflows was in a position to handle member functions just tremendous. I ultimately just ended up having to be more prescriptive in creating LlamaIndex FunctionTool objects for my skills:

for func in skill_map.get_function_list(): 
self.tools.append(FunctionTool(
skill_map.get_function_callable_by_name(func), 
metadata=ToolMetadata(name=func, description=skill_map.get_function_description_by_name(func))))

Excerpt from AgentFlow.__init__ that builds FunctionTools

Advantages of Workflows

I had a much easier time constructing the Workflows agent than I did the LangGraph agent, mainly because Workflows still required me to put in writing routing logic and gear handling code myself as a substitute of providing built-in functions. This also meant that my Workflow agent looked extremely much like my code-based agent.

The most important difference got here in using events. I used two custom events to maneuver between steps in my agent:

class ToolCallEvent(Event):
tool_calls: list[ToolSelection]class RouterInputEvent(Event):
input: list[ChatMessage]

The emitter-receiver, event-based architecture took the place of directly calling among the methods in my agent, just like the tool call handler.

If you may have more complex systems with multiple steps which are triggering asynchronously and might emit multiple events, this architecture becomes very helpful to administer that cleanly.

Other advantages of Workflows include the undeniable fact that it is rather lightweight and doesn’t force much structure on you (apart from using certain LlamaIndex objects) and that its event-based architecture provides a helpful alternative to direct function calling — especially for complex, asynchronous applications.

Looking across the three approaches, every one has its advantages.

The no framework approach is the only to implement. Because any abstractions are defined by the developer (i.e. SkillMap object within the above example), keeping various types and objects straight is simple. The readability and accessibility of the code entirely comes right down to the person developer nonetheless, and it’s easy to see how increasingly complex agents could get messy without some enforced structure.

LangGraph provides quite a little bit of structure, which makes the agent very clearly defined. If a broader team is collaborating on an agent, this structure would offer a helpful way of enforcing an architecture. LangGraph also might provide start line with agents for those not as aware of the structure. There may be a tradeoff, nonetheless — since LangGraph does quite a bit for you, it may result in headaches in case you don’t fully buy into the framework; the code could be very clean, but chances are you’ll pay for it with more debugging.

Workflows falls somewhere in the center. The event-based architecture may be extremely helpful for some projects, and the undeniable fact that less is required when it comes to using of LlamaIndex types provides greater flexibility for those not be fully using the framework across their application.

Ultimately, the core query may come right down to “are you already using LlamaIndex or LangChain to orchestrate your application?” LangGraph and Workflows are each so entwined with their respective underlying frameworks that the extra advantages of every agent-specific framework may not cause you to change on merit alone.

The pure code approach will likely at all times be a horny option. If you may have the rigor to document and implement any abstractions created, then ensuring nothing in an external framework slows you down is simple.

After all, “it depends” is rarely a satisfying answer. These three questions should assist you to resolve which framework to make use of in your next agent project.

Are you already using LlamaIndex or LangChain for significant pieces of your project?

If yes, explore that option first.

Are you aware of common agent structures, or do you wish something telling you the way you need to structure your agent?

If you happen to fall into the latter group, try Workflows. If you happen to really fall into the latter group, try LangGraph.

Has your agent been built before?

One in every of the framework advantages is that there are a lot of tutorials and examples built with each. There are far fewer examples of pure code agents to construct from.

Picking an agent framework is only one selection amongst many that may impact outcomes in production for generative AI systems. As at all times, it pays to have robust guardrails and LLM tracing in place — and to be agile as latest agent frameworks, research, and models upend established techniques.

Selecting Between LLM Agent Frameworks

The tradeoffs between constructing bespoke code-based agents and the main agent frameworks.

Background on the Agent Used for Testing

Pure Code Architecture

Challenges with Pure Code Agents

Advantages of a Pure Code Agent

LangGraph Architecture

Challenges with LangGraph

LangGraph Advantages

Takeaway

Workflows Architecture

Challenges with Workflows

Advantages of Workflows

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

How social media encourages the worst of AI boosterism

Hugging Face + PyCharm

The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel

Share your open ML datasets on Hugging Face Hub!

The Machine Learning “Advent Calendar” Day 21: Gradient Boosted Decision Tree Regressor in Excel

Selecting Between LLM Agent Frameworks

The tradeoffs between constructing bespoke code-based agents and the main agent frameworks.

Background on the Agent Used for Testing

Pure Code Architecture

Challenges with Pure Code Agents

Advantages of a Pure Code Agent

LangGraph Architecture

Challenges with LangGraph

LangGraph Advantages

Takeaway

Workflows Architecture

Challenges with Workflows

Advantages of Workflows

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.