to start out studying LLMs with all this content over the web, and latest things are coming up every day. I’ve read some guides from Google, OpenAI, and Anthropic and noticed how each focuses on different features of Agents and LLMs. So, I made a decision to consolidate these concepts here and add other essential ideas that I feel are essential in case you’re starting to check this field.
This post covers key concepts with code examples to make things concrete. I’ve prepared a Google Colab notebook with all of the examples so you possibly can apply the code while reading the article. To make use of it, you’ll need an API key — check section 5 of my previous article in case you don’t know methods to get one.
While this guide gives you the essentials, I like to recommend reading the total articles from these corporations to deepen your understanding.
I hope this lets you construct a solid foundation as you begin your journey with LLMs!
On this MindMap, you possibly can check a summary of this text’s content.
What’s an agent?
“Agent” could be defined in several ways. Each company whose guide I’ve read defines agents in a different way. Let’s examine these definitions and compare them:
“
“
The three definitions emphasize different features of an agent. Nevertheless, all of them agree that agents:
- Operate autonomously to perform tasks
- Make decisions about what to do next
- Use tools to realize goals
An agent consists of three primary components:
- Model
- Instructions/Orchestration
- Tools

First, I’ll define each component in an easy phrase so you possibly can have an summary. Then, in the next section, we’ll dive into each component.
- Model: a language model that generates the output.
- Instructions/Orchestration: explicit guidelines defining how the agent behaves.
- Tools: allows the agent to interact with external data and services.
Model
Model refers back to the language model (LM). In easy terms, it predicts the following word or sequence of words based on the words it has already seen.
If you must understand how these models work behind the black box, here’s a video from 3Blue1Brown that explains it.
Agents vs models
Agents and models usually are not the identical. The model is a component of an agent, and it’s utilized by it. While models are limited to predicting a response based on their training data, agents extend this functionality by acting independently to realize specific goals.
Here’s a summary of the primary differences between Models and Agents from Google’s paper.

Large Language Models
The opposite L from LLM refers to “Large”, which mainly refers back to the variety of parameters it was trained on. These models can have a whole bunch of billions and even trillions of parameters. They’re trained on huge data and wish heavy computer power to be trained on.
Examples of LLMs are GPT 4o, Gemini Flash 2.0 , Gemini Pro 2.5, Claude 3.7 Sonnet.
Small Language Models
We even have Small Language Models (SLM). They’re used for less complicated tasks where you would like less data and fewer parameters, are lighter to run, and are easier to manage.
SLMs have fewer parameters (typically under 10 billion), dramatically reducing the computational costs and energy usage. They deal with specific tasks and are trained on smaller datasets. This maintains a balance between performance and resource efficiency.
Examples of SLMs are Llama 3.1 8B (Meta), Gemma2 9B (Google), Mistral 7B (Mistral AI).
Open Source vs Closed Source
Those models could be open source or closed. Being open source implies that the code — sometimes model weights and training data, too — is publicly available for anyone to make use of freely, understand how it really works internally, and adjust for specific tasks.
The closed model implies that the code isn’t publicly available. Only the corporate that developed it could possibly control its use, and users can only access it through APIs or paid services. Sometimes, they’ve a free tier, like Gemini has.
Here, you possibly can check some open source models on Hugging Face.

Those with * in size mean this information isn’t publicly available, but there are rumors of a whole bunch of billions and even trillions of parameters.
Instructions/Orchestration
Instructions are explicit guidelines and guardrails defining how the agent behaves. In its most fundamental form, an agent would consist of just “Instructions” for this component, as defined in Open AI’s guide. Nevertheless, the agent could have greater than just “Instructions” to handle more complex scenarios. In Google’s paper, they call this component “Orchestration” as a substitute, and it involves three layers:
- Instructions
- Memory
- Model-based Reasoning/Planning
Orchestration follows a cyclical pattern. The agent gathers information, processes it internally, after which uses those insights to find out its next move.

Instructions
The instructions may very well be the model’s goals, profile, roles, rules, and knowledge you think that is very important to boost its behavior.
Here is an example:
system_prompt = """
You're a friendly and a programming tutor.
All the time explain concepts in an easy and clear way, using examples when possible.
If the user asks something unrelated to programming, politely bring the conversation back to programming topics.
"""
In this instance, we told the role of the LLM, the expected behavior, how we wanted the output — easy and with examples when possible — and set limits on what it’s allowed to speak about.
Model-based Reasoning/Planning
Some reasoning techniques, reminiscent of ReAct and Chain-of-Thought, give the orchestration layer a structured strategy to soak up information, perform internal reasoning, and produce informed decisions.
Chain-of-Thought (CoT) is a prompt engineering technique that allows reasoning capabilities through intermediate steps. It’s a way of questioning a language model to generate a step-by-step explanation or reasoning process before arriving at a final answer. This method helps the model to interrupt down the issue and never skip any intermediate tasks to avoid reasoning failures.
Prompting example:
system_prompt = f"""
You're the assistant for a tiny candle shop.
Step 1:Check whether the user mentions either of our candles:
• Forest Breeze (woodsy scent, 40 h burn, $18)
• Vanilla Glow (warm vanilla, 35 h burn, $16)
Step 2:List any assumptions the user makes
(e.g. "Vanilla Glow lasts 50 h" or "Forest Breeze is unscented").
Step 3:If an assumption is improper, correct it politely.
Then answer the query in a friendly tone.
Mention only the 2 candles above-we don't sell anything.
Use exactly this output format:
Step 1:
Step 2:
Step 3:
Response to user:
"""
Here is an example of the model output for the user query: “Hi! I’d prefer to buy the Vanilla Glow. Is it $10?”. You possibly can see the model following our guidelines from each step to construct the ultimate answer.

ReAct is one other prompt engineering technique that mixes reasoning and acting. It provides a thought process strategy for language models to reason and take motion on a user query. The agent continues in a loop until it accomplishes the duty. This method overcomes weaknesses of reasoning-only methods like CoT, reminiscent of hallucination, since it reasons in external information obtained through actions.
Prompting example:
system_prompt= """You're an agent that may call two tools:
1. CurrencyAPI:
• input: {base_currency (3-letter code), quote_currency (3-letter code)}
• returns: exchange rate (float)
2. Calculator:
• input: {arithmetic_expression}
• returns: result (float)
Follow **strictly** this response format:
Thought:
Motion: []
Statement:
… (repeat Thought/Motion/Statement as needed)
Answer:
Never output anything. If no tool is required, skip on to Answer.
"""
Here, I haven’t implemented the functions (the model is hallucinating to get the currency), so it’s just an example of the reasoning trace:

These techniques are good to make use of once you need transparency and control over what and why the agent is giving that answer or taking an motion. It helps debug your system, and in case you analyze it, it could provide signals for improving prompts.
If you must read more, these techniques were proposed by Google’s researchers within the paper Chain of Thought Prompting Elicits Reasoning in Large Language Models and REACT: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS.
Memory
LLMs don’t have memory inbuilt. This “Memory” is a few content you pass inside your prompt to present the model context. We are able to seek advice from two kinds of memory: short-term and long-term.
- Short-term memory refers back to the immediate context the model has access to during an interaction. This may very well be the most recent message, the last messages, or a summary of previous messages. The quantity could vary based on the model’s context limitations — when you hit that limit, you could possibly drop older messages to present space to latest ones.
- Long-term memory involves storing essential information beyond the model’s context window for future use. To work around this, you could possibly summarize past conversations or get key information and save them externally, typically in a vector database. When needed, the relevant information is retrieved using Retrieval-Augmented Generation (RAG) techniques to refresh the model’s understanding. We’ll speak about RAG in the next section.
Here is just an easy example of managing short-term memory manually. You possibly can check the Google Colab notebook for this code execution and a more detailed explanation.
# System prompt
system_prompt = """
You're the assistant for a tiny candle shop.
Step 1:Check whether the user mentions either of our candles:
• Forest Breeze (woodsy scent, 40 h burn, $18)
• Vanilla Glow (warm vanilla, 35 h burn, $16)
Step 2:List any assumptions the user makes
(e.g. "Vanilla Glow lasts 50 h" or "Forest Breeze is unscented").
Step 3:If an assumption is improper, correct it politely.
Then answer the query in a friendly tone.
Mention only the 2 candles above-we don't sell anything.
Use exactly this output format:
Step 1:
Step 2:
Step 3:
Response to user:
"""
# Start a chat_history
chat_history = []
# First message
user_input = "I would love to purchase 1 Forest Breeze. Can I pay $10?"
full_content = f"System instructions: {system_prompt}nn Chat History: {chat_history} nn User message: {user_input}"
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=full_content
)
# Append to talk history
chat_history.append({"role": "user", "content": user_input})
chat_history.append({"role": "assistant", "content": response.text})
# Second Message
user_input = "What did I say I desired to buy?"
full_content = f"System instructions: {system_prompt}nn Chat History: {chat_history} nn User message: {user_input}"
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=full_content
)
# Append to talk history
chat_history.append({"role": "user", "content": user_input})
chat_history.append({"role": "assistant", "content": response.text})
print(response.text)
We actually pass to the model the variable full_content
, composed of system_prompt
(containing instructions and reasoning guidelines), the memory (chat_history
), and the brand new user_input
.

In summary, you possibly can mix instructions, reasoning guidelines, and memory in your prompt to improve results. All of this combined forms considered one of an agent’s components: Orchestration.
Tools
Models are really good at processing information, nevertheless, they’re limited by what they’ve learned from their training data. With access to tools, the models can interact with external systems and access knowledge beyond their training data.

Functions and Function Calling
Functions are self-contained modules of code that accomplish a particular task. They’re reusable code that you would be able to use over and once again.
When implementing function calling, you connect a model with functions. You provide a set of predefined functions, and the model determines when to make use of each function and which arguments are required based on the function’s specifications.
The Model doesn’t execute the function itself. It’ll inform which functions must be called and pass the parameters (inputs) to make use of that function based on the user query, and you’ll have to create the code to execute this function later. Nevertheless, if we construct an agent, then we are able to program its workflow to execute the function and answer based on that, or we are able to use Langchain, which has an abstraction of the code, and you only pass the functions to the pre-built agent. Keep in mind that an agent is a composition of (model + instructions + tools).
In this manner, you extend your agent’s capabilities to make use of external tools, reminiscent of calculators, and take actions, reminiscent of interacting with external systems using APIs.
Here, I’ll first show you an LLM and a basic function call so you possibly can understand what is occurring. It’s great to make use of LangChain since it simplifies your code, but you must understand what is occurring underneath the abstraction. At the top of the post, we’ll construct an agent using LangChain.
The means of making a function call:
- Define the function and a function declaration, which describes the function’s name, parameters, and purpose to the model.
- Call LLM with function declarations. As well as, you possibly can pass multiple functions and define if the model can select any function you specified, whether it is forced to call exactly one specific function, or if it could possibly’t use them in any respect.
- Execute Function Code.
- Answer the user.
# Shopping list
shopping_list: List[str] = []
# Functions
def add_shopping_items(items: List[str]):
"""Add multiple items to the shopping list."""
for item in items:
shopping_list.append(item)
return {"status": "okay", "added": items}
def list_shopping_items():
"""Return all items currently within the shopping list."""
return {"shopping_list": shopping_list}
# Function declarations
add_shopping_items_declaration = {
"name": "add_shopping_items",
"description": "Add a number of items to the shopping list",
"parameters": {
"type": "object",
"properties": {
"items": {
"type": "array",
"items": {"type": "string"},
"description": "An inventory of shopping items so as to add"
}
},
"required": ["items"]
}
}
list_shopping_items_declaration = {
"name": "list_shopping_items",
"description": "List all current items within the shopping list",
"parameters": {
"type": "object",
"properties": {},
"required": []
}
}
# Configuration Gemini
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
tools = types.Tool(function_declarations=[
add_shopping_items_declaration,
list_shopping_items_declaration
])
config = types.GenerateContentConfig(tools=[tools])
# User input
user_input = (
"Hey there! I'm planning to bake a chocolate cake later today, "
"but I noticed I'm out of flour and chocolate chips. "
"Could you please add those items to my shopping list?"
)
# Send the user input to Gemini
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=user_input,
config=config,
)
print("Model Output Function Call")
print(response.candidates[0].content.parts[0].function_call)
print("n")
#Execute Function
tool_call = response.candidates[0].content.parts[0].function_call
if tool_call.name == "add_shopping_items":
result = add_shopping_items(**tool_call.args)
print(f"Function execution result: {result}")
elif tool_call.name == "list_shopping_items":
result = list_shopping_items()
print(f"Function execution result: {result}")
else:
print(response.candidates[0].content.parts[0].text)
On this code, we’re creating two functions: add_shopping_items
and list_shopping_items
. We defined the function and the function declaration, configured Gemini, and created a user input. The model had two functions available, but as you possibly can see, it selected add_shopping_items
and got the args={‘items’: [‘flour’, ‘chocolate chips’]}
, which was exactly what we were expecting. Finally, we executed the function based on the model output, and people items were added to the shopping_list
.

External data
Sometimes, your model doesn’t have the suitable information to reply properly or do a task. Access to external data allows us to offer additional data to the model, beyond the foundational training data, eliminating the necessity to train the model or fine-tune it on this extra data.
Example of the information:
- Website content
- Structured Data in formats like PDF, Word Docs, CSV, Spreadsheets, etc.
- Unstructured Data in formats like HTML, PDF, TXT, etc.
One of the vital common uses of a knowledge store is the implementation of RAGs.
Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation (RAG) means:
- Retrieval -> When the user asks the LLM a matter, the RAG system will seek for an external source to retrieve relevant information for the query.
- Augmented -> The relevant information will probably be incorporated into the prompt.
- Generation -> The LLM then generates a response based on each the unique prompt and the extra context retrieved.
Here, I’ll show you the steps of a typical RAG. We now have two pipelines, one for storing and the opposite for retrieving.

First, now we have to load the documents, split them into smaller chunks of text, embed each chunk, and store them in a vector database.
Necessary:
- Breaking down large documents into smaller chunks is very important since it makes a more focused retrieval, and LLMs even have context window limits.
- Embeddings create numerical representations for pieces of text. The embedding vector tries to capture the meaning, so text with similar content can have similar vectors.
The second pipeline retrieves the relevant information based on a user query. First, embed the user query and retrieve relevant chunks within the vector store using some calculation, reminiscent of basic semantic similarity or maximum marginal relevance (MMR), between the embedded chunks and the embedded user query. Afterward, you possibly can mix essentially the most relevant chunks before passing them into the ultimate LLM prompt. Finally, add this mixture of chunks to the LLM instructions, and it could possibly generate a solution based on this latest context and the unique prompt.
In summary, you possibly can give your agent more knowledge and the flexibility to take motion with tools.
Enhancing model performance
Now that now we have seen each component of an agent, let’s speak about how we could enhance the model’s performance.
There are some strategies for enhancing model performance:
- In-context learning
- Retrieval-based in-context learning
- Superb-tuning based learning

In-context learning
In-context learning means you “teach” the model methods to perform a task by giving examples directly within the prompt, without changing the model’s underlying weights.
This method provides a generalized approach with a prompt, tools, and few-shot examples at inference time, allowing it to learn “on the fly” how and when to make use of those tools for a particular task.
There are some kinds of in-context learning:

We already saw examples of Zero-shot, CoT, and ReAct within the previous sections, so now here is an example of one-shot learning:
user_query= "Carlos to establish the server by Tuesday, Maria will finalize the design specs by Thursday, and let's schedule the demo for the next Monday."
system_prompt= f""" You're a helpful assistant that reads a block of meeting transcript and extracts clear motion items.
For every item, list the person responsible, the duty, and its due date or timeframe in bullet-point form.
Example 1
Transcript:
'John will draft the budget by Friday. Sarah volunteers to review the marketing deck next week. We want to send invites for the kickoff.'
Actions:
- John: Draft budget (due Friday)
- Sarah: Review marketing deck (next week)
- Team: Send kickoff invites
Now you
Transcript: {user_query}
Actions:
"""
# Send the user input to Gemini
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=system_prompt,
)
print(response.text)
Here is the output based in your query and the instance:

Retrieval-based in-context learning
Retrieval-based in-context learning means the model retrieves external context (like documents) and adds this relevant content retrieved into the model’s prompt at inference time to boost its response.
RAGs are essential because they reduce hallucinations and enable LLMs to reply questions on specific domains or private data (like an organization’s internal documents) while not having to be retrained.
Should you missed it, return to the last section, where I explained RAG intimately.
Superb-tuning-based learning
Superb-tuning-based learning means you train the model further on a particular dataset to “internalize” latest behaviors or knowledge. The model’s weights are updated to reflect this training. This method helps the model understand when and methods to apply certain tools before receiving user queries.
There are some common techniques for fine-tuning. Listed below are just a few examples so you possibly can search to check further.

Analogy to match the three strategies
Imagine you’re training a tour guide to receive a bunch of individuals in Iceland.
- In-Context Learning: you give the tour guide just a few handwritten notes with some examples like “If someone asks about Blue Lagoon, say this. In the event that they ask about local food, say that”. The guide doesn’t know the town deeply, but he can follow your examples as long the tourists stay inside those topics.
- Retrieval-Based Learning: you equip the guide with a phone + map + access to Google search. The guide doesn’t must memorize every thing but knows methods to look up information immediately when asked.
- Superb-Tuning: you give the guide months of immersive training in the town. The knowledge is already of their head after they start giving tours.

Where does LangChain come in?
LangChain is a framework designed to simplify the event of applications powered by large language models (LLMs).
Inside the LangChain ecosystem, now we have:
- LangChain: The fundamental framework for working with LLMs. It permits you to change between providers or mix components when constructing applications without altering the underlying code. For instance, you could possibly switch between Gemini or GPT models easily. Also, it makes the code simpler. In the following section, I’ll compare the code we inbuilt the section on function calling and the way we could try this with LangChain.
- LangGraph: For constructing, deploying, and managing agent workflows.
- LangSmith: For debugging, testing, and monitoring your LLM applications
While these abstractions simplify development, understanding their underlying mechanics through checking the documentation is crucial — the convenience these frameworks provide comes with hidden implementation details that may impact performance, debugging, and customization options if not properly understood.
Beyond LangChain, you may additionally consider OpenAI’s Agents SDK or Google’s Agent Development Kit (ADK), which supply different approaches to constructing agent systems.
Let’s construct one agent using LangChain
Here, in a different way from the code within the “Function Calling” section, we don’t must create function declarations like we did before manually. Using the @tool
decorator above our functions, LangChain robotically converts them into structured descriptions which might be passed to the model behind the scenes.
ChatPromptTemplate
organizes information in your prompt, creating consistency in how information is presented to the model. It combines system instructions + the user’s query + agent’s working memory. This manner, the LLM at all times gets information in a format it could possibly easily work with.
The MessagesPlaceholder
component reserves a spot within the prompt template and the agent_scratchpad
is the agent’s working memory. It accommodates the history of the agent’s thoughts, tool calls, and the outcomes of those calls. This permits the model to see its previous reasoning steps and power outputs, enabling it to construct on past actions and make informed decisions.
One other key difference is that we don’t must implement the logic with conditional statements to execute the functions. The create_openai_tools_agent
function creates an agent that may reason about which tools to make use of and when. As well as, the AgentExecutor
orchestrates the method, managing the conversation between the user, agent, and tools. The agent determines which tool to make use of through its reasoning process, and the executor takes care of the function execution and handling the result.
# Shopping list
shopping_list = []
# Functions
@tool
def add_shopping_items(items: List[str]):
"""Add multiple items to the shopping list."""
for item in items:
shopping_list.append(item)
return {"status": "okay", "added": items}
@tool
def list_shopping_items():
"""Return all items currently within the shopping list."""
return {"shopping_list": shopping_list}
# Configuration
llm = ChatGoogleGenerativeAI(
model="gemini-2.0-flash",
temperature=0
)
tools = [add_shopping_items, list_shopping_items]
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant that helps manage shopping lists. "
"Use the available tools to add items to the shopping list "
"or list the current items when requested by the user."),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad")
])
# Create the Agent
agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# User input
user_input = (
"Hey there! I'm planning to bake a chocolate cake later today, "
"but I noticed I'm out of flour and chocolate chips. "
"Could you please add those items to my shopping list?"
)
# Send the user input to Gemini
response = agent_executor.invoke({"input": user_input})
After we use verbose=True
, we are able to see the reasoning and actions while the code is being executed.

And the :

When do you have to construct an agent?
Keep in mind that we discussed agents’s definitions in the primary section and saw that they operate autonomously to perform tasks. It’s cool to create agents, much more due to the hype. Nevertheless, constructing an agent isn’t at all times essentially the most efficient solution, and a deterministic solution may suffice.
A deterministic solution implies that the system follows clear and predefined rules without an interpretation. This manner is healthier when the duty is well-defined, stable, and advantages from clarity. As well as, in this manner, it is less complicated to check and debug, and it is nice when it’s good to know exactly what is occurring given an input, no “black box”. Anthropic’s guide shows many alternative LLM Workflows where LLMs and tools are orchestrated through predefined code paths.
One of the best practices guide for constructing agents from Open AI and Anthropic recommend first finding the best solution possible and only increasing the complexity if needed.
When you’re evaluating in case you should construct an agent, consider the next:
- Complex decisions: when coping with processes that require nuanced judgment, handling exceptions, or making decisions that depend heavily on context — reminiscent of determining whether a customer is eligible for a refund.
- Diffult-to-maintain rules: If you could have workflows built on complicated sets of rules which might be difficult to update or maintain without risk of constructing mistakes, and so they are continuously changing.
- Dependence on unstructured data: If you could have tasks that require understanding written or spoken language, getting insights from documents — pdfs, emails, images, audio, html pages… — or chatting with users naturally.
Conclusion
We saw that agents are systems designed to perform tasks on human behalf independently. These agents are composed of instructions, the model, and tools to access external data and take actions. There are some ways we could enhance our model by improving the prompt with examples, using RAG to present more context, or fine-tuning it. When constructing an agent or LLM workflow, LangChain can assist simplify the code, but you must understand what the abstractions are doing. All the time consider that simplicity is the perfect strategy to construct agentic systems, and only follow a more complex approach if needed.
Next Steps
Should you are latest to this content, I like to recommend that you simply digest all of this primary, read it just a few times, and likewise read the total articles I beneficial so you could have a solid foundation. Then, try to start out constructing something, like an easy application, to start out practicing and creating the bridge between this theoretical content and the practice. Starting to construct is the perfect strategy to learn these concepts.
As I told you before, I even have an easy step-by-step guide for making a chat in Streamlit and deploying it. There’s also a video on YouTube explaining this guide in Portuguese. It’s a great place to begin in case you haven’t done anything before.
I hope you enjoyed this tutorial.
Yow will discover all of the code for this project on my GitHub or Google Colab.
Follow me on:
Resources
Constructing effective agents – Anthropic
Agents – Google
A practical guide to constructing agents – OpenAI
Chain of Thought Prompting Elicits Reasoning in Large Language Models – Google Research
REACT: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS – Google Research
Small Language Models: A Guide With Examples – DataCamp