has received serious attention with the rise of LLMs able to handling complex tasks. Initially, most discussions on this talk revolved around : Tuning a single prompt for optimized performance on a single task. Nonetheless, as LLMs grow more capable, prompt engineering has changed into : Optimizing all data you feed into your LLM, for max performance on complex tasks.
In this text, I’ll dive deeper into agentic context engineering, which is about optimizing the context specifically for agents. This differs from traditional context engineering in that agents typically perform sequences of tasks for an extended time period. Since agentic context engineering is a big topic, I’ll dive deeper into the topics listed below in this text and write a follow-up article covering more topics.
- Specific context engineering suggestions
- Shortening/summarizing the context
- Tool usage
Why care about agentic context engineering
Before diving deeper into the specifics of context engineering, I’ll cover why agentic context engineering is significant. I’ll cover this in two parts:
- Why we use agents
- Why agents need context engineering
Why we use agents
To start with, we use agents because they’re more able to performing tasks that static LLM calls. Agents can receive a question from a user, for instance:
This is able to not be feasible inside a single LLM call, since you should understand the bug higher (possibly ask the one who reported the bug), you should understand where within the code the bug occurs, and possibly fetch among the error messages. That is where agents are available in.
An agent can take a look at the bug, call a tool asking the user a follow-up query, for instance: Where in the applying does this bug occur? The agent can then find that location within the codebase, run the code itself to read error logs, and implement the fix. This all requires a series of LLM calls and tools calls before solving the problem.
Why agents need context engineering
So we now know why we’d like agents, but why do agents need context engineering? The primary reason is that LLMs all the time perform higher when their context incorporates more relevant information and fewer noise (irrelevant information). Moreover, agents’ context quickly adds up once they perform a series of tool calls, for instance, fetching the error logs when a bug happens. This creates context bloat, which is when the context of an LLM incorporates a whole lot of irrelevant information. We’d like to remove this noisy information from the LLMs context, and in addition ensure all relevant information is present within the LLMs context.
Specific context engineering suggestions
Agentic context engineering builds on top of traditional context engineering. I thus include just a few details to enhance your context.
- Few-shot learning
- Structured prompts
- Step-by-step reasoning
These are three commonly used techniques inside context engineering that usually improve LLM performance.
Few-shot learning
Few-shot learning is a commonly used approach where you include examples of an identical task before feeding the agent the duty it’s to perform. This helps the model understand the duty higher, which often increases performance.
Below you’ll be able to see two prompt examples. The primary example shows a zero-shot prompt, where we directly ask the LLM the query. Considering this is an easy task, the LLM will likely get the correct answer; nevertheless, few-shot learning could have a greater effect on harder tasks. Within the second prompt, you see that I provide just a few examples on methods to do the mathematics, where the examples are also wrapped in XML tags. This not only helps the model understand what task it’s performing, but it surely also helps ensure a consistent answer format, because the model will often respond in the identical format as provided within the few-shot examples.
# zero-shot
prompt = "What's 123+150?"
# few-shot
prompt = """
"What's 10+20?" -> "30"
"What's 120+70?" -> "190"
What's 123+150?
"""
Structured prompts
Having structured prompts can also be an incredibly vital a part of context engineering. Within the code examples above, you’ll be able to see me using XML tags with It’s also possible to use Markdown formatting to reinforce the structure of your prompts. I often find that writing a general outline of my prompt first, then feeding it to an LLM for optimization and proper structuring, is an ideal way of designing good prompts.
You need to use designated tools like Anthropic’s prompt optimizer, but you can even simply feed your unstructured prompt into ChatGPT and ask it to enhance your prompt. Moreover, you’ll get even higher prompts in case you describe scenarios where your current prompt is struggling.
For instance, if you might have a math agent that’s doing very well as well as, subtraction, and division, but scuffling with multiplication, it’s best to add that information to your prompt optimizer.
Step-by-step reasoning
Step-by-step reasoning is one other powerful context engineering approach. You prompt the LLM to think step by stepabout methods to solve the issue, before attempting to unravel the issue. For even higher context engineering, you’ll be able to mix all three approaches covered on this section, as seen in the instance below:
# few-shot + structured + step-by-step reasoning
prompt = """
"What's 10+20?" -> "To reply the user request, I actually have so as to add up the 2 numbers. I can do that by first adding the last two digits of every number: 0+0=0. I then add up the last two digits and get 1+2=3. The reply is: 30"
"What's 120+70?" -> "To reply the euser request, I actually have so as to add up the digits going backwards to front. I start with: 0+0=0. Then I do 2+7=9, and at last I do 1+0=1. The reply is: 190"
What's 123+150?
"""
This may help the model understand the examples even higher, which frequently increases model performance even further.
Shortening the context
When your agent has operated for just a few steps, for instance, asking for user input, fetching some information, and so forth, you may experience the LLM context filling up. Before reaching the context limit and losing all tokens over this limit, it’s best to shorten the context.
Summarization is an ideal way of shortening the context; nevertheless, summarization can sometimes cut out vital pieces of your context. The primary half of your context may not contain any useful information, while the second half includes several paragraphs which are required. This is an element of why agentic context engineering is difficult.
To perform context shortening, you’ll typically use one other LLM, which I’ll call the This LLM receives the context and returns a shortened version of it. The best version of the Shortening LLM simply summarizes the context and returns it. Nonetheless, you’ll be able to employ the next techniques to enhance the shortening:
- Determine if some whole parts of the context could be cut out (specific documents, previous tool calls, etc)
- A prompt-tuned Shortening LLM, optimized for analyzing the duty at hand, all relevant information available, and returns only the knowledge that might be relevant to solving the duty
Determine if whole parts could be cut out
The very first thing it’s best to do when attempting to shorten the context is to seek out areas of the context that could be completely cut out.
For instance, if the LLM might’ve previously fetched a document, used to unravel a previous task, where you might have the duty results. This implies the document is just not relevant anymore and ought to be faraway from the context. This may also occur if the LLM has fetched other information, for instance via keyword search, and the LLM has itself summarized the output of the search. On this instance, it’s best to remove the old output from the keyword search.
Simply removing such whole parts of the context can get you far in shortening the context. Nonetheless, you should be mindful that removing context that could be relevant for later tasks could be detrimental to the agent’s performance.
Thus, as Anthropic points out of their article on context engineering, it’s best to first optimize for recall, where you make sure the LLM shortener never removes context that’s relevant in the longer term. Once you achieve almost perfect recall, you’ll be able to start specializing in precision, where you remove increasingly more context that is just not relevant anymore to solving the duty at hand.

Prompt-tuned shortening LLM
I also recommend making a prompt-tuned shortening LLM. To do that, you first have to create a test set of contexts and the specified shortened context, given a task at hand. These examples should preferably be fetched from real user interactions together with your agent.
Continuing, you’ll be able to prompt optimize (and even fine-tune) the shortening LLM for the duty of summarizing the LLM’s context, to maintain vital parts of the context, while removing other parts of the context that usually are not relevant anymore.
Tools
Certainly one of the primary points separating agents from one-off LLM calls is their use of tools. We typically provide agents with a series of tools they will use to extend the agent’s ability to unravel a task. Examples of such tools are:
- Perform a keyword search on a document corpus
- Fetch details about a user given their email
- A calculator so as to add numbers together
These tools simplify the issue the agent has to unravel. The agent can perform a keyword search to fetch additional (often required) information, or it will probably use a calculator so as to add numbers together, which is far more consistent than adding numbers using next-token prediction.
Listed here are some techniques to be mindful to make sure proper tool usage when providing tools within the agent’s context:
- Well-described tools (can a human understand it?)
- Create specific tools
- Avoid bloating
- Only show relevant tools
- Informative error handling
Well-described agentic tools
The primary, and doubtless most vital note, is to have well-described tools in your system. The tools you define must have type annotations for all input parameters and a return type. It must also have function name and an outline within the docstring. Below you’ll be able to see an example of a poor tool definition, vs tool definition:
# poor tool definition
def calculator(a, b):
return a+b
# good tool definition
def add_numbers(a: float, b: float) -> float:
"""A function so as to add two numbers together. Must be used anytime you might have so as to add two numbers together.
Takes in parameters:
a: float
b: float
Returns
float
"""
return a+b
The second function within the code above is far easier for the agent to know. Properly describing tools will make the agent a lot better at understanding when to make use of the tool, and when other approaches is healthier.
The go-to benchmark for a well-described tool is:
Can a human who has never seen the tools before, understand the tools, just from taking a look at the functions and their definitions?
Specific tools
It is best to also try to maintain your tools as specific as possible. Once you define vague tools, it’s difficult for the LLM to know when to make use of the tool and to make sure the LLM uses the tool properly.
For instance, as a substitute of defining a generic tool for the agent to fetch information from a database, it’s best to provide specific tools to extract specific info.
Bad tool:
- Fetch information from database
- Input
- Columns to retrieve
- Database index to seek out info by
Higher tools:
- Fetch info about all users from the database (no input parameters)
- Get a sorted list of documents by date belonging to a given customer ID
- Get an aggregate list of all users and the actions they’ve taken within the last 24 hours
You’ll be able to then define more specific tools if you see the necessity for them. This makes it easier for the agent to fetch relevant information into its context.
Avoid bloating
It is best to also avoid bloating in any respect costs. There are two primary approaches to achieving this with functions:
- Functions should return structured outputs, and optionally, only return a subset of results
- Avoid irrelevant tools
For the primary point, I’ll again use the instance of a keyword search. When performing a keyword search, for instance, against AWS Elastic Search, you’ll receive back a whole lot of information, sometimes not that structured.
# bad function return
def keyword_search(search_term: str) -> str:
# perform keyword search
# results: {"id": ..., "content": ..., "createdAt": ..., ...}, {...}, {...}]
return str(results)
# good function return
def _organize_keyword_output(results: list[dict], max_results: int) -> str:
output_string = ""
num_results = len(results)
for i, res in enumerate(results[:max_results]): # max return max_results
output_string += f"Document number {i}/{num_results}. ID: {res["id"]}, content: {res["content"]}, created at: {res["createdAt"]}"
return output_string
def keyword_search(search_term: str, max_results: int) -> str:
# perform keyword search
# results: {"id": ..., "content": ..., "createdAt": ..., ...}, {...}, {...}]
organized_results = _organize_keyword_output(results, max_results)
return organized_results
Within the bad example, we simply stringify the raw list of dicts returned from the keyword search. The higher approach is to have a separate helper function to structure the outcomes right into a structured string.
It is best to also make sure the model can return only a subset of results, as shown with the parameter. This helps the model so much, especially with functions like keyword search, that may potentially return 100’s of results, immediately filling up the LLM’s context.
My second point was on avoiding irrelevant tools. You’ll probably encounter situations where you might have a whole lot of tools, a lot of which is able to only be relevant for the agent to make use of at specific steps. If you happen to know a tool is just not relevant for an agent at a given time, it’s best to keep the tool out of the context.
Informative error handling
Informative error handling is critical when providing agents with tools. It’s essential help the agent understand what it’s doing improper. Often, the raw error messages provided by Python are bloated and never that easy to know.
Below is example of error handling in tools, where the agent is told what the error was and methods to take care of it. For instance, when encountering rate limit errors, we tell the agent to specifically sleep before trying again. This simplifies the issue so much for the agent, because it doesn’t must reason itself that it has to sleep.
def keyword_search(search_term: str) -> str:
try:
# keyword search
results = ...
return results
except requests.exceptions.RateLimitError as e:
return f"Rate limit error: {e}. It is best to run time.sleep(10) before retrying."
except requests.exceptions.ConnectionError as e:
return f"Connection error occurred: {e}. The network is perhaps down, inform the user of the problem with the inform_user function."
except requests.exceptions.HTTPError as e:
return f"HTTP error occurred: {e}. The function failed with http error. This often happens due to access issues. Make sure you validate before using this function"
except Exception as e:
return f"An unexpected error occurred: {e}"
It is best to have such error handling for all functions, keeping the next points in mind:
- Error messages ought to be informative of what happened
- If you happen to know the fix (or potential fixes) for a selected error, inform the LLM methods to act if the error occurs (for instance: if a rate limit error, tell the model to run time.sleep())
Agentic context engineering going forward
In this text, I’ve covered three primary topics: Specific context engineering suggestions, shortening the agents’ context, and methods to provide your agents with tools. These are all foundational topics you should understand to construct AI agent. There are also further topics that it’s best to learn more about, equivalent to the consideration of pre-computed information or inference-time information retrieval. I’ll cover this topic in a future article. Agentic context engineering will proceed to be a brilliant relevant topic, and understanding methods to handle the context of an agent is, and might be, fundamental to future AI agent developments.
👉 Find me on socials:
🧑💻 Get in contact
✍️ Medium
It’s also possible to read a few of my other articles: