Hands-On with Agents SDK: Safeguarding Input and Output with Guardrails

-

exploring features within the OpenAI Agents SDK framework, there’s one capability that deserves a better look: input and output guardrails.

In previous articles, we built our first agent with an API-calling tool after which expanded right into a multi-agent system. In real-world scenarios, though, constructing these systems is complex—and without the proper safeguards, things can quickly go astray. That’s where guardrails are available in: they assist ensure safety, focus, and efficiency.

Here’s why guardrails matter:

  • Prevent misuse
  • Save resources
  • Ensure safety and compliance
  • Maintain focus and quality

Without proper guardrails, unexpected use cases can pop up. For instance, you may have heard of individuals using AI-powered customer support bots (designed for product support) to write down code as an alternative. It sounds funny, but for the corporate, it might probably develop into a costly and irrelevant distraction.

To see why guardrails are necessary, let’s revisit our last project. I ran the agents_as_tools script and asked it to generate code for calling a weather API. Since no guardrails were in place, the app returned the reply without hesitation—proving that, by default, it can attempt to do almost anything asked of it.

We definitely don’t want this happening in a production app. Imagine the prices of unintended usage—not to say the larger risks it might probably bring, resembling information leaks, system prompt exposure, and other serious vulnerabilities.

Hopefully, this makes the case clear for why guardrails are price exploring. Next, let’s dive into the right way to start using the guardrail feature within the OpenAI Agents SDK.

A Quick Intro to Guardrails

Within the OpenAI Agents SDK, there are two kinds of guardrails: input guardrails and output guardrails [1]. Input guardrails run on the user’s initial input, while output guardrails run on the agent’s final response.

A guardrail may be an LLM-powered agent—useful for tasks that require reasoning—or a rule-based/programmatic function, resembling a to detect specific keywords. If the guardrail finds a violation, it triggers a and raises an exception. This mechanism prevents the primary agent from processing unsafe or irrelevant queries, ensuring each safety and efficiency.

Some practical uses for input guardrails include:

  • Identifying when a user asks an off-topic query [2]
  • Detecting unsafe input attempts, including jailbreaks and prompt injections [3]
  • Moderating to flag inappropriate input, resembling harassment, violence, or hate speech [3]
  • Handling specific-case validation. For instance, in our weather app, we could implement that questions only reference cities in Indonesia.

Alternatively, output guardrails may be used to:

  • Prevent unsafe or inappropriate responses
  • Stop the agent from leaking personally identifiable information (PII) [3]
  • Ensure compliance and brand safety, resembling blocking outputs that would harm brand integrity

In this text, we’ll explore several types of guardrails, including each LLM-based and rule-based approaches, and the way they may be applied for various sorts of validation.

Prerequisites

  • Create a requirements.txt file:
openai-agents
streamlit
  • Create a virtual environment named venv. Run the next commands in your terminal:
python −m venv venv 
source venv/bin/activate # On Windows: venvScriptsactivate 
pip install -r requirements.txt
  • Create a .env file to store your OpenAI API key:
OPENAI_API_KEY=your_openai_key_here

For the guardrail implementation, we’ll use the script from the previous article where we built the agents-as-tools multi-agent system. For an in depth walkthrough, please refer back to that article. The complete implementation script may be found here: app06_agents_as_tools.py.

Now let’s create a brand new file named app08_guardrails.py.

Input Guardrail

We’ll start by adding input guardrails to our weather app. On this section, we’ll construct two types:

  • , which uses an LLM to find out if the user input is unrelated to the app’s purpose.
  • , which uses a straightforward rule to catch jailbreak and prompt injection attempts.

Import Libraries

First, let’s import the vital packages from the Agents SDK and other libraries. We’ll also arrange the environment to load the OpenAI API key from the .env file. From the Agents SDK, besides the fundamental functions (Agent, Runner, and function_tool) we’ll also import functions specifically used for implementing input and output guardrails.

from agents import (
    Agent, 
    Runner, 
    function_tool, 
    GuardrailFunctionOutput, 
    input_guardrail, 
    InputGuardrailTripwireTriggered,
    output_guardrail,
    OutputGuardrailTripwireTriggered
)
import asyncio
import requests
import streamlit as st
from pydantic import BaseModel, Field
from dotenv import load_dotenv

load_dotenv()

Define Output Model

For any LLM-based guardrail, we’d like to define an . Typically, we use a Pydantic model class to specify the structure of the info. At the only level, we’d like a boolean field (True/False) to point whether the guardrail should trigger, together with a text field that explains the reasoning.

In our case, we wish the guardrail to find out whether the query continues to be throughout the scope of the app’s purpose (weather and air quality). To do this, we’ll define a model named TopicClassificationOutput as shown below:

# Define output model for the guardrail agent to categorise if input is off-topic
class TopicClassificationOutput(BaseModel):
    is_off_topic: bool = Field(
        description="True if the input is off-topic (not related to weather/air quality and never a greeting), False otherwise"
    )
    reasoning: str = Field(
        description="Transient explanation of why the input was classified as on-topic or off-topic"
    )

The boolean field is_off_topic will likely be set to True if the input is outside the app’s scope. The reasoning field stores a brief explanation of why the model made its classification.

Create Guardrail Agent

We’d like to define an agent with clear and complete instructions to find out whether a user’s query is on-topic or off-topic. This may be adjusted depending in your app’s purpose—the instructions don’t need to be the identical for each use case.

For our Weather and Air Quality assistant, here’s the guardrail agent with instructions for classifying a user’s query.

# Create the guardrail agent to find out if input is off-topic
topic_classification_agent = Agent(
    name="Topic Classification Agent",
    instructions=(
        "You're a subject classifier for a weather and air quality application. "
        "Your task is to find out if a user's query is on-topic. "
        "Allowed topics include: "
        "1. Weather-related: current weather, weather forecast, temperature, precipitation, wind, humidity, etc. "
        "2. Air quality-related: air pollution, AQI, PM2.5, ozone, air conditions, etc. "
        "3. Location-based inquiries about weather or air conditions "
        "4. Polite greetings and conversational starters (e.g., 'hello', 'hi', 'good morning') "
        "5. Questions that mix greetings with weather/air quality topics "
        "Mark as OFF-TOPIC provided that the query is clearly unrelated to weather/air quality AND not a straightforward greeting. "
        "Examples of off-topic: math problems, cooking recipes, sports scores, technical support, jokes (unless weather-related). "
        "Examples of on-topic: 'Hello, what is the weather?', 'Hi there', 'Good morning, how's the air quality?', 'What is the temperature?' "
        "The ultimate output MUST be a JSON object conforming to the TopicClassificationOutput model."
    ),
    output_type=TopicClassificationOutput,
    model="gpt-4o-mini" # Use a quick and cost-effective model
)

Within the instructions, besides listing the apparent topics, we also allow some flexibility for easy conversational starters like “hello,” “hi,” or other greetings. To make the classification clearer, we included examples of each on-topic and off-topic queries.

One other good thing about input guardrails is cost optimization. To make the most of this, we should always use a faster and less expensive model than the primary agent. This fashion, the primary (and dearer) agent is just used when absolutely vital.

In this instance, the guardrail agent uses gpt-4o-mini while the primary agent runs on gpt-4o.

Create an Input Guardrail Function

Next, let’s wrap the agent in an async function decorated with @input_guardrail. The output of this function will include two fields defined earlier: is_off_topic and reasoning.

The function returns a structured GuardrailFunctionOutput object containing output_info (set from the reasoning field) and tripwire_triggered.

The tripwire_triggered value determines whether the input must be blocked. If is_off_topic is True, the tripwire triggers, blocking the input. Otherwise, the worth is False and the primary agent continues processing.

# Create the input guardrail function
@input_guardrail
async def off_topic_guardrail(ctx, agent, input) -> GuardrailFunctionOutput:
    """
    Classifies user input to make sure it's on-topic for a weather and air quality app.
    """

    result = await Runner.run(topic_classification_agent, input, context=ctx.context)
    return GuardrailFunctionOutput(
        output_info=result.final_output.reasoning,
        tripwire_triggered=result.final_output.is_off_topic
    )

Create a Rule-based Input Guardrail Function

Alongside the LLM-based off-topic guardrail, we’ll create a straightforward rule-based guardrail. This one doesn’t require an LLM and as an alternative relies on programmatic pattern matching.

Depending in your app’s purpose, rule-based guardrails may be very effective at blocking harmful inputs—especially when dangerous patterns are predictable.

In this instance, we define an inventory of keywords often utilized in jailbreak or prompt injection attempts. The list includes: "ignore previous instructions", "you are actually a", "forget every part above", "developer mode", "override safety", "disregard guidelines".

If the user input accommodates any of those keywords, the guardrail will trigger mechanically. Since no LLM is involved, we are able to handle the validation directly contained in the input guardrail function injection_detection_guardrail:

# Rule-based input guardrail to detect jailbreaking and prompt injection query
@input_guardrail
async def injection_detection_guardrail(ctx, agent, input) -> GuardrailFunctionOutput:
    """
    Detects potential jailbreaking or prompt injection attempts in user input.
    """

    # Easy keyword-based detection
    injection_patterns = [
        "ignore previous instructions",
        "you are now a",
        "forget everything above",
        "developer mode",
        "override safety",
        "disregard guidelines"
    ]

    if any(keyword in input.lower() for keyword in injection_patterns):
        return GuardrailFunctionOutput(
            output_info="Potential jailbreaking or prompt injection detected.",
            tripwire_triggered=True
        )

    return GuardrailFunctionOutput(
        output_info="No jailbreaking or prompt injection detected.",
        tripwire_triggered=False
    )

This guardrail simply checks the input against the keyword list. If a match is found, tripwire_triggered is about to True. Otherwise, it stays False.

Define Specialized Agent for Weather and Air Quality

Now let’s proceed by defining the weather and air quality specialist agents with their function tool. The reason of this part may be found on my previous article so for this text I’ll skip the reason.

# Define function tools and specialized agents for weather and air qualities
@function_tool
def get_current_weather(latitude: float, longitude: float) -> dict:
    """Fetch current weather data for the given latitude and longitude."""
    
    url = "https://api.open-meteo.com/v1/forecast"
    params = {
        "latitude": latitude,
        "longitude": longitude,
        "current": "temperature_2m,relative_humidity_2m,dew_point_2m,apparent_temperature,precipitation,weathercode,windspeed_10m,winddirection_10m",
        "timezone": "auto"
    }
    response = requests.get(url, params=params)
    return response.json()

weather_specialist_agent = Agent(
    name="Weather Specialist Agent",
    instructions="""
    You're a weather specialist agent.
    Your task is to research current weather data, including temperature, humidity, wind speed and direction, precipitation, and weather codes.

    For every query, provide:
    1. A transparent, concise summary of the present weather conditions in plain language.
    2. Practical, actionable suggestions or precautions for outdoor activities, travel, health, or clothing, tailored to the weather data.
    3. If severe weather is detected (e.g., heavy rain, thunderstorms, extreme heat), clearly highlight beneficial safety measures.

    Structure your response in two sections:
    Weather Summary:
    - Summarize the weather conditions in easy terms.

    Suggestions:
    - List relevant advice or precautions based on the weather.
    """,
    tools=[get_current_weather],
    tool_use_behavior="run_llm_again"
)

@function_tool
def get_current_air_quality(latitude: float, longitude: float) -> dict:
    """Fetch current air quality data for the given latitude and longitude."""

    url = "https://air-quality-api.open-meteo.com/v1/air-quality"
    params = {
        "latitude": latitude,
        "longitude": longitude,
        "current": "european_aqi,us_aqi,pm10,pm2_5,carbon_monoxide,nitrogen_dioxide,sulphur_dioxide,ozone",
        "timezone": "auto"
    }
    response = requests.get(url, params=params)
    return response.json()

air_quality_specialist_agent = Agent(
    name="Air Quality Specialist Agent",
    instructions="""
    You're an air quality specialist agent.
    Your role is to interpret current air quality data and communicate it clearly to users.

    For every query, provide:
    1. A concise summary of the air quality conditions in plain language, including key pollutants and their levels.
    2. Practical, actionable advice or precautions for outdoor activities, travel, and health, tailored to the air quality data.
    3. If poor or hazardous air quality is detected (e.g., high pollution, allergens), clearly highlight beneficial safety measures.

    Structure your response in two sections:
    Air Quality Summary:
    - Summarize the air quality conditions in easy terms.

    Suggestions:
    - List relevant advice or precautions based on the air quality.
    """,
    tools=[get_current_air_quality],
    tool_use_behavior="run_llm_again"
)

Define the Orchestrator Agent with Input Guardrails

Almost the identical with previous part, the orchestrator agent here have the identical properties with the one which we already discussed on my previous article where within the pattern, the orchestrator agent will manage the duty of every specialized agents as an alternative of handing-offer the duty to at least one agent like in pattern.

The one different here is we adding recent property to the agent; input_guardrails. On this property, we pass the list of the input guardrail functions that we’ve got defined before; off_topic_guardrail and injection_detection_guardrail.

# Define the primary orchestrator agent with guardrails
orchestrator_agent = Agent(
    name="Orchestrator Agent",
    instructions="""
    You're an orchestrator agent.
    Your task is to administer the interaction between the Weather Specialist Agent and the Air Quality Specialist Agent.
    You'll receive a question from the user and can resolve which agent to invoke based on the content of the query.
    If each weather and air quality information is requested, you'll invoke each agents and mix their responses into one clear answer.
    """,
    tools=[
        weather_specialist_agent.as_tool(
            tool_name="get_weather_update",
            tool_description="Get current weather information and suggestion including temperature, humidity, wind speed and direction, precipitation, and weather codes."
        ),
        air_quality_specialist_agent.as_tool(
            tool_name="get_air_quality_update",
            tool_description="Get current air quality information and suggestion including pollutants and their levels."
        )
    ],
    tool_use_behavior="run_llm_again",
    input_guardrails=[injection_detection_guardrail, off_topic_guardrail],
)


# Define the run_agent function
async def run_agent(user_input: str):
    result = await Runner.run(orchestrator_agent, user_input)
    return result.final_output

One thing that I observed while experimenting with guardrails is once we listed the guardrail function within the agent property, the list will likely be used because the sequence of the execution. Meaning that we are able to configure the evaluation order within the standpoint of cost and impact.

In our case here, I feel I should immediately cut the method if the query violate the prompt injection guardrail as a consequence of its impact and likewise since this validation requires no LLM. So, if the query already identified can’t be proceed, we don’t need to judge it using LLM (which has cost) within the off topic guardrail.

Create Major Function with Exception Handler

Here is the part where the input guardrail take a motion. On this part where we define the primary function of Streamlit user interface, we’ll add an exception handling particularly when the input guardrail tripwire has been triggered.

# Define the primary function of the Streamlit app
def primary():
    st.title("Weather and Air Quality Assistant")
    user_input = st.text_input("Enter your query about weather or air quality:")

    if st.button("Get Update"):
        with st.spinner("Considering..."):
            if user_input:
                try:
                    agent_response = asyncio.run(run_agent(user_input))
                    st.write(agent_response)
                except InputGuardrailTripwireTriggered as e:
                    st.write("I can only help with weather and air quality related questions. Please try asking something else! ")
                    st.error("Info: {}".format(e.guardrail_result.output.output_info))
                except Exception as e:
                    st.error(e)
            else:
                st.write("Please enter an issue in regards to the weather or air quality.")

if __name__ == "__main__":
    primary()

As we are able to see within the code above, when the InputGuardrailTripwireTriggered is raise, it can show a user-friendly message that tell the user the app only might help for weather and air quality related query.

To make the message will likely be more helpful, we also add additional information specifically for what input guardrail that blocked the user’s query. If the exception raised by off_topic_guardrail, it can show the reasoning from the agent that handle this. Meanwhile if it coming from injection_detection_guardrail, the app will show a hard-coded message “Potential jailbreaking or prompt injection detected.”.

Run and Check

To check how the input guardrail works, let’s start by running the Streamlit app.

streamlit run app08_guardrails.py

First, let’s try asking an issue that aligns with the app’s intended purpose.

Agent’s response where the query is aligned with weather and air quality.

As expected, the app returns a solution because the query is expounded to weather or air quality.

Using Traces, we are able to see what’s happening under the hood.

Screenshot of Traces dashboard that shows the sequence of input guardrails and primary agent run.

As discussed earlier, the input guardrails run before the primary agent. Since we set the guardrail list so as, the injection_detection_guardrail runs first, followed by the off_topic_guardrail. Once the input passes these two guardrails, the primary agent can execute the method.

Nonetheless, if we alter the query to something completely unrelated to weather or air quality—just like the history of Jakarta—the response looks like this:

If the query will not be aligned, input guardrail will block the input before primary agent takes motion.

Here, the off_topic_guardrail triggers the tripwire, cuts the method midway, and returns a message together with some extra details about why it happened.

Screenshot of Traces dashboard that shows how the input guardrail blocked the method.

From the Traces dashboard for that history query, we are able to see the orchestrator agent throws an error since the guardrail tripwire was triggered.

For the reason that process was cut before the input reached the primary agent, we never even called the primary agent model—saving some bucks on a question the app isn’t imagined to handle anyway.

Output Guardrail

If the input guardrail ensures that the user’s query is protected and relevant, the output guardrail ensures that the agent’s response itself meets our desired standards. That is equally necessary because even with strong input filtering, the agent can still produce outputs which might be unintended, harmful, or just not aligned with our requirements.

For instance, in our app we wish to make sure that the agent . Since LLMs often mirror the tone of the user’s query, they may reply in an informal, sarcastic, or unprofessional tone—which is outside the scope of the input guardrails we already implemented.

To handle this, we add an output guardrail that checks whether a response is skilled. If it’s not, the guardrail will trigger and stop the unprofessional response from reaching the user.

Prepare the Output Guardrail Function

Identical to the off_topic_guardrail, we create a brand new professionalism_guardrail. It uses a Pydantic model for the output, a dedicated agent to categorise professionalism, and an async function decorated with @output_guardrail to implement the check.

# Define output model for Output Guardrail Agent
class ResponseCheckerOutput(BaseModel):
    is_not_professional: bool = Field(
        description="True if the output will not be skilled, False otherwise"
    )
    reasoning: str = Field(
        description="Transient explanation of why the output was classified as skilled or unprofessional"
    )

# Create Output Guardrail Agent
response_checker_agent = Agent(
    name="Response Checker Agent",
    instructions="""
    You're a response checker agent.
    Your task is to judge the professionalism of the output generated by other agents.

    For every response, provide:
    1. A classification of the response as skilled or unprofessional.
    2. A transient explanation of the reasoning behind the classification.

    Structure your response in two sections:
    Professionalism Classification:
    - State whether the response is skilled or unprofessional.

    Reasoning:
    - Provide a transient explanation of the classification.
    """,
    output_type=ResponseCheckerOutput,
    model="gpt-4o-mini"
)

# Define output guardrail function
@output_guardrail
async def professionalism_guardrail(ctx, agent, output) -> GuardrailFunctionOutput:
    result = await Runner.run(response_checker_agent, output, context=ctx.context)
    return GuardrailFunctionOutput(
        output_info=result.final_output.reasoning,
        tripwire_triggered=result.final_output.is_not_professional
    )

Output Guardrail Implementation

Now we add this recent guardrail to the orchestrator agent by listing it under output_guardrails. This ensures every response is checked before being shown to the user.

# Add professionalism guardrail to the orchestrator agent
orchestrator_agent = Agent(
    name="Orchestrator Agent",
    instructions="...same as before...",
    tools=[...],
    input_guardrails=[injection_detection_guardrail, off_topic_guardrail],
    output_guardrails=[professionalism_guardrail],
)

Finally, we extend the primary function to handle OutputGuardrailTripwireTriggered exceptions. If triggered, the app will block the unprofessional response and return a friendly fallback message as an alternative.

# Handle output guardrail within the primary function
except OutputGuardrailTripwireTriggered as e:
    st.write("The response didn't meet our quality standards. Please try again.")
    st.error("Info: {}".format(e.guardrail_result.output.output_info))

Run and Check

Now, let’s take a have a look at how the output guardrail works. Start by running the app as before:

streamlit run app08_guardrails.py

To check this, we are able to attempt to force the agent to reply in an unprofessional way related to weather or air quality. For instance, by asking:

Output guardrail blocked the agent response that’s violate the standard standard.

This question passes the input guardrails since it continues to be on-topic and never an attempt at prompt injection. Consequently, the primary agent processes the input and calls the proper function.

Nonetheless, the ultimate output generated by the primary agent—because it followed the user’s hyperbole request—doesn’t align with the brand’s communication standard. Here’s the result we got from the app:

Conclusion

Throughout this text, we explored how guardrails within the OpenAI Agents SDK help us maintain control over each input and output. The input guardrail we built here protects the app from harmful or unintended user input that would cost us as developers, while the output guardrail ensures responses remain consistent with the brand standard.

By combining these mechanisms, we are able to significantly reduce the risks of unintended usage, information leaks, or outputs that fail to align with the intended communication style. This is particularly crucial when deploying agentic applications into production environments, where safety, reliability, and trust matter most.

Guardrails usually are not a silver bullet, but they’re an important layer of defense. As we proceed constructing more advanced multi-agent systems, adopting guardrails early on will help ensure we create applications that usually are not only powerful but additionally protected, responsible, and cost-conscious.

Previous Articles in This Series

References

[1] OpenAI. (2025). . Retrieved August 30, 2025, from https://openai.github.io/openai-agents-python/guardrails/

[2] OpenAI. (2025). . OpenAI Cookbook. Retrieved August 30, 2025, from https://cookbook.openai.com/examples/how_to_use_guardrails

[3] OpenAI. (2025). . Retrieved August 30, 2025, from https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf


You’ll find the whole source code utilized in this text in the next repository: agentic-ai-weather | GitHub Repository. Be happy to explore, clone, or fork the project to follow along or construct your individual version.

In the event you’d prefer to see the app in motion, I’ve also deployed it here: Weather Assistant Streamlit

Lastly, let’s connect on LinkedIn!

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x