Constructing LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output

, the usual “text in, text out” paradigm will only take you to date.

Real applications that deliver actual value should give you the chance to look at visuals, reason through complex problems, and produce results that systems can actually use.

On this post, we’ll design this stack by bringing together three powerful capabilities: multimodal input, reasoning, and structured output.

As an example this, we’ll walk through a hands-on example: constructing a time-series anomaly detection system for e-commerce order data using OpenAI’s o3 model. Specifically, we’ll show the best way to pair o3’s reasoning capability with image input and emit validated JSON, in order that the downstream system can easily eat it.

By the tip, our app will:

See: analyze charts of e-commerce order volume time series
Think: discover unusual patterns
Integrate: output a structured anomaly report

You’ll leave with functional code you possibly can reuse for various use cases that transcend just anomaly detection.

Let’s dive in.

, .

1. Case Study

On this post, we aim to construct an anomaly detection solution for identifying abnormal patterns in e-commerce order time series data.

For this case study, we generated three sets of each day order data. The datasets represent three different profiles of the each day order over roughly one month of time. To make seasonality obvious, we’ve shaded the weekends. The x-axis shows the day of the week.

Figure 1. Dataset 1, with the shaded regions being the weekends. (Image by creator)

Figure 2. Dataset 2, with the shaded regions being the weekends. (Image by creator)

Figure 3. Dataset 3, with the shaded regions being the weekends. (Image by creator)

Each figure incorporates one specific form of anomaly (). We’ll later use those figures to check our anomaly detection solution and see if it will probably accurately get better those anomalies.

2. Our Solution

2.1 Overview

Unlike the normal machine learning approaches that require tedious feature engineering and model training, our current approach is far simpler. It really works with the next steps:

We prepare the figure for visualizing the e-commerce order time series data.
We prompt the reasoning model o3, ask it to take a more in-depth take a look at the time series image we fed to it, and determine if an unusual pattern exists.
The o3 model will then output its findings in a pre-defined JSON format.

And that’s it. Easy.

After all, to deliver this solution, we’d like to enable o3 model to take image input and emit structured output. We are going to see the best way to try this shortly.

2.2 Organising the reasoning model

As mentioned before, we’ll use o3 model, which is the flagship reasoning model from OpenAI that may tackle complex multi-step problems with state-of-the-art performance. Specifically, we’ll use the Azure OpenAI endpoint to call the model.

Ensure that you will have put the endpoint, API key, and deployment name in an .env file, we will then proceed to establishing the LLM client:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from openai import AzureOpenAI
from dotenv import load_dotenv
import os

load_dotenv()

# Setup LLM client
endpoint = os.getenv("api_base")
api_key = os.getenv("o3_API_KEY")
api_version = "2025-04-01-preview"
model_name = "o3"
deployment = os.getenv("deployment_name")

LLM_client = AzureOpenAI(
    api_key=api_key,  
    api_version=api_version,
    azure_endpoint=endpoint
)

We use the next instruction because the system message for the o3 model (tuned by GPT-5):

instruction = f"""

[Role]
You're a meticulous data analyst.

[Task]
You will probably be given a line chart image related to each day e-commerce orders. 
Your task is to discover outstanding anomalies in the info.

[Rules]
The anomaly kinds might be spike, drop, level_shift, or seasonal_outlier.
A level_shift is a sustained baseline change (≥ 5 consecutive days), not a single point.
A seasonal_outlier happens if a weekend/weekday behaves unlike peers in its category. 
For instance, weekend orders are often lower than the weekdays'.
Read dates/values from axes; in the event you can’t read exactly, snap to the closest tick and note uncertainty in explanation.
The weekends are shaded within the figure.
"""

Within the above instruction, we clearly defined the role of the LLM, the duty that the LLM should complete, and the foundations the LLM should follow.

To limit the complexity of our case study, we intentionally specified only 4 anomaly types that LLM must discover. We also provided clear definitions of those anomaly types to remove ambiguity.

Finally, we injected a little bit of domain knowledge about e-commerce patterns, i.e., lower weekend orders are expected in comparison with weekdays. Incorporating domain know-how is usually considered good practice for guiding the model’s analytical process.

Now that we’ve our model arrange, let’s discuss the best way to prepare the image for o3 model to eat.

2.3 Image preparation

To enable o3’s multimodal capabilities, we’d like to supply figures in a selected format, i.e., either publicly accessible web URLs or as base64-encoded data URLs. Since our figures are generated locally, we’ll use the second approach.

We will use the next function to handle this conversion routinely:

import io
import base64

def fig_to_data_url(fig, fmt="png"):
    """
    Converts a Matplotlib figure to a base64 data URL without saving to disk.

    Args:
    -----
    fig (matplotlib.figure.Figure): The figure to convert.
    fmt (str): The format of the image ("png", "jpeg", etc.)

    Returns:
    --------
    str: The info URL representing the figure.
    """

    buf = io.BytesIO()
    fig.savefig(buf, format=fmt, bbox_inches="tight")
    buf.seek(0)
    
    base64_encoded_data = base64.b64encode(buf.read()).decode("utf-8")
    mime_type = f"image/{fmt.lower()}"
    
    return f"data:{mime_type};base64,{base64_encoded_data}"

Essentially, our function first saves the matplotlib figure to a memory buffer. It then encodes the binary PNG data as base64 text and wraps it in the specified data URL format.

Assuming we’ve access to the synthetic each day order data, we will use the next function to generate the plot and convert it right into a proper data URL format in a single go:

def create_fig(df):
    """
    Create a Matplotlib figure and convert it to a base64 data URL.
    Weekends (Sat–Sun) are shaded.

    Args:
    -----
    df: dataframe incorporates one profile of each day order time series. 
        dataframe has "date" and "orders" columns.

    Returns:
    --------
    image_url: The info URL representing the figure.
    """

    df = df.copy()
    df['date'] = pd.to_datetime(df['date'])

    fig, ax = plt.subplots(figsize=(8, 4.5))
    ax.plot(df["date"], df["orders"], linewidth=2)
    ax.set_xlabel('Date', fontsize=14)
    ax.set_ylabel('Every day Orders', fontsize=14)

    # Weekend shading
    start = df["date"].min().normalize()
    end   = df["date"].max().normalize()
    cur = start
    while cur <= end:
        if cur.weekday() == 5:  # Saturday 00:00
            span_start = cur                                      # Sat 00:00
            span_end   = cur + pd.Timedelta(days=1)               # Mon 00:00
            ax.axvspan(span_start, span_end, alpha=0.12, zorder=0)
            cur += pd.Timedelta(days=2)                           # skip Sunday 
        else:
            cur += pd.Timedelta(days=1)

    # Title
    title = f'Every day Orders: {df["date"].min():%b %d, %Y} - {df["date"].max():%b %d, %Y}'
    ax.set_title(title, fontsize=16)

    # Format x-axis dates
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d')) 
    ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))

    plt.tight_layout()

    # Obtain url
    image_url = fig_to_data_url(fig)

    return image_url

Figures 1-3 are generated by the above plotting routine.

2.4 Structured output

On this section, let’s discuss the best way to make sure the o3 model outputs a consistent JSON format as a substitute of free-form text. That is what’s referred to as “structured output,” and it’s one in every of the important thing enablers for integrating LLMs into existing automatic workflows.

To attain that, we start by defining the schema that governs the expected output structure. We’ll be using a Pydantic model:

from pydantic import BaseModel, Field
from typing import Literal
from datetime import date

AnomalyKind = Literal["spike", "drop", "level_shift", "seasonal_outlier"]

class DateWindow(BaseModel):
    start: date = Field(description="Earliest plausible date the anomaly begins (ISO YYYY-MM-DD)")
    end: date = Field(description="Latest plausible date the anomaly ends, inclusive (ISO YYYY-MM-DD)")

class AnomalyReport(BaseModel):
    when: DateWindow = Field(
        description=(
            "Minimal window that incorporates the anomaly. "
            "For single-point anomalies, use the interval that covers reading uncertainty, if the tick labels are unclear"
        )
    )
    y: int = Field(description="Approx value on the anomaly’s most representative day (peak/lowest), rounded")
    kind: AnomalyKind = Field(description="The form of the anomaly")
    why: str = Field(description="One-sentence reason for why this window is unusual")
    date_confidence: Literal["low","medium","high"] = Field(
        default="medium", description="Confidence that the window localization is correct"
    )

Our Pydantic schema tries to capture each the quantitative and qualitative facets of the detected anomalies. For every field, we specify its data type (e.g., int for numerical values, Literal for a set set of decisions, etc.).

Also, we use Field function to supply detailed descriptions of every key. Those descriptions are especially necessary as they effectively function inline instructions for o3, in order that it understands the semantic meaning of every component.

Now, we've covered the multimodal input and structured output, time to place them together in a single LLM call.

2.5 o3 model invocation

To interact with o3 using multimodal input and structured output, we use LLM_client.beta.chat.completions.parse() API. A few of the key arguments include:

model: the deployment name;
messages: the message object sent to o3 model;
max_completion_token: the utmost variety of tokens the model can generate in its final response. Note that for reasoning models like o3, they may generate reasoning_tokens internally to “think through” the issue. The present max_completion_token only limits the visible output tokens that users receive;
response_format: the Pydantic model that defines the expected JSON schema structure;
reasoning_effort: a control knob that dictates how much computational effort o3 should use for reasoning. The available options include low, medium, and high.

We will define a helper function to interact with the o3 model:

def anomaly_detection(instruction, fig_path, 
                      response_format, prompt=None, 
                      deployment="o3", reasoning_effort="high"):

    # Compose messages
    messages=[
            { "role": "system", "content": instruction},
            { "role": "user", "content": [  
                { 
                    "type": "image_url",
                    "image_url": {
                        "url": fig_path,
                        "detail": "high"
                    }
                },
            ]} 
    ]

    # Add prompt whether it is given
    if prompt isn't None:
        messages[1]["content"].append({"type": "text", "text": prompt})

    # Invoke LLM API
    response = LLM_client.beta.chat.completions.parse(
        model=deployment,
        messages=messages,
        max_completion_tokens=4000,
        reasoning_effort=reasoning_effort,
        response_format=response_format
    )

    return response.decisions[0].message.parsed.model_dump()

Note that the messages object accepts each text and image content. Since we’ll solely use figures to prompt the model, the text prompt is optional.

We set the "detail": "high" to enable high-resolution image processing. For our current case study, that is almost certainly crucial as we'd like o3 to raised read tremendous details like axis tick labels, data point values, and subtle visual patterns. Nevertheless, keep in mind that high-detail processing would incur more tokens and better API costs.

Finally, by utilizing .parsed.model_dump(), we turn the JSON output right into a usual Python dictionary.

That’s it for the implementation. Let’s see some results next.

3. Results

On this section, we’ll input the previously generated figures into the o3 model and ask it to discover potential anomalies.

3.1 Spike anomaly

# df_spike_anomaly is the dataframe of the primary set of synthetic data (Figure 1)
spike_anomaly_url = create_fig(df_spike_anomaly)

# Anomaly detection
result = anomaly_detection(instruction,
                          spike_anomaly_url,
                          response_format=AnomalyReport,
                          reasoning_effort="medium")
print(result)

In the decision above, the spike_anomaly_url is the info URL for Figure 1. The output of the result's shown below:

{
  'when': {'start': datetime.date(2025, 8, 19), 'end': datetime.date(2025, 8, 21)}, 
  'y': 166, 
  'kind': 'spike', 
  'why': 'Single day orders jump to ~166, far above adjoining days that sit near 120–130.', 
  'date_confidence': 'medium'
}

We see that o3 model faithfully returned the output exactly within the format we designed. Now, we will grab this result and generate a visualization programmatically:

# Create image
fig, ax = plt.subplots(figsize=(8, 4.5))
df_spike_anomaly['date'] = pd.to_datetime(df_spike_anomaly['date'])
ax.plot(df_spike_anomaly["date"], df_spike_anomaly["orders"], linewidth=2)
ax.set_xlabel('Date', fontsize=14)
ax.set_ylabel('Every day Orders', fontsize=14)

# Format x-axis dates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))  
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1)) 

# Add anomaly overlay
start_date = pd.to_datetime(result['when']['start'])
end_date = pd.to_datetime(result['when']['end'])

# Add shaded region
ax.axvspan(start_date, end_date, alpha=0.3, color='red', label=f"Anomaly ({result['kind']})")

# Add text annotation
mid_date = start_date + (end_date - start_date) / 2  # Middle of anomaly window
ax.annotate(
    result['why'], 
    xy=(mid_date, result['y']), 
    xytext=(10, 20),  # Offset from the purpose
    textcoords='offset points',
    bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.7),
    arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.1'),
    fontsize=10,
    wrap=True
)

# Add legend
ax.legend()

plt.xticks(rotation=0)
plt.tight_layout()

The generated visualization looks like this:

Figure 4. The anomaly detection results for Figure 1. (Image by creator)

We will see that the o3 model appropriately identified the spike anomaly presented in this primary set of synthetic data.

Not bad, especially considering the indisputable fact that we didn’t do any conventional model training, just by prompting an LLM.

3.2 Level shift anomaly

# df_level_shift_anomaly is the dataframe of the 2nd set of synthetic data (Figure 2)
level_shift_anomaly_url = create_fig(df_level_shift_anomaly)

# Anomaly detection
result = anomaly_detection(instruction,
                          level_shift_anomaly_url,
                          response_format=AnomalyReport,
                          reasoning_effort="medium")
print(result)

The output of the result's shown below:

{
  'when': {'start': datetime.date(2025, 8, 26), 'end': datetime.date(2025, 9, 2)}, 
  'y': 150, 
  'kind': 'level_shift', 
  'why': 'Orders suddenly jump from the 120-135 range to ~150 on Aug 26 and remain elevated for all subsequent days, indicating a sustained baseline change.', 
  'date_confidence': 'high'
}

Again, we see that the model accurately identified that a “level_shift” anomaly is present within the plot:

Figure 5. The anomaly detection results for Figure 2. (Image by creator)

3.3 Seasonality anomaly

# df_seasonality_anomaly is the dataframe of the third set of synthetic data (Figure 3)
seasonality_anomaly_url = create_fig(df_seasonality_anomaly)

# Anomaly detection
result = anomaly_detection(instruction,
                          seasonality_anomaly_url,
                          response_format=AnomalyReport,
                          reasoning_effort="medium")
print(result)

The output of the result's shown below:

{
  'when': {'start': datetime.date(2025, 8, 23), 'end': datetime.date(2025, 8, 24)}, 
  'y': 132, 
  'kind': 'seasonal_outlier', 
  'why': 'Weekend of Aug 23-24 shows order volumes (~130+) on par with surrounding weekdays, whereas other weekends consistently drop to ~115, making it an out-of-season spike.', 
  'date_confidence': 'high'
}

This can be a difficult case. Nevertheless, our o3 model managed to tackle it properly, with accurate localization and a transparent reasoning trace. Pretty impressive:

Figure 6. The anomaly detection results for Figure 3. (Image by creator)

4. Summary

Congratulations! We’ve successfully built an anomaly detection solution for time-series data that worked entirely through visualization and prompting.

By feeding each day order plots into the o3 reasoning model and constraining its output to a JSON schema, the LLM managed to discover three different anomaly types with accurate localization. All of this was achieved without training any ML model. Impressive!

If we take a step back, we will see that the answer we built illustrates the broader pattern of mixing three capabilities:

See: multimodal input to let the model eat figures directly.
Think: step-by-step reasoning capability to tackle complex problems.
Integrate: structured output that downstream systems can easily eat (e.g., generating visualizations).

The mixture of multimodal input + reasoning + structured output really creates a flexible foundation for useful LLM applications.

You now have the constructing blocks ready. What do you must construct next?

Constructing LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Reducing CUDA Binary Size to Distribute cuML on PyPI

A greater multilingual vision language encoder

Microsoft will finally kill obsolete cipher that has wreaked a long time of havoc

Korean AI startup Motif reveals 4 big lessons for training enterprise LLMs

The fast and the future-focused are revolutionizing motorsport

Constructing LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.