Consult with my Agent 

-

the past several months, I’ve had the chance to immerse myself within the task of adapting APIs and backend systems for consumption by LLMs, specifically agents using the MCP protocol. Initially, I expected the experience to be no different than some other similar development projects I’ve done up to now. I used to be fascinated to find, nevertheless, that these autonomous clients are a brand new style of creature. Consequently, evolving APIs to yield essentially the most value from agent interaction required greater than simply making them accessible. 

This post is a results of my experimentations and field testing, hopefully it could possibly be useful to other practitioners. 

The ability and curse of autonomy 

Image generated by writer (Midjourney)

We developers are used to Third-party tools and automation processes interacting with the applying APIs. Our interfaces have subsequently evolved around best practices that best support these use cases. Transactional, versioned, contract-driven APIs, minded to implement forward/backward compatibility and built for efficiency. These are all vital concerns which might be secondary in priority and infrequently simply irrelevant when considering the autonomous user.

With agents as clients, there isn’t any have to worry about backward/forward compatibility as each session is stateless and unique. The model will study the best way to use tools every time it discovers them, arriving at the proper combination of API calls to realize its objective. As enthusiastic as this agent could also be, nevertheless, it’ll also surrender after a number of failed attempts unless given proper incentive and guidelines. 

More importantly, without such clues it could reach the API call but fail to fulfill its objectives. Unlike scripted automations or experienced developers, it only has the API documentation and responses to go on in planning out the best way to meet its goals. The dynamic nature of its response is each a blessing and a curse as these two sources are also the sum of data it could possibly draw upon to be effective. 

Conversation-Driven APIs

I had first realized that the agent would require a distinct style of design while troubleshooting some cases by which the agent was not capable of get to the specified results. I provided MCP tool access to an API that gives usage information for any code function based on tracing data. Sometimes it seemed the agent was simply not using it appropriately. Looking more closely on the interaction, it seemed that the model was appropriately calling the tool and for various reasons received an empty array as a response. This behavior could be 100% correct for any similar operation in our API. 

The agent, nevertheless, had trouble comprehending why this was happening. After trying a number of easy variations, it gave up and decided to maneuver on to other avenues of exploration. To me, that interaction spelled out a missed opportunity. Nobody was at fault; transactionally, the behavior was correct. The entire relevant tests would pass, but in measuring the effectiveness of using this API, we discovered the ‘success rate’ was ridiculously low.

The answer turned out to be an easy one, as a substitute of returning an empty response, I made a decision to offer a more detailed set of instructions and concepts:

var emptyResult = latest NoDataFoundResponse()
{
    Message = @"There was no info found based on the factors sent.
        This might mean that the code is just not called, or that it is just not manually instrumented 
        using OTEL annotations.",
    SuggestedNextSteps = @"Suggested steps: 
    1. Seek for endpoints (http, consumers, jobs etc.) that use this function. 
       Endpoints are frequently mechanically instrumented with OTEL spans by the 
       libraries using them.
    2. Try calling this tool using the tactic and sophistication of the endpoint 
       itself or use the GetTraceForEndpoint tool with the endpoint route. 
    3. Suggest manual instrumentation for the particular method depending on the language utilized in the project
       and the present kind of instrumentation used (annotations, code etc.)"

};

As an alternative of just returning the outcomes to the agent, I used to be attempting to do something agents will often attempt as well — keep the conversation going. My perceptions of API responses, subsequently, modified. When being consumed by LLMs, beyond serving functional purposes, they’re, in essence, a reverse prompt. An ended interaction is a dead end, nevertheless, any data we return back to the agent gives it a likelihood to drag on one other thread in its investigative process.

HATEOAS, the ‘select your personal adventure’ APIs

Image generated by writer (Midjourney)

Desirous about the philosophy of this approach, I spotted that there was something vaguely familiar about it. A protracted time ago, once I was taking my first steps crafting modern REST APIs, I used to be introduced to the concept of hypermedia APIs and HATEOAS: Hypertext As Engine of the Application State. The concept was outlined by Fielding in his seminal 2008 blog post REST APIs have to be hypertext-driven. One sentence in that post completely blew my mind on the time:

“Application state transitions have to be driven by client collection of server-provided selections which might be present within the received representations”

In other words, the server can teach the client what to do next as a substitute of simply sending back the requested data. The canonical example is a straightforward GET request for a selected resource, whereby the response provides information on actions the client can take next on that resource. A self-documenting API where the client was not required to know anything about it ahead of time except a single entry point from which a branch of selections emerges. Here is an excellent example from the Wikipedia page:

HTTP/1.1 200 OK

{
    "account": {
        "account_number": 12345,
        "balance": {
            "currency": "usd",
            "value": 100.00
        },
        "links": {
            "deposits": "/accounts/12345/deposits",
            "withdrawals": "/accounts/12345/withdrawals",
            "transfers": "/accounts/12345/transfers",
            "close-requests": "/accounts/12345/close-requests"
        }
    }
}

On the time, I used to be fascinated by this idea, which jogged my memory of what is usually known as ‘select your personal adventure’ books or ‘gamebooks’. This genre of books, an inseparable a part of my childhood, didn’t simply relay a story (or provide an API response by this metaphor), but additionally gave the reader a key as to what are the following set set of options available to it. Hypermedia REST APIs were self-documenting and offered users a approach to understand the state of the applying and the operations available for every entity or process resource without having to read through extensive documentation. 

Hypermedia on steroids

One approach to take a look at Hypermedia APIs is that they supply more context to the user as an element of the response. Context, to agents, is every thing, and it actually doesn’t have to stop with available options or operation. An API is an interaction point by which context may be relayed and provided in a form that will encourage further interaction. Let’s take a take a look at one other example!

One other tool I used to be working on, allows the model to retrieve runtime issues present in the deployment environment, again based on observability data. The precise result I used to be testing prompt response for, was an anomaly present in the performance of a selected endpoint. It appears that evidently at times responses were EXTREMELY slow, ~70X slower than the median. Providing that information piece to the LLM was helpful, but ultimately didn’t accomplish greater than easy repeats of the info provided. 

For reference, here is the response provided, in addition to the agent output:

{
      "Name": "Performance Anomaly",
      "Category": "Performance",
      "P50":
      {
          "Value": 12.33,
          "Unit": "ms",
          "Raw": 12331700.0
      },
      "P95":
      {
          "Value": 909.62,
          "Unit": "ms",
          "Raw": 909625000.0
      },
      "SlowerByPercentage": 7376.314701136097,

      "SpanInfo":
      {
          ....
      },
      
      #more data  
      ....

        
}
Image by writer

There’s nothing functionally mistaken with the API response or the best way the data was mediated to the user by the agent. The one problem is that there may be a whole lot of context and concepts missing from it that would leverage the agent’s ability to take that conversation forward. In other words, that is a standard API request/response interaction, but agents through reasoning are able to so way more. Let’s see what happens if we modify our API to inject additional state and suggestions to attempt to carry the conversation forward:

{

  "_recommendation": 
      "This asset's P95 (slowest 5%) duration is disproportionally slow 
       in comparison with the median to an excessive degree
       Listed below are some suggested investigative next steps to get to the 
       root cause or correct the problem: 
       1. The problem includes example traces for each the P95 and median
          duration, get each traces and compare them discover which asset 
          or assets are those which might be abnormally slow sometimes
       2. Check the performance graphs for this asset P95 and see if there 
          has been a change recently, in that case check for pull requests 
          merged around that point which may be relevan tot his area 
       3. Check for fruther clues within the slow traces, for instance possibly 
          ALL spans of the identical type are slow at the moment period indicating
          a scientific issue"

    "Name": "Performance Anomaly",
    "Category": "Performance",
    "P50":
    {
        ...
    },
      #more data

All we’ve done is give the AI model somewhat more to go on. As an alternative of simply returning the result, we will feed the model with ideas on the best way to use the data provided to it. Surely enough, these suggestions are immediately put to make use of. This time, the agent continues to research the issue by calling other tools to examine the behavior, compare the traces and understand the issue lineage:

Image by writer

With the brand new information in place, the agent is pleased to proceed the exploration, examine the timeline and synthesize the outcomes from the varied tools until it comes up with latest data, that was by no means a part of the unique response scope:

Image by writer

Wait… Shouldn’t all APIs be designed like that?

Absolutely! I definitely consider that this approach may benefit users, automation developers, and everybody else — even in the event that they use brains for reasoning slightly than LLM models. In essence, a conversation-driven API can expand the context beyond the realm of knowledge and into the realm of possibilities. Opening up more branches of exploration for agents and users alike and improving the effectiveness of APIs in solving the underlying use case.

There is certainly more room for evolution. For instance, the hints and concepts provided to the client by the API in our example were static, what in the event that they were AI-generated as well? There are lots of different A2A models on the market, but sooner or later, it could just be a backend system and a client brainstorming about what the info means and what could possibly be done to grasp it higher. As for the user? Ignore him, refer to his agent.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x