There was much discussion about AI Agents — pivotal self-contained units able to performing tasks autonomously, driven by specific instructions and contextual understanding. In reality, the subject has grow to be almost as widely discussed as LLMs. In this text, I consider AI Agents and, more specifically, the concept of Multi-Agents-as-a-Service from the angle of the lead engineers, architects, and site reliability engineers (SREs) that must cope with AI agents in production systems going forward.
Context: What Problems Can AI Agents Solve?
AI agents are adept at tasks that profit from human-friendly interactions:
- E-Commerce: agents powered by technologies like LLM-based RAG or Text-to-SQL reply to user inquiries with accurate answers based on company policies, allowing for a more tailored shopping experience and customer journey that may revolutionize e-commerce
- Customer Service: That is one other ideal application. Lots of us have experienced long waits to talk with representatives for easy queries like order status updates. Some startups — Decagon for instance — are making strides in addressing these inefficiencies through AI agents.
- Personalized Product and Content Creation: a primary example of that is Wix — for low-code or no-code website constructing, Wix developed a chatbot that, through interactive Q&A sessions, creates an initial website for patrons based on their description and requirements.
Overall, LLM-based agents would work great in mimicking natural human dialogue and easy business workflows, often producing results which are each effective and impressively satisfying.
An Engineer’s View: AI Agents & Enterprise Production Environments
Considering the advantages mentioned, have you ever ever wondered how AI agents would function inside enterprise production environments? What architecture patterns and infrastructure components best support them? What will we do when things inevitably go improper and the agents hallucinate, crash or (arguably even worse) perform incorrect reasoning/planning when performing a critical task?
As senior engineers, we want to fastidiously consider the above. Furthermore, we must ask a good more vital query: how will we define what a successful deployment of a multi-agent platform looks like in the primary place?
To reply this query, let’s borrow an idea from one other software engineering field: Service Level Objectives (SLOs) from Reliability Engineering. SLOs are a critical component in measuring the performance and reliability of services. Simply put, SLOs define the suitable ratio of “successful” measurements to “all” measurements and their impact on the user journeys. These objectives help us determine the required and expected levels of service from our agents and the broader workflows they support.
So, how are SLOs relevant to our AI Agent discussion?
Using a simplified view, let’s consider two vital objectives — “Availability” and “Accuracy” — for the agents and discover some more granular SLOs that contribute to those:
- Availability: this refers to the share of requests that receive some successful response (think HTTP 200 status code) from the agents or platform. Historically, the uptime and ping success of the underlying servers (i.e. temporal measures) were key correlated indicators of availability. But with the rise of Micro-services, notional uptime has grow to be less relevant. Modern systems as a substitute deal with the variety of successful versus unsuccessful responses to user requests as a more accurate proxy for availability. Other related metrics may be regarded as Latency and Throughput.
- Accuracy: this, however, is less about how quickly and consistently the agents return responses to the clients, but relatively how accurately, from a business perspective, they’re able to perform their tasks and return data with out a human present within the loop to confirm their work. Traditional systems also track similar SLOs resembling data correctness and quality.
The act of measuring the 2 objectives above normally occurs through submission of internal application metrics at runtime, either at set time intervals (e.g. every 10 minutes), or in response to events (user requests, upstream calls etc.). Synthetic probing, as an example, may be used to mimic user requests, trigger relevant events and monitor the numbers. The key idea to explore here is that this: traditional systems are deterministic to a big extent and, subsequently, it’s generally more straightforward to instrument, probe and evaluate them. Then again, in our beautiful yet non-deterministic world of GenAI agents, this is just not necessarily the case.
Note: the main target of this post is more so on the previous of our two objectives – availability. This includes determining acceptance criteria that sets up baseline cloud/environmental stability to assist agents reply to user queries. For a deeper dive into accuracy (i.e. defining sensible task scope for the agents, optimizing performance of few-shot methods and evaluation frameworks), this blog post acts as an exquisite primer.
Now, back to the things engineers must get right to make sure infrastructure reasiness when deploying agents. As a way to achieve our goal SLOs and supply a reliable and secure platform, senior engineers consistently take into consideration the next elements:
- Scalability: when variety of requests increase (suddenly at times), can the system handle them efficiently?
- Cost-Effectiveness: LLM usage is dear, so how can we monitor and control the price?
- High Availability: how can we keep the system always-available and aware of customers? Can agents self-heal and get better from errors/crashes?
- Security: How can we ensure data is secure at rest and in transit, perform security audits, vulnerability assessments, etc.?
- Compliance & Regulatory: a significant topic for AI, what are the relevant data privacy regulations and other industry-specific standards to which we must adhere?
- Observability: how can we gain real-time visibility into AI agents’ activities, health, and resource utilization levels as a way to discover and resolve problems before they impact the user experience?
Sound familiar? These are much like the challenges that modern web applications, Micro-services pattern and Cloud infrastructure aim to deal with.
So, now what? We propose an AI Agent development and maintenance framework that adheres to best-practices developed over time across a spread of engineering and software disciplines.
Multi-Agent-as-a-Service (MAaaS)
This time, allow us to borrow a few of best-practices for cloud-based applications to redefine how agents are designed in production systems:
- Clear Bounded Context: Each agent must have a well-defined and small scope of responsibility with clear functionality boundaries. This modular approach ensures that agents are more accurate, easier to administer and scale independently.
- RESTful and Asynchronous Inter-Service Communication: Usage of RESTful APIs for communication between users and agents, and leveraging message brokers for asynchronous communication. This decouples agents, improving scalability and fault tolerance.
- Isolated Data Storage per Agent: Each agent must have its own data storage to make sure data encapsulation and reduce dependencies. Utilize distributed data storage solutions where vital to support scalability.
- Containerization and Orchestration: Using containers (e.g. Docker) to package and deploy agents consistently across different environments, simplifying deployment and scaling. Employ container orchestration platforms like Kubernetes to administer the deployment, scaling, and operational lifecycle of agent services.
- Testing and CI/CD: Implementing automated testing (unit, integration, contract, and end-to-end tests) to make sure the reliable change management for agents. Use CI tools to robotically construct and test agents every time code changes are committed. Establish CD pipelines to deploy changes to production seamlessly, reducing downtime and ensuring rapid iteration cycles.
- Observability: Implementing robust observability instrumentation resembling metrics, tracing and logging for the agents and their supporting infrastructure to construct a real-time view of the platform’s reliability (tracing may very well be of particular interest here if a given user request goes through multiple agents). Calculating and tracking SLO’s and error budgets for the agents and the combination request flow. Synthetic probing and efficient Alerting on warnings and failures to be sure that agent health issues are detected before widely impacting the top users.
By applying these principles, we are able to create a sturdy framework for AI agents, transforming the concept into “Multi-Agent as a Service” (MAaaS). This approach leverages the best-practices of cloud-based applications to redefine how agents are designed, deployed, and managed.