
Engineering teams are generating more code with AI agents than ever before. But they're hitting a wall when that code reaches production.
The issue isn't necessarily the AI-generated code itself. It's that traditional monitoring tools generally struggle to supply the granular, function-level data AI agents need to grasp how code actually behaves in complex production environments. Without that context, agents can't detect issues or generate fixes that account for production reality.
It's a challenge that startup Hud is seeking to help solve with the launch of its runtime code sensor on Wednesday. The corporate's eponymous sensor runs alongside production code, routinely tracking how every function behaves, giving developers a heads-up on what's actually occurring in deployment.
"Every software team constructing at scale faces the identical fundamental challenge: constructing high-quality products that work well in the true world," Roee Adler, CEO and founding father of Hud, told VentureBeat in an exclusive interview. "In the brand new era of AI-accelerated development, not knowing how code behaves in production becomes an excellent larger a part of that challenge."
What software developers are fightingÂ
The pain points that developers are facing are fairly consistent across engineering organizations. Moshik Eilon, group tech lead at Monday.com, oversees 130 engineer and describes a well-known frustration with traditional monitoring tools.
"Once you get an alert, you often find yourself checking an endpoint that has an error rate or high latency, and you would like to drill right down to see the downstream dependencies," Eilon told VentureBeat. "Lots of times it's the actual application, after which it's a black box. You simply get 80% downstream latency on the appliance."
The following step typically involves manual detective work across multiple tools. Check the logs. Correlate timestamps. Attempt to reconstruct what the appliance was doing. For novel issues deep in a big codebase, teams often lack the precise data they need.
Daniel Marashlian, CTO and co-founder at Drata, saw his engineers spending hours on what he known as an "investigation tax." "They were mapping a generic alert to a selected code owner, then digging through logs to reconstruct the state of the appliance," Marashlian told VentureBeat. "We desired to eliminate that so our team could focus entirely on the fix somewhat than the invention."
Drata's architecture compounds the challenge. The corporate integrates with quite a few external services to deliver automated compliance, which creates sophisticated investigations when issues arise. Engineers trace behavior across a really large codebase spanning risk, compliance, integrations, and reporting modules.
Marashlian identified three specific problems that drove Drata toward investing in runtime sensors. The primary issue was the fee of context switching.Â
"Our data was scattered, so our engineers needed to act as human bridges between disconnected tools," he said.
The second issue, he noted, is alert fatigue. "When you’ve a posh distributed system, general alert channels turn into a relentless stream of background noise, what our team describes as a 'ding, ding, ding' effect that eventually gets ignored," Marashlian said.
The third key driver was a have to integrate with the corporate's AI strategy.
"An AI agent can write code, but it surely cannot fix a production bug if it may well't see the runtime variables or the basis cause," Marashlian said.
Why traditional APMs can't solve the issue easily
Enterprises have long relied on a category of tools and services often called Application Performance Monitoring (APM).Â
With the present pace of agentic AI development and modern development workflows, each Monday.com and Drata simply weren’t in a position to get the required visibility from existing APM tools.
"If I’d wish to get this information from Datadog or from CoreLogix, I’d just must ingest tons of logs or tons of spans, and I’d pay a whole lot of money," Eilon said.Â
Eilon noted that Monday.com used very low sampling rates due to cost constraints. That meant they often missed the precise data needed to debug issues.
Traditional application performance monitoring tools also require prediction, which is an issue because sometimes a developer just doesn't know what they don't know.
"Traditional observability requires you to anticipate what you'll have to debug," Marashlian said. "But when a novel issue surfaces, especially deep inside a big, complex codebase, you're often missing the precise data you wish."
Drata evaluated several solutions within the AI site reliability engineering and automatic incident response categories and didn't find what was needed.Â
 "Most tools we evaluated were excellent at managing the incident process, routing tickets, summarizing Slack threads, or correlating graphs," he said. "But they often stopped wanting the code itself. They may tell us 'Service A is down,' but they couldn't tell us why specifically."
One other common capability in some tools including error monitors like Sentry is the flexibility to capture exceptions. The challenge, in keeping with Adler, is that being made aware of exceptions is sweet, but that doesn't connect them to business impact or provide the execution context AI agents have to propose fixes.
How runtime sensors work in a different way
Runtime sensors push intelligence to the sting where code executes. Hud's sensor runs as an SDK that integrates with a single line of code. It sees every function execution but only sends lightweight aggregate data unless something goes flawed.
When errors or slowdowns occur, the sensor routinely gathers deep forensic data including HTTP parameters, database queries and responses, and full execution context. The system establishes performance baselines inside a day and may alert on each dramatic slowdowns and outliers that percentile-based monitoring misses.
"Now we just get all of this information for all the functions no matter what level they’re, even for underlying packages," Eilon said. "Sometimes you may have a difficulty that may be very deep, and we still see it pretty fast."
The platform delivers data through 4 channels:
-
Web application for centralized monitoring and evaluation
-
IDE extensions for VS Code, JetBrains and Cursor that surface production metrics directly where code is written
-
MCP server that feeds structured data to AI coding agents
-
Alerting system that identifies issues without manual configuration
The MCP server integration is critical for AI-assisted development. Monday.com engineers now query production behavior directly inside Cursor.Â
"I can just ask Cursor an issue: Hey, why is that this endpoint slow?" Eilon said. "When it uses the Hud MCP, I get all the granular metrics, and this function is 30% slower since this deployment. Then I also can find the basis cause."
This changes the incident response workflow. As a substitute of starting in Datadog and drilling down through layers, engineers start by asking an AI agent to diagnose the problem. The agent has immediate access to function-level production data.
From voodoo incidents to minutes-long fixes
The shift from theoretical capability to practical impact becomes clear in how engineering teams actually use runtime sensors. What used to take hours or days of detective work now resolves in minutes.
"I'm used to having these voodoo incidents where there’s a CPU spike and also you don't know where it got here from," Eilon said. "A number of years ago, I had such an incident and I had to construct my very own tool that takes the CPU profile and the memory dump. Now I just have all the function data and I've seen engineers just solve it so fast."
At Drata, the quantified impact is dramatic. The corporate built an internal /triage command that support engineers run inside their AI assistants to immediately discover root causes. Manual triage work dropped from roughly 3 hours per day to under 10 minutes. Mean time to resolution improved by roughly 70%.
The team also generates a every day "Heads Up" report of quick-win errors. Because the basis cause is already captured, developers can fix these issues in minutes. Support engineers now perform forensic diagnosis that previously required a senior developer. Ticket throughput increased without expanding the L2 team.
Where this technology matches
Runtime sensors occupy a definite space from traditional APMs, which excel at service-level monitoring but struggle with granular, cost-effective function-level data. They differ from error monitors that capture exceptions without business context.
The technical requirements for supporting AI coding agents differ from human-facing observability. Agents need structured, function-level data they will reason over. They will't parse and correlate raw logs the way in which humans do. Traditional observability also assumes you’ll be able to predict what you'll have to debug and instrument accordingly. That approach breaks down with AI-generated code where engineers may not deeply understand every function.
"I feel we're entering a brand new age of AI-generated code and this puzzle, this jigsaw puzzle of a brand new stack emerging," Adler said. "I just don't think that the cloud computing observability stack goes to suit neatly into how the longer term looks like."
What this implies for enterprises
For organizations already using AI coding assistants like GitHub Copilot or Cursor, runtime intelligence provides a security layer for production deployments. The technology enables what Monday.com calls "agentic investigation" somewhat than manual tool-hopping.
The broader implication pertains to trust. "With AI-generated code, we’re getting way more AI-generated code, and engineers start not knowing all the code," Eilon said.
 Runtime sensors bridge that knowledge gap by providing production context directly within the IDE where code is written.
For enterprises seeking to scale AI code generation beyond pilots, runtime intelligence addresses a fundamental problem. AI agents generate code based on assumptions about system behavior. Production environments are complex and surprising. Function-level behavioral data captured routinely from production gives agents the context they should generate reliable code at scale.
Organizations should evaluate whether their existing observability stack can cost-effectively provide the granularity AI agents require. If achieving function-level visibility requires dramatically increasing ingestion costs or manual instrumentation, runtime sensors may offer a more sustainable architecture for AI-accelerated development workflows already emerging across the industry.
