n8n workflows in production, the stress of hearing that a process failed and needing to dig through logs to seek out the basis cause.
User: Samir, your automation doesn’t work anymore, I didn’t receive my notification!
Step one is to open your n8n interface and review the last executions to discover the problems.
After just a few minutes, you end up jumping between executions, comparing timestamps and reading JSON errors to know where things broke.

What if an agent could inform you why your workflow failed at 3 AM without you having to dig through the logs?
It is feasible!
As an experiment, I connected the n8n API, which provides access to execution logs of my instance, to an MCP server powered by Claude.

The result’s an AI assistant that may monitor workflows, analyse failures, and explain what went fallacious in natural language.

In this text, I’ll walk you thru the step-by-step strategy of constructing this technique.
The primary section will show an actual example from my very own n8n instance, where several workflows failed through the night.

We’ll use this case to see how the agent identifies issues and explains their root causes.
Then, I’ll detail how I connected my n8n instance’s API to the MCP server using a webhook to enable Claude Desktop to fetch execution data for natural-language debugging.

The webhook includes three functions:
- Get Energetic Workflows: which provides the list of all lively workflows
- Get Last Executions: includes information concerning the last n executions
- Get Executions Details (Status = Error): details of failed executions formatted to support root cause analyses
You’ll find the entire tutorial, together with the n8n workflow template and the MCP server source code, linked in this text.
Demonstration: Using AI to Analyse Failed n8n Executions
Allow us to look together at one in all my n8n instances, which runs several workflows that fetch event information from different cities around the globe.
These workflows help business and networking communities discover interesting events to attend and learn from.

To check the answer, I’ll start by asking the agent to list the lively workflows.
Step 1: What number of workflows are lively?

Based on the query alone, Claude understood that it needed to interact with the n8n-monitor tool, which was built using an MCP server.

From there, it mechanically chosen the corresponding function, Get Energetic Workflows, to retrieve the list of lively automations from my n8n instance.

That is where you begin to sense the facility of the model.
It mechanically categorised the workflows based on their names
- 8 workflows to hook up with fetch events from APIs and process them
- 3 other workflows which are work-in-progress, including the one used to fetch the logs

This marks the start of the evaluation; all these insights can be utilised in the basis cause evaluation.
Step 2: Analyse the last n executions
At this stage, we will begin asking Claude to retrieve the most recent executions for evaluation.

Because of the context provided within the doc-strings, which I’ll explain in the following section, Claude understood that it needed to call the get workflow executions.
It’s going to receive a summary of the executions, with the proportion of failures and the variety of workflows impacted by these failures.
{
"summary": {
"totalExecutions": 25,
"successfulExecutions": 22,
"failedExecutions": 3,
"failureRate": "12.00%",
"successRate": "88.00%",
"totalWorkflowsExecuted": 7,
"workflowsWithFailures": 1
},
"executionModes": {
"webhook": 7,
"trigger": 18
},
"timing": {
"averageExecutionTime": "15.75 seconds",
"maxExecutionTime": "107.18 seconds",
"minExecutionTime": "0.08 seconds",
"timeRange": {
"from": "2025-10-24T06:14:23.127Z",
"to": "2025-10-24T11:11:49.890Z"
}
},
[...]
That is the very first thing it’ll share with you; it provides a transparent overview of the situation.

Within the second a part of the outputs, yow will discover an in depth breakdown of the failures for every workflow impacted.
"failureAnalysis": {
"workflowsImpactedByFailures": [
"7uvA2XQPMB5l4kI5"
],
"failedExecutionsByWorkflow": {
"7uvA2XQPMB5l4kI5": {
"workflowId": "7uvA2XQPMB5l4kI5",
"failures": [
{
"id": "13691",
"startedAt": "2025-10-24T11:00:15.072Z",
"stoppedAt": "2025-10-24T11:00:15.508Z",
"mode": "trigger"
},
{
"id": "13683",
"startedAt": "2025-10-24T09:00:57.274Z",
"stoppedAt": "2025-10-24T09:00:57.979Z",
"mode": "trigger"
},
{
"id": "13677",
"startedAt": "2025-10-24T07:00:57.167Z",
"stoppedAt": "2025-10-24T07:00:57.685Z",
"mode": "trigger"
}
],
"failureCount": 3
}
},
"recentFailures": [
{
"id": "13691",
"workflowId": "7uvA2XQPMB5l4kI5",
"startedAt": "2025-10-24T11:00:15.072Z",
"mode": "trigger"
},
{
"id": "13683",
"workflowId": "7uvA2XQPMB5l4kI5",
"startedAt": "2025-10-24T09:00:57.274Z",
"mode": "trigger"
},
{
"id": "13677",
"workflowId": "7uvA2XQPMB5l4kI5",
"startedAt": "2025-10-24T07:00:57.167Z",
"mode": "trigger"
}
]
},
As a user, you now have visibility into the impacted workflows, together with details of the failure occurrences.

For this specific case, the workflow “Bangkok Meetup” is triggered every hour.
What we could see is that we had issues 3 times (out of 5) through the last five hours.
Note: We will ignore the last sentence because the agent doesn’t yet have access to the
The last section of the outputs includes an evaluation of the general performance of the workflows.
"workflowPerformance": {
"allWorkflowMetrics": {
"CGvCrnUyGHgB7fi8": {
"workflowId": "CGvCrnUyGHgB7fi8",
"totalExecutions": 7,
"successfulExecutions": 7,
"failedExecutions": 0,
"successRate": "100.00%",
"failureRate": "0.00%",
"lastExecution": "2025-10-24T11:11:49.890Z",
"executionModes": {
"webhook": 7
}
},
[... other workflows ...]
,
"topProblematicWorkflows": [
{
"workflowId": "7uvA2XQPMB5l4kI5",
"totalExecutions": 5,
"successfulExecutions": 2,
"failedExecutions": 3,
"successRate": "40.00%",
"failureRate": "60.00%",
"lastExecution": "2025-10-24T11:00:15.072Z",
"executionModes": {
"trigger": 5
}
},
{
"workflowId": "CGvCrnUyGHgB7fi8",
"totalExecutions": 7,
"successfulExecutions": 7,
"failedExecutions": 0,
"successRate": "100.00%",
"failureRate": "0.00%",
"lastExecution": "2025-10-24T11:11:49.890Z",
"executionModes": {
"webhook": 7
}
},
[... other workflows ...]
}
]
}
This detailed breakdown can allow you to prioritise the upkeep in case you’ve got multiple workflows failing.

On this specific example, I actually have only a single failing workflow, which is the Ⓜ️ Bangkok Meetup.
What if I need to know when issues began?
Don’t worry, I’ve added a bit with the main points of the execution hour by hour.
"timeSeriesData": {
"2025-10-24T11:00": {
"total": 5,
"success": 4,
"error": 1
},
"2025-10-24T10:00": {
"total": 6,
"success": 6,
"error": 0
},
"2025-10-24T09:00": {
"total": 3,
"success": 2,
"error": 1
},
"2025-10-24T08:00": {
"total": 3,
"success": 3,
"error": 0
},
"2025-10-24T07:00": {
"total": 3,
"success": 2,
"error": 1
},
"2025-10-24T06:00": {
"total": 5,
"success": 5,
"error": 0
}
}
You only need to let Claude create a pleasant visual just like the one you’ve got below.

Let me remind you here that I didn’t provide any suggestion of results presentation to Claude; that is all from its own initiative!
Impressive, no?
Step 3: Root Cause Evaluation
Now that we all know which workflows have issues, we should always seek for the basis cause(s).

Claude should generally call the Get Error Executions function to retrieve details of executions with failures.
In your information, the failure of this workflow is on account of an error within the node JSON Tech that processes the output of the API call.
- Meetup Tech is sending an HTTP query to the Meetup API
- Processed by Result Tech Node
- JSON Tech is imagined to transform this output right into a transformed JSON

Here’s what happens when all the things goes well.

Nonetheless, it may well occur that the API call sometimes fails and the JavaScript node receives an error, because the input just isn’t within the expected format.
Allow us to see if Claude can locate the basis cause.
Here is the output of the Get Error Executions function.
{
"workflow_id": "7uvA2XQPMB5l4kI5",
"workflow_name": "Ⓜ️ Bangkok Meetup",
"error_count": 5,
"errors": [
{
"id": "13691",
"workflow_name": "Ⓜ️ Bangkok Meetup",
"status": "error",
"mode": "trigger",
"started_at": "2025-10-24T11:00:15.072Z",
"stopped_at": "2025-10-24T11:00:15.508Z",
"duration_seconds": 0.436,
"finished": false,
"retry_of": null,
"retry_success_id": null,
"error": {
"message": "A 'json' property isn't an object [item 0]",
"description": "Within the returned data, every key named 'json' must point to an object.",
"http_code": null,
"level": "error",
"timestamp": null
},
"failed_node": {
"name": "JSON Tech",
"type": "n8n-nodes-base.code",
"id": "dc46a767-55c8-48a1-a078-3d401ea6f43e",
"position": [
-768,
-1232
]
},
"trigger": {}
},
[... 4 other errors ...]
],
"summary": {
"total_errors": 5,
"error_patterns": {
"A 'json' property is not an object [item 0]": {
"count": 5,
"executions": [
"13691",
"13683",
"13677",
"13660",
"13654"
]
}
},
"failed_nodes": {
"JSON Tech": 5
},
"time_range": {
"oldest": "2025-10-24T05:00:57.105Z",
"newest": "2025-10-24T11:00:15.072Z"
}
}
}
Claude now has access to the main points of the executions with the error message and the impacted nodes.

Within the response above, you possibly can see that Claude summarised the outputs of multiple executions in a single evaluation.
We all know now that:
- Errors occurred every hour except at 08:00 am
- Every time, the identical node, called “JSON Tech”, is impacted
- The error occurs quickly after the workflow is triggered
This descriptive evaluation is accomplished by the start of a diagnostic.

This assertion just isn’t incorrect, as evidenced by the error message on the n8n UI.

Nonetheless, on account of the limited context, Claude starts to supply recommendations to repair the workflow that aren’t correct.

Along with the code correction, it provides an motion plan.

As I do know that the difficulty just isn’t (only) on the code node, I desired to guide Claude in the basis cause evaluation.

It finally challenged the initial proposal of the resolution and commenced to share assumptions concerning the root cause(s).

This begins to catch up with to the actual root cause, providing enough insights for us to start out exploring the workflow.

The revised fix is now higher because it considers the chance that the difficulty comes from the node input data.
For me, that is the perfect I could expect from Claude, considering the limited information that he has readily available.
Conclusion: Value Proposition of This Tool
This easy experiment demonstrates how an AI agent powered by Claude can extend beyond basic monitoring to deliver real operational value.
Before manually checking executions and logs, you possibly can first converse along with your automation system to ask what failed, why it failed, and receive context-aware explanations inside seconds.
This may not replace you entirely, but it may well speed up the basis cause evaluation process.
In the following section, I’ll briefly introduce how I arrange the MCP Server to attach Claude Desktop to my instance.
Constructing a neighborhood MCP Server to attach Claude Desktop to a FastAPI Microservice
To equip Claude with the three functions available within the webhook (Get Energetic Workflows, Get Workflow Executions and Get Error Executions), I actually have implemented an MCP Server.

On this section, I’ll briefly introduce the implementation, focusing only on Get Energetic Workflows and Get Workflows Executions, to reveal how I explain the usage of those tools to Claude.
For a comprehensive and detailed introduction to the answer, including instructions on easy methods to deploy it on your machine, I invite you to look at this tutorial on my YouTube Channel.
You can even find the MCP Server source code and the n8n workflow of the webhook.
Create a Class to Query the Workflow
Before examining easy methods to arrange the three different tools, let me introduce the utility class, which is defined with all of the functions needed to interact with the webhook.
You’ll find it within the Python file: ./utils/n8n_monitory_sync.py
import logging
import os
from datetime import datetime, timedelta
from typing import Any, Dict, Optional
import requests
import traceback
logger = logging.getLogger(__name__)
class N8nMonitor:
"""Handler for n8n monitoring operations - synchronous version"""
def __init__(self):
self.webhook_url = os.getenv("N8N_WEBHOOK_URL", "")
self.timeout = 30
Essentially, we retrieve the webhook URL from an environment variable and set a question timeout of 30 seconds.
The primary function get_active_workflows is querying the webhook passing as a parameter: "motion": get_active_workflows".
def get_active_workflows(self) -> Dict[str, Any]:
"""Fetch all lively workflows from n8n"""
if not self.webhook_url:
logger.error("Environment variable N8N_WEBHOOK_URL not configured")
return {"error": "N8N_WEBHOOK_URL environment variable not set"}
try:
logger.info("Fetching lively workflows from n8n")
response = requests.post(
self.webhook_url,
json={"motion": "get_active_workflows"},
timeout=self.timeout
)
response.raise_for_status()
data = response.json()
logger.debug(f"Response type: {type(data)}")
# List of all workflows
workflows = []
if isinstance(data, list):
workflows = [item for item in data if isinstance(item, dict)]
if not workflows and data:
logger.error(f"Expected list of dictionaries, got list of {type(data[0]).__name__}")
return {"error": "Webhook returned invalid data format"}
elif isinstance(data, dict):
if "data" in data:
workflows = data["data"]
else:
logger.error(f"Unexpected dict response with keys: {list(data.keys())} n {traceback.format_exc()}")
return {"error": "Unexpected response format"}
else:
logger.error(f"Unexpected response type: {type(data)} n {traceback.format_exc()}")
return {"error": f"Unexpected response type: {type(data).__name__}"}
logger.info(f"Successfully fetched {len(workflows)} lively workflows")
return {
"total_active": len(workflows),
"workflows": [
{
"id": wf.get("id", "unknown"),
"name": wf.get("name", "Unnamed"),
"created": wf.get("createdAt", ""),
"updated": wf.get("updatedAt", ""),
"archived": wf.get("isArchived", "false") == "true"
}
for wf in workflows
],
"summary": {
"total": len(workflows),
"names": [wf.get("name", "Unnamed") for wf in workflows]
}
}
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching workflows: {e} n {traceback.format_exc()}")
return {"error": f"Did not fetch workflows: {str(e)} n {traceback.format_exc()}"}
except Exception as e:
logger.error(f"Unexpected error fetching workflows: {e} n {traceback.format_exc()}")
return {"error": f"Unexpected error: {str(e)} n {traceback.format_exc()}"}
I actually have added many checks, because the API sometimes fails to return the expected data format.
This solution is more robust, providing Claude with all the knowledge to know why a question failed.
Now that the primary function is roofed, we will give attention to getting all of the last n executions with get_workflow_executions.
def get_workflow_executions(
self,
limit: int = 50,
includes_kpis: bool = False,
) -> Dict[str, Any]:
"""Fetch workflow executions of the last 'limit' executions with or without KPIs """
if not self.webhook_url:
logger.error("Environment variable N8N_WEBHOOK_URL not set")
return {"error": "N8N_WEBHOOK_URL environment variable not set"}
try:
logger.info(f"Fetching the last {limit} executions")
payload = {
"motion": "get_workflow_executions",
"limit": limit
}
response = requests.post(
self.webhook_url,
json=payload,
timeout=self.timeout
)
response.raise_for_status()
data = response.json()
if isinstance(data, list) and len(data) > 0:
data = data[0]
logger.info("Successfully fetched execution data")
if includes_kpis and isinstance(data, dict):
logger.info("Including KPIs within the execution data")
if "summary" in data:
summary = data["summary"]
failure_rate = float(summary.get("failureRate", "0").rstrip("%"))
data["insights"] = {
"health_status": "🟢 Healthy" if failure_rate < 10 else
"🟡 Warning" if failure_rate < 25 else
"🔴 Critical",
"message": f"{summary.get('totalExecutions', 0)} executions with {summary.get('failureRate', '0%')} failure rate"
}
return data
except requests.exceptions.RequestException as e:
logger.error(f"HTTP error fetching executions: {e} n {traceback.format_exc()}")
return {"error": f"Did not fetch executions: {str(e)}"}
except Exception as e:
logger.error(f"Unexpected error fetching executions: {e} n {traceback.format_exc()}")
return {"error": f"Unexpected error: {str(e)}"}
The one parameter here is the number n of executions you should retrieve: "limit": n.
The outputs include a summary with a health status that's generated by the code node Processing Audit.

The function get_workflow_executions only retrieves the outputs for formatting before sending them to the agent.
Now that we've defined our core functions, we will create the tools to equip Claude via the MCP server.
Arrange an MCP Server with Tools
Now it's the time to create our MCP server with tools and resources to equip (and teach) Claude.
from mcp.server.fastmcp import FastMCP
import logging
from typing import Optional, Dict, Any
from utils.n8n_monitor_sync import N8nMonitor
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler("n8n_monitor.log"),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
mcp = FastMCP("n8n-monitor")
monitor = N8nMonitor()
It's a basic implementation using FastMCP and importing n8n_monitor_sync.py with the functions defined within the previous section.
# Resource for the agent (Samir: update it every time you add a tool)
@mcp.resource("n8n://help")
def get_help() -> str:
"""Get help documentation for the n8n monitoring tools"""
return """
📊 N8N MONITORING TOOLS
=======================
WORKFLOW MONITORING:
• get_active_workflows()
List all lively workflows with names and IDs
EXECUTION TRACKING:
• get_workflow_executions(limit=50, include_kpis=True)
Get execution logs with detailed KPIs
- limit: Variety of recent executions to retrieve (1-100)
- include_kpis: Calculate performance metrics
ERROR DEBUGGING:
• get_error_executions(workflow_id)
Retrieve detailed error information for a selected workflow
- Returns last 5 errors with comprehensive debugging data
- Shows error messages, failed nodes, trigger data
- Identifies error patterns and problematic nodes
- Includes HTTP codes, error levels, and timing info
HEALTH REPORTING:
• get_workflow_health_report(limit=50)
Generate comprehensive health evaluation based on recent executions
- Identifies problematic workflows
- Shows success/failure rates
- Provides execution timing metrics
KEY METRICS PROVIDED:
• Total executions
• Success/failure rates
• Execution times (avg, min, max)
• Workflows with failures
• Execution modes (manual, trigger, integrated)
• Error patterns and frequencies
• Failed node identification
HEALTH STATUS INDICATORS:
• 🟢 Healthy: <10% failure rate
• 🟡 Warning: 10-25% failure rate
• 🔴 Critical: >25% failure rate
USAGE EXAMPLES:
- "Show me all lively workflows"
- "What workflows have been failing?"
- "Generate a health report for my n8n instance"
- "Show execution metrics for the last 48 hours"
- "Debug errors in workflow CGvCrnUyGHgB7fi8"
- "What's causing failures in my data processing workflow?"
DEBUGGING WORKFLOW:
1. Use get_workflow_executions() to discover problematic workflows
2. Use get_error_executions() for detailed error evaluation
3. Check error patterns to discover recurring issues
4. Review failed node details and trigger data
5. Use workflow_id and execution_id for targeted fixes
"""
Because the tool is complex to apprehend, we include a prompt, in the shape of an MCP resource, to summarise the target and features of the n8n workflow connected via webhook.
Now we will define the primary tool to get all of the lively workflows.
@mcp.tool()
def get_active_workflows() -> Dict[str, Any]:
"""
Get all lively workflows within the n8n instance.
Returns:
Dictionary with list of lively workflows and their details
"""
try:
logger.info("Fetching lively workflows")
result = monitor.get_active_workflows()
if "error" in result:
logger.error(f"Did not get workflows: {result['error']}")
else:
logger.info(f"Found {result.get('total_active', 0)} lively workflows")
return result
except Exception as e:
logger.error(f"Unexpected error: {str(e)}")
return {"error": str(e)}
The docstring, used to elucidate to the MCP server easy methods to use the tool, is comparatively temporary, as there aren't any input parameters for get_active_workflows().
Allow us to do the identical for the second tool to retrieve the last n executions.
@mcp.tool()
def get_workflow_executions(
limit: int = 50,
include_kpis: bool = True
) -> Dict[str, Any]:
"""
Get workflow execution logs and KPIs for the last N executions.
Args:
limit: Variety of executions to retrieve (default: 50)
include_kpis: Include calculated KPIs (default: true)
Returns:
Dictionary with execution data and KPIs
"""
try:
logger.info(f"Fetching the last {limit} executions")
result = monitor.get_workflow_executions(
limit=limit,
includes_kpis=include_kpis
)
if "error" in result:
logger.error(f"Did not get executions: {result['error']}")
else:
if "summary" in result:
summary = result["summary"]
logger.info(f"Executions: {summary.get('totalExecutions', 0)}, "
f"Failure rate: {summary.get('failureRate', 'N/A')}")
return result
except Exception as e:
logger.error(f"Unexpected error: {str(e)}")
return {"error": str(e)}
Unlike the previous tool, we want to specify the input data with the default value.
We now have now equipped Claude with these two tools that could be used as in the instance presented within the previous section.
What’s next? Deploy it in your machine!
As I desired to keep this text short, I'll only introduce these two tools.
For the remainder of the functionalities, I invite you to look at this whole tutorial on my YouTube channel.
I include step-by-step explanations on easy methods to deploy this in your machine with an in depth review of the source code shared on my GitHub (MCP Server) and n8n profile (workflow).
Conclusion
That is only the start!
We will consider this as version 1.0 of what can turn out to be an excellent agent to administer your n8n workflows.
What do I mean by this?
There may be a large potential for improving this solution, especially for the basis cause evaluation by:
- Providing more context to the agent using the sticky notes contained in the workflows
- Showing how good inputs and outputs look with evaluation nodes to assist Claude perform gap analyses
- Exploiting the opposite endpoints of the n8n API for more accurate analyses
Nonetheless, I don’t think I can, as a full-time startup founder and CEO, develop such a comprehensive tool by myself.
Subsequently, I desired to share that with the Towards Data Science and n8n community as an open-source solution available on my GitHub profile.
Need inspiration to start out automating with n8n?
On this blog, I actually have published multiple articles to share examples of workflow automations we've implemented for small, medium and huge operations.

The main target was mainly on logistics and provide chain operations with real case studies:
I even have a complete playlist on my YouTube Channel, Supply Science, with greater than 15 tutorials.

You'll be able to follow these tutorials to deploy the workflows I share on my n8n creator profile (linked within the descriptions) that cover:
- Process Automation for Logistics and Supply Chain
- AI-Powered Workflows for Content Creation
- Productivity and Language Learning
Be at liberty to share your questions within the comment sections of the videos.
Other examples of MCP Server Implementation
This just isn't my first implementation of MCP servers.
In one other experiment, I connected Claude Desktop with a Supply-Chain Network Optimisation tool.

In this instance, the n8n workflow is replaced by a FastAPI microservice hosting a linear programming algorithm.

The target is to find out the optimal set of factories to provide and deliver products to market at the bottom cost and with the smallest environmental footprint.

In the sort of exercise, Claude is doing an important job of synthesising and presenting results.
For more information, have a have a look at this Towards Data Science Article.
About Me
Let’s connect on Linkedin and Twitter. I'm a Supply Chain Engineer who uses data analytics to enhance logistics operations and reduce costs.
For consulting or advice on analytics and sustainable supply chain transformation, be at liberty to contact me via Logigreen Consulting.
In case you are all for Data Analytics and Supply Chain, have a look at my website.
