Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel

-



For those who’ve been constructing with LLMs for some time, you’ve probably lived through this loop again and again: you are taking your time crafting an important prompt that results in excellent results, after which a couple of days later you would like the identical behavior again, so you begin prompting from scratch again. After some repetitions you possibly realize the inefficiencies, so that you’re going to store the prompt’s template somewhere so that you would be able to retrieve it for later, but even then it’s worthwhile to find your prompt, paste it in, and tweak it for this particular conversation. It’s so tedious.

That is what I call the . And it’s a fundamentally broken workflow.

Claude Skills are Anthropic’s answer to this “reusable prompt” problem, . Beyond just saving you from repetitive prompting, they introduce a fundamentally different approach to context management, token economics, and the architecture of AI-powered development workflows.

On this post, I’ll unpack what skills and subagents actually are, how they differ from traditional MCP, and where the skill / MCP / subagent mix is heading.


What are ?

At their core, skills are reusable instruction sets that AI Agents, like Claude, can routinely access after they’re relevant to a conversation. You write a skill.md file with some metadata and a body of instructions, drop it right into a .claude/skills/ directory, and Claude takes it from there.

Their looks

In its simplest form, a skill is a markdown file with a reputation, description, and body of instructions, like this:

---

name: 

description: 

---

Their strenghts

The most important strength of skills lies within the auto-invocation. When starting a brand new conversation, the agent only reads each skill’s name and outline, to save lots of on tokens. When it determines a skill is relevant, it loads the body. If the body references additional files or folders, the agent reads those too, but only when it decides they’re needed. In essence, skills are lazy-loaded context. The agent doesn’t devour the complete instruction set upfront. It progressively discloses information to itself, pulling in just what’s needed for the present step.

This progressive disclosure operates across three levels, each with its own context budget:

  1. Metadata (loaded at startup): The skill’s name (max 64 characters) and outline (max 1,024 characters). This costs roughly ~100 tokens per skill, negligible overhead even with tons of of skills registered.
  2. Skill body (loaded on invocation): The complete instruction set inside skill.md, as much as ~5,000 tokens. This only enters the context window when the agent determines the skill is relevant.
  3. Referenced files (loaded on demand): Additional markdown files, folders, or scripts inside the skill directory. There’s practically no limit here, and the agent reads these on demand, only when the instructions reference them and the present task requires it.
Skills load context progressively across three levels, skill summary (metadata), body (detailed instructions), and referenced files (additional context), each triggered only when needed.

Insight: Skills are reusable, lazy-loaded, and auto-invoked instruction sets that use progressive disclosure across three levels: metadata, body, and referenced files. This minimizes the upfront cost by stopping to dump every thing into the context window ( you, MCP 👀).


The issue in token economics

Cost aspects

It’s no secret; an agent’s context window space isn’t free, and filling it has compounding costs. Every token in your context window costs you in 3 ways:

  1. Actual cost: the apparent one is that you just’re paying per token. This might be directly through API usage, or not directly through usage limits.
  2. Latency: you’re also paying along with your time, since more input tokens means slower responses. Something that doesn’t scale well with the length of the context window (~attention mechanism).
  3. Quality: finally, there’s also a degradation in quality on account of long context windows. LLMs demonstrably perform worse when their context is cluttered with irrelevant information.

The costly overhead of MCPs

Let’s put this into perspective, through a fast back-of-the-envelope calculation. My go-to MCP picks for programming are:

  • AWS for infrastructure deployment. Three servers (aws-mcp, aws-official, aws-docs) combined yield a value of around ~8,500 tokens (13 tools).
  • Context7 for documentation. Metadata is around ~750 tokens (2 tools).
  • Figma for bringing design to frontend development. Metadata is around ~500 tokens (2 tools).
  • GitHub for searching code in other repositories. Metadata is around ~2,000 tokens (26 tools).
  • Linear for project management. Metadata is around ~3,250 tokens (33 tools).
  • Serena for code search. Metadata is around ~4,500 tokens (26 tools).
  • Sentry for error tracking. Metadata is around ~12,500 tokens (22 tools).

That’s a complete of roughly ~32,000 tokens of tool metadata, loaded into , whether you’re interacting with the tool or not.

To place a dollar figure on this: Claude Opus 4.6 charges $5 per million input tokens. Those 32K tokens of idle MCP metadata add $0.16 to each message you send. That sounds small, until you realize that even a straightforward 5-message conversation already adds $0.8 in pure overhead. And most developers don’t send just 5 messages; add some short clarifications and context-gathering questions and also you quickly reach 10s if not 100s of messages. Let’s say on average you send 50 messages a day over a 20-day work month, that’s $8/day, ~$160/month* in , only for tool descriptions sitting in context. And that’s before you account for the latency and quality impact.

*A small asterisk: most models charge significantly less for cached input tokens (90% discount). An asterisk to this asterisk is that a few of them charge extra when enabling caching, they usually don’t at all times enable (API) caching by default (cough cough).

The fee-effective approach of skills

The loading patttern of Skills fundamentally change all three cost aspects. On the outset, the agent only sees each skill’s name and a brief description, roughly ~100 tokens per skill. Like this, I could register 300 skills and still devour fewer tokens than my MCP setup does. The complete instruction body (~5,000 tokens) only loads when the agent decides it’s relevant, and referenced files will only load when the present step needs them.

In practice, a typical conversation might invoke one or two skills while the remaining remain invisible to the context window. That’s the important thing difference: MCP cost scales with the variety of tools (across all servers), while skills’ cost scales more closely with actual usage.

MCP loads all metadata upfront. Skills load context only when relevant, a difference that compounds with every message.

Insight: MCP is “eager” and loads all tool metadata upfront no matter whether it’s used. Skills are “lazy” and cargo context progressively and only when relevant. The difference matters for cost, latency, and output quality.

Wait, that’s misleading? Skills and MCP are two completely various things!

If the above reads like skills are the brand new and higher MCPs, then allow me to correct that framing. The intent was to zoom in on their loading patterns and the impact they’ve on token consumption. Functionally, they’re quite different.

MCP (Model Context Protocol) is an open standard that offers any LLM the flexibility to interact with external applications. Before MCP, connecting M models to N tools required M * N custom integrations. MCP collapses that to M + N: each model implements the protocol once, each tool exposes it once, they usually all interoperate. It’s a straightforward infrastructural change, nevertheless it’s genuinely powerful (no wonder it took the world by storm).

Skills, then again, are somewhat “glorified prompts”, and I mean that in one of the best possible way. They provide an agent expertise and direction on how one can approach a task, what conventions to follow, when to make use of which tool, and how one can structure its output. They’re reusable instruction sets fetched on-demand when relevant, nothing more, nothing less.

Insight: MCP gives an agent capabilities (the “what”). Skills give it expertise (the “how”) and thus they’re complementary.

Here’s an example to make this concrete. Say you connect GitHub’s MCP server to your agent. MCP gives the agent the flexibility to create pull requests, list issues, and search repositories. Nevertheless it doesn’t tell the agent, for instance, how your team structures PRs, that you just at all times include a testing section, that you just tag by change type, that you just reference the Linear ticket within the title. That’s what a skill does. The MCP provides the tools, the skill provides the playbook.

So, when earlier I showed that skills load context more efficiently than MCP, the true takeaway isn’t “use skills as an alternative of MCP”, it’s that lazy-loading as a pattern works. Hence, it’s price asking: why can’t MCP tool access be lazy-loaded too? That’s where subagents are available in.


Subagents: better of each worlds

Subagents are specialized with their very own isolated context window and tools connected. Two properties make them powerful:

  • Isolated context: A subagent starts with a clean context window, pre-loaded with its own system prompt and only the tools assigned to it. Every part it reads, processes, and generates stays in its own context, the most important agent only sees the outcome.
  • Isolated tools: Each subagent might be equipped with its own set of MCP servers and skills. The most important agent doesn’t have to find out about (or pay for) tools it never directly uses.

Once a subagent finishes its task, its entire context is discarded. The tool metadata, the intermediate reasoning, the API responses: all gone. Only the result flows back to the most important agent. This is definitely an important thing. Not only will we avoid bloating the most important agent’s context with unnecessary tool metadata, we also prevent unnecessary reasoning tokens from polluting the context. As an illustrative example, imagine a subagent that researches a library’s API. It’d search across multiple documentation sources, read through dozens of pages, and take a look at several queries before finding the suitable answer. You continue to pay for the subagent’s own token usage, but all of that intermediate work, the dead ends, the irrelevant pages, the search queries, gets discarded once the subagent finishes. The important thing profit is that none of it compounds into the most important agent’s context, so every subsequent message in your conversation stays clean and low-cost.

This implies you’ll be able to design your setup in order that MCP servers are only accessible through specific subagents, never loaded on the most important agent in any respect. As a substitute of carrying ~32,000 tokens of tool metadata in every message, the most important agent carries nearly zero. When it must open a pull request, it spins up a GitHub subagent, creates the PR, and returns the link. Just like skills being lazy-loaded , subagents are lazy-loaded : the most important agent knows what specialists it may call on, and only spins one up when a task demands it.

A practical example

Let’s make this tangible. One workflow I exploit every day is a “feature branch wrap-up” that automates most of a really tedious a part of my development cycle: opening a pull request. Here’s how skills, MCP, and subagents play together.

After the most important agent and I finish the coding work, I ask it to wrap up the feature branch. The most important agent doesn’t handle this itself; it delegates the whole PR workflow to a dedicated subagent. This subagent is provided with the GitHub MCP server and a change-report skill that defines how my team structures PRs. Its skill.md looks roughly like this:

---
name: change-report
description: Use when generating a change report for a PR.
   Defines the team's PR structure, categorization rules, and formatting
   conventions.
---

1. Ensure that there aren't any staging changes left, otherwise report back to 
   the most important agent
2. Run `git diff dev...HEAD --stat` and `git log dev..HEAD --oneline`
   to assemble all changes on this feature branch.
3. Analyze the diff and categorize probably the most crucial changes by their type
   (latest features, refactors, bug fixes, or config changes).
4. Generate a structured change report following the template
   in `pr-template.md`.
5. Open the PR via GitHub MCP, populating the title and body from
   the generated report.
6. Answer with the PR link.

The pr-template.md file in the identical directory defines my team’s PR structure: sections for summary, changes breakdown, and testing notes. That is level 3 of progressive disclosure: the subagent only reads it when step 4 tells it to.

Here’s what makes this setup work. The skill provides the expertise on how my team reports on changes, the GitHub MCP provides the potential to really create the PR, and the subagent provides the context boundary to perform all of this work. The most important agent, then again, only calls the subagent, waits for it to finish, and gets either a confirmation back or a message of what went unsuitable.

The PR workflow in motion: the most important agent delegates the whole PR process to a subagent equipped with a change-report skill and GitHub MCP access.

Insight: skills, MCPs, and subagents work in harmony. The skill provides expertise and instruction, MCP provides the potential, the subagent provides the context boundary (keeping the most important agent’s context clean).


The larger picture

Within the early days of LLMs, the race was about higher models: fewer hallucinations, sharper reasoning, more creative output. That race hasn’t stopped completely, but the middle of gravity has actually shifted. MCP and Claude Code were genuinely revolutionary. Upgrading Claude Sonnet from 3.5 to three.7 truthfully was not. The incremental model improvements we’re getting today matter far lower than the infrastructure we construct around them. Skills, subagents, and multi-agent orchestration are all a part of this shift: from “how will we make the model smarter” to “how will we get probably the most value out of what’s already here”.

Insight: the worth in AI development has shifted from higher models to raised infrastructure. Skills, subagents, and multi-agent orchestration aren’t just developer experience improvements; they’re the architecture that makes agentic AI economically and operationally viable at scale.

Where we’re today

Skills solve the prompt engineering hamster wheel by turning your best prompts into reusable, auto-invoked instruction sets. Subagents solve the context bloat problem by isolating tool access and intermediate reasoning into dedicated employees. Together, they make it possible to codify your expertise once and have it routinely applied across every future interaction. That is what engineering teams following the state-of-the-practice already do with documentation, style guides, and runbooks. Skills and subagents just make those artifacts machine-readable.

The subagent pattern can be unlocking multi-agent parallelism. As a substitute of 1 agent working through tasks sequentially, you’ll be able to spin up multiple subagents concurrently, have them work independently, and collect their results. Anthropic’s own multi-agent research system already does this: Claude Opus 4.6 orchestrates while Claude Sonnet 4.6 subagents execute in parallel. This naturally results in heterogeneous model routing, where an expensive frontier model orchestrates and plans, while smaller, cheaper models handle execution. The orchestrator reasons, the employees execute. This may dramatically reduce costs while maintaining output quality.

There’s a very important caveat here. Where parallelism works well for tasks, it gets much harder for tasks that touch shared state. Say, for instance, you’re spinning up a backend and a frontend subagent in parallel. The backend agent refactors an API endpoint, while the frontend agent, working from a snapshot taken before that change, generates code that calls the old endpoint. Neither agent is unsuitable in isolation, but together they produce an inconsistent result. It is a classic concurrency problem, coming from the AI workflows of the near-future, which thus far stays an open problem.

Where it’s heading

I expect skill composition to develop into more sophisticated. Today, skills are relatively flat: a markdown file with optional references. However the architecture naturally supports layered skills that reference other skills, creating something like an inheritance hierarchy of experience. Think a base “code review” skill prolonged by language-specific variants, further prolonged by team-specific conventions.

Most multi-agent systems today are strictly hierarchical: a most important agent delegates to a subagent, the subagent finishes, and control returns. There’s currently not much peer-to-peer collaboration between subagents yet. Anthropic’s recently launched “agent teams” feature for Opus 4.6 is an early step towards this, allowing multiple agents to coordinate directly reasonably than routing every thing through an orchestrator. On the protocol side, Google’s A2A (Agent-to-Agent Protocol) could standardize this pattern across providers; where MCP handles agent-to-tool communication, A2A would handle agent-to-agent communication. That said, A2A’s adoption has been slow in comparison with MCP’s explosive growth. One to observe, not one to bet on yet.

Agents will develop into the brand new functions

There’s a broader abstraction emerging here that’s price stepping back to understand. Andrej Karpathy’s famous tweet captured something real about how we interact with LLMs. But skills and subagents take this abstraction one level further: agents have gotten the brand new functions.

A subagent is a self-contained unit of labor: it takes an input (a task description), has its own internal state (context window), uses specific tools (MCP servers), follows specific instructions (skills), and returns an output. It could possibly be called from multiple places, it’s reusable, and it’s composable. That’s a function. The most important agent becomes the execution thread: orchestrating, branching, delegating, and synthesizing results from specialized employees.

Apart from the analogy, it may have the identical practical implications that functions had for software engineering. Isolation limits the blast radius when an agent fails, reasonably than corrupting the whole system, and failures might be caught through try-except mechanisms. Specialization means each agent might be optimized for its specific task. Composability means you’ll be able to construct increasingly complex workflows from easy, testable parts. And observability follows naturally; since each agent is a discrete unit with clear inputs and outputs, tracing becomes inspecting a call stack reasonably than looking at a 200K-token context dump.

A subagent maps on to a function: input, internal state, tools, instructions, and output. The most important agent is the execution thread.

Conclusion

Skills appear like easy “reusable prompts” on the surface, but they really represent a thoughtful answer to a few of the hardest problems in AI tooling: context management, token efficiency, and the gap between raw capability and domain expertise.

For those who haven’t experimented with skills yet, start small. Pick your most-repeated prompting pattern, extract it right into a skill.md, and see the way it changes your workflow. Once that clicks, take the subsequent step: discover which MCP tools don’t have to continue to exist your most important agent, or which subprocesses require lots of reasoning that’s used after you discover the reply, and scope them to dedicated subagents as an alternative. You’ll be surprised how much cleaner your setup becomes when each agent only carries what it actually needs.

Key insights from this post

  • Skills are reusable, lazy-loaded, and auto-invoked instruction sets that use progressive disclosure across three levels: metadata, body, and referenced files. This minimizes the upfront cost by stopping to dump every thing into the context window ( you, MCP 👀).
  • MCP is “eager” and loads all tool metadata upfront no matter whether it’s used. Skills are “lazy” and cargo context progressively and only when relevant. The difference matters for cost, latency, and output quality.
  • MCP gives an agent capabilities (the “what”). Skills give it expertise (the “how”) and thus they’re complementary.
  • Skills, MCPs, and subagents work in harmony. The skill provides expertise and instruction, MCP provides the potential, the subagent provides the context boundary (keeping the most important agent’s context clean).
  • The worth in AI development has shifted from higher models to raised infrastructure. Skills, subagents, and multi-agent orchestration aren’t just developer experience improvements; they’re the architecture that makes agentic AI economically and operationally viable at scale.

Final insight: The prompt engineering hamster wheel is optional. It’s time to step off.


ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x