Anthropic says it solved the long-running AI agent problem with a brand new multi-session Claude SDK

-



Agent memory stays an issue that enterprises wish to fix, as agents forget some instructions or conversations the longer they run. 

Anthropic believes it has solved this issue for its Claude Agent SDK, developing a two-fold solution that permits an agent to work across different context windows.

“The core challenge of long-running agents is that they need to work in discrete sessions, and every latest session begins with no memory of what got here before,” Anthropic wrote in a blog post. “Because context windows are limited, and since most complex projects can’t be accomplished inside a single window, agents need a strategy to bridge the gap between coding sessions.”

Anthropic engineers proposed a two-fold approach for its Agent SDK: An initializer agent to establish the environment, and a coding agent to make incremental progress in each session and leave artifacts for the subsequent.  

The agent memory problem

Since agents are built on foundation models, they continue to be constrained by the limited, although continually growing, context windows. For long-running agents, this might create a bigger problem, leading the agent to forget instructions and behave abnormally while performing a task. Enhancing agent memory becomes essential for consistent, business-safe performance. 

Several methods emerged over the past 12 months, all attempting to bridge the gap between context windows and agent memory. LangChain’s LangMem SDK, Memobase and OpenAI’s Swarm are examples of corporations offering memory solutions. Research on agentic memory has also exploded recently, with proposed frameworks like Memp and the Nested Learning Paradigm from Google offering latest alternatives to boost memory. 

Most of the current memory frameworks are open source and may ideally adapt to different large language models (LLMs) powering agents. Anthropic’s approach improves its Claude Agent SDK. 

How it really works

Anthropic identified that although the Claude Agent SDK had context management capabilities and “needs to be possible for an agent to proceed to do useful work for an arbitrarily very long time,” it was not sufficient. The corporate said in its blog post that a model like Opus 4.5 running the Claude Agent SDK can “fall wanting constructing a production-quality web app if it’s only given a high-level prompt, similar to 'construct a clone of claude.ai.'” 

The failures manifested in two patterns, Anthropic said. First, the agent tried to do an excessive amount of, causing the model to expire of context in the center. The agent then has to guess what happened and can’t pass clear instructions to the subsequent agent. The second failure occurs in a while, after some features have already been built. The agent sees progress has been made and just declares the job done. 

Anthropic researchers broke down the answer: Organising an initial environment to put the inspiration for features and prompting each agent to make incremental progress towards a goal, while still leaving a clean slate at the top. 

That is where the two-part solution of Anthropic's agent is available in. The initializer agent sets up the environment, logging what agents have done and which files have been added. The coding agent will then ask models to make incremental progress and leave structured updates. 

“Inspiration for these practices got here from knowing what effective software engineers do daily,” Anthropic said. 

The researchers said they added testing tools to the coding agent, improving its ability to discover and fix bugs that weren’t obvious from the code alone. 

Future research

Anthropic noted that its approach is “one possible set of solutions in a long-running agent harness.” Nevertheless, that is just the start stage of what could turn out to be a wider research area for a lot of within the AI space. 

The corporate said its experiments to spice up long-term memory for agents haven’t shown whether a single general-purpose coding agent works best across contexts or a multi-agent structure. 

Its demo also focused on full-stack web app development, so other experiments should concentrate on generalizing the outcomes across different tasks.

“It’s likely that some or all of those lessons could be applied to the sorts of long-running agentic tasks required in, for instance, scientific research or financial modeling,” Anthropic said. 



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x