AI & Machine Learning |

Claude Agent SDK Memory Tool in 2026: Persistent Context That Survives /clear

How the Claude Agent SDK memory tool keeps context across sessions — file-based memory patterns, trade-offs vs prompt caching, and a working Python setup.

By SouvenirList

You’ve built a Claude-powered agent that works beautifully — until the user starts a new session and it forgets everything. Their preferences, the project context, the weird edge case you told it about yesterday: gone. You find yourself stuffing an ever-growing “context” string into every system prompt and watching your token bill climb.

The Claude Agent SDK memory tool solves exactly this problem. It gives the model a file-backed scratchpad it can read, write, and search across sessions, turning a stateless API into something that behaves like a long-running collaborator. This guide covers how it works in 2026, where it outperforms prompt-caching tricks, and a minimal working pattern.


TL;DR

  • The memory tool in the Claude Agent SDK is a file-based persistence layer the model can call directly — read_file, write_file, list_files, etc.
  • It’s not magic — you wire up the storage backend (local disk, S3, database). The SDK gives the model the tool interface; you decide what “a file” actually is.
  • It complements, not replaces, prompt caching. Caching makes repeated context cheap; memory makes different context persist across sessions.
  • Best fit: long-running coding agents, customer support bots with per-user history, research assistants, anything where “remember what we decided” is load-bearing.

Deep Dive: How the Memory Tool Actually Works

The Core Idea

The Claude Agent SDK exposes a memory tool via the standard tool-use protocol. From Claude’s perspective, memory is just another tool it can call — same mechanism as a web search or a shell command.

Under the hood, the SDK ships a default tool schema the model can invoke with operations like:

  • view — read the contents of a memory file
  • create — write a new memory file
  • str_replace — edit part of a file in place
  • insert — add content at a specific position
  • delete — remove a file
  • rename — move/rename a file

You register a backend that handles these operations. The backend can be anything — local filesystem, SQLite, Redis, Postgres, or an object store. The model neither knows nor cares.

Why This Is Different From “Just Putting It in the Prompt”

Stuffing past context into the system prompt has three known failure modes:

  1. Token cost grows linearly with history, even when most of it is irrelevant to the current turn
  2. Context windows fill up — the 1M-token window on Opus 4.7 sounds endless until you’re 40 sessions deep with verbose tool outputs
  3. The model can’t selectively forget — you can only trim on your side, which means guessing what it’ll need

The memory tool flips control: the model decides what to remember and when to look it up. It writes a note after resolving a tricky bug, then retrieves only the relevant file on the next session. You pay tokens for the lookup, not for carrying the entire history around.

Memory Tool vs Prompt Caching

This confuses a lot of developers, so let’s be blunt about it:

FeatureMemory ToolPrompt Caching
PurposePersistent state across sessionsCheap repetition of the same context within a 5-min window
LifetimeAs long as your backend keeps the file~5 minutes TTL
Cost modelTokens read/written on demand~90% discount on cached input tokens
Model controlModel decides what to save/readYou control what gets cached
Good forUser profiles, long-running agents, project memoryRAG context, system prompts, static docs

They’re complementary. A well-designed agent uses caching for the expensive-to-send system prompt and memory for the dynamic, session-spanning user context. See our Claude API prompt caching guide for the caching side.

A Minimal Python Setup

Here’s the skeleton of a local-filesystem memory backend. The SDK handles the tool wiring; you implement the storage.

from anthropic import Anthropic
from pathlib import Path

MEMORY_ROOT = Path("./agent_memory")
MEMORY_ROOT.mkdir(exist_ok=True)

def handle_memory_tool(tool_input: dict) -> str:
    command = tool_input["command"]
    path = MEMORY_ROOT / tool_input["path"].lstrip("/")

    if command == "view":
        return path.read_text() if path.exists() else "File not found"
    if command == "create":
        path.parent.mkdir(parents=True, exist_ok=True)
        path.write_text(tool_input["file_text"])
        return f"Created {path.name}"
    if command == "str_replace":
        text = path.read_text()
        new = text.replace(tool_input["old_str"], tool_input["new_str"], 1)
        path.write_text(new)
        return "Replaced"
    # ... handle insert, delete, rename
    return "Unknown command"

client = Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    tools=[{"type": "memory_20250818", "name": "memory"}],
    messages=[{"role": "user", "content": "Remember I prefer TypeScript over Python."}],
)

On subsequent turns, you feed the tool results back into the conversation, and Claude will proactively read memory when a user says something that might be context-dependent.

What to Store (and What Not To)

The real engineering is in the taxonomy of what goes in memory. Dump everything and retrieval becomes noisy; store too little and you’re back to the same problem.

A pattern that works well in production:

  • user/profile.md — stable preferences, role, tech stack
  • user/feedback.md — corrections the user has made (“stop doing X”)
  • project/{id}/decisions.md — key architectural choices
  • project/{id}/open_questions.md — things left unresolved

Avoid storing debuggable/derivable state: current file contents, git log output, current branch. That belongs in tools the model calls fresh each time.


Pros & Cons

ProsCons
Persistence across sessions without prompt bloatYou own the storage backend — no managed option out of the box
Model decides what’s worth rememberingPoor memory hygiene creates “noise memories” that degrade retrieval
Token-efficient for long-horizon agentsExtra round-trips for each memory read (latency)
Composes with prompt cachingNot free — reads and writes still cost tokens
Backend is pluggable (disk/S3/DB)Multi-user systems need careful key scoping (user-id prefix, etc.)

Who Should Use This

  • Builders of long-running coding agents (à la Claude Code) where remembering user conventions across days/weeks matters
  • Customer support / CRM bots that need per-user memory without re-fetching CRM state every turn
  • Research or writing assistants where “what did we decide about X last week” is a common question
  • Multi-agent systems where a supervisor agent coordinates by reading shared memory

Skip it if your use case is single-turn (classification, transformation, one-shot extraction) — you’ll just add latency and complexity.


FAQ

Can I use the memory tool without the Agent SDK?

Yes — the memory tool is a standard Claude API tool type. You can register it via the raw Messages API by including {"type": "memory_20250818", "name": "memory"} in the tools array. The Agent SDK just gives you ergonomic helpers and a reference filesystem backend.

How big should memory files be?

Aim for each file to be small and focused — a few hundred tokens per file. The model reads them whole, so a 50KB profile defeats the efficiency goal. Split by topic.

Does memory work with streaming?

Yes. Tool calls interleave with streaming output; when the model invokes view or create, you pause streaming, execute the backend call, and resume with the result. See our Claude structured output and streaming guide for streaming patterns.

What about privacy and user data?

Memory is only as secure as your backend. If you’re storing personal data, scope file paths by user ID, encrypt at rest, and add an explicit deletion endpoint. The model has no concept of “this is PII” — that’s your responsibility.

Does this replace vector databases for RAG?

No. Memory is for agent-authored state — notes the model decides to keep. RAG is for your corpus — documents you control the ingestion of. They solve different problems and often coexist. For vector-store selection see our pgvector vs Pinecone comparison.


Bottom Line

The Claude Agent SDK memory tool is the cleanest answer to the “my agent forgets everything” problem — but only if you treat memory hygiene as a first-class design concern. Combine it with prompt caching for cost, scope files narrowly for retrieval quality, and the model itself does most of the work.

Product recommendations are based on independent research and testing. We may earn a commission through affiliate links at no extra cost to you.

Tags: Claude Agent SDK memory tool AI agents prompt caching Claude API

Related Articles