A step-by-step tutorial showing how to use the d33pmemory API to give your Claude agent persistent memory. Covers installation, ingesting conversations, recalling context, and injecting memory into system prompts.
By default, every conversation with a Claude agent starts from scratch. No memory of past sessions, no knowledge of user preferences, no continuity. For simple chatbots this is fine. For agents that are supposed to know you โ your goals, your projects, your preferences โ it's a dealbreaker.
d33pmemory solves this with a simple REST API. You ingest conversation turns, and the engine extracts structured memories (facts, preferences, relationships, events). When the agent needs context, you recall relevant memories and inject them into the system prompt. The whole loop takes less than 200ms.
Here's how to wire it up in TypeScript in about 5 minutes.
Sign up at d33pmemory.com, create an agent from the dashboard, and copy your API key. Keep it in an environment variable.
# .env
D33PMEMORY_API_KEY=dm_live_your_key_hered33pmemory is a plain REST API. You can use fetch, axios, or any HTTP client. No SDK required (though one is coming). Here's a minimal TypeScript wrapper:
// lib/memory.ts
const BASE = "https://api.d33pmemory.com";
const KEY = process.env.D33PMEMORY_API_KEY!;
const headers = {
"Authorization": `Bearer ${KEY}`,
"Content-Type": "application/json",
};
export async function ingestConversation(
userId: string,
messages: { role: "user" | "assistant"; content: string }[]
) {
const res = await fetch(`${BASE}/v1/ingest`, {
method: "POST",
headers,
body: JSON.stringify({ userId, messages }),
});
if (!res.ok) throw new Error(`Ingest failed: ${res.status}`);
return res.json();
}
export async function recallMemory(userId: string, query: string) {
const res = await fetch(`${BASE}/v1/recall`, {
method: "POST",
headers,
body: JSON.stringify({ userId, query, limit: 10 }),
});
if (!res.ok) throw new Error(`Recall failed: ${res.status}`);
return res.json() as Promise<{ memories: { content: string; confidence: number }[] }>;
}After each conversation turn (or at the end of a session), send the message history to the ingest endpoint. d33pmemory will extract structured facts, preferences, and relationships and store them for later recall.
// After each conversation session ends:
await ingestConversation("user_123", [
{ role: "user", content: "I prefer TypeScript over Python for backend work." },
{ role: "assistant", content: "Got it, I'll use TypeScript in future examples." },
{ role: "user", content: "Also, I'm building a SaaS product called Taskflow." },
{ role: "assistant", content: "Happy to help with Taskflow!" },
]);The engine runs async extraction using Claude Haiku under the hood. Within a few seconds, the memories are stored and indexed for semantic recall.
Before each agent response, call recall with the user's latest message as the query. This performs a semantic search over all stored memories and returns the most relevant ones ranked by relevance and confidence.
const userMessage = "Can you help me design the auth flow?";
const { memories } = await recallMemory("user_123", userMessage);
const memoryContext = memories
.filter(m => m.confidence >= 0.7)
.map(m => m.content)
.join("\n");
console.log(memoryContext);
// โ User prefers TypeScript for backend work.
// โ User is building a SaaS product called Taskflow.Now wire the recalled memories into your Claude system prompt. The d33pmemory context compilation endpoint can format them for you, or you can format manually:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function agentResponse(userId: string, userMessage: string) {
// 1. Recall relevant memories
const { memories } = await recallMemory(userId, userMessage);
const memoryBlock = memories
.filter(m => m.confidence >= 0.6)
.map(m => `- ${m.content}`)
.join("\n");
// 2. Build system prompt with memory context
const systemPrompt = `You are a helpful assistant with persistent memory about this user.
What you know about this user:
${memoryBlock || "No prior memories yet."}
Use this context naturally in your responses. Don't mention that you have a memory system.`;
// 3. Call Claude
const response = await client.messages.create({
model: "claude-haiku-4-5",
max_tokens: 1024,
system: systemPrompt,
messages: [{ role: "user", content: userMessage }],
});
return response.content[0].type === "text" ? response.content[0].text : "";
}After Claude responds, ingest the full exchange so future sessions benefit from this conversation too:
const assistantReply = await agentResponse("user_123", userMessage);
// Ingest this turn for future memory
await ingestConversation("user_123", [
{ role: "user", content: userMessage },
{ role: "assistant", content: assistantReply },
]);With this setup, your Claude agent accumulates knowledge across sessions. It knows the user prefers TypeScript. It knows they're building Taskflow. It knows they've asked about auth before. The token cost? d33pmemory compresses 15,000 tokens of conversation history into ~142 tokens of structured context. Your agent stays fast, cheap, and contextually aware.
From here you can explore confidence thresholds, team/fleet memory (shared context across agents), and the proxy mode that intercepts any OpenAI-compatible LLM call and injects memory automatically โ no code changes required.