Skip to main content

Agent with Persistent Memory

This tutorial shows how to give an agent a long-term memory that persists across sessions. You will:

  1. Create a Memory container and tag it for filtering.
  2. Write memory entries and observe the three deduplication outcomes: created, skipped, and updated.
  3. Upload a Document with structured reference information.
  4. Create an agent that retrieves from both memories and the document via knowledge_config, with write_memory_id enabled so the agent can persist new facts it learns.
  5. Run a generation and observe the model answering accurately from injected context — with no RAG logic in the prompt.
  6. Observe the agent writing a new fact to memory with source: "agent".
  7. Query the knowledge layer directly to see memory entries and document chunks side by side.

By the end you will understand how Memories, Documents, and the Knowledge search layer compose with agents to build stateful, context-aware AI assistants.

Prerequisites

  • SOAT running locally. Follow the Quick Start guide to bring the stack up with Docker Compose.
  • New to SOAT? Read Key Concepts to understand projects, agents, and the IAM model before diving in.
  • CLI installed and configured, or SDK set up. See CLI or SDK.
  • For production hardening (secrets, env vars), see Advanced Configuration.
  • Server is at http://localhost:5047.
  • Ollama running locally with a chat model available.
export SOAT_BASE_URL=http://localhost:5047

Step 1 — Log in as admin

Admin is the built-in superuser role. It bypasses policy evaluation entirely. See Users for full authentication details.

ADMIN_TOKEN=$(soat login-user --username admin --password Admin1234! | jq -r '.token')
export SOAT_TOKEN=$ADMIN_TOKEN

Step 2 — Create a project

Every resource in SOAT lives inside a project. Create one to hold the memory and agent.

PROJECT_ID=$(soat create-project --name "Support Demo" | jq -r '.id')
echo "PROJECT_ID: $PROJECT_ID"

Step 3 — Create an AI provider

Set up a local AI provider backed by Ollama. This tutorial uses a local Ollama provider so it can run without external credentials. To connect xAI, OpenAI, Anthropic, or Amazon Bedrock instead, see Connect Third-Party LLMs.

AI_PROVIDER_ID=$(soat create-ai-provider \
--project-id "$PROJECT_ID" \
--name "Local Ollama" \
--provider "ollama" \
--default-model "qwen2.5:0.5b" | jq -r '.id')
echo "AI_PROVIDER_ID: $AI_PROVIDER_ID"

Step 4 — Create a memory

A Memory is a named container that holds a collection of text entries. You can attach tags to a memory for later filtering — useful when an agent should search only a subset of all memories in a project.

MEMORY_ID=$(soat create-memory \
--project-id "$PROJECT_ID" \
--name "Alice Profile" \
--description "Facts about customer Alice gathered during support interactions" \
--tags '["alice","customer"]' | jq -r '.id')
echo "MEMORY_ID: $MEMORY_ID"

Step 5 — Write memory entries

Memory entries are the individual facts stored inside a memory. Every write request goes through a semantic deduplication algorithm that compares the new content against existing entries:

  • created (HTTP 201) — no similar entry exists; the new fact is stored.
  • skipped (HTTP 200) — a near-identical entry already exists (similarity ≥ duplicate_threshold, default 0.95); the new content is discarded.
  • updated (HTTP 200) — an entry is similar but not identical (similarity ≥ update_threshold, default 0.75 and < duplicate_threshold); the existing entry is replaced with the richer version.

5a — First entry (action: created)

A genuinely new fact. No similar entry exists, so it is stored.

soat create-memory-entry \
--memory-id "$MEMORY_ID" \
--content "Alice prefers email over phone calls for all support communication"
# → { "action": "created", ... }

5b — Near-duplicate (action: skipped)

The content is almost identical to 5a. The similarity score exceeds duplicate_threshold (0.95), so the write is silently ignored and the existing entry is unchanged.

soat create-memory-entry \
--memory-id "$MEMORY_ID" \
--content "Alice prefers email over phone calls"
# → { "action": "skipped", ... }

5c — Improved version (action: updated)

The content is related but adds new detail (similarity between 0.75 and 0.95). The existing entry is replaced with the richer version, keeping memory clean and up to date.

soat create-memory-entry \
--memory-id "$MEMORY_ID" \
--content "Alice prefers email, especially for billing inquiries; she checks it twice a day"
# → { "action": "updated", ... }

5d — Second distinct fact (action: created)

An unrelated fact is added. No existing entry is similar, so it is stored as a new entry.

soat create-memory-entry \
--memory-id "$MEMORY_ID" \
--content "Alice's fiscal year ends in March; she starts renewal discussions in January"
# → { "action": "created", ... }

Step 6 — List entries to verify

After the four writes, the memory holds exactly two entries — the skipped near-duplicate was discarded and the improved version replaced the original.

soat list-memory-entries --memory-id "$MEMORY_ID" | jq '[.[] | .content]'
# [
# "Alice prefers email, especially for billing inquiries; she checks it twice a day",
# "Alice's fiscal year ends in March; she starts renewal discussions in January"
# ]

Step 7 — Upload a support-policy document

A Document is a text file indexed for semantic search. Here we store Alice's account support policy — structured reference material that the agent should consult alongside the memory entries written in Step 5.

The path field gives the document a logical location inside the project (similar to a file path). We will use /alice/support-policy.txt so we can later filter the entire /alice/ subtree with a single document_paths prefix.

DOC_ID=$(soat create-document \
--project-id "$PROJECT_ID" \
--path "/alice/support-policy.txt" \
--content "Alice Corp Support Policy: All priority-1 incidents must receive an initial response within 2 hours. Priority-2 incidents within 8 hours. Refunds are approved automatically for outages exceeding 4 hours. Alice Corp is entitled to a dedicated support engineer during business hours (9 AM–6 PM EST)." \
| jq -r '.id')
echo "DOC_ID: $DOC_ID"

Step 8 — Create an agent with knowledge_config

The knowledge_config field on an agent tells SOAT which memories and documents to search before every generation. The search query is automatically derived from the last user message — no explicit RAG logic needed in the prompt.

The fields you can set in knowledge_config:

FieldDescription
memory_idsSearch specific memories by ID
memory_tagsSearch memories whose tags match (supports glob patterns)
document_pathsInclude chunks from documents whose path starts with the given prefix
document_idsInclude chunks from specific documents by ID
min_scoreMinimum cosine similarity (0–1) for a result to be injected
limitMaximum number of results to inject
write_memory_idID of a memory the agent can write to; enables the write_memory tool automatically during generation

Here we combine the memory from Step 4 with the document uploaded in Step 7 so the agent can draw on both personal customer facts and the structured support policy.

AGENT_ID=$(soat create-agent \
--project-id "$PROJECT_ID" \
--ai-provider-id "$AI_PROVIDER_ID" \
--name "Support Agent" \
--instructions "You are a helpful customer support assistant. Use the provided knowledge context to answer questions accurately and concisely. When you learn new facts about a customer, use the write_memory tool to persist them." \
--knowledge-config '{"memory_ids":["'"$MEMORY_ID"'"],"document_paths":["/alice/"],"limit":5,"write_memory_id":"'"$MEMORY_ID"'"}' \
| jq -r '.id')
echo "AGENT_ID: $AGENT_ID"

Step 9 — Run a generation

Send a user message that requires combining personal customer facts (from memory) with the support policy (from the document). Before calling the model, SOAT searches both sources using the user message as the query and injects all matching results as a system message.

soat create-agent-generation \
--agent-id "$AGENT_ID" \
--messages '[{"role":"user","content":"Alice has a P1 outage since 3 hours ago. How should we handle it and how do we best reach her?"}]' \
| jq '{status: .status, output: .output.content}'

Expected shape:

{
"status": "completed",
"output": "Since Alice has a P1 outage, an initial response should have been sent within 2 hours per the support policy ... Contact her by email, which she checks twice a day and prefers for all support communication ..."
}

The model combines two distinct knowledge sources:

  • From memory — Alice prefers email; she checks it twice a day.
  • From the document — P1 incidents require a response within 2 hours; outages over 4 hours trigger automatic refunds.

Neither fact appeared in the user message.


Step 10 — Observe the agent writing to memory

When write_memory_id is set on the agent's knowledge_config, SOAT automatically makes a write_memory tool available during generation. If the model decides to call it (for example, because the user reveals new information), the fact is persisted via the same deduplication algorithm used for manual writes.

Send a message that introduces a new fact not yet in memory:

soat create-agent-generation \
--agent-id "$AGENT_ID" \
--messages '[{"role":"user","content":"Just so you know, Alice moved to the West Coast and is now in the PT timezone."}]' \
| jq '{status: .status, output: .output.content}'

After the generation completes, list the memory entries and look for any with source == "agent":

soat list-memory-entries --memory-id "$MEMORY_ID" \
| jq '[.[] | select(.source == "agent") | {content: .content, source: .source}]'

If the model called write_memory, you will see an entry with "source": "agent" containing the timezone fact. The write goes through the same deduplication algorithm — subsequent mentions of Alice's timezone will be deduplicated automatically.


Step 11 — Query the knowledge layer directly

The Knowledge endpoint is the same search layer the agent uses internally. Pass both memory_ids and document_paths to see exactly which chunks — from both sources — would be injected for a given question.

soat search-knowledge \
--project-id "$PROJECT_ID" \
--query "P1 outage response and how to reach Alice" \
--memory-ids '["'"$MEMORY_ID"'"]' \
--document-paths '["/alice/"]' \
| jq '.results[] | {score: .score, source_type: .source_type, content: .content}'

Expected output — note the two different source_type values:

{ "score": 0.69, "source_type": "document", "content": "Alice Corp Support Policy: All priority-1 incidents must receive an initial response within 2 hours ..." }
{ "score": 0.62, "source_type": "memory", "content": "Alice prefers email, especially for billing inquiries; she checks it twice a day" }
{ "score": 0.50, "source_type": "memory", "content": "Alice's fiscal year ends in March; she starts renewal discussions in January" }

Each result shows a score (cosine similarity) so you can tune min_score and limit on knowledge_config with confidence.


What's next

  • Tag-based filtering — create separate memories per customer (e.g. tags: ["bob"]) and set memory_tags: ["alice"] on the agent to ensure each agent only retrieves the right customer's facts.
  • Agent-sourced entries — set source: "agent" when writing entries programmatically from an agent's output to distinguish automated facts from manually curated ones.
  • Document subtrees — use document_paths prefixes like /alice/ to scope retrieval to one customer's documents, keeping context focused and token-efficient.
  • Adjust dedup thresholds — lower update_threshold to be more aggressive about replacing stale facts, or raise duplicate_threshold to allow more near-duplicate entries to coexist.