mirror of https://github.com/openclaw/openclaw.git
docs: rewrite sessions/memory section -- compaction, memory, and new memory-search page
This commit is contained in:
parent
6d9a7224aa
commit
3584a893e8
|
|
@ -1,98 +1,141 @@
|
|||
---
|
||||
summary: "Context window + compaction: how OpenClaw keeps sessions under model limits"
|
||||
summary: "How OpenClaw compacts long sessions to stay within model context limits"
|
||||
read_when:
|
||||
- You want to understand auto-compaction and /compact
|
||||
- You are debugging long sessions hitting context limits
|
||||
- You want to tune compaction behavior or use a custom context engine
|
||||
title: "Compaction"
|
||||
---
|
||||
|
||||
# Context Window & Compaction
|
||||
# Compaction
|
||||
|
||||
Every model has a **context window** (max tokens it can see). Long-running chats accumulate messages and tool results; once the window is tight, OpenClaw **compacts** older history to stay within limits.
|
||||
Every model has a **context window** -- the maximum number of tokens it can see
|
||||
at once. As a conversation grows, it eventually approaches that limit. OpenClaw
|
||||
**compacts** older history into a summary so the session can continue without
|
||||
losing important context.
|
||||
|
||||
## What compaction is
|
||||
## How compaction works
|
||||
|
||||
Compaction **summarizes older conversation** into a compact summary entry and keeps recent messages intact. The summary is stored in the session history, so future requests use:
|
||||
Compaction is a three-step process:
|
||||
|
||||
- The compaction summary
|
||||
- Recent messages after the compaction point
|
||||
1. **Summarize** older conversation turns into a compact summary.
|
||||
2. **Persist** the summary as a `compaction` entry in the session transcript
|
||||
(JSONL).
|
||||
3. **Keep** recent messages after the compaction point intact.
|
||||
|
||||
Compaction **persists** in the session’s JSONL history.
|
||||
After compaction, future turns see the summary plus all messages after the
|
||||
compaction point. The on-disk transcript retains the full history -- compaction
|
||||
only changes what gets loaded into the model context.
|
||||
|
||||
## Configuration
|
||||
## Auto-compaction
|
||||
|
||||
Use the `agents.defaults.compaction` setting in your `openclaw.json` to configure compaction behavior (mode, target tokens, etc.).
|
||||
Compaction summarization preserves opaque identifiers by default (`identifierPolicy: "strict"`). You can override this with `identifierPolicy: "off"` or provide custom text with `identifierPolicy: "custom"` and `identifierInstructions`.
|
||||
Auto-compaction is **on by default**. It triggers in two situations:
|
||||
|
||||
You can optionally specify a different model for compaction summarization via `agents.defaults.compaction.model`. This is useful when your primary model is a local or small model and you want compaction summaries produced by a more capable model. The override accepts any `provider/model-id` string:
|
||||
1. **Threshold maintenance** -- after a successful turn, when estimated context
|
||||
usage exceeds `contextWindow - reserveTokens`.
|
||||
2. **Overflow recovery** -- the model returns a context-overflow error. OpenClaw
|
||||
compacts and retries the request.
|
||||
|
||||
```json
|
||||
{
|
||||
"agents": {
|
||||
"defaults": {
|
||||
"compaction": {
|
||||
"model": "openrouter/anthropic/claude-sonnet-4-6"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
When auto-compaction runs you will see:
|
||||
|
||||
This also works with local models, for example a second Ollama model dedicated to summarization or a fine-tuned compaction specialist:
|
||||
- `Auto-compaction complete` in verbose mode
|
||||
- `/status` showing `Compactions: <count>`
|
||||
|
||||
```json
|
||||
{
|
||||
"agents": {
|
||||
"defaults": {
|
||||
"compaction": {
|
||||
"model": "ollama/llama3.1:8b"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
### Pre-compaction memory flush
|
||||
|
||||
When unset, compaction uses the agent's primary model.
|
||||
|
||||
## Auto-compaction (default on)
|
||||
|
||||
When a session nears or exceeds the model’s context window, OpenClaw triggers auto-compaction and may retry the original request using the compacted context.
|
||||
|
||||
You’ll see:
|
||||
|
||||
- `🧹 Auto-compaction complete` in verbose mode
|
||||
- `/status` showing `🧹 Compactions: <count>`
|
||||
|
||||
Before compaction, OpenClaw can run a **silent memory flush** turn to store
|
||||
durable notes to disk. See [Memory](/concepts/memory) for details and config.
|
||||
Before compacting, OpenClaw can run a **silent turn** that reminds the model to
|
||||
write durable notes to disk. This prevents important context from being lost in
|
||||
the summary. The flush is controlled by `agents.defaults.compaction.memoryFlush`
|
||||
and runs once per compaction cycle. See [Memory](/concepts/memory) for details.
|
||||
|
||||
## Manual compaction
|
||||
|
||||
Use `/compact` (optionally with instructions) to force a compaction pass:
|
||||
Use `/compact` in any chat to force a compaction pass. You can optionally add
|
||||
instructions to guide the summary:
|
||||
|
||||
```
|
||||
/compact Focus on decisions and open questions
|
||||
```
|
||||
|
||||
## Context window source
|
||||
## Configuration
|
||||
|
||||
Context window is model-specific. OpenClaw uses the model definition from the configured provider catalog to determine limits.
|
||||
### Compaction model
|
||||
|
||||
By default, compaction uses the agent's primary model. You can override this
|
||||
with a different model for summarization -- useful when your primary model is
|
||||
small or local and you want a more capable summarizer:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
compaction: {
|
||||
model: "openrouter/anthropic/claude-sonnet-4-6",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Reserve tokens and floor
|
||||
|
||||
- `reserveTokens` -- headroom reserved for prompts and the next model output
|
||||
(Pi runtime default: `16384`).
|
||||
- `reserveTokensFloor` -- minimum reserve enforced by OpenClaw (default:
|
||||
`20000`). Set to `0` to disable.
|
||||
- `keepRecentTokens` -- how many tokens of recent conversation to preserve
|
||||
during compaction (default: `20000`).
|
||||
|
||||
### Identifier preservation
|
||||
|
||||
Compaction summaries preserve opaque identifiers by default
|
||||
(`identifierPolicy: "strict"`). Override with:
|
||||
|
||||
- `"off"` -- no special identifier handling.
|
||||
- `"custom"` -- provide your own instructions via `identifierInstructions`.
|
||||
|
||||
### Memory flush
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
compaction: {
|
||||
memoryFlush: {
|
||||
enabled: true, // default
|
||||
softThresholdTokens: 4000,
|
||||
systemPrompt: "Session nearing compaction. Store durable memories now.",
|
||||
prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
The flush triggers when context usage crosses
|
||||
`contextWindow - reserveTokensFloor - softThresholdTokens`. It runs silently
|
||||
(the user sees nothing) and is skipped when the workspace is read-only.
|
||||
|
||||
## Compaction vs pruning
|
||||
|
||||
- **Compaction**: summarises and **persists** in JSONL.
|
||||
- **Session pruning**: trims old **tool results** only, **in-memory**, per request.
|
||||
| | Compaction | Session pruning |
|
||||
| ---------------- | ------------------------------ | -------------------------------- |
|
||||
| **What it does** | Summarizes older conversation | Trims old tool results |
|
||||
| **Persisted?** | Yes (in JSONL transcript) | No (in-memory only, per request) |
|
||||
| **Scope** | Entire conversation history | Tool result messages only |
|
||||
| **Frequency** | Once when threshold is reached | Every LLM call (when enabled) |
|
||||
|
||||
See [/concepts/session-pruning](/concepts/session-pruning) for pruning details.
|
||||
See [Session Pruning](/concepts/session-pruning) for pruning details.
|
||||
|
||||
## OpenAI server-side compaction
|
||||
|
||||
OpenClaw also supports OpenAI Responses server-side compaction hints for
|
||||
compatible direct OpenAI models. This is separate from local OpenClaw
|
||||
compaction and can run alongside it.
|
||||
OpenClaw also supports OpenAI Responses server-side compaction for compatible
|
||||
direct OpenAI models. This is separate from local compaction and can run
|
||||
alongside it:
|
||||
|
||||
- Local compaction: OpenClaw summarizes and persists into session JSONL.
|
||||
- Server-side compaction: OpenAI compacts context on the provider side when
|
||||
- **Local compaction** -- OpenClaw summarizes and persists into session JSONL.
|
||||
- **Server-side compaction** -- OpenAI compacts context on the provider side when
|
||||
`store` + `context_management` are enabled.
|
||||
|
||||
See [OpenAI provider](/providers/openai) for model params and overrides.
|
||||
|
|
@ -100,24 +143,40 @@ See [OpenAI provider](/providers/openai) for model params and overrides.
|
|||
## Custom context engines
|
||||
|
||||
Compaction behavior is owned by the active
|
||||
[context engine](/concepts/context-engine). The legacy engine uses the built-in
|
||||
[context engine](/concepts/context-engine). The built-in engine uses the
|
||||
summarization described above. Plugin engines (selected via
|
||||
`plugins.slots.contextEngine`) can implement any compaction strategy — DAG
|
||||
summaries, vector retrieval, incremental condensation, etc.
|
||||
`plugins.slots.contextEngine`) can implement any strategy -- DAG summaries,
|
||||
vector retrieval, incremental condensation, etc.
|
||||
|
||||
When a plugin engine sets `ownsCompaction: true`, OpenClaw delegates all
|
||||
compaction decisions to the engine and does not run built-in auto-compaction.
|
||||
|
||||
When `ownsCompaction` is `false` or unset, OpenClaw may still use Pi's
|
||||
built-in in-attempt auto-compaction, but the active engine's `compact()` method
|
||||
still handles `/compact` and overflow recovery. There is no automatic fallback
|
||||
to the legacy engine's compaction path.
|
||||
|
||||
If you are building a non-owning context engine, implement `compact()` by
|
||||
When `ownsCompaction` is `false` or unset, the built-in auto-compaction still
|
||||
runs, but the engine's `compact()` method handles `/compact` and overflow
|
||||
recovery. If you are building a non-owning engine, implement `compact()` by
|
||||
calling `delegateCompactionToRuntime(...)` from `openclaw/plugin-sdk/core`.
|
||||
|
||||
## Tips
|
||||
## Troubleshooting
|
||||
|
||||
- Use `/compact` when sessions feel stale or context is bloated.
|
||||
- Large tool outputs are already truncated; pruning can further reduce tool-result buildup.
|
||||
- If you need a fresh slate, `/new` or `/reset` starts a new session id.
|
||||
**Compaction triggers too often?**
|
||||
|
||||
- Check the model's context window -- small models compact more frequently.
|
||||
- High `reserveTokens` relative to the context window can trigger early
|
||||
compaction.
|
||||
- Large tool outputs accumulate fast. Enable
|
||||
[session pruning](/concepts/session-pruning) to reduce tool-result buildup.
|
||||
|
||||
**Context feels stale after compaction?**
|
||||
|
||||
- Use `/compact Focus on <topic>` to guide the summary.
|
||||
- Increase `keepRecentTokens` to preserve more recent conversation.
|
||||
- Enable the [memory flush](/concepts/memory) so durable notes survive
|
||||
compaction.
|
||||
|
||||
**Need a fresh start?**
|
||||
|
||||
- `/new` or `/reset` starts a new session ID without compacting.
|
||||
|
||||
For the full internal lifecycle (store schema, transcript structure, Pi runtime
|
||||
semantics), see
|
||||
[Session Management Deep Dive](/reference/session-management-compaction).
|
||||
|
|
|
|||
|
|
@ -0,0 +1,248 @@
|
|||
---
|
||||
title: "Memory Search"
|
||||
summary: "How OpenClaw memory search works -- embedding providers, hybrid search, MMR, and temporal decay"
|
||||
read_when:
|
||||
- You want to understand how memory_search retrieves results
|
||||
- You want to tune hybrid search, MMR, or temporal decay
|
||||
- You want to choose an embedding provider
|
||||
---
|
||||
|
||||
# Memory Search
|
||||
|
||||
OpenClaw indexes workspace memory files (`MEMORY.md` and `memory/*.md`) into
|
||||
chunks (~400 tokens, 80-token overlap) and searches them with `memory_search`.
|
||||
This page explains how the search pipeline works and how to tune it. For the
|
||||
file layout and memory basics, see [Memory](/concepts/memory).
|
||||
|
||||
## Search pipeline
|
||||
|
||||
```
|
||||
Query -> Embedding -> Vector Search ─┐
|
||||
├─> Weighted Merge -> Temporal Decay -> MMR -> Top-K
|
||||
Query -> Tokenize -> BM25 Search ──┘
|
||||
```
|
||||
|
||||
Both retrieval paths run in parallel when hybrid search is enabled. If either
|
||||
path is unavailable (no embeddings or no FTS5), the other runs alone.
|
||||
|
||||
## Embedding providers
|
||||
|
||||
The default `memory-core` plugin ships built-in adapters for these providers:
|
||||
|
||||
| Provider | Adapter ID | Auto-selected | Notes |
|
||||
| ---------- | ---------- | -------------------- | ----------------------------------- |
|
||||
| Local GGUF | `local` | Yes (first priority) | node-llama-cpp, ~0.6 GB model |
|
||||
| OpenAI | `openai` | Yes | `text-embedding-3-small` default |
|
||||
| Gemini | `gemini` | Yes | Supports multimodal (images, audio) |
|
||||
| Voyage | `voyage` | Yes | |
|
||||
| Mistral | `mistral` | Yes | |
|
||||
| Ollama | `ollama` | No (explicit only) | Local/self-hosted |
|
||||
|
||||
Auto-selection picks the first provider whose API key can be resolved. Set
|
||||
`memorySearch.provider` explicitly to override.
|
||||
|
||||
Remote embeddings require an API key for the embedding provider. OpenClaw
|
||||
resolves keys from auth profiles, `models.providers.*.apiKey`, or environment
|
||||
variables. Codex OAuth covers chat/completions only and does not satisfy
|
||||
embedding requests.
|
||||
|
||||
### Quick start
|
||||
|
||||
Enable memory search with OpenAI embeddings:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
memorySearch: {
|
||||
provider: "openai",
|
||||
model: "text-embedding-3-small",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Or use local embeddings (no API key needed):
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
memorySearch: {
|
||||
provider: "local",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Local mode uses node-llama-cpp and may require `pnpm approve-builds` to build
|
||||
the native addon.
|
||||
|
||||
## Hybrid search (BM25 + vector)
|
||||
|
||||
When both FTS5 and embeddings are available, OpenClaw combines two retrieval
|
||||
signals:
|
||||
|
||||
- **Vector similarity** -- semantic matching. Good at paraphrases ("Mac Studio
|
||||
gateway host" vs "the machine running the gateway").
|
||||
- **BM25 keyword relevance** -- exact token matching. Good at IDs, code symbols,
|
||||
error strings, and config keys.
|
||||
|
||||
### How scores are merged
|
||||
|
||||
1. Retrieve a candidate pool from each side (top
|
||||
`maxResults x candidateMultiplier`).
|
||||
2. Convert BM25 rank to a 0-1 score: `textScore = 1 / (1 + max(0, bm25Rank))`.
|
||||
3. Union candidates by chunk ID and compute:
|
||||
`finalScore = vectorWeight x vectorScore + textWeight x textScore`.
|
||||
|
||||
Weights are normalized to 1.0, so they behave as percentages. If either path is
|
||||
unavailable, the other runs alone with no hard failure.
|
||||
|
||||
### CJK support
|
||||
|
||||
FTS5 uses configurable trigram tokenization with a short-substring fallback so
|
||||
Chinese, Japanese, and Korean text is searchable. CJK-heavy text is weighted
|
||||
correctly during chunk-size estimation, and surrogate-pair characters are
|
||||
preserved during fine splits.
|
||||
|
||||
## Post-processing
|
||||
|
||||
After merging scores, two optional stages refine the result list:
|
||||
|
||||
### Temporal decay (recency boost)
|
||||
|
||||
Daily notes accumulate over months. Without decay, a well-worded note from six
|
||||
months ago can outrank yesterday's update on the same topic.
|
||||
|
||||
Temporal decay applies an exponential multiplier based on age:
|
||||
|
||||
```
|
||||
decayedScore = score x e^(-lambda x ageInDays)
|
||||
```
|
||||
|
||||
With the default half-life of 30 days:
|
||||
|
||||
| Age | Score retained |
|
||||
| -------- | -------------- |
|
||||
| Today | 100% |
|
||||
| 7 days | ~84% |
|
||||
| 30 days | 50% |
|
||||
| 90 days | 12.5% |
|
||||
| 180 days | ~1.6% |
|
||||
|
||||
**Evergreen files are never decayed** -- `MEMORY.md` and non-dated files in
|
||||
`memory/` (like `memory/projects.md`) always rank at full score. Dated daily
|
||||
files use the date from the filename.
|
||||
|
||||
**When to enable:** Your agent has months of daily notes and stale information
|
||||
outranks recent context.
|
||||
|
||||
### MMR re-ranking (diversity)
|
||||
|
||||
When search returns results, multiple chunks may contain similar or overlapping
|
||||
content. MMR (Maximal Marginal Relevance) re-ranks results to balance relevance
|
||||
with diversity.
|
||||
|
||||
How it works:
|
||||
|
||||
1. Start with the highest-scoring result.
|
||||
2. Iteratively select the next result that maximizes:
|
||||
`lambda x relevance - (1 - lambda) x max_similarity_to_already_selected`.
|
||||
3. Similarity is measured using Jaccard text similarity on tokenized content.
|
||||
|
||||
The `lambda` parameter controls the trade-off:
|
||||
|
||||
- `1.0` -- pure relevance (no diversity penalty).
|
||||
- `0.0` -- maximum diversity (ignores relevance).
|
||||
- Default: `0.7` (balanced, slight relevance bias).
|
||||
|
||||
**When to enable:** `memory_search` returns redundant or near-duplicate
|
||||
snippets, especially with daily notes that repeat similar information.
|
||||
|
||||
## Configuration
|
||||
|
||||
Both post-processing features and hybrid search weights are configured under
|
||||
`memorySearch.query.hybrid`:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
memorySearch: {
|
||||
query: {
|
||||
hybrid: {
|
||||
enabled: true,
|
||||
vectorWeight: 0.7,
|
||||
textWeight: 0.3,
|
||||
candidateMultiplier: 4,
|
||||
mmr: {
|
||||
enabled: true, // default: false
|
||||
lambda: 0.7,
|
||||
},
|
||||
temporalDecay: {
|
||||
enabled: true, // default: false
|
||||
halfLifeDays: 30,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
You can enable either feature independently:
|
||||
|
||||
- **MMR only** -- many similar notes but age does not matter.
|
||||
- **Temporal decay only** -- recency matters but results are already diverse.
|
||||
- **Both** -- recommended for agents with large, long-running daily note
|
||||
histories.
|
||||
|
||||
## Session memory search (experimental)
|
||||
|
||||
You can optionally index session transcripts and surface them via
|
||||
`memory_search`. This is gated behind an experimental flag:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
memorySearch: {
|
||||
experimental: { sessionMemory: true },
|
||||
sources: ["memory", "sessions"],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Session indexing is opt-in and runs asynchronously. Results can be slightly stale
|
||||
until background sync finishes. Session logs live on disk, so treat filesystem
|
||||
access as the trust boundary.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**`memory_search` returns nothing?**
|
||||
|
||||
- Check `openclaw memory status` -- is the index populated?
|
||||
- Verify an embedding provider is configured and has a valid key.
|
||||
- Run `openclaw memory index --force` to trigger a full reindex.
|
||||
|
||||
**Results are all keyword matches, no semantic results?**
|
||||
|
||||
- Embeddings may not be configured. Check `openclaw memory status --deep`.
|
||||
- If using `local`, ensure node-llama-cpp built successfully.
|
||||
|
||||
**CJK text not found?**
|
||||
|
||||
- FTS5 trigram tokenization handles CJK. If results are missing, run
|
||||
`openclaw memory index --force` to rebuild the FTS index.
|
||||
|
||||
## Further reading
|
||||
|
||||
- [Memory](/concepts/memory) -- file layout, backends, tools
|
||||
- [Memory configuration reference](/reference/memory-config) -- all config knobs
|
||||
including QMD, batch indexing, embedding cache, sqlite-vec, and multimodal
|
||||
|
|
@ -1,78 +1,187 @@
|
|||
---
|
||||
title: "Memory"
|
||||
summary: "How OpenClaw memory works (workspace files + automatic memory flush)"
|
||||
summary: "How OpenClaw memory works -- file layout, backends, search, and automatic flush"
|
||||
read_when:
|
||||
- You want the memory file layout and workflow
|
||||
- You want to understand memory search and backends
|
||||
- You want to tune the automatic pre-compaction memory flush
|
||||
---
|
||||
|
||||
# Memory
|
||||
|
||||
OpenClaw memory is **plain Markdown in the agent workspace**. The files are the
|
||||
source of truth; the model only "remembers" what gets written to disk.
|
||||
source of truth -- the model only "remembers" what gets written to disk.
|
||||
|
||||
Memory search tools are provided by the active memory plugin (default:
|
||||
`memory-core`). Disable memory plugins with `plugins.slots.memory = "none"`.
|
||||
|
||||
## Memory files (Markdown)
|
||||
## File layout
|
||||
|
||||
The default workspace layout uses two memory layers:
|
||||
The default workspace uses two memory layers:
|
||||
|
||||
- `memory/YYYY-MM-DD.md`
|
||||
- Daily log (append-only).
|
||||
- Read today + yesterday at session start.
|
||||
- `MEMORY.md` (optional)
|
||||
- Curated long-term memory.
|
||||
- If both `MEMORY.md` and `memory.md` exist at the workspace root, OpenClaw loads both (deduplicated by realpath so symlinks pointing to the same file are not injected twice).
|
||||
- **Only load in the main, private session** (never in group contexts).
|
||||
| Path | Purpose | Loaded at session start |
|
||||
| ---------------------- | ------------------------ | -------------------------- |
|
||||
| `memory/YYYY-MM-DD.md` | Daily log (append-only) | Today + yesterday |
|
||||
| `MEMORY.md` | Curated long-term memory | Yes (main DM session only) |
|
||||
|
||||
These files live under the workspace (`agents.defaults.workspace`, default
|
||||
`~/.openclaw/workspace`). See [Agent workspace](/concepts/agent-workspace) for the full layout.
|
||||
If both `MEMORY.md` and `memory.md` exist at the workspace root, OpenClaw loads
|
||||
both (deduplicated by realpath so symlinks are not injected twice). `MEMORY.md`
|
||||
is only loaded in the main, private session -- never in group contexts.
|
||||
|
||||
## Memory tools
|
||||
|
||||
OpenClaw exposes two agent-facing tools for these Markdown files:
|
||||
|
||||
- `memory_search` -- semantic recall over indexed snippets.
|
||||
- `memory_get` -- targeted read of a specific Markdown file/line range.
|
||||
|
||||
`memory_get` now **degrades gracefully when a file doesn't exist** (for example,
|
||||
today's daily log before the first write). Both the builtin manager and the QMD
|
||||
backend return `{ text: "", path }` instead of throwing `ENOENT`, so agents can
|
||||
handle "nothing recorded yet" and continue their workflow without wrapping the
|
||||
tool call in try/catch logic.
|
||||
These files live under the agent workspace (`agents.defaults.workspace`, default
|
||||
`~/.openclaw/workspace`). See [Agent workspace](/concepts/agent-workspace) for
|
||||
the full layout.
|
||||
|
||||
## When to write memory
|
||||
|
||||
- Decisions, preferences, and durable facts go to `MEMORY.md`.
|
||||
- Day-to-day notes and running context go to `memory/YYYY-MM-DD.md`.
|
||||
- If someone says "remember this," write it down (do not keep it in RAM).
|
||||
- This area is still evolving. It helps to remind the model to store memories; it will know what to do.
|
||||
- **Decisions, preferences, and durable facts** go to `MEMORY.md`.
|
||||
- **Day-to-day notes and running context** go to `memory/YYYY-MM-DD.md`.
|
||||
- If someone says "remember this," **write it down** (do not keep it in RAM).
|
||||
- If you want something to stick, **ask the bot to write it** into memory.
|
||||
|
||||
## Automatic memory flush (pre-compaction ping)
|
||||
## Memory tools
|
||||
|
||||
When a session is **close to auto-compaction**, OpenClaw triggers a **silent,
|
||||
agentic turn** that reminds the model to write durable memory **before** the
|
||||
context is compacted. The default prompts explicitly say the model _may reply_,
|
||||
but usually `NO_REPLY` is the correct response so the user never sees this turn.
|
||||
The active memory plugin owns the prompt/path policy for that flush; the
|
||||
default `memory-core` plugin writes to the canonical daily file under
|
||||
`memory/YYYY-MM-DD.md`.
|
||||
OpenClaw exposes two agent-facing tools:
|
||||
|
||||
This is controlled by `agents.defaults.compaction.memoryFlush`:
|
||||
- **`memory_search`** -- semantic recall over indexed snippets. Uses the active
|
||||
memory backend's search pipeline (vector similarity, keyword matching, or
|
||||
hybrid).
|
||||
- **`memory_get`** -- targeted read of a specific Markdown file or line range.
|
||||
Degrades gracefully when a file does not exist (returns empty text instead of
|
||||
an error).
|
||||
|
||||
## Memory backends
|
||||
|
||||
OpenClaw supports two memory backends that control how `memory_search` indexes
|
||||
and retrieves content:
|
||||
|
||||
### Builtin (default)
|
||||
|
||||
The builtin backend uses a per-agent SQLite database with optional extensions:
|
||||
|
||||
- **FTS5 full-text search** for keyword matching (BM25 scoring).
|
||||
- **sqlite-vec** for in-database vector similarity (falls back to in-process
|
||||
cosine similarity when unavailable).
|
||||
- **Hybrid search** combining BM25 + vector scores for best-of-both-worlds
|
||||
retrieval.
|
||||
- **CJK support** via configurable trigram tokenization with short-substring
|
||||
fallback.
|
||||
|
||||
The builtin backend works out of the box with no extra dependencies. For
|
||||
embedding vectors, configure an embedding provider (OpenAI, Gemini, Voyage,
|
||||
Mistral, Ollama, or local GGUF). Without an embedding provider, only keyword
|
||||
search is available.
|
||||
|
||||
Index location: `~/.openclaw/memory/<agentId>.sqlite`
|
||||
|
||||
### QMD (experimental)
|
||||
|
||||
[QMD](https://github.com/tobi/qmd) is a local-first search sidecar that
|
||||
combines BM25 + vectors + reranking in a single binary. Set
|
||||
`memory.backend = "qmd"` to opt in.
|
||||
|
||||
Key differences from the builtin backend:
|
||||
|
||||
- Runs as a subprocess (Bun + node-llama-cpp), auto-downloads GGUF models.
|
||||
- Supports advanced post-processing: reranking, query expansion.
|
||||
- Can index extra directories beyond the workspace (`memory.qmd.paths`).
|
||||
- Can optionally index session transcripts (`memory.qmd.sessions`).
|
||||
- Falls back to the builtin backend if QMD is unavailable.
|
||||
|
||||
QMD requires a separate install (`bun install -g https://github.com/tobi/qmd`)
|
||||
and a SQLite build that allows extensions. See the
|
||||
[Memory configuration reference](/reference/memory-config) for full setup.
|
||||
|
||||
## Memory search
|
||||
|
||||
When an embedding provider is configured, `memory_search` uses semantic vector
|
||||
search to find relevant notes even when the wording differs from the query.
|
||||
Hybrid search (BM25 + vector) is enabled by default when both FTS5 and
|
||||
embeddings are available.
|
||||
|
||||
For details on how search works -- embedding providers, hybrid scoring, MMR
|
||||
diversity re-ranking, temporal decay, and tuning -- see
|
||||
[Memory Search](/concepts/memory-search).
|
||||
|
||||
### Embedding provider auto-selection
|
||||
|
||||
If `memorySearch.provider` is not set, OpenClaw auto-selects the first available
|
||||
provider in this order:
|
||||
|
||||
1. `local` -- if `memorySearch.local.modelPath` is configured and exists.
|
||||
2. `openai` -- if an OpenAI key can be resolved.
|
||||
3. `gemini` -- if a Gemini key can be resolved.
|
||||
4. `voyage` -- if a Voyage key can be resolved.
|
||||
5. `mistral` -- if a Mistral key can be resolved.
|
||||
|
||||
If none can be resolved, memory search stays disabled until configured. Ollama
|
||||
is supported but not auto-selected (set `memorySearch.provider = "ollama"`
|
||||
explicitly).
|
||||
|
||||
## Additional memory paths
|
||||
|
||||
Index Markdown files outside the default workspace layout:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
memorySearch: {
|
||||
extraPaths: ["../team-docs", "/srv/shared-notes/overview.md"],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Paths can be absolute or workspace-relative. Directories are scanned
|
||||
recursively for `.md` files. Symlinks are ignored.
|
||||
|
||||
## Multimodal memory (Gemini)
|
||||
|
||||
When using `gemini-embedding-2-preview`, OpenClaw can index image and audio
|
||||
files from `memorySearch.extraPaths`:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
memorySearch: {
|
||||
provider: "gemini",
|
||||
model: "gemini-embedding-2-preview",
|
||||
extraPaths: ["assets/reference", "voice-notes"],
|
||||
multimodal: {
|
||||
enabled: true,
|
||||
modalities: ["image", "audio"],
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Search queries remain text, but Gemini can compare them against indexed
|
||||
image/audio embeddings. `memory_get` still reads Markdown only.
|
||||
|
||||
See the [Memory configuration reference](/reference/memory-config) for supported
|
||||
formats and limitations.
|
||||
|
||||
## Automatic memory flush
|
||||
|
||||
When a session is close to auto-compaction, OpenClaw runs a **silent turn** that
|
||||
reminds the model to write durable notes before the context is summarized. This
|
||||
prevents important information from being lost during compaction.
|
||||
|
||||
Controlled by `agents.defaults.compaction.memoryFlush`:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
compaction: {
|
||||
reserveTokensFloor: 20000,
|
||||
memoryFlush: {
|
||||
enabled: true,
|
||||
softThresholdTokens: 4000,
|
||||
systemPrompt: "Session nearing compaction. Store durable memories now.",
|
||||
prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
|
||||
enabled: true, // default
|
||||
softThresholdTokens: 4000, // how far below compaction threshold to trigger
|
||||
},
|
||||
},
|
||||
},
|
||||
|
|
@ -82,30 +191,38 @@ This is controlled by `agents.defaults.compaction.memoryFlush`:
|
|||
|
||||
Details:
|
||||
|
||||
- **Soft threshold**: flush triggers when the session token estimate crosses
|
||||
- **Triggers** when context usage crosses
|
||||
`contextWindow - reserveTokensFloor - softThresholdTokens`.
|
||||
- **Silent** by default: prompts include `NO_REPLY` so nothing is delivered.
|
||||
- **Two prompts**: a user prompt plus a system prompt append the reminder.
|
||||
- **One flush per compaction cycle** (tracked in `sessions.json`).
|
||||
- **Workspace must be writable**: if the session runs sandboxed with
|
||||
`workspaceAccess: "ro"` or `"none"`, the flush is skipped.
|
||||
- **Runs silently** -- prompts include `NO_REPLY` so nothing is delivered to the
|
||||
user.
|
||||
- **Once per compaction cycle** (tracked in `sessions.json`).
|
||||
- **Skipped** when the workspace is read-only (`workspaceAccess: "ro"` or
|
||||
`"none"`).
|
||||
- The active memory plugin owns the flush prompt and path policy. The default
|
||||
`memory-core` plugin writes to `memory/YYYY-MM-DD.md`.
|
||||
|
||||
For the full compaction lifecycle, see
|
||||
[Session management + compaction](/reference/session-management-compaction).
|
||||
For the full compaction lifecycle, see [Compaction](/concepts/compaction).
|
||||
|
||||
## Vector memory search
|
||||
## CLI commands
|
||||
|
||||
OpenClaw can build a small vector index over `MEMORY.md` and `memory/*.md` so
|
||||
semantic queries can find related notes even when wording differs. Hybrid search
|
||||
(BM25 + vector) is available for combining semantic matching with exact keyword
|
||||
lookups.
|
||||
| Command | Description |
|
||||
| -------------------------------- | ------------------------------------------ |
|
||||
| `openclaw memory status` | Show memory index status and provider info |
|
||||
| `openclaw memory search <query>` | Search memory from the command line |
|
||||
| `openclaw memory index` | Force a reindex of memory files |
|
||||
|
||||
Memory search adapter ids come from the active memory plugin. The default
|
||||
`memory-core` plugin ships built-ins for OpenAI, Gemini, Voyage, Mistral,
|
||||
Ollama, and local GGUF models, plus an optional QMD sidecar backend for
|
||||
advanced retrieval and post-processing features like MMR diversity re-ranking
|
||||
and temporal decay.
|
||||
Add `--agent <id>` to target a specific agent, `--deep` for extended
|
||||
diagnostics, or `--json` for machine-readable output.
|
||||
|
||||
For the full configuration reference -- including embedding provider setup, QMD
|
||||
backend, hybrid search tuning, multimodal memory, and all config knobs -- see
|
||||
[Memory configuration reference](/reference/memory-config).
|
||||
See [CLI: memory](/cli/memory) for the full command reference.
|
||||
|
||||
## Further reading
|
||||
|
||||
- [Memory Search](/concepts/memory-search) -- how search works, hybrid search,
|
||||
MMR, temporal decay
|
||||
- [Memory configuration reference](/reference/memory-config) -- all config knobs
|
||||
for providers, QMD, hybrid search, batch indexing, and multimodal
|
||||
- [Compaction](/concepts/compaction) -- how compaction interacts with memory
|
||||
flush
|
||||
- [Session Management Deep Dive](/reference/session-management-compaction) --
|
||||
internal session and compaction lifecycle
|
||||
|
|
|
|||
|
|
@ -1033,6 +1033,7 @@
|
|||
"concepts/session-pruning",
|
||||
"concepts/session-tool",
|
||||
"concepts/memory",
|
||||
"concepts/memory-search",
|
||||
"concepts/compaction"
|
||||
]
|
||||
},
|
||||
|
|
|
|||
Loading…
Reference in New Issue