docs: rewrite sessions/memory section -- compaction, memory, and new memory-search page

2026-03-30 07:09:40 +09:00 · 2026-03-30 07:09:40 +09:00 · 3584a893e8
parent 6d9a7224aa
commit 3584a893e8
4 changed files with 561 additions and 136 deletions
--- a/docs/concepts/compaction.md
+++ b/docs/concepts/compaction.md
@ -1,98 +1,141 @@
 ---
-summary: "Context window + compaction: how OpenClaw keeps sessions under model limits"
+summary: "How OpenClaw compacts long sessions to stay within model context limits"
 read_when:
  - You want to understand auto-compaction and /compact
  - You are debugging long sessions hitting context limits
+  - You want to tune compaction behavior or use a custom context engine
 title: "Compaction"
 ---

-# Context Window & Compaction
+# Compaction

-Every model has a **context window** (max tokens it can see). Long-running chats accumulate messages and tool results; once the window is tight, OpenClaw **compacts** older history to stay within limits.
+Every model has a **context window** -- the maximum number of tokens it can see
+at once. As a conversation grows, it eventually approaches that limit. OpenClaw
+**compacts** older history into a summary so the session can continue without
+losing important context.

-## What compaction is
+## How compaction works

-Compaction **summarizes older conversation** into a compact summary entry and keeps recent messages intact. The summary is stored in the session history, so future requests use:
+Compaction is a three-step process:

- The compaction summary
- Recent messages after the compaction point
+1. **Summarize** older conversation turns into a compact summary.
+2. **Persist** the summary as a `compaction` entry in the session transcript
+   (JSONL).
+3. **Keep** recent messages after the compaction point intact.

-Compaction **persists** in the session’s JSONL history.
+After compaction, future turns see the summary plus all messages after the
+compaction point. The on-disk transcript retains the full history -- compaction
+only changes what gets loaded into the model context.

-## Configuration
+## Auto-compaction

-Use the `agents.defaults.compaction` setting in your `openclaw.json` to configure compaction behavior (mode, target tokens, etc.).
-Compaction summarization preserves opaque identifiers by default (`identifierPolicy: "strict"`). You can override this with `identifierPolicy: "off"` or provide custom text with `identifierPolicy: "custom"` and `identifierInstructions`.
+Auto-compaction is **on by default**. It triggers in two situations:

-You can optionally specify a different model for compaction summarization via `agents.defaults.compaction.model`. This is useful when your primary model is a local or small model and you want compaction summaries produced by a more capable model. The override accepts any `provider/model-id` string:
+1. **Threshold maintenance** -- after a successful turn, when estimated context
+   usage exceeds `contextWindow - reserveTokens`.
+2. **Overflow recovery** -- the model returns a context-overflow error. OpenClaw
+   compacts and retries the request.

-```json
-{
-  "agents": {
-    "defaults": {
-      "compaction": {
-        "model": "openrouter/anthropic/claude-sonnet-4-6"
-      }
-    }
-  }
-}
-```
+When auto-compaction runs you will see:

-This also works with local models, for example a second Ollama model dedicated to summarization or a fine-tuned compaction specialist:
+- `Auto-compaction complete` in verbose mode
+- `/status` showing `Compactions: <count>`

-```json
-{
-  "agents": {
-    "defaults": {
-      "compaction": {
-        "model": "ollama/llama3.1:8b"
-      }
-    }
-  }
-}
-```
+### Pre-compaction memory flush

-When unset, compaction uses the agent's primary model.
-
-## Auto-compaction (default on)
-
-When a session nears or exceeds the model’s context window, OpenClaw triggers auto-compaction and may retry the original request using the compacted context.
-
-You’ll see:
-
- `🧹 Auto-compaction complete` in verbose mode
- `/status` showing `🧹 Compactions: <count>`
-
-Before compaction, OpenClaw can run a **silent memory flush** turn to store
-durable notes to disk. See [Memory](/concepts/memory) for details and config.
+Before compacting, OpenClaw can run a **silent turn** that reminds the model to
+write durable notes to disk. This prevents important context from being lost in
+the summary. The flush is controlled by `agents.defaults.compaction.memoryFlush`
+and runs once per compaction cycle. See [Memory](/concepts/memory) for details.

 ## Manual compaction

-Use `/compact` (optionally with instructions) to force a compaction pass:
+Use `/compact` in any chat to force a compaction pass. You can optionally add
+instructions to guide the summary:

 ```
 /compact Focus on decisions and open questions
 ```

-## Context window source
+## Configuration

-Context window is model-specific. OpenClaw uses the model definition from the configured provider catalog to determine limits.
+### Compaction model
+
+By default, compaction uses the agent's primary model. You can override this
+with a different model for summarization -- useful when your primary model is
+small or local and you want a more capable summarizer:
+
+```json5
+{
+  agents: {
+    defaults: {
+      compaction: {
+        model: "openrouter/anthropic/claude-sonnet-4-6",
+      },
+    },
+  },
+}
+```
+
+### Reserve tokens and floor
+
+- `reserveTokens` -- headroom reserved for prompts and the next model output
+  (Pi runtime default: `16384`).
+- `reserveTokensFloor` -- minimum reserve enforced by OpenClaw (default:
+  `20000`). Set to `0` to disable.
+- `keepRecentTokens` -- how many tokens of recent conversation to preserve
+  during compaction (default: `20000`).
+
+### Identifier preservation
+
+Compaction summaries preserve opaque identifiers by default
+(`identifierPolicy: "strict"`). Override with:
+
+- `"off"` -- no special identifier handling.
+- `"custom"` -- provide your own instructions via `identifierInstructions`.
+
+### Memory flush
+
+```json5
+{
+  agents: {
+    defaults: {
+      compaction: {
+        memoryFlush: {
+          enabled: true, // default
+          softThresholdTokens: 4000,
+          systemPrompt: "Session nearing compaction. Store durable memories now.",
+          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
+        },
+      },
+    },
+  },
+}
+```
+
+The flush triggers when context usage crosses
+`contextWindow - reserveTokensFloor - softThresholdTokens`. It runs silently
+(the user sees nothing) and is skipped when the workspace is read-only.

 ## Compaction vs pruning

- **Compaction**: summarises and **persists** in JSONL.
- **Session pruning**: trims old **tool results** only, **in-memory**, per request.
+|                  | Compaction                     | Session pruning                  |
+| ---------------- | ------------------------------ | -------------------------------- |
+| **What it does** | Summarizes older conversation  | Trims old tool results           |
+| **Persisted?**   | Yes (in JSONL transcript)      | No (in-memory only, per request) |
+| **Scope**        | Entire conversation history    | Tool result messages only        |
+| **Frequency**    | Once when threshold is reached | Every LLM call (when enabled)    |

-See [/concepts/session-pruning](/concepts/session-pruning) for pruning details.
+See [Session Pruning](/concepts/session-pruning) for pruning details.

 ## OpenAI server-side compaction

-OpenClaw also supports OpenAI Responses server-side compaction hints for
-compatible direct OpenAI models. This is separate from local OpenClaw
-compaction and can run alongside it.
+OpenClaw also supports OpenAI Responses server-side compaction for compatible
+direct OpenAI models. This is separate from local compaction and can run
+alongside it:

- Local compaction: OpenClaw summarizes and persists into session JSONL.
- Server-side compaction: OpenAI compacts context on the provider side when
+- **Local compaction** -- OpenClaw summarizes and persists into session JSONL.
+- **Server-side compaction** -- OpenAI compacts context on the provider side when
  `store` + `context_management` are enabled.

 See [OpenAI provider](/providers/openai) for model params and overrides.
@ -100,24 +143,40 @@ See [OpenAI provider](/providers/openai) for model params and overrides.
 ## Custom context engines

 Compaction behavior is owned by the active
-[context engine](/concepts/context-engine). The legacy engine uses the built-in
+[context engine](/concepts/context-engine). The built-in engine uses the
 summarization described above. Plugin engines (selected via
-`plugins.slots.contextEngine`) can implement any compaction strategy — DAG
-summaries, vector retrieval, incremental condensation, etc.
+`plugins.slots.contextEngine`) can implement any strategy -- DAG summaries,
+vector retrieval, incremental condensation, etc.

 When a plugin engine sets `ownsCompaction: true`, OpenClaw delegates all
 compaction decisions to the engine and does not run built-in auto-compaction.

-When `ownsCompaction` is `false` or unset, OpenClaw may still use Pi's
-built-in in-attempt auto-compaction, but the active engine's `compact()` method
-still handles `/compact` and overflow recovery. There is no automatic fallback
-to the legacy engine's compaction path.
-
-If you are building a non-owning context engine, implement `compact()` by
+When `ownsCompaction` is `false` or unset, the built-in auto-compaction still
+runs, but the engine's `compact()` method handles `/compact` and overflow
+recovery. If you are building a non-owning engine, implement `compact()` by
 calling `delegateCompactionToRuntime(...)` from `openclaw/plugin-sdk/core`.

-## Tips
+## Troubleshooting

- Use `/compact` when sessions feel stale or context is bloated.
- Large tool outputs are already truncated; pruning can further reduce tool-result buildup.
- If you need a fresh slate, `/new` or `/reset` starts a new session id.
+**Compaction triggers too often?**
+
+- Check the model's context window -- small models compact more frequently.
+- High `reserveTokens` relative to the context window can trigger early
+  compaction.
+- Large tool outputs accumulate fast. Enable
+  [session pruning](/concepts/session-pruning) to reduce tool-result buildup.
+
+**Context feels stale after compaction?**
+
+- Use `/compact Focus on <topic>` to guide the summary.
+- Increase `keepRecentTokens` to preserve more recent conversation.
+- Enable the [memory flush](/concepts/memory) so durable notes survive
+  compaction.
+
+**Need a fresh start?**
+
+- `/new` or `/reset` starts a new session ID without compacting.
+
+For the full internal lifecycle (store schema, transcript structure, Pi runtime
+semantics), see
+[Session Management Deep Dive](/reference/session-management-compaction).
--- a/docs/concepts/memory-search.md
+++ b/docs/concepts/memory-search.md
@ -0,0 +1,248 @@
+---
+title: "Memory Search"
+summary: "How OpenClaw memory search works -- embedding providers, hybrid search, MMR, and temporal decay"
+read_when:
+  - You want to understand how memory_search retrieves results
+  - You want to tune hybrid search, MMR, or temporal decay
+  - You want to choose an embedding provider
+---
+
+# Memory Search
+
+OpenClaw indexes workspace memory files (`MEMORY.md` and `memory/*.md`) into
+chunks (~400 tokens, 80-token overlap) and searches them with `memory_search`.
+This page explains how the search pipeline works and how to tune it. For the
+file layout and memory basics, see [Memory](/concepts/memory).
+
+## Search pipeline
+
+```
+Query -> Embedding -> Vector Search ─┐
+                                     ├─> Weighted Merge -> Temporal Decay -> MMR -> Top-K
+Query -> Tokenize  -> BM25 Search  ──┘
+```
+
+Both retrieval paths run in parallel when hybrid search is enabled. If either
+path is unavailable (no embeddings or no FTS5), the other runs alone.
+
+## Embedding providers
+
+The default `memory-core` plugin ships built-in adapters for these providers:
+
+| Provider   | Adapter ID | Auto-selected        | Notes                               |
+| ---------- | ---------- | -------------------- | ----------------------------------- |
+| Local GGUF | `local`    | Yes (first priority) | node-llama-cpp, ~0.6 GB model       |
+| OpenAI     | `openai`   | Yes                  | `text-embedding-3-small` default    |
+| Gemini     | `gemini`   | Yes                  | Supports multimodal (images, audio) |
+| Voyage     | `voyage`   | Yes                  |                                     |
+| Mistral    | `mistral`  | Yes                  |                                     |
+| Ollama     | `ollama`   | No (explicit only)   | Local/self-hosted                   |
+
+Auto-selection picks the first provider whose API key can be resolved. Set
+`memorySearch.provider` explicitly to override.
+
+Remote embeddings require an API key for the embedding provider. OpenClaw
+resolves keys from auth profiles, `models.providers.*.apiKey`, or environment
+variables. Codex OAuth covers chat/completions only and does not satisfy
+embedding requests.
+
+### Quick start
+
+Enable memory search with OpenAI embeddings:
+
+```json5
+{
+  agents: {
+    defaults: {
+      memorySearch: {
+        provider: "openai",
+        model: "text-embedding-3-small",
+      },
+    },
+  },
+}
+```
+
+Or use local embeddings (no API key needed):
+
+```json5
+{
+  agents: {
+    defaults: {
+      memorySearch: {
+        provider: "local",
+      },
+    },
+  },
+}
+```
+
+Local mode uses node-llama-cpp and may require `pnpm approve-builds` to build
+the native addon.
+
+## Hybrid search (BM25 + vector)
+
+When both FTS5 and embeddings are available, OpenClaw combines two retrieval
+signals:
+
+- **Vector similarity** -- semantic matching. Good at paraphrases ("Mac Studio
+  gateway host" vs "the machine running the gateway").
+- **BM25 keyword relevance** -- exact token matching. Good at IDs, code symbols,
+  error strings, and config keys.
+
+### How scores are merged
+
+1. Retrieve a candidate pool from each side (top
+   `maxResults x candidateMultiplier`).
+2. Convert BM25 rank to a 0-1 score: `textScore = 1 / (1 + max(0, bm25Rank))`.
+3. Union candidates by chunk ID and compute:
+   `finalScore = vectorWeight x vectorScore + textWeight x textScore`.
+
+Weights are normalized to 1.0, so they behave as percentages. If either path is
+unavailable, the other runs alone with no hard failure.
+
+### CJK support
+
+FTS5 uses configurable trigram tokenization with a short-substring fallback so
+Chinese, Japanese, and Korean text is searchable. CJK-heavy text is weighted
+correctly during chunk-size estimation, and surrogate-pair characters are
+preserved during fine splits.
+
+## Post-processing
+
+After merging scores, two optional stages refine the result list:
+
+### Temporal decay (recency boost)
+
+Daily notes accumulate over months. Without decay, a well-worded note from six
+months ago can outrank yesterday's update on the same topic.
+
+Temporal decay applies an exponential multiplier based on age:
+
+```
+decayedScore = score x e^(-lambda x ageInDays)
+```
+
+With the default half-life of 30 days:
+
+| Age      | Score retained |
+| -------- | -------------- |
+| Today    | 100%           |
+| 7 days   | ~84%           |
+| 30 days  | 50%            |
+| 90 days  | 12.5%          |
+| 180 days | ~1.6%          |
+
+**Evergreen files are never decayed** -- `MEMORY.md` and non-dated files in
+`memory/` (like `memory/projects.md`) always rank at full score. Dated daily
+files use the date from the filename.
+
+**When to enable:** Your agent has months of daily notes and stale information
+outranks recent context.
+
+### MMR re-ranking (diversity)
+
+When search returns results, multiple chunks may contain similar or overlapping
+content. MMR (Maximal Marginal Relevance) re-ranks results to balance relevance
+with diversity.
+
+How it works:
+
+1. Start with the highest-scoring result.
+2. Iteratively select the next result that maximizes:
+   `lambda x relevance - (1 - lambda) x max_similarity_to_already_selected`.
+3. Similarity is measured using Jaccard text similarity on tokenized content.
+
+The `lambda` parameter controls the trade-off:
+
+- `1.0` -- pure relevance (no diversity penalty).
+- `0.0` -- maximum diversity (ignores relevance).
+- Default: `0.7` (balanced, slight relevance bias).
+
+**When to enable:** `memory_search` returns redundant or near-duplicate
+snippets, especially with daily notes that repeat similar information.
+
+## Configuration
+
+Both post-processing features and hybrid search weights are configured under
+`memorySearch.query.hybrid`:
+
+```json5
+{
+  agents: {
+    defaults: {
+      memorySearch: {
+        query: {
+          hybrid: {
+            enabled: true,
+            vectorWeight: 0.7,
+            textWeight: 0.3,
+            candidateMultiplier: 4,
+            mmr: {
+              enabled: true, // default: false
+              lambda: 0.7,
+            },
+            temporalDecay: {
+              enabled: true, // default: false
+              halfLifeDays: 30,
+            },
+          },
+        },
+      },
+    },
+  },
+}
+```
+
+You can enable either feature independently:
+
+- **MMR only** -- many similar notes but age does not matter.
+- **Temporal decay only** -- recency matters but results are already diverse.
+- **Both** -- recommended for agents with large, long-running daily note
+  histories.
+
+## Session memory search (experimental)
+
+You can optionally index session transcripts and surface them via
+`memory_search`. This is gated behind an experimental flag:
+
+```json5
+{
+  agents: {
+    defaults: {
+      memorySearch: {
+        experimental: { sessionMemory: true },
+        sources: ["memory", "sessions"],
+      },
+    },
+  },
+}
+```
+
+Session indexing is opt-in and runs asynchronously. Results can be slightly stale
+until background sync finishes. Session logs live on disk, so treat filesystem
+access as the trust boundary.
+
+## Troubleshooting
+
+**`memory_search` returns nothing?**
+
+- Check `openclaw memory status` -- is the index populated?
+- Verify an embedding provider is configured and has a valid key.
+- Run `openclaw memory index --force` to trigger a full reindex.
+
+**Results are all keyword matches, no semantic results?**
+
+- Embeddings may not be configured. Check `openclaw memory status --deep`.
+- If using `local`, ensure node-llama-cpp built successfully.
+
+**CJK text not found?**
+
+- FTS5 trigram tokenization handles CJK. If results are missing, run
+  `openclaw memory index --force` to rebuild the FTS index.
+
+## Further reading
+
+- [Memory](/concepts/memory) -- file layout, backends, tools
+- [Memory configuration reference](/reference/memory-config) -- all config knobs
+  including QMD, batch indexing, embedding cache, sqlite-vec, and multimodal
--- a/docs/concepts/memory.md
+++ b/docs/concepts/memory.md
@ -1,78 +1,187 @@
 ---
 title: "Memory"
-summary: "How OpenClaw memory works (workspace files + automatic memory flush)"
+summary: "How OpenClaw memory works -- file layout, backends, search, and automatic flush"
 read_when:
  - You want the memory file layout and workflow
+  - You want to understand memory search and backends
  - You want to tune the automatic pre-compaction memory flush
 ---

 # Memory

 OpenClaw memory is **plain Markdown in the agent workspace**. The files are the
-source of truth; the model only "remembers" what gets written to disk.
+source of truth -- the model only "remembers" what gets written to disk.

 Memory search tools are provided by the active memory plugin (default:
 `memory-core`). Disable memory plugins with `plugins.slots.memory = "none"`.

-## Memory files (Markdown)
+## File layout

-The default workspace layout uses two memory layers:
+The default workspace uses two memory layers:

- `memory/YYYY-MM-DD.md`
-  - Daily log (append-only).
-  - Read today + yesterday at session start.
- `MEMORY.md` (optional)
-  - Curated long-term memory.
-  - If both `MEMORY.md` and `memory.md` exist at the workspace root, OpenClaw loads both (deduplicated by realpath so symlinks pointing to the same file are not injected twice).
-  - **Only load in the main, private session** (never in group contexts).
+| Path                   | Purpose                  | Loaded at session start    |
+| ---------------------- | ------------------------ | -------------------------- |
+| `memory/YYYY-MM-DD.md` | Daily log (append-only)  | Today + yesterday          |
+| `MEMORY.md`            | Curated long-term memory | Yes (main DM session only) |

-These files live under the workspace (`agents.defaults.workspace`, default
-`~/.openclaw/workspace`). See [Agent workspace](/concepts/agent-workspace) for the full layout.
+If both `MEMORY.md` and `memory.md` exist at the workspace root, OpenClaw loads
+both (deduplicated by realpath so symlinks are not injected twice). `MEMORY.md`
+is only loaded in the main, private session -- never in group contexts.

-## Memory tools
-
-OpenClaw exposes two agent-facing tools for these Markdown files:
-
- `memory_search` -- semantic recall over indexed snippets.
- `memory_get` -- targeted read of a specific Markdown file/line range.
-
-`memory_get` now **degrades gracefully when a file doesn't exist** (for example,
-today's daily log before the first write). Both the builtin manager and the QMD
-backend return `{ text: "", path }` instead of throwing `ENOENT`, so agents can
-handle "nothing recorded yet" and continue their workflow without wrapping the
-tool call in try/catch logic.
+These files live under the agent workspace (`agents.defaults.workspace`, default
+`~/.openclaw/workspace`). See [Agent workspace](/concepts/agent-workspace) for
+the full layout.

 ## When to write memory

- Decisions, preferences, and durable facts go to `MEMORY.md`.
- Day-to-day notes and running context go to `memory/YYYY-MM-DD.md`.
- If someone says "remember this," write it down (do not keep it in RAM).
- This area is still evolving. It helps to remind the model to store memories; it will know what to do.
+- **Decisions, preferences, and durable facts** go to `MEMORY.md`.
+- **Day-to-day notes and running context** go to `memory/YYYY-MM-DD.md`.
+- If someone says "remember this," **write it down** (do not keep it in RAM).
 - If you want something to stick, **ask the bot to write it** into memory.

-## Automatic memory flush (pre-compaction ping)
+## Memory tools

-When a session is **close to auto-compaction**, OpenClaw triggers a **silent,
-agentic turn** that reminds the model to write durable memory **before** the
-context is compacted. The default prompts explicitly say the model _may reply_,
-but usually `NO_REPLY` is the correct response so the user never sees this turn.
-The active memory plugin owns the prompt/path policy for that flush; the
-default `memory-core` plugin writes to the canonical daily file under
-`memory/YYYY-MM-DD.md`.
+OpenClaw exposes two agent-facing tools:

-This is controlled by `agents.defaults.compaction.memoryFlush`:
+- **`memory_search`** -- semantic recall over indexed snippets. Uses the active
+  memory backend's search pipeline (vector similarity, keyword matching, or
+  hybrid).
+- **`memory_get`** -- targeted read of a specific Markdown file or line range.
+  Degrades gracefully when a file does not exist (returns empty text instead of
+  an error).
+
+## Memory backends
+
+OpenClaw supports two memory backends that control how `memory_search` indexes
+and retrieves content:
+
+### Builtin (default)
+
+The builtin backend uses a per-agent SQLite database with optional extensions:
+
+- **FTS5 full-text search** for keyword matching (BM25 scoring).
+- **sqlite-vec** for in-database vector similarity (falls back to in-process
+  cosine similarity when unavailable).
+- **Hybrid search** combining BM25 + vector scores for best-of-both-worlds
+  retrieval.
+- **CJK support** via configurable trigram tokenization with short-substring
+  fallback.
+
+The builtin backend works out of the box with no extra dependencies. For
+embedding vectors, configure an embedding provider (OpenAI, Gemini, Voyage,
+Mistral, Ollama, or local GGUF). Without an embedding provider, only keyword
+search is available.
+
+Index location: `~/.openclaw/memory/<agentId>.sqlite`
+
+### QMD (experimental)
+
+[QMD](https://github.com/tobi/qmd) is a local-first search sidecar that
+combines BM25 + vectors + reranking in a single binary. Set
+`memory.backend = "qmd"` to opt in.
+
+Key differences from the builtin backend:
+
+- Runs as a subprocess (Bun + node-llama-cpp), auto-downloads GGUF models.
+- Supports advanced post-processing: reranking, query expansion.
+- Can index extra directories beyond the workspace (`memory.qmd.paths`).
+- Can optionally index session transcripts (`memory.qmd.sessions`).
+- Falls back to the builtin backend if QMD is unavailable.
+
+QMD requires a separate install (`bun install -g https://github.com/tobi/qmd`)
+and a SQLite build that allows extensions. See the
+[Memory configuration reference](/reference/memory-config) for full setup.
+
+## Memory search
+
+When an embedding provider is configured, `memory_search` uses semantic vector
+search to find relevant notes even when the wording differs from the query.
+Hybrid search (BM25 + vector) is enabled by default when both FTS5 and
+embeddings are available.
+
+For details on how search works -- embedding providers, hybrid scoring, MMR
+diversity re-ranking, temporal decay, and tuning -- see
+[Memory Search](/concepts/memory-search).
+
+### Embedding provider auto-selection
+
+If `memorySearch.provider` is not set, OpenClaw auto-selects the first available
+provider in this order:
+
+1. `local` -- if `memorySearch.local.modelPath` is configured and exists.
+2. `openai` -- if an OpenAI key can be resolved.
+3. `gemini` -- if a Gemini key can be resolved.
+4. `voyage` -- if a Voyage key can be resolved.
+5. `mistral` -- if a Mistral key can be resolved.
+
+If none can be resolved, memory search stays disabled until configured. Ollama
+is supported but not auto-selected (set `memorySearch.provider = "ollama"`
+explicitly).
+
+## Additional memory paths
+
+Index Markdown files outside the default workspace layout:
+
+```json5
+{
+  agents: {
+    defaults: {
+      memorySearch: {
+        extraPaths: ["../team-docs", "/srv/shared-notes/overview.md"],
+      },
+    },
+  },
+}
+```
+
+Paths can be absolute or workspace-relative. Directories are scanned
+recursively for `.md` files. Symlinks are ignored.
+
+## Multimodal memory (Gemini)
+
+When using `gemini-embedding-2-preview`, OpenClaw can index image and audio
+files from `memorySearch.extraPaths`:
+
+```json5
+{
+  agents: {
+    defaults: {
+      memorySearch: {
+        provider: "gemini",
+        model: "gemini-embedding-2-preview",
+        extraPaths: ["assets/reference", "voice-notes"],
+        multimodal: {
+          enabled: true,
+          modalities: ["image", "audio"],
+        },
+      },
+    },
+  },
+}
+```
+
+Search queries remain text, but Gemini can compare them against indexed
+image/audio embeddings. `memory_get` still reads Markdown only.
+
+See the [Memory configuration reference](/reference/memory-config) for supported
+formats and limitations.
+
+## Automatic memory flush
+
+When a session is close to auto-compaction, OpenClaw runs a **silent turn** that
+reminds the model to write durable notes before the context is summarized. This
+prevents important information from being lost during compaction.
+
+Controlled by `agents.defaults.compaction.memoryFlush`:

 ```json5
 {
  agents: {
    defaults: {
      compaction: {
-        reserveTokensFloor: 20000,
        memoryFlush: {
-          enabled: true,
-          softThresholdTokens: 4000,
-          systemPrompt: "Session nearing compaction. Store durable memories now.",
-          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
+          enabled: true, // default
+          softThresholdTokens: 4000, // how far below compaction threshold to trigger
        },
      },
    },
@ -82,30 +191,38 @@ This is controlled by `agents.defaults.compaction.memoryFlush`:

 Details:

- **Soft threshold**: flush triggers when the session token estimate crosses
+- **Triggers** when context usage crosses
  `contextWindow - reserveTokensFloor - softThresholdTokens`.
- **Silent** by default: prompts include `NO_REPLY` so nothing is delivered.
- **Two prompts**: a user prompt plus a system prompt append the reminder.
- **One flush per compaction cycle** (tracked in `sessions.json`).
- **Workspace must be writable**: if the session runs sandboxed with
-  `workspaceAccess: "ro"` or `"none"`, the flush is skipped.
+- **Runs silently** -- prompts include `NO_REPLY` so nothing is delivered to the
+  user.
+- **Once per compaction cycle** (tracked in `sessions.json`).
+- **Skipped** when the workspace is read-only (`workspaceAccess: "ro"` or
+  `"none"`).
+- The active memory plugin owns the flush prompt and path policy. The default
+  `memory-core` plugin writes to `memory/YYYY-MM-DD.md`.

-For the full compaction lifecycle, see
-[Session management + compaction](/reference/session-management-compaction).
+For the full compaction lifecycle, see [Compaction](/concepts/compaction).

-## Vector memory search
+## CLI commands

-OpenClaw can build a small vector index over `MEMORY.md` and `memory/*.md` so
-semantic queries can find related notes even when wording differs. Hybrid search
-(BM25 + vector) is available for combining semantic matching with exact keyword
-lookups.
+| Command                          | Description                                |
+| -------------------------------- | ------------------------------------------ |
+| `openclaw memory status`         | Show memory index status and provider info |
+| `openclaw memory search <query>` | Search memory from the command line        |
+| `openclaw memory index`          | Force a reindex of memory files            |

-Memory search adapter ids come from the active memory plugin. The default
-`memory-core` plugin ships built-ins for OpenAI, Gemini, Voyage, Mistral,
-Ollama, and local GGUF models, plus an optional QMD sidecar backend for
-advanced retrieval and post-processing features like MMR diversity re-ranking
-and temporal decay.
+Add `--agent <id>` to target a specific agent, `--deep` for extended
+diagnostics, or `--json` for machine-readable output.

-For the full configuration reference -- including embedding provider setup, QMD
-backend, hybrid search tuning, multimodal memory, and all config knobs -- see
-[Memory configuration reference](/reference/memory-config).
+See [CLI: memory](/cli/memory) for the full command reference.
+
+## Further reading
+
+- [Memory Search](/concepts/memory-search) -- how search works, hybrid search,
+  MMR, temporal decay
+- [Memory configuration reference](/reference/memory-config) -- all config knobs
+  for providers, QMD, hybrid search, batch indexing, and multimodal
+- [Compaction](/concepts/compaction) -- how compaction interacts with memory
+  flush
+- [Session Management Deep Dive](/reference/session-management-compaction) --
+  internal session and compaction lifecycle
--- a/docs/docs.json
+++ b/docs/docs.json
@ -1033,6 +1033,7 @@
                  "concepts/session-pruning",
                  "concepts/session-tool",
                  "concepts/memory",
+                  "concepts/memory-search",
                  "concepts/compaction"
                ]
              },