docs(guardian): improve README with quick start, default config values, and block behavior

- Replace Enable/Config sections with Quick start (bundled plugin, no npm install) - Show all default values in config example - Add "When a tool call is blocked" section explaining user flow - Remove Model selection section - Fix dead anchor link Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 11:20:44 +08:00 · 2026-03-15 11:20:44 +08:00 · 8972213aee
parent 6a3220b0c6
commit 8972213aee
1 changed files with 96 additions and 57 deletions
--- a/extensions/guardian/README.md
+++ b/extensions/guardian/README.md
@ -28,7 +28,10 @@ The guardian uses a **dual-hook architecture**:
 2. **`before_tool_call` hook** — lazily extracts the latest conversation context
   (including tool results like `memory_search`) and sends it to the guardian LLM

-## Enable
+## Quick start
+
+Guardian is a bundled plugin — no separate install needed. Just enable it in
+`~/.openclaw/openclaw.json`:

 ```json
 {
@ -40,9 +43,7 @@ The guardian uses a **dual-hook architecture**:
 }
 ```

-If no `model` is configured, the guardian uses the main agent model.
-
-## Config
+For better resilience, use a **different provider** than your main model:

 ```json
 {
@ -51,8 +52,52 @@ If no `model` is configured, the guardian uses the main agent model.
      "guardian": {
        "enabled": true,
        "config": {
-          "model": "openai/gpt-4o-mini",
-          "mode": "enforce"
+          "model": "anthropic/claude-opus-4-20250514"
+        }
+      }
+    }
+  }
+}
+```
+
+## Config
+
+All options with their **default values**:
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "guardian": {
+        "enabled": true,
+        "config": {
+          "mode": "enforce",
+          "watched_tools": [
+            "message_send",
+            "message",
+            "exec",
+            "write_file",
+            "Write",
+            "edit",
+            "gateway",
+            "gateway_config",
+            "cron",
+            "cron_add"
+          ],
+          "context_tools": [
+            "memory_search",
+            "memory_get",
+            "memory_recall",
+            "read",
+            "exec",
+            "web_fetch",
+            "web_search"
+          ],
+          "timeout_ms": 20000,
+          "fallback_on_error": "allow",
+          "log_decisions": true,
+          "max_arg_length": 500,
+          "max_recent_turns": 3
        }
      }
    }
@ -62,19 +107,17 @@ If no `model` is configured, the guardian uses the main agent model.

 ### All options

-| Option                   | Type                     | Default        | Description                                                                                                                                                                                                            |
-| ------------------------ | ------------------------ | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `model`                  | string                   | _(main model)_ | Guardian model in `provider/model` format (e.g. `"openai/gpt-4o-mini"`, `"kimi/moonshot-v1-8k"`, `"ollama/llama3.1:8b"`). A small, cheap model is recommended — the guardian only makes a binary ALLOW/BLOCK decision. |
-| `mode`                   | `"enforce"` \| `"audit"` | `"enforce"`    | `enforce` blocks disallowed calls. `audit` logs decisions without blocking — useful for initial evaluation.                                                                                                            |
-| `watched_tools`          | string[]                 | See below      | Tool names that require guardian review. Tools not in this list are always allowed.                                                                                                                                    |
-| `timeout_ms`             | number                   | `20000`        | Max wait for guardian API response (ms).                                                                                                                                                                               |
-| `fallback_on_error`      | `"allow"` \| `"block"`   | `"allow"`      | What to do when the guardian API fails or times out.                                                                                                                                                                   |
-| `log_decisions`          | boolean                  | `true`         | Log all ALLOW/BLOCK decisions. BLOCK decisions are logged with full conversation context.                                                                                                                              |
-| `max_user_messages`      | number                   | `10`           | Number of conversation turns fed to the summarizer (history window).                                                                                                                                                   |
-| `max_arg_length`         | number                   | `500`          | Max characters of tool arguments JSON to include (truncated).                                                                                                                                                          |
-| `max_recent_turns`       | number                   | `3`            | Number of recent raw conversation turns to keep in the guardian prompt alongside the rolling summary.                                                                                                                  |
-| `context_tools`          | string[]                 | See below      | Tool names whose results are included in the guardian's conversation context. Only results from these tools are fed to the guardian — others are filtered out to save tokens.                                          |
-| `max_tool_result_length` | number                   | `300`          | Max characters per tool result snippet included in the guardian context.                                                                                                                                               |
+| Option              | Type                     | Default        | Description                                                                                                                                                                      |
+| ------------------- | ------------------------ | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `model`             | string                   | _(main model)_ | Guardian model in `provider/model` format (e.g. `"openai/gpt-4o-mini"`, `"kimi/moonshot-v1-8k"`, `"ollama/llama3.1:8b"`). The guardian only makes a binary ALLOW/BLOCK decision. |
+| `mode`              | `"enforce"` \| `"audit"` | `"enforce"`    | `enforce` blocks disallowed calls. `audit` logs decisions without blocking — useful for initial evaluation.                                                                      |
+| `watched_tools`     | string[]                 | See below      | Tool names that require guardian review. Tools not in this list are always allowed.                                                                                              |
+| `timeout_ms`        | number                   | `20000`        | Max wait for guardian API response (ms).                                                                                                                                         |
+| `fallback_on_error` | `"allow"` \| `"block"`   | `"allow"`      | What to do when the guardian API fails or times out.                                                                                                                             |
+| `log_decisions`     | boolean                  | `true`         | Log all ALLOW/BLOCK decisions. BLOCK decisions are logged with full conversation context.                                                                                        |
+| `max_arg_length`    | number                   | `500`          | Max characters of tool arguments JSON to include (truncated).                                                                                                                    |
+| `max_recent_turns`  | number                   | `3`            | Number of recent raw conversation turns to keep in the guardian prompt alongside the rolling summary.                                                                            |
+| `context_tools`     | string[]                 | See below      | Tool names whose results are included in the guardian's conversation context. Only results from these tools are fed to the guardian — others are filtered out to save tokens.    |

 ### Default watched tools

@ -115,12 +158,14 @@ context for the guardian's decisions.

 ## Getting started

-**Step 1** — Start with audit mode to observe decisions without blocking:
+**Step 1** — Install and enable with defaults (see [Quick start](#quick-start)).
+
+**Step 2** — Optionally start with audit mode to observe decisions without
+blocking:

 ```json
 {
  "config": {
-    "model": "openai/gpt-4o-mini",
    "mode": "audit"
  }
 }
@ -129,50 +174,45 @@ context for the guardian's decisions.
 Check logs for `[guardian] AUDIT-ONLY (would block)` entries and verify the
 decisions are reasonable.

-**Step 2** — Switch to enforce mode:
+**Step 3** — Switch to `"enforce"` mode (the default) once you're satisfied.

-```json
-{
-  "config": {
-    "model": "openai/gpt-4o-mini",
-    "mode": "enforce"
-  }
-}
-```
-
-**Step 3** — Adjust `watched_tools` if needed. Remove tools that produce too
+**Step 4** — Adjust `watched_tools` if needed. Remove tools that produce too
 many false positives, or add custom tools that need protection.

-## Model selection
+## When a tool call is blocked

-The guardian makes a simple binary decision (ALLOW/BLOCK) for each tool call.
-A small, fast model is sufficient and keeps cost low.
+When the guardian blocks a tool call, the agent receives a tool error containing
+the block reason (e.g. `"Guardian: user never requested file deletion"`). The
+agent will then inform the user that the action was blocked and why.

-**Use a different provider than your main agent model.** If both the main model
-and the guardian use the same provider, a single provider outage takes down both
-the agent and its safety layer. Using a different provider ensures the guardian
-remains available even when the main model's provider has issues. For example,
-if your main model is `anthropic/claude-sonnet-4-20250514`, use
-`openai/gpt-4o-mini` for the guardian.
+**To proceed with the blocked action**, simply confirm it in the conversation:

-| Model                 | Notes                                       |
-| --------------------- | ------------------------------------------- |
-| `openai/gpt-4o-mini`  | Fast (~200ms), cheap, good accuracy         |
-| `kimi/moonshot-v1-8k` | Good for Chinese-language conversations     |
-| `ollama/llama3.1:8b`  | Free, runs locally, slightly lower accuracy |
+> "yes, go ahead and delete /tmp/old"

-Avoid using the same large model as your main agent — it wastes cost and adds
-latency to every watched tool call.
+The guardian re-evaluates every tool call independently. On the next attempt it
+will see your explicit confirmation in the recent conversation and ALLOW the
+call.
+
+If a tool is producing too many false positives, you can also:
+
+- Remove it from `watched_tools`
+- Switch to `"mode": "audit"` (log-only, no blocking)
+- Disable the plugin entirely (`"enabled": false`)

 ## Context awareness

-The guardian uses a **rolling summary + recent turns** strategy to provide
-long-term context without wasting tokens:
+The guardian builds rich context for each tool call review:

+- **Agent context** — the main agent's full system prompt, cached on the
+  first `llm_input` call. Contains AGENTS.md rules, MEMORY.md content,
+  tool definitions, available skills, and user-configured instructions.
+  Passed as-is (no extraction or summarization) since guardian models have
+  128K+ context windows. Treated as background DATA — user messages remain
+  the ultimate authority.
 - **Session summary** — a 2-4 sentence summary of the entire conversation
-  history, covering tasks requested, files/systems being worked on, standing
-  instructions, and confirmations. Updated asynchronously after each user
-  message (non-blocking). Roughly ~150 tokens.
+  history, covering tasks requested, files/systems being worked on, and
+  confirmations. Updated asynchronously after each user message
+  (non-blocking). Roughly ~150 tokens.
 - **Recent conversation turns** — the last `max_recent_turns` (default 3)
  raw turns with user messages, assistant replies, and tool results. Roughly
  ~600 tokens.
@ -186,9 +226,6 @@ long-term context without wasting tokens:
  new user input, trailing assistant messages and tool results are attached
  to the last conversation turn.

-This approach keeps the guardian prompt at ~750 tokens (vs ~2000 for 10 raw
-turns), while preserving full conversation context through the summary.
-
 The context is extracted **lazily** at `before_tool_call` time from the live
 session message array, so it always reflects the latest state — including tool
 results that arrived after the initial `llm_input` hook fired.
@ -206,6 +243,8 @@ parent agent's).
 - Assistant replies are treated as **context only** — they may be poisoned
 - Only user messages are considered authoritative intent signals
 - Tool results (shown as `[tool: ...]`) are treated as DATA
- Memory results are recognized as the user's own saved preferences
+- Agent context (system prompt) is treated as background DATA — it may be
+  indirectly poisoned (e.g. malicious rules written to memory or a trojan
+  skill in a cloned repo); user messages remain the ultimate authority
 - Forward scanning of guardian response prevents attacker-injected ALLOW in
  tool arguments from overriding the model's verdict