docs(guardian): improve README with quick start, default config values, and block behavior

- Replace Enable/Config sections with Quick start (bundled plugin, no npm install)
- Show all default values in config example
- Add "When a tool call is blocked" section explaining user flow
- Remove Model selection section
- Fix dead anchor link

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
ShengtongZhu 2026-03-15 11:20:44 +08:00
parent 6a3220b0c6
commit 8972213aee
1 changed files with 96 additions and 57 deletions

View File

@ -28,7 +28,10 @@ The guardian uses a **dual-hook architecture**:
2. **`before_tool_call` hook** — lazily extracts the latest conversation context 2. **`before_tool_call` hook** — lazily extracts the latest conversation context
(including tool results like `memory_search`) and sends it to the guardian LLM (including tool results like `memory_search`) and sends it to the guardian LLM
## Enable ## Quick start
Guardian is a bundled plugin — no separate install needed. Just enable it in
`~/.openclaw/openclaw.json`:
```json ```json
{ {
@ -40,9 +43,7 @@ The guardian uses a **dual-hook architecture**:
} }
``` ```
If no `model` is configured, the guardian uses the main agent model. For better resilience, use a **different provider** than your main model:
## Config
```json ```json
{ {
@ -51,8 +52,52 @@ If no `model` is configured, the guardian uses the main agent model.
"guardian": { "guardian": {
"enabled": true, "enabled": true,
"config": { "config": {
"model": "openai/gpt-4o-mini", "model": "anthropic/claude-opus-4-20250514"
"mode": "enforce" }
}
}
}
}
```
## Config
All options with their **default values**:
```json
{
"plugins": {
"entries": {
"guardian": {
"enabled": true,
"config": {
"mode": "enforce",
"watched_tools": [
"message_send",
"message",
"exec",
"write_file",
"Write",
"edit",
"gateway",
"gateway_config",
"cron",
"cron_add"
],
"context_tools": [
"memory_search",
"memory_get",
"memory_recall",
"read",
"exec",
"web_fetch",
"web_search"
],
"timeout_ms": 20000,
"fallback_on_error": "allow",
"log_decisions": true,
"max_arg_length": 500,
"max_recent_turns": 3
} }
} }
} }
@ -62,19 +107,17 @@ If no `model` is configured, the guardian uses the main agent model.
### All options ### All options
| Option | Type | Default | Description | | Option | Type | Default | Description |
| ------------------------ | ------------------------ | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ------------------- | ------------------------ | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model` | string | _(main model)_ | Guardian model in `provider/model` format (e.g. `"openai/gpt-4o-mini"`, `"kimi/moonshot-v1-8k"`, `"ollama/llama3.1:8b"`). A small, cheap model is recommended — the guardian only makes a binary ALLOW/BLOCK decision. | | `model` | string | _(main model)_ | Guardian model in `provider/model` format (e.g. `"openai/gpt-4o-mini"`, `"kimi/moonshot-v1-8k"`, `"ollama/llama3.1:8b"`). The guardian only makes a binary ALLOW/BLOCK decision. |
| `mode` | `"enforce"` \| `"audit"` | `"enforce"` | `enforce` blocks disallowed calls. `audit` logs decisions without blocking — useful for initial evaluation. | | `mode` | `"enforce"` \| `"audit"` | `"enforce"` | `enforce` blocks disallowed calls. `audit` logs decisions without blocking — useful for initial evaluation. |
| `watched_tools` | string[] | See below | Tool names that require guardian review. Tools not in this list are always allowed. | | `watched_tools` | string[] | See below | Tool names that require guardian review. Tools not in this list are always allowed. |
| `timeout_ms` | number | `20000` | Max wait for guardian API response (ms). | | `timeout_ms` | number | `20000` | Max wait for guardian API response (ms). |
| `fallback_on_error` | `"allow"` \| `"block"` | `"allow"` | What to do when the guardian API fails or times out. | | `fallback_on_error` | `"allow"` \| `"block"` | `"allow"` | What to do when the guardian API fails or times out. |
| `log_decisions` | boolean | `true` | Log all ALLOW/BLOCK decisions. BLOCK decisions are logged with full conversation context. | | `log_decisions` | boolean | `true` | Log all ALLOW/BLOCK decisions. BLOCK decisions are logged with full conversation context. |
| `max_user_messages` | number | `10` | Number of conversation turns fed to the summarizer (history window). | | `max_arg_length` | number | `500` | Max characters of tool arguments JSON to include (truncated). |
| `max_arg_length` | number | `500` | Max characters of tool arguments JSON to include (truncated). | | `max_recent_turns` | number | `3` | Number of recent raw conversation turns to keep in the guardian prompt alongside the rolling summary. |
| `max_recent_turns` | number | `3` | Number of recent raw conversation turns to keep in the guardian prompt alongside the rolling summary. | | `context_tools` | string[] | See below | Tool names whose results are included in the guardian's conversation context. Only results from these tools are fed to the guardian — others are filtered out to save tokens. |
| `context_tools` | string[] | See below | Tool names whose results are included in the guardian's conversation context. Only results from these tools are fed to the guardian — others are filtered out to save tokens. |
| `max_tool_result_length` | number | `300` | Max characters per tool result snippet included in the guardian context. |
### Default watched tools ### Default watched tools
@ -115,12 +158,14 @@ context for the guardian's decisions.
## Getting started ## Getting started
**Step 1** — Start with audit mode to observe decisions without blocking: **Step 1** — Install and enable with defaults (see [Quick start](#quick-start)).
**Step 2** — Optionally start with audit mode to observe decisions without
blocking:
```json ```json
{ {
"config": { "config": {
"model": "openai/gpt-4o-mini",
"mode": "audit" "mode": "audit"
} }
} }
@ -129,50 +174,45 @@ context for the guardian's decisions.
Check logs for `[guardian] AUDIT-ONLY (would block)` entries and verify the Check logs for `[guardian] AUDIT-ONLY (would block)` entries and verify the
decisions are reasonable. decisions are reasonable.
**Step 2** — Switch to enforce mode: **Step 3** — Switch to `"enforce"` mode (the default) once you're satisfied.
```json **Step 4** — Adjust `watched_tools` if needed. Remove tools that produce too
{
"config": {
"model": "openai/gpt-4o-mini",
"mode": "enforce"
}
}
```
**Step 3** — Adjust `watched_tools` if needed. Remove tools that produce too
many false positives, or add custom tools that need protection. many false positives, or add custom tools that need protection.
## Model selection ## When a tool call is blocked
The guardian makes a simple binary decision (ALLOW/BLOCK) for each tool call. When the guardian blocks a tool call, the agent receives a tool error containing
A small, fast model is sufficient and keeps cost low. the block reason (e.g. `"Guardian: user never requested file deletion"`). The
agent will then inform the user that the action was blocked and why.
**Use a different provider than your main agent model.** If both the main model **To proceed with the blocked action**, simply confirm it in the conversation:
and the guardian use the same provider, a single provider outage takes down both
the agent and its safety layer. Using a different provider ensures the guardian
remains available even when the main model's provider has issues. For example,
if your main model is `anthropic/claude-sonnet-4-20250514`, use
`openai/gpt-4o-mini` for the guardian.
| Model | Notes | > "yes, go ahead and delete /tmp/old"
| --------------------- | ------------------------------------------- |
| `openai/gpt-4o-mini` | Fast (~200ms), cheap, good accuracy |
| `kimi/moonshot-v1-8k` | Good for Chinese-language conversations |
| `ollama/llama3.1:8b` | Free, runs locally, slightly lower accuracy |
Avoid using the same large model as your main agent — it wastes cost and adds The guardian re-evaluates every tool call independently. On the next attempt it
latency to every watched tool call. will see your explicit confirmation in the recent conversation and ALLOW the
call.
If a tool is producing too many false positives, you can also:
- Remove it from `watched_tools`
- Switch to `"mode": "audit"` (log-only, no blocking)
- Disable the plugin entirely (`"enabled": false`)
## Context awareness ## Context awareness
The guardian uses a **rolling summary + recent turns** strategy to provide The guardian builds rich context for each tool call review:
long-term context without wasting tokens:
- **Agent context** — the main agent's full system prompt, cached on the
first `llm_input` call. Contains AGENTS.md rules, MEMORY.md content,
tool definitions, available skills, and user-configured instructions.
Passed as-is (no extraction or summarization) since guardian models have
128K+ context windows. Treated as background DATA — user messages remain
the ultimate authority.
- **Session summary** — a 2-4 sentence summary of the entire conversation - **Session summary** — a 2-4 sentence summary of the entire conversation
history, covering tasks requested, files/systems being worked on, standing history, covering tasks requested, files/systems being worked on, and
instructions, and confirmations. Updated asynchronously after each user confirmations. Updated asynchronously after each user message
message (non-blocking). Roughly ~150 tokens. (non-blocking). Roughly ~150 tokens.
- **Recent conversation turns** — the last `max_recent_turns` (default 3) - **Recent conversation turns** — the last `max_recent_turns` (default 3)
raw turns with user messages, assistant replies, and tool results. Roughly raw turns with user messages, assistant replies, and tool results. Roughly
~600 tokens. ~600 tokens.
@ -186,9 +226,6 @@ long-term context without wasting tokens:
new user input, trailing assistant messages and tool results are attached new user input, trailing assistant messages and tool results are attached
to the last conversation turn. to the last conversation turn.
This approach keeps the guardian prompt at ~750 tokens (vs ~2000 for 10 raw
turns), while preserving full conversation context through the summary.
The context is extracted **lazily** at `before_tool_call` time from the live The context is extracted **lazily** at `before_tool_call` time from the live
session message array, so it always reflects the latest state — including tool session message array, so it always reflects the latest state — including tool
results that arrived after the initial `llm_input` hook fired. results that arrived after the initial `llm_input` hook fired.
@ -206,6 +243,8 @@ parent agent's).
- Assistant replies are treated as **context only** — they may be poisoned - Assistant replies are treated as **context only** — they may be poisoned
- Only user messages are considered authoritative intent signals - Only user messages are considered authoritative intent signals
- Tool results (shown as `[tool: ...]`) are treated as DATA - Tool results (shown as `[tool: ...]`) are treated as DATA
- Memory results are recognized as the user's own saved preferences - Agent context (system prompt) is treated as background DATA — it may be
indirectly poisoned (e.g. malicious rules written to memory or a trojan
skill in a cloned repo); user messages remain the ultimate authority
- Forward scanning of guardian response prevents attacker-injected ALLOW in - Forward scanning of guardian response prevents attacker-injected ALLOW in
tool arguments from overriding the model's verdict tool arguments from overriding the model's verdict