openclaw

Commit Graph

Author	SHA1	Message	Date
ShengtongZhu	f4488a73ff	fix(guardian): stricter ALLOW/BLOCK verdict parsing in guardian response Require a delimiter (colon, space, or end of line) after ALLOW/BLOCK keywords. Previously `startsWith("ALLOW")` would match words like "ALLOWING" or "ALLOWANCE", potentially causing a false ALLOW verdict if the model's response started with such a word. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 12:32:47 +08:00
ShengtongZhu	6a3220b0c6	feat(guardian): enhance context awareness and add conversation summarization - Add rolling conversation summary generation to provide long-term context without token waste - Extract standing instructions and available skills from system prompt for better decision context - Support thinking block extraction for reasoning model responses (e.g. kimi-coding) - Add config options for context tools, recent turns, and tool result length - Implement lazy context extraction with live message array reference - Skip guardian review for system triggers (heartbeat, cron) - Improve error handling for abort race conditions and timeout scenarios - Normalize headers in model-auth to handle secret inputs consistently - Update documentation with comprehensive usage guide and security model	2026-03-15 12:32:47 +08:00
Albert	1c6b5d7b72	refactor(guardian): use pi-ai completeSimple, improve prompt and logging - Replace 3 raw fetch() API call functions (OpenAI, Anthropic, Google) with a single pi-ai completeSimple() call, ensuring consistent HTTP behavior (User-Agent, auth, retry) with the main model - Remove authMode field — pi-ai auto-detects OAuth from API key prefix - Rewrite system prompt for strict single-line output format, add "Do NOT change your mind" and "Do NOT output reasoning" constraints - Move decision guidelines to system prompt, add multi-step workflow awareness (intermediate read steps should be ALLOWed) - Simplify user prompt — remove inline examples and criteria - Use forward scanning in parseGuardianResponse for security (model's verdict appears first, attacker-injected text appears after) - Add prominent BLOCK logging via logger.error with full conversation context dump (████ banner, all turns, tool arguments) - Remove 800-char assistant message truncation limit - Increase default max_user_messages from 3 to 10 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 12:32:34 +08:00
Albert	ba28dbc016	feat(guardian): add LLM-based intent-alignment guardian plugin Guardian intercepts tool calls via before_tool_call hook and sends them to a separate LLM for review — blocks actions the user never requested, defending against prompt injection attacks. Key design decisions: - Conversation turns (user + assistant pairs) give guardian context to understand confirmations like "yes" / "go ahead" - Assistant replies are explicitly marked as untrusted in the prompt to prevent poisoning attacks from propagating - Provider resolution uses SDK (not hardcoded list) with 3-layer fallback: explicit config → models.json → pi-ai built-in database - Lazy resolution pattern for async provider/auth lookup in sync register() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 12:32:34 +08:00

4 Commits