openclaw

Commit Graph

Author	SHA1	Message	Date
ShengtongZhu	8a2c15f9bc	fix(guardian): detect system triggers from historyMessages, not just currentPrompt Heartbeat prompts may arrive via historyMessages (as the last user message) rather than via currentPrompt, depending on the agent loop stage. Check both sources for system trigger detection so heartbeat tool calls are consistently skipped regardless of how the prompt is delivered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 12:32:47 +08:00
ShengtongZhu	13b4a0bbeb	fix(guardian): preserve isSystemTrigger across agent loop continuations During a heartbeat cycle, llm_input fires multiple times: first with the heartbeat prompt (isSystemTrigger=true), then without a prompt as the agent loop continues after tool results. Previously the flag was unconditionally rewritten on each llm_input, resetting to false when currentPrompt was undefined — causing heartbeat tool calls to reach the guardian LLM unnecessarily. Now preserves the existing isSystemTrigger value when currentPrompt is empty/undefined, and only resets it when a real user message arrives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 12:32:47 +08:00
ShengtongZhu	6a3220b0c6	feat(guardian): enhance context awareness and add conversation summarization - Add rolling conversation summary generation to provide long-term context without token waste - Extract standing instructions and available skills from system prompt for better decision context - Support thinking block extraction for reasoning model responses (e.g. kimi-coding) - Add config options for context tools, recent turns, and tool result length - Implement lazy context extraction with live message array reference - Skip guardian review for system triggers (heartbeat, cron) - Improve error handling for abort race conditions and timeout scenarios - Normalize headers in model-auth to handle secret inputs consistently - Update documentation with comprehensive usage guide and security model	2026-03-15 12:32:47 +08:00
Albert	7be93f981b	fix(guardian): include trailing assistant messages in conversation context When the main model is iterating autonomously (tool call → response → tool call → ...) without new user input, assistant messages after the last user message were being discarded. The guardian couldn't see what the model had been doing, leading to potential misjudgments. Now trailing assistant messages are appended to the last conversation turn, giving the guardian full visibility into the model's recent actions and reasoning during autonomous iteration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 12:32:34 +08:00
Albert	1c6b5d7b72	refactor(guardian): use pi-ai completeSimple, improve prompt and logging - Replace 3 raw fetch() API call functions (OpenAI, Anthropic, Google) with a single pi-ai completeSimple() call, ensuring consistent HTTP behavior (User-Agent, auth, retry) with the main model - Remove authMode field — pi-ai auto-detects OAuth from API key prefix - Rewrite system prompt for strict single-line output format, add "Do NOT change your mind" and "Do NOT output reasoning" constraints - Move decision guidelines to system prompt, add multi-step workflow awareness (intermediate read steps should be ALLOWed) - Simplify user prompt — remove inline examples and criteria - Use forward scanning in parseGuardianResponse for security (model's verdict appears first, attacker-injected text appears after) - Add prominent BLOCK logging via logger.error with full conversation context dump (████ banner, all turns, tool arguments) - Remove 800-char assistant message truncation limit - Increase default max_user_messages from 3 to 10 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 12:32:34 +08:00
Albert	ba28dbc016	feat(guardian): add LLM-based intent-alignment guardian plugin Guardian intercepts tool calls via before_tool_call hook and sends them to a separate LLM for review — blocks actions the user never requested, defending against prompt injection attacks. Key design decisions: - Conversation turns (user + assistant pairs) give guardian context to understand confirmations like "yes" / "go ahead" - Assistant replies are explicitly marked as untrusted in the prompt to prevent poisoning attacks from propagating - Provider resolution uses SDK (not hardcoded list) with 3-layer fallback: explicit config → models.json → pi-ai built-in database - Lazy resolution pattern for async provider/auth lookup in sync register() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 12:32:34 +08:00

6 Commits