openclaw/extensions/guardian
Albert ba28dbc016 feat(guardian): add LLM-based intent-alignment guardian plugin
Guardian intercepts tool calls via before_tool_call hook and sends them
to a separate LLM for review — blocks actions the user never requested,
defending against prompt injection attacks.

Key design decisions:
- Conversation turns (user + assistant pairs) give guardian context to
  understand confirmations like "yes" / "go ahead"
- Assistant replies are explicitly marked as untrusted in the prompt to
  prevent poisoning attacks from propagating
- Provider resolution uses SDK (not hardcoded list) with 3-layer
  fallback: explicit config → models.json → pi-ai built-in database
- Lazy resolution pattern for async provider/auth lookup in sync register()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 12:32:34 +08:00
..
guardian-client.test.ts feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
guardian-client.ts feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
index.test.ts feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
index.ts feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
message-cache.test.ts feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
message-cache.ts feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
openclaw.plugin.json feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
package.json feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
prompt.test.ts feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
prompt.ts feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
types.test.ts feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00
types.ts feat(guardian): add LLM-based intent-alignment guardian plugin 2026-03-15 12:32:34 +08:00