openclaw/docs/concepts
Josh Palmer 7a6c40872d
Agents: add system prompt safety guardrails (#5445)
* 🤖 agents: add system prompt safety guardrails

What:
- add safety guardrails to system prompt
- update system prompt docs
- update prompt tests

Why:
- discourage power-seeking or self-modification behavior
- clarify safety/oversight priority when conflicts arise

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)

* 🤖 agents: tighten safety wording for prompt guardrails

What:
- scope safety wording to system prompts/safety/tool policy changes
- document Safety inclusion in minimal prompt mode
- update safety prompt tests

Why:
- avoid blocking normal code changes or PR workflows
- keep prompt mode docs consistent with implementation

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)

* 🤖 docs: note safety guardrails are soft

What:
- document system prompt safety guardrails as advisory
- add security note on prompt guardrails vs hard controls

Why:
- clarify threat model and operator expectations
- avoid implying prompt text is an enforcement layer

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)
2026-01-31 15:50:15 +01:00
..
agent-loop.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
agent-workspace.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
agent.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
architecture.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
channel-routing.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
compaction.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
context.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
group-messages.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
groups.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
markdown-formatting.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
memory.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
messages.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
model-failover.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
model-providers.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
models.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
multi-agent.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
oauth.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
presence.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
queue.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
retry.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
session-pruning.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
session-tool.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
session.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
sessions.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
streaming.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
system-prompt.md Agents: add system prompt safety guardrails (#5445) 2026-01-31 15:50:15 +01:00
timezone.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
typebox.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
typing-indicators.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00
usage-tracking.md chore: Run `pnpm format:fix`. 2026-01-31 21:13:13 +09:00