mirror of https://github.com/openclaw/openclaw.git
6.7 KiB
6.7 KiB
Self Evolve
English
self-evolve is an self-learning plugin for openclaw. Fewer tokens, more algorithmic learning of new skills:
- Retrieves episodic memories before answering and prepends them to prompt context.
- Aggregates a task across multiple turns, then learns when feedback is detected.
- Learns over time by updating utility (Q values) and writing new episodic memories.
Quick Start
Recommended: upgrade to openclaw 2026.3.2+ before using this plugin. Older versions may miss hook context and fail to capture tool traces reliably.
- Install plugin
openclaw plugins uninstall self-evolve
openclaw plugins install /path/to/self-evolve
- Set env var
export OPENAI_API_KEY=sk-xxx
- One-shot config
openclaw config set plugins.entries.self-evolve '{"enabled":true,"config":{"embedding":{"provider":"openai","apiKey":"${OPENAI_API_KEY}","model":"text-embedding-3-small","dimensions":512},"reward":{"provider":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0},"experience":{"summarizer":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0}}}'
- Restart and verify
- Restart gateway.
- Check logs for:
self-evolve: initialized ...self-evolve: feedback scored ... learn=true
Feedback Tips
- Praise clearly when it works (for positive reinforcement).
- Point out clearly when it fails (to down-rank bad strategies).
- Explicit feedback is better than vague messages like "ok".
How It Works
before_prompt_build
- Manages a pending task state (
open/waiting_feedback). - Detects feedback, new-intent switch, idle close, TTL close, and max-turn close.
- Builds embedding and retrieves candidates.
- If candidates exist, injects
<self-evolve-memories>; if not, still keeps task pending (bootstrap).
agent_end
- Captures assistant response and moves task to
waiting_feedback.
- Later user messages
- If feedback is detected, scores reward and decides learning.
- If reward + mode + intent gates pass, updates Q and appends episodic memory.
- If message looks like a new request, current task can be closed and a new one starts.
Advanced Settings
Default learning gates:
runtime.observeTurns=0runtime.minAbsReward=0.15runtime.minRewardConfidence=0.55runtime.minFeedbackCharshas been removed.
Default retrieval gate:
retrieval.tau=0.85(only inject memories when best similarity is high enough)
Learning modes (runtime.learnMode):
balanced(default): prefer tool turns; no-tool turns require high reward/confidence.tools_only: learn only when tools were called (lowest token cost).all: learn all turns that pass reward gates (highest token cost).
Balanced-mode no-tool thresholds:
runtime.noToolMinAbsReward=0.8runtime.noToolMinRewardConfidence=0.9
Task boundary defaults:
runtime.newIntentSimilarityThreshold=0.35runtime.idleTurnsToClose=2runtime.pendingTtlMs=300000(5 minutes)runtime.maxTurnsPerTask=5
Switch mode:
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"tools_only"'
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"all"'
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"balanced"'
Memory retention:
- Default
memory.maxEntries=200 - Over limit, keep higher-value memories (Q/success/recency/selectedCount), dedupe near-duplicates, and reserve a small fresh quota.
openclaw config set plugins.entries.self-evolve.config.memory.maxEntries 200
中文
self-evolve 是一个为openclaw设计的自学习插件,可以更少token、更算法的学习新技能:
- 回答前检索 episodic memory 并注入上下文。
- 将一个任务聚合为多轮,再在检测到反馈时学习。
- 持续更新 Q 值并写入新记忆。
快速入门
建议先升级到 openclaw 2026.3.2+。旧版本可能出现 hook 上下文缺失,导致 tool trace 记录不稳定。
- 安装插件
openclaw plugins uninstall self-evolve
openclaw plugins install /path/to/self-evolve
- 设置环境变量
export OPENAI_API_KEY=sk-xxx
- 一条命令配置
openclaw config set plugins.entries.self-evolve '{"enabled":true,"config":{"embedding":{"provider":"openai","apiKey":"${OPENAI_API_KEY}","model":"text-embedding-3-small","dimensions":512},"reward":{"provider":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0},"experience":{"summarizer":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0}}}'
- 重启并验证
- 重启 gateway。
- 查看日志是否出现:
self-evolve: initialized ...self-evolve: feedback scored ... learn=true
反馈建议
- 做对时明确表扬(强化正确策略)。
- 做错时明确指出(降低错误策略权重)。
- 明确反馈优于“ok/继续”这类模糊反馈。
高级配置
默认学习门槛:
runtime.observeTurns=0runtime.minAbsReward=0.15runtime.minRewardConfidence=0.55runtime.minFeedbackChars已移除。
默认检索门槛:
retrieval.tau=0.85(仅在最高相似度足够高时才注入记忆)
学习模式 runtime.learnMode:
balanced(默认):优先学习工具回合;无工具回合需高奖励高置信。tools_only:仅学习有工具调用的回合(最省 token)。all:所有通过门槛的回合都学习(最费 token)。
任务边界默认值:
runtime.newIntentSimilarityThreshold=0.35runtime.idleTurnsToClose=2runtime.pendingTtlMs=300000(5分钟)runtime.maxTurnsPerTask=5
切换示例:
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"tools_only"'
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"all"'
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"balanced"'
记忆保留:
- 默认
memory.maxEntries=200 - 超限时按综合价值保留,并对高相似记忆去重。
openclaw config set plugins.entries.self-evolve.config.memory.maxEntries 200
References / 参考
Citation:
@misc{zhang2026memrlselfevolvingagentsruntime,
title = {MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory},
author = {Shengtao Zhang and Jiaqian Wang and Ruiwen Zhou and Junwei Liao and Yuchen Feng and Weinan Zhang and Ying Wen and Zhiyu Li and Feiyu Xiong and Yutao Qi and Bo Tang and Muning Wen},
year = {2026},
eprint = {2601.03192},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2601.03192},
}
License
MIT