6.7 KiB

Raw Blame History

Self Evolve

English
中文

English

self-evolve is an self-learning plugin for openclaw. Fewer tokens, more algorithmic learning of new skills:

Retrieves episodic memories before answering and prepends them to prompt context.
Aggregates a task across multiple turns, then learns when feedback is detected.
Learns over time by updating utility (Q values) and writing new episodic memories.

Quick Start

Recommended: upgrade to openclaw 2026.3.2+ before using this plugin. Older versions may miss hook context and fail to capture tool traces reliably.

Install plugin

openclaw plugins uninstall self-evolve
openclaw plugins install /path/to/self-evolve

Set env var

export OPENAI_API_KEY=sk-xxx

One-shot config

openclaw config set plugins.entries.self-evolve '{"enabled":true,"config":{"embedding":{"provider":"openai","apiKey":"${OPENAI_API_KEY}","model":"text-embedding-3-small","dimensions":512},"reward":{"provider":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0},"experience":{"summarizer":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0}}}'

Restart and verify

Restart gateway.
Check logs for:
- self-evolve: initialized ...
- self-evolve: feedback scored ... learn=true

Feedback Tips

Praise clearly when it works (for positive reinforcement).
Point out clearly when it fails (to down-rank bad strategies).
Explicit feedback is better than vague messages like "ok".

How It Works

before_prompt_build

Manages a pending task state (open / waiting_feedback).
Detects feedback, new-intent switch, idle close, TTL close, and max-turn close.
Builds embedding and retrieves candidates.
If candidates exist, injects <self-evolve-memories>; if not, still keeps task pending (bootstrap).

agent_end

Captures assistant response and moves task to waiting_feedback.

Later user messages

If feedback is detected, scores reward and decides learning.
If reward + mode + intent gates pass, updates Q and appends episodic memory.
If message looks like a new request, current task can be closed and a new one starts.

Advanced Settings

Default learning gates:

runtime.observeTurns=0
runtime.minAbsReward=0.15
runtime.minRewardConfidence=0.55
runtime.minFeedbackChars has been removed.

Default retrieval gate:

retrieval.tau=0.85 (only inject memories when best similarity is high enough)

Learning modes (runtime.learnMode):

balanced (default): prefer tool turns; no-tool turns require high reward/confidence.
tools_only: learn only when tools were called (lowest token cost).
all: learn all turns that pass reward gates (highest token cost).

Balanced-mode no-tool thresholds:

runtime.noToolMinAbsReward=0.8
runtime.noToolMinRewardConfidence=0.9

Task boundary defaults:

runtime.newIntentSimilarityThreshold=0.35
runtime.idleTurnsToClose=2
runtime.pendingTtlMs=300000 (5 minutes)
runtime.maxTurnsPerTask=5

Switch mode:

openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"tools_only"'
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"all"'
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"balanced"'

Memory retention:

Default memory.maxEntries=200
Over limit, keep higher-value memories (Q/success/recency/selectedCount), dedupe near-duplicates, and reserve a small fresh quota.

openclaw config set plugins.entries.self-evolve.config.memory.maxEntries 200

中文

self-evolve 是一个为openclaw设计的自学习插件，可以更少token、更算法的学习新技能：

回答前检索 episodic memory 并注入上下文。
将一个任务聚合为多轮，再在检测到反馈时学习。
持续更新 Q 值并写入新记忆。

快速入门

建议先升级到 openclaw 2026.3.2+。旧版本可能出现 hook 上下文缺失，导致 tool trace 记录不稳定。

安装插件

openclaw plugins uninstall self-evolve
openclaw plugins install /path/to/self-evolve

设置环境变量

export OPENAI_API_KEY=sk-xxx

一条命令配置

openclaw config set plugins.entries.self-evolve '{"enabled":true,"config":{"embedding":{"provider":"openai","apiKey":"${OPENAI_API_KEY}","model":"text-embedding-3-small","dimensions":512},"reward":{"provider":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0},"experience":{"summarizer":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0}}}'

重启并验证

重启 gateway。
查看日志是否出现：
- self-evolve: initialized ...
- self-evolve: feedback scored ... learn=true

反馈建议

做对时明确表扬（强化正确策略）。
做错时明确指出（降低错误策略权重）。
明确反馈优于“ok/继续”这类模糊反馈。

高级配置

默认学习门槛：

runtime.observeTurns=0
runtime.minAbsReward=0.15
runtime.minRewardConfidence=0.55
runtime.minFeedbackChars 已移除。

默认检索门槛：

retrieval.tau=0.85（仅在最高相似度足够高时才注入记忆）

学习模式 runtime.learnMode：

balanced（默认）：优先学习工具回合；无工具回合需高奖励高置信。
tools_only：仅学习有工具调用的回合（最省 token）。
all：所有通过门槛的回合都学习（最费 token）。

任务边界默认值：

runtime.newIntentSimilarityThreshold=0.35
runtime.idleTurnsToClose=2
runtime.pendingTtlMs=300000（5分钟）
runtime.maxTurnsPerTask=5

切换示例：

openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"tools_only"'
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"all"'
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"balanced"'

记忆保留：

默认 memory.maxEntries=200
超限时按综合价值保留，并对高相似记忆去重。

openclaw config set plugins.entries.self-evolve.config.memory.maxEntries 200

References / 参考

Citation:

@misc{zhang2026memrlselfevolvingagentsruntime,
  title         = {MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory},
  author        = {Shengtao Zhang and Jiaqian Wang and Ruiwen Zhou and Junwei Liao and Yuchen Feng and Weinan Zhang and Ying Wen and Zhiyu Li and Feiyu Xiong and Yutao Qi and Bo Tang and Muning Wen},
  year          = {2026},
  eprint        = {2601.03192},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2601.03192},
}

License

MIT

6.7 KiB Raw Blame History Unescape Escape