* fix: prune stale session entries, cap entry count, and rotate sessions.json
The sessions.json file grows unbounded over time. Every heartbeat tick (default: 30m)
triggers multiple full rewrites, and session keys from groups, threads, and DMs
accumulate indefinitely with large embedded objects (skillsSnapshot,
systemPromptReport). At >50MB the synchronous JSON parse blocks the event loop,
causing Telegram webhook timeouts and effectively taking the bot down.
Three mitigations, all running inside saveSessionStoreUnlocked() on every write:
1. Prune stale entries: remove entries with updatedAt older than 30 days
(configurable via session.maintenance.pruneDays in openclaw.json)
2. Cap entry count: keep only the 500 most recently updated entries
(configurable via session.maintenance.maxEntries). Entries without updatedAt
are evicted first.
3. File rotation: if the existing sessions.json exceeds 10MB before a write,
rename it to sessions.json.bak.{timestamp} and keep only the 3 most recent
backups (configurable via session.maintenance.rotateBytes).
All three thresholds are configurable under session.maintenance in openclaw.json
with Zod validation. No env vars.
Existing tests updated to use Date.now() instead of epoch-relative timestamps
(1, 2, 3) that would be incorrectly pruned as stale.
27 new tests covering pruning, capping, rotation, and integration scenarios.
* feat: auto-prune expired cron run sessions (#12289)
Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.
New config option:
cron.sessionRetention: string | false (default: '24h')
The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.
Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely
Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps
Closes#12289
* fix: sweep cron session stores per agent
* docs: add changelog for session maintenance (#13083) (thanks @skyfallsin, @Glucksberg)
* fix: add warn-only session maintenance mode
* fix: warn-only maintenance defaults to active session
* fix: deliver maintenance warnings to active session
* docs: add session maintenance examples
* fix: accept duration and size maintenance thresholds
* refactor: share cron run session key check
* fix: format issues and replace defaultRuntime.warn with console.warn
---------
Co-authored-by: Pradeep Elankumaran <pradeepe@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: max <40643627+quotentiroler@users.noreply.github.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
Problem:
When users execute `/think off`, they still receive `reasoning_content`
from models configured with `reasoning: true` (e.g., GLM-4.7, GLM-4.6,
Kimi K2.5, MiniMax-M2.1).
Expected: `/think off` should completely disable reasoning content.
Actual: Reasoning content is still returned.
Root Cause:
The directive handlers delete `sessionEntry.thinkingLevel` when user
executes `/think off`. This causes the thinking level to become undefined,
and the system falls back to `resolveThinkingDefault()`, which checks the
model catalog and returns "low" for reasoning-capable models, ignoring the
user's explicit intent.
Why We Must Persist "off" (Design Rationale):
1. **Model-dependent defaults**: Unlike other directives where "off" means
use a global default, `thinkingLevel` has model-dependent defaults:
- Reasoning-capable models (GLM-4.7, etc.) → default "low"
- Other models → default "off"
2. **Existing pattern**: The codebase already follows this pattern for
`elevatedLevel`, which persists "off" explicitly to override defaults
that may be "on". The comment explains:
"Persist 'off' explicitly so `/elevated off` actually overrides defaults."
3. **User intent**: When a user explicitly executes `/think off`, they want
to disable thinking regardless of the model's capabilities. Deleting the
field breaks this intent by falling back to the model's default.
Solution:
Persist "off" value instead of deleting the field in all internal directive handlers:
- `src/auto-reply/reply/directive-handling.impl.ts`: Directive-only messages
- `src/auto-reply/reply/directive-handling.persist.ts`: Inline directives
- `src/commands/agent.ts`: CLI command-line flags
Gateway API Backward Compatibility:
The original implementation incorrectly mapped `null` to "off" in
`sessions-patch.ts` for consistency with internal handlers. This was a
breaking change because:
- Previously, `null` cleared the override (deleted the field)
- API clients lost the ability to "clear to default" via `null`
- This contradicts standard JSON semantics where `null` means "no value"
Restored original null semantics in `src/gateway/sessions-patch.ts`:
- `null` → delete field, fall back to model default (clear override)
- `"off"` → persist explicit override
- Other values → normalize and persist
This ensures backward compatibility for API clients while fixing the `/think off`
issue in internal handlers.
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
- Create shared PNG encoder module (src/media/png-encode.ts)
- Refactor qr-image.ts and live-image-probe.ts to use shared encoder
- Add safeParseJson to utils.ts and plugin-sdk exports
- Update msteams and pairing-store to use centralized safeParseJson
* refactor: consolidate duplicate utility functions
- Add escapeRegExp to src/utils.ts and remove 10 local duplicates
- Rename bash-tools clampNumber to clampWithDefault (different signature)
- Centralize formatError calls to use formatErrorMessage from infra/errors.ts
- Re-export formatErrorMessage from cli/cli-utils.ts to preserve API
* refactor: consolidate remaining escapeRegExp duplicates
* refactor: consolidate sleep, stripAnsi, and clamp duplicates
* Gateway: preserve Pi transcript parentId for injected messages
Thread: unknown
When: 2026-02-08 20:08 CST
Repo: https://github.com/openclaw/openclaw.git
Branch: codex/wip/2026-02-09/compact-post-compaction-parentid-fix
Problem
- Post-compaction turns sometimes lost the compaction summary + kept suffix in the *next* provider request.
- Root cause was session graph corruption: gateway appended "stopReason: injected" transcript lines via raw JSONL writes without `parentId`.
- Pi's `SessionManager.buildSessionContext()` walks the `parentId` chain from the current leaf; missing `parentId` can sever the active branch and hide the compaction entry.
Fix
- Use `SessionManager.appendMessage(...)` for injected assistant transcript writes so `parentId` is set to the current leaf.
- Route `chat.inject` through the same helper to avoid duplicating the broken raw JSONL append logic.
Why This Matters
- The compaction algorithm may be correct, but if the leaf chain is broken right after compaction, the provider payload cannot include the summary/suffix "shape" Pi expects.
Testing
- pnpm test src/agents/pi-embedded-helpers.post-compaction-shape.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.post-context.test.ts
- pnpm build
Notes
- This is provider-shape agnostic: it fixes transcript structure so Anthropic/Gemini/etc all see the same post-compaction context.
Resume
- If post-compaction looks wrong again, inspect the session transcript for entries missing `parentId` immediately after `type: compaction`.
* Gateway: guardrail test for transcript parentId (chat.inject)
* Gateway: guardrail against raw transcript appends (chat.ts)
* Gateway: add local AGENTS.md note to preserve Pi transcript parentId chain
* Changelog: note gateway post-compaction amnesia fix
* Gateway: store injected transcript messages with valid stopReason
* Gateway: use valid stopReason in injected fallback
* fix(paths): structurally resolve home dir to prevent Windows path bugs
Extract resolveRawHomeDir as a private function and gate the public
resolveEffectiveHomeDir through a single path.resolve() exit point.
This makes it structurally impossible for unresolved paths (missing
drive letter on Windows) to escape the function, regardless of how
many return paths exist in the raw lookup logic.
Simplify resolveRequiredHomeDir to only resolve the process.cwd()
fallback, since resolveEffectiveHomeDir now returns resolved values.
Fix shortenMeta in tool-meta.ts: the colon-based split for file:line
patterns (e.g. file.txt:12) conflicts with Windows drive letters
(C:\...) because indexOf(":") matches the drive colon first.
shortenHomeInString already handles file:line patterns correctly via
split/join, so the colon split was both unnecessary and harmful.
Update test assertions across all affected files to use path.resolve()
in expected values and input strings so they match the now-correct
resolved output on both Unix and Windows.
Fixes#12119
* fix(changelog): add paths Windows fix entry (#12125)
---------
Co-authored-by: Sebastian <19554889+sebslight@users.noreply.github.com>
* fix: use .js extension for ESM imports of RoutePeerKind
The imports incorrectly used .ts extension which doesn't resolve
with moduleResolution: NodeNext. Changed to .js and added 'type'
import modifier.
* fix tsconfig
* refactor: unify peer kind to ChatType, rename dm to direct
- Replace RoutePeerKind with ChatType throughout codebase
- Change 'dm' literal values to 'direct' in routing/session keys
- Keep backward compat: normalizeChatType accepts 'dm' -> 'direct'
- Add ChatType export to plugin-sdk, deprecate RoutePeerKind
- Update session key parsing to accept both 'dm' and 'direct' markers
- Update all channel monitors and extensions to use ChatType
BREAKING CHANGE: Session keys now use 'direct' instead of 'dm'.
Existing 'dm' keys still work via backward compat layer.
* fix tests
* test: update session key expectations for dmdirect migration
- Fix test expectations to expect :direct: in generated output
- Add explicit backward compat test for normalizeChatType('dm')
- Keep input test data with :dm: keys to verify backward compat
* fix: accept legacy 'dm' in session key parsing for backward compat
getDmHistoryLimitFromSessionKey now accepts both :dm: and :direct:
to ensure old session keys continue to work correctly.
* test: add explicit backward compat tests for dmdirect migration
- session-key.test.ts: verify both :dm: and :direct: keys are valid
- getDmHistoryLimitFromSessionKey: verify both formats work
* feat: backward compat for resetByType.dm config key
* test: skip unix-path Nix tests on Windows
* initial commit
* feat: implement deriveSessionTotalTokens function and update usage tests
* Added deriveSessionTotalTokens function to calculate total tokens based on usage and context tokens.
* Updated usage tests to include cases for derived session total tokens.
* Refactored session usage calculations in multiple files to utilize the new function for improved accuracy.
* fix: restore overflow truncation fallback + changelog/test hardening (#11551) (thanks @tyler6204)
* fix(cron): comprehensive cron scheduling and delivery fixes
- Fix delivery target resolution for isolated agent cron jobs
- Improve schedule parsing and validation
- Add job retry logic and error handling
- Enhance cron ops with better state management
- Add timer improvements for more reliable cron execution
- Add cron event type to protocol schema
- Support cron events in heartbeat runner (skip empty-heartbeat check,
use dedicated CRON_EVENT_PROMPT for relay)
* fix: remove cron debug test and add changelog/docs notes (#11641) (thanks @tyler6204)
* fix(gateway): use LAN IP for WebSocket/probe URLs when bind=lan (#11329)
When gateway.bind=lan, the HTTP server correctly binds to 0.0.0.0
(all interfaces), but WebSocket connection URLs, probe targets, and
Control UI links were hardcoded to 127.0.0.1. This caused CLI commands
and status probes to show localhost-only URLs even in LAN mode, and
made onboarding display misleading connection info.
- Add pickPrimaryLanIPv4() to gateway/net.ts to detect the machine's
primary LAN IPv4 address (prefers en0/eth0, falls back to any
external interface)
- Update pickProbeHostForBind() to use LAN IP when bind=lan
- Update buildGatewayConnectionDetails() to use LAN IP and report
"local lan <ip>" as the URL source
- Update resolveControlUiLinks() to return LAN-accessible URLs
- Update probe note in status.gather.ts to reflect new behavior
- Add tests for pickPrimaryLanIPv4 and bind=lan URL resolution
Closes#11329
Co-authored-by: Cursor <cursoragent@cursor.com>
* test: move vi.restoreAllMocks to afterEach in pickPrimaryLanIPv4
Per review feedback: avoid calling vi.restoreAllMocks() inside
individual tests as it restores all spies globally and can cause
ordering issues. Use afterEach in the describe block instead.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Changelog: note LAN bind URLs fix (#11448) (thanks @AnonO6)
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* refactor: update cron job wake mode and run mode handling
- Changed default wake mode from 'next-heartbeat' to 'now' in CronJobEditor and related CLI commands.
- Updated cron-tool tests to reflect changes in run mode, introducing 'due' and 'force' options.
- Enhanced cron-tool logic to handle new run modes and ensure compatibility with existing job structures.
- Added new tests for delivery plan consistency and job execution behavior under various conditions.
- Improved normalization functions to handle wake mode and session target casing.
This refactor aims to streamline cron job configurations and enhance the overall user experience with clearer defaults and improved functionality.
* test: enhance cron job functionality and UI
- Added tests to ensure the isolated agent correctly announces the final payload text when delivering messages via Telegram.
- Implemented a new function to pick the last deliverable payload from a list of delivery payloads.
- Enhanced the cron service to maintain legacy "every" jobs while minute cron jobs recompute schedules.
- Updated the cron store migration tests to verify the addition of anchorMs to legacy every schedules.
- Improved the UI for displaying cron job details, including job state and delivery information, with new styles and layout adjustments.
These changes aim to improve the reliability and user experience of the cron job system.
* test: enhance sessions thinking level handling
- Added tests to verify that the correct thinking levels are applied during session spawning.
- Updated the sessions-spawn-tool to include a new parameter for overriding thinking levels.
- Enhanced the UI to support additional thinking levels, including "xhigh" and "full", and improved the handling of current options in dropdowns.
These changes aim to improve the flexibility and accuracy of thinking level configurations in session management.
* feat: enhance session management and cron job functionality
- Introduced passthrough arguments in the test-parallel script to allow for flexible command-line options.
- Updated session handling to hide cron run alias session keys from the sessions list, improving clarity.
- Enhanced the cron service to accurately record job start times and durations, ensuring better tracking of job execution.
- Added tests to verify the correct behavior of the cron service under various conditions, including zero-delay timers.
These changes aim to improve the usability and reliability of session and cron job management.
* feat: implement job running state checks in cron service
- Added functionality to prevent manual job runs if a job is already in progress, enhancing job management.
- Updated the `isJobDue` function to include checks for running jobs, ensuring accurate scheduling.
- Enhanced the `run` function to return a specific reason when a job is already running.
- Introduced a new test case to verify the behavior of forced manual runs during active job execution.
These changes aim to improve the reliability and clarity of cron job execution and management.
* feat: add session ID and key to CronRunLogEntry model
- Introduced `sessionid` and `sessionkey` properties to the `CronRunLogEntry` struct for enhanced tracking of session-related information.
- Updated the initializer and Codable conformance to accommodate the new properties, ensuring proper serialization and deserialization.
These changes aim to improve the granularity of logging and session management within the cron job system.
* fix: improve session display name resolution
- Updated the `resolveSessionDisplayName` function to ensure that both label and displayName are trimmed and default to an empty string if not present.
- Enhanced the logic to prevent returning the key if it matches the label or displayName, improving clarity in session naming.
These changes aim to enhance the accuracy and usability of session display names in the UI.
* perf: skip cron store persist when idle timer tick produces no changes
recomputeNextRuns now returns a boolean indicating whether any job
state was mutated. The idle path in onTimer only persists when the
return value is true, eliminating unnecessary file writes every 60s
for far-future or idle schedules.
* fix: prep for merge - explicit delivery mode migration, docs + changelog (#10776) (thanks @tyler6204)
* security: add skill/plugin code safety scanner module
* security: integrate skill scanner into security audit
* security: add pre-install code safety scan for plugins
* style: fix curly brace lint errors in skill-scanner.ts
* docs: add changelog entry for skill code safety scanner
* security: redact credentials from config.get gateway responses
The config.get gateway method returned the full config snapshot
including channel credentials (Discord tokens, Slack botToken/appToken,
Telegram botToken, Feishu appSecret, etc.), model provider API keys,
and gateway auth tokens in plaintext.
Any WebSocket client—including the unauthenticated Control UI when
dangerouslyDisableDeviceAuth is set—could read every secret.
This adds redactConfigSnapshot() which:
- Deep-walks the config object and masks any field whose key matches
token, password, secret, or apiKey patterns
- Uses the existing redactSensitiveText() to scrub the raw JSON5 source
- Preserves the hash for change detection
- Includes 15 test cases covering all channel types
* security: make gateway config writes return redacted values
* test: disable control UI by default in gateway server tests
* fix: redact credentials in gateway config APIs (#9858) (thanks @abdelsfane)
---------
Co-authored-by: George Pickett <gpickett00@gmail.com>
* Gateway: require explicit auth for url overrides
* Gateway: scope credential blocking to non-local URLs only
Address review feedback: the previous fix blocked credential fallback for
ALL URL overrides, which was overly strict and could break workflows that
use --url to switch between loopback/tailnet without passing credentials.
Now credential fallback is only blocked for non-local URLs (public IPs,
external hostnames). Local addresses (127.0.0.1, localhost, private IPs
like 192.168.x.x, 10.x.x.x, tailnet 100.x.x.x) still get credential
fallback as before.
This maintains the security fix (preventing credential exfiltration to
attacker-controlled URLs) while preserving backward compatibility for
legitimate local URL overrides.
* Security: require explicit credentials for gateway url overrides (#8113) (thanks @victormier)
* Gateway: reuse explicit auth helper for url overrides (#8113) (thanks @victormier)
* Tests: format gateway chat test (#8113) (thanks @victormier)
* Tests: require explicit auth for gateway url overrides (#8113) (thanks @victormier)
---------
Co-authored-by: Victor Mier <victormier@gmail.com>
- Add forceReload option to ensureLoaded to avoid stat I/O in normal
paths while still detecting cross-service writes in the timer path
- Post isolated job summary back to main session (restores the old
isolation.postToMainPrefix behavior via delivery model)
- Update legacy migration tests to check delivery.channel instead of
payload.channel (normalization now moves delivery fields to top-level)
- Remove legacy deliver/channel/to/bestEffortDeliver from payload schema
- Update protocol conformance test for delivery modes
- Regenerate GatewayModels.swift (isolation -> delivery)
- Updated isolated cron jobs to support new delivery modes: `announce` and `none`, improving output management.
- Refactored job configuration to remove legacy fields and streamline delivery settings.
- Enhanced the `CronJobEditor` UI to reflect changes in delivery options, including a new segmented control for delivery mode selection.
- Updated documentation to clarify the new delivery configurations and their implications for job execution.
- Improved tests to validate the new delivery behavior and ensure backward compatibility with legacy settings.
This update provides users with greater flexibility in managing how isolated jobs deliver their outputs, enhancing overall usability and clarity in job configurations.
- Added support for new delivery modes in cron jobs: `announce`, `deliver`, and `none`.
- Updated documentation to reflect changes in delivery options and usage examples.
- Enhanced the cron job schema to include delivery configuration.
- Refactored related CLI commands and UI components to accommodate the new delivery settings.
- Improved handling of legacy delivery fields for backward compatibility.
This update allows users to choose how output from isolated jobs is delivered, enhancing flexibility in job management.
Fixes#7667
Task 1: Fix cron operation timeouts
- Increase default gateway tool timeout from 10s to 30s
- Increase cron-specific tool timeout to 60s
- Increase CLI default timeout from 10s to 30s
- Prevents timeouts when gateway is busy with long-running jobs
Task 2: Add timestamp validation
- New validateScheduleTimestamp() function in validate-timestamp.ts
- Rejects atMs timestamps more than 1 minute in the past
- Rejects atMs timestamps more than 10 years in the future
- Applied to both cron.add and cron.update operations
- Provides helpful error messages with current time and offset
Task 3: Enable file sync for manual edits
- Track file modification time (storeFileMtimeMs) in CronServiceState
- Check file mtime in ensureLoaded() and reload if changed
- Recompute next runs after reload to maintain accuracy
- Update mtime after persist() to prevent reload loop
- Dashboard now picks up manual edits to ~/.openclaw/cron/jobs.json