* fix: make cleanup "keep" persist subagent sessions indefinitely
* feat: expose subagent session metadata in sessions list
* fix: include status and timing in sessions_list tool
* fix: hide injected timestamp prefixes in chat ui
* feat: push session list updates over websocket
* feat: expose child subagent sessions in subagents list
* feat: add admin http endpoint to kill sessions
* Emit session.message websocket events for transcript updates
* Estimate session costs in sessions list
* Add direct session history HTTP and SSE endpoints
* Harden dashboard session events and history APIs
* Add session lifecycle gateway methods
* Add dashboard session API improvements
* Add dashboard session model and parent linkage support
* fix: tighten dashboard session API metadata
* Fix dashboard session cost metadata
* Persist accumulated session cost
* fix: stop followup queue drain cfg crash
* Fix dashboard session create and model metadata
* fix: stop guessing session model costs
* Gateway: cache OpenRouter pricing for configured models
* Gateway: add timeout session status
* Fix subagent spawn test config loading
* Gateway: preserve operator scopes without device identity
* Emit user message transcript events and deduplicate plugin warnings
* feat: emit sessions.changed lifecycle event on subagent spawn
Adds a session-lifecycle-events module (similar to transcript-events)
that emits create events when subagents are spawned. The gateway
server.impl.ts listens for these events and broadcasts sessions.changed
with reason=create to SSE subscribers, so dashboards can pick up new
subagent sessions without polling.
* Gateway: allow persistent dashboard orchestrator sessions
* fix: preserve operator scopes for token-authenticated backend clients
Backend clients (like agent-dashboard) that authenticate with a valid gateway
token but don't present a device identity were getting their scopes stripped.
The scope-clearing logic ran before checking the device identity decision,
so even when evaluateMissingDeviceIdentity returned 'allow' (because
roleCanSkipDeviceIdentity passed for token-authed operators), scopes were
already cleared.
Fix: also check decision.kind before clearing scopes, so token-authenticated
operators keep their requested scopes.
* Gateway: allow operator-token session kills
* Fix stale active subagent status after follow-up runs
* Fix dashboard image attachments in sessions send
* Fix completed session follow-up status updates
* feat: stream session tool events to operator UIs
* Add sessions.steer gateway coverage
* Persist subagent timing in session store
* Fix subagent session transcript event keys
* Fix active subagent session status in gateway
* bump session label max to 512
* Fix gateway send session reactivation
* fix: publish terminal session lifecycle state
* feat: change default session reset to effectively never
- Change DEFAULT_RESET_MODE from "daily" to "idle"
- Change DEFAULT_IDLE_MINUTES from 60 to 0 (0 = disabled/never)
- Allow idleMinutes=0 through normalization (don't clamp to 1)
- Treat idleMinutes=0 as "no idle expiry" in evaluateSessionFreshness
- Default behavior: mode "idle" + idleMinutes 0 = sessions never auto-reset
- Update test assertion for new default mode
* fix: prep session management followups (#50101) (thanks @clay-datacurve)
---------
Co-authored-by: Tyler Yust <TYTYYUST@YAHOO.COM>
* fix: preserve stored provider in resolveSessionModelRef for vendor-prefixed models
When an OpenRouter model with a vendor prefix (e.g. "anthropic/claude-haiku-4.5")
was successfully used and persisted to the session entry, the next call to
resolveSessionModelRef would re-parse the model string through parseModelRef,
which splits on the first slash and incorrectly extracts "anthropic" as the
provider — discarding the stored "openrouter" provider entirely. This caused
subsequent requests to attempt direct Anthropic API calls with an OpenRouter
API key, producing "credit balance too low" billing errors.
The fix trusts the explicitly stored modelProvider on the session entry and
skips parseModelRef re-parsing when a provider is already recorded. parseModelRef
is still used as a fallback when no provider is stored on the entry.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Changelog: add OpenRouter note for #22753
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
* fix(gateway): normalize session key casing to prevent ghost sessions on Linux
On case-sensitive filesystems (Linux), mixed-case session keys like
agent:ops:MySession and agent:ops:mysession resolve to different store
entries, creating ghost duplicates that never converge.
Core changes in session-utils.ts:
- resolveSessionStoreKey: lowercase all session key components
- canonicalizeSpawnedByForAgent: accept cfg, resolve main-alias references
via canonicalizeMainSessionAlias after lowercasing
- loadSessionEntry: return legacyKey only when it differs from canonicalKey
- resolveGatewaySessionStoreTarget: scan store for case-insensitive matches;
add optional scanLegacyKeys param to skip disk reads for read-only callers
- Export findStoreKeysIgnoreCase for use by write-path consumers
- Compare global/unknown sentinels case-insensitively in all canonicalization
functions
sessions-resolve.ts:
- Make resolveSessionKeyFromResolveParams async for inline migration
- Check canonical key first (fast path), then fall back to legacy scan
- Delete ALL legacy case-variant keys in a single updateSessionStore pass
Fixes#12603
* fix(gateway): propagate canonical keys and clean up all case variants on write paths
- agent.ts: use canonicalizeSpawnedByForAgent (with cfg) instead of raw
toLowerCase; use findStoreKeysIgnoreCase to delete all legacy variants
on store write; pass canonicalKey to addChatRun, registerAgentRunContext,
resolveSendPolicy, and agentCommand
- sessions.ts: replace single-key migration with full case-variant cleanup
via findStoreKeysIgnoreCase in patch/reset/delete/compact handlers; add
case-insensitive fallback in preview (store already loaded); make
sessions.resolve handler async; pass scanLegacyKeys: false in preview
- server-node-events.ts: use findStoreKeysIgnoreCase to clean all legacy
variants on voice.transcript and agent.request write paths; pass
canonicalKey to addChatRun and agentCommand
* test(gateway): add session key case-normalization tests
Cover the case-insensitive session key canonicalization logic:
- resolveSessionStoreKey normalizes mixed-case bare and prefixed keys
- resolveSessionStoreKey resolves mixed-case main aliases (MAIN, Main)
- resolveGatewaySessionStoreTarget includes legacy mixed-case store keys
- resolveGatewaySessionStoreTarget collects all case-variant duplicates
- resolveGatewaySessionStoreTarget finds legacy main alias keys with
customized mainKey configuration
All 5 tests fail before the production changes, pass after.
* fix: clean legacy session alias cleanup gaps (openclaw#12846) thanks @mcaxtr
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix: prune stale session entries, cap entry count, and rotate sessions.json
The sessions.json file grows unbounded over time. Every heartbeat tick (default: 30m)
triggers multiple full rewrites, and session keys from groups, threads, and DMs
accumulate indefinitely with large embedded objects (skillsSnapshot,
systemPromptReport). At >50MB the synchronous JSON parse blocks the event loop,
causing Telegram webhook timeouts and effectively taking the bot down.
Three mitigations, all running inside saveSessionStoreUnlocked() on every write:
1. Prune stale entries: remove entries with updatedAt older than 30 days
(configurable via session.maintenance.pruneDays in openclaw.json)
2. Cap entry count: keep only the 500 most recently updated entries
(configurable via session.maintenance.maxEntries). Entries without updatedAt
are evicted first.
3. File rotation: if the existing sessions.json exceeds 10MB before a write,
rename it to sessions.json.bak.{timestamp} and keep only the 3 most recent
backups (configurable via session.maintenance.rotateBytes).
All three thresholds are configurable under session.maintenance in openclaw.json
with Zod validation. No env vars.
Existing tests updated to use Date.now() instead of epoch-relative timestamps
(1, 2, 3) that would be incorrectly pruned as stale.
27 new tests covering pruning, capping, rotation, and integration scenarios.
* feat: auto-prune expired cron run sessions (#12289)
Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.
New config option:
cron.sessionRetention: string | false (default: '24h')
The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.
Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely
Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps
Closes#12289
* fix: sweep cron session stores per agent
* docs: add changelog for session maintenance (#13083) (thanks @skyfallsin, @Glucksberg)
* fix: add warn-only session maintenance mode
* fix: warn-only maintenance defaults to active session
* fix: deliver maintenance warnings to active session
* docs: add session maintenance examples
* fix: accept duration and size maintenance thresholds
* refactor: share cron run session key check
* fix: format issues and replace defaultRuntime.warn with console.warn
---------
Co-authored-by: Pradeep Elankumaran <pradeepe@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: max <40643627+quotentiroler@users.noreply.github.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
* refactor: update cron job wake mode and run mode handling
- Changed default wake mode from 'next-heartbeat' to 'now' in CronJobEditor and related CLI commands.
- Updated cron-tool tests to reflect changes in run mode, introducing 'due' and 'force' options.
- Enhanced cron-tool logic to handle new run modes and ensure compatibility with existing job structures.
- Added new tests for delivery plan consistency and job execution behavior under various conditions.
- Improved normalization functions to handle wake mode and session target casing.
This refactor aims to streamline cron job configurations and enhance the overall user experience with clearer defaults and improved functionality.
* test: enhance cron job functionality and UI
- Added tests to ensure the isolated agent correctly announces the final payload text when delivering messages via Telegram.
- Implemented a new function to pick the last deliverable payload from a list of delivery payloads.
- Enhanced the cron service to maintain legacy "every" jobs while minute cron jobs recompute schedules.
- Updated the cron store migration tests to verify the addition of anchorMs to legacy every schedules.
- Improved the UI for displaying cron job details, including job state and delivery information, with new styles and layout adjustments.
These changes aim to improve the reliability and user experience of the cron job system.
* test: enhance sessions thinking level handling
- Added tests to verify that the correct thinking levels are applied during session spawning.
- Updated the sessions-spawn-tool to include a new parameter for overriding thinking levels.
- Enhanced the UI to support additional thinking levels, including "xhigh" and "full", and improved the handling of current options in dropdowns.
These changes aim to improve the flexibility and accuracy of thinking level configurations in session management.
* feat: enhance session management and cron job functionality
- Introduced passthrough arguments in the test-parallel script to allow for flexible command-line options.
- Updated session handling to hide cron run alias session keys from the sessions list, improving clarity.
- Enhanced the cron service to accurately record job start times and durations, ensuring better tracking of job execution.
- Added tests to verify the correct behavior of the cron service under various conditions, including zero-delay timers.
These changes aim to improve the usability and reliability of session and cron job management.
* feat: implement job running state checks in cron service
- Added functionality to prevent manual job runs if a job is already in progress, enhancing job management.
- Updated the `isJobDue` function to include checks for running jobs, ensuring accurate scheduling.
- Enhanced the `run` function to return a specific reason when a job is already running.
- Introduced a new test case to verify the behavior of forced manual runs during active job execution.
These changes aim to improve the reliability and clarity of cron job execution and management.
* feat: add session ID and key to CronRunLogEntry model
- Introduced `sessionid` and `sessionkey` properties to the `CronRunLogEntry` struct for enhanced tracking of session-related information.
- Updated the initializer and Codable conformance to accommodate the new properties, ensuring proper serialization and deserialization.
These changes aim to improve the granularity of logging and session management within the cron job system.
* fix: improve session display name resolution
- Updated the `resolveSessionDisplayName` function to ensure that both label and displayName are trimmed and default to an empty string if not present.
- Enhanced the logic to prevent returning the key if it matches the label or displayName, improving clarity in session naming.
These changes aim to enhance the accuracy and usability of session display names in the UI.
* perf: skip cron store persist when idle timer tick produces no changes
recomputeNextRuns now returns a boolean indicating whether any job
state was mutated. The idle path in onTimer only persists when the
return value is true, eliminating unnecessary file writes every 60s
for far-future or idle schedules.
* fix: prep for merge - explicit delivery mode migration, docs + changelog (#10776) (thanks @tyler6204)