* fix: canonicalize session keys at write time to prevent orphaned sessions (#29683)
resolveSessionKey() uses hardcoded DEFAULT_AGENT_ID="main", but all read
paths canonicalize via cfg. When the configured default agent differs
(e.g. "ops" with mainKey "work"), writes produce "agent:main:main" while
reads look up "agent:ops:work", orphaning transcripts on every restart.
Fix all three write-path call sites by wrapping with
canonicalizeMainSessionAlias:
- initSessionState (auto-reply/reply/session.ts)
- runWebHeartbeatOnce (web/auto-reply/heartbeat-runner.ts)
- resolveCronAgentSessionKey (cron/isolated-agent/session-key.ts)
Add startup migration (migrateOrphanedSessionKeys) to rename existing
orphaned keys to canonical form, merging by most-recent updatedAt.
* fix: address review — track agent IDs in migration map, align snapshot key
P1: migrateOrphanedSessionKeys now tracks agentId alongside each store
path in a Map instead of inferring from the filesystem path. This
correctly handles custom session.store templates outside the default
agents/<id>/ layout.
P2: Pass the already-canonicalized sessionKey to getSessionSnapshot so
the heartbeat snapshot reads/restores use the same key as the write path.
* fix: log migration results at all early return points
migrateOrphanedSessionKeys runs before detectLegacyStateMigrations, so
it can canonicalize legacy keys (e.g. "main" → "agent:main:main") before
the legacy detector sees them. This caused the early return path to skip
logging, breaking doctor-state-migrations tests that assert log.info was
called.
Extract logMigrationResults helper and call it at every return point.
* fix: handle shared stores and ~ expansion in migration
P1: When session.store has no {agentId}, all agents resolve to the same
file. Track all agentIds per store path (Map<path, Set<id>>) and run
canonicalization once per agent. Skip cross-agent "agent:main:*"
remapping when "main" is a legitimate configured agent sharing the store,
to avoid merging its data into another agent's namespace.
P2: Use expandHomePrefix (environment-aware ~ resolution) instead of
os.homedir() in resolveStorePathFromTemplate, matching the runtime
resolveStorePath behavior for OPENCLAW_HOME/HOME overrides.
* fix: narrow cross-agent remap to provable orphan aliases only
Only remap agent:main:* keys where the suffix is a main session alias
("main" or the configured mainKey). Other agent:main:* keys — hooks,
subagents, cron sessions, per-sender keys — may be intentional
cross-agent references and must not be silently moved into another
agent's namespace.
* fix: run orphan-key session migration at gateway startup (#29683)
* fix: canonicalize cross-agent legacy main aliases in session keys (#29683)
* fix: guard shared-store migration against cross-agent legacy alias remap (#29683)
* refactor: split session-key migration out of pr 30654
---------
Co-authored-by: Your Name <your_email@example.com>
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix(cron): track and log bestEffort delivery failures, mark not delivered on partial failure
* fix(cron): cache successful results on partial failure to preserve replay idempotency
When a best-effort send partially fails, we now still cache the successful delivery results via rememberCompletedDirectCronDelivery. This prevents duplicate sends on same-process replay while still correctly marking the job as not fully delivered.
* fix(cron): preserve partial-failure state on replay (#27069)
* fix(cron): restore test infrastructure and fix formatting
* fix: clarify cron best-effort partial delivery status (#42535) (thanks @MoerAI)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix(telegram): improve error messages for 403 bot not member errors
- Detect 403 'bot is not a member' errors specifically
- Provide actionable guidance for users to fix the issue
- Fixes#48273 where outbound sendMessage fails with 403
Root cause:
When a Telegram bot tries to send a message to a channel/group it's not
a member of, the API returns 403 'bot is not a member of the channel chat'.
The error message was not clear about how to fix this.
Fix:
1. Detect 403 errors in wrapTelegramChatNotFoundError
2. Provide clear error message explaining the issue
3. Suggest adding the bot to the channel/group
* fix(telegram): fix regex precedence for 403 error detection
- Group alternatives correctly: /403.*(bot.*not.*member|bot was blocked)/i
- Require 403 for both alternatives (previously bot.*blocked matched any error)
- Update error message to cover both scenarios
- Fixes Greptile review feedback
* fix(telegram): correct regex alternation precedence for 403 errors
- Fix: /403.*(bot.*not.*member|bot was blocked)/ → /403.*(bot.*not.*member|bot.*blocked)/
- Ensures 403 requirement applies to both alternatives
- Fixes Greptile review comment on PR #48650
* fix(telegram): add 'bot was kicked' to 403 error regex and message
* fix(telegram): preserve membership delivery errors
* fix: improve Telegram 403 membership delivery errors (#53635) (thanks @w-sss)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix: make cleanup "keep" persist subagent sessions indefinitely
* feat: expose subagent session metadata in sessions list
* fix: include status and timing in sessions_list tool
* fix: hide injected timestamp prefixes in chat ui
* feat: push session list updates over websocket
* feat: expose child subagent sessions in subagents list
* feat: add admin http endpoint to kill sessions
* Emit session.message websocket events for transcript updates
* Estimate session costs in sessions list
* Add direct session history HTTP and SSE endpoints
* Harden dashboard session events and history APIs
* Add session lifecycle gateway methods
* Add dashboard session API improvements
* Add dashboard session model and parent linkage support
* fix: tighten dashboard session API metadata
* Fix dashboard session cost metadata
* Persist accumulated session cost
* fix: stop followup queue drain cfg crash
* Fix dashboard session create and model metadata
* fix: stop guessing session model costs
* Gateway: cache OpenRouter pricing for configured models
* Gateway: add timeout session status
* Fix subagent spawn test config loading
* Gateway: preserve operator scopes without device identity
* Emit user message transcript events and deduplicate plugin warnings
* feat: emit sessions.changed lifecycle event on subagent spawn
Adds a session-lifecycle-events module (similar to transcript-events)
that emits create events when subagents are spawned. The gateway
server.impl.ts listens for these events and broadcasts sessions.changed
with reason=create to SSE subscribers, so dashboards can pick up new
subagent sessions without polling.
* Gateway: allow persistent dashboard orchestrator sessions
* fix: preserve operator scopes for token-authenticated backend clients
Backend clients (like agent-dashboard) that authenticate with a valid gateway
token but don't present a device identity were getting their scopes stripped.
The scope-clearing logic ran before checking the device identity decision,
so even when evaluateMissingDeviceIdentity returned 'allow' (because
roleCanSkipDeviceIdentity passed for token-authed operators), scopes were
already cleared.
Fix: also check decision.kind before clearing scopes, so token-authenticated
operators keep their requested scopes.
* Gateway: allow operator-token session kills
* Fix stale active subagent status after follow-up runs
* Fix dashboard image attachments in sessions send
* Fix completed session follow-up status updates
* feat: stream session tool events to operator UIs
* Add sessions.steer gateway coverage
* Persist subagent timing in session store
* Fix subagent session transcript event keys
* Fix active subagent session status in gateway
* bump session label max to 512
* Fix gateway send session reactivation
* fix: publish terminal session lifecycle state
* feat: change default session reset to effectively never
- Change DEFAULT_RESET_MODE from "daily" to "idle"
- Change DEFAULT_IDLE_MINUTES from 60 to 0 (0 = disabled/never)
- Allow idleMinutes=0 through normalization (don't clamp to 1)
- Treat idleMinutes=0 as "no idle expiry" in evaluateSessionFreshness
- Default behavior: mode "idle" + idleMinutes 0 = sessions never auto-reset
- Update test assertion for new default mode
* fix: prep session management followups (#50101) (thanks @clay-datacurve)
---------
Co-authored-by: Tyler Yust <TYTYYUST@YAHOO.COM>
* fix(bluebubbles): auto-create chats for new numbers, persist outbound messages to session transcripts
Two fixes for BlueBubbles message tool behavior:
1. **Attachment sends to new phone numbers**: sendBlueBubblesAttachment now
auto-creates a new DM chat (via /api/v1/chat/new) when no existing chat
is found for a handle target, matching the behavior already present in
sendMessageBlueBubbles for text sends. The existing createNewChatWithMessage
is refactored into a reusable createChatForHandle that returns the chatGuid.
2. **Outbound message session persistence**: Ensures outbound messages sent
via the message tool are reliably tracked in session transcripts:
- ensureOutboundSessionEntry now falls back to directly creating a session
store entry when recordSessionMetaFromInbound returns null, guaranteeing
a sessionId exists for the subsequent mirror append.
- appendAssistantMessageToSessionTranscript now normalizes the session key
(lowercased) when looking up the store, preventing case mismatches
between the store keys and the mirror sessionKey.
Tests added for all changes.
* test(slack): verify outbound session tracking and new target sends for Slack
The shared infrastructure changes from the BlueBubbles fix (session key
normalization in transcript.ts and fallback session entry creation in
outbound-session.ts) already cover Slack. Slack's sendMessageSlack uses
conversations.open to auto-create DM channels for new user targets.
Add tests confirming:
- Slack user DM and channel session route resolution (outbound.test.ts)
- Slack session key normalization for transcript append (sessions.test.ts)
- Slack outbound sendText/sendMedia to new user and channel targets (channel.test.ts)
* fix(cron): skip stale delayed deliveries
* fix: prep PR #50092