openclaw/src
Vishal Doshi e91a5b0216 fix: release stale session locks and add watchdog for hung API calls (#18060)
When a model API call hangs indefinitely (e.g. Anthropic quota exceeded
mid-call), the gateway acquires a session .jsonl.lock but the promise
never resolves, so the try/finally block never reaches release(). Since
the owning PID is the gateway itself, stale detection cannot help —
isPidAlive() always returns true.

This commit adds four layers of defense:

1. **In-process lock watchdog** (session-write-lock.ts)
   - Track acquiredAt timestamp on each held lock
   - 60-second interval timer checks all held locks
   - Auto-releases any lock held longer than maxHoldMs (default 5 min)
   - Catches the hung-API-call case that try/finally cannot

2. **Gateway startup cleanup** (server-startup.ts)
   - On boot, scan all agent session directories for *.jsonl.lock files
   - Remove locks with dead PIDs or older than staleMs (30 min)
   - Log each cleaned lock for diagnostics

3. **openclaw doctor stale lock detection** (doctor-session-locks.ts)
   - New health check scans for .jsonl.lock files
   - Reports PID status and age of each lock found
   - In --fix mode, removes stale locks automatically

4. **Transcript error entry on API failure** (attempt.ts)
   - When promptError is set, write an error marker to the session
     transcript before releasing the lock
   - Preserves conversation history even on model API failures

Closes #18060
2026-02-16 23:59:22 +01:00
..
acp perf(test): fold acp event mapper tests into client suite 2026-02-16 02:45:00 +00:00
agents fix: release stale session locks and add watchdog for hung API calls (#18060) 2026-02-16 23:59:22 +01:00
auto-reply fix(gateway): set explicit chat timeouts for mesh gateway calls 2026-02-16 23:58:23 +01:00
browser test: add tests for extraArgs filtering logic 2026-02-16 23:52:42 +01:00
canvas-host chore: remove accidental a2ui bundle artifacts 2026-02-16 02:45:00 +00:00
channels fix(onboarding): keep wildcard allowFrom helper string-typed 2026-02-16 22:55:59 +00:00
cli CLI: preserve message send components payload 2026-02-16 23:54:08 +01:00
commands fix: release stale session locks and add watchdog for hung API calls (#18060) 2026-02-16 23:59:22 +01:00
compat
config config: align memory hybrid UI metadata with schema labels/help 2026-02-16 23:59:19 +01:00
cron cron: keep usage telemetry in run log types + error paths 2026-02-16 23:58:38 +01:00
daemon fix(daemon): prefer current node and add macOS version manager paths to service PATH 2026-02-16 23:53:41 +01:00
discord Fix Discord auto-thread attempting to thread in Forum/Media channels\n\nCreating threads on messages within Forum/Media channels is often redundant\nor invalid (as messages are already posts). This prevents API errors and spam.\n\nFix: Check channel type before attempting auto-thread creation. 2026-02-16 23:59:16 +01:00
docs
gateway fix: release stale session locks and add watchdog for hung API calls (#18060) 2026-02-16 23:59:22 +01:00
hooks fix(session-memory): fallback to rotated transcript after /new 2026-02-16 23:49:41 +01:00
imessage channels: migrate core channel account listing to factory 2026-02-16 23:53:19 +01:00
infra fix: include OPENCLAW_SERVICE_VERSION in system presence version detection 2026-02-16 23:56:10 +01:00
line refactor(channels): dedupe transport and gateway test scaffolds 2026-02-16 14:59:31 +00:00
link-understanding
linq feat(linq): add read receipts, typing indicators, and User-Agent header 2026-02-16 23:52:56 +01:00
logging feat: add stuck loop detection and exponential backoff infrastructure for agent polling (#17118) 2026-02-16 15:16:35 -05:00
macos test: tighten relay smoke + slack token validation 2026-02-16 02:45:00 +00:00
markdown test: remove duplicate hr spacing assertion 2026-02-16 06:16:33 +00:00
media fix(media): clean expired files in subdirectories 2026-02-16 23:50:56 +01:00
media-understanding fix: support OAuth for Gemini media understanding 2026-02-16 23:53:21 +01:00
memory Memory: fix MMR tie-break and temporal timestamp dedupe 2026-02-16 23:59:19 +01:00
node-host perf(test): fold node-host runner tests into sanitize env suite 2026-02-16 02:45:00 +00:00
pairing refactor(core): dedupe tool policy and IPv4 matcher logic 2026-02-16 16:14:54 +00:00
plugin-sdk channels: add createAccountListHelpers factory 2026-02-16 23:53:19 +01:00
plugins feat: add before_message_write plugin hook 2026-02-16 23:58:12 +01:00
process fix: add windowsHide: true to spawn in runCommandWithTimeout 2026-02-16 23:49:47 +01:00
providers refactor(agent): dedupe harness and command workflows 2026-02-16 14:59:30 +00:00
routing Fix Discord session routing continuity (enable lastRoute for groups)\n\nPreviously, 'updateLastRoute' was only enabled for Direct Messages.\nThis meant that group/channel sessions did not update their routing\nmetadata (last channel/to/accountId) in 'session-meta.json'.\n\nIf the bot restarted or a proactive cron job tried to send a message\nto a group session using 'sessions_send' without an explicit 'to' field,\nit would fail because 'lastRoute' was missing or stale.\n\nFix: Enable 'updateLastRoute' for all Discord messages (Group + DM),\nensuring the session store always has the latest valid routing target. 2026-02-16 23:59:16 +01:00
scripts
security refactor(core): dedupe tool policy and IPv4 matcher logic 2026-02-16 16:14:54 +00:00
sessions perf(test): fold session key utils into routing session key suite 2026-02-16 02:45:00 +00:00
shared refactor(core): dedupe shared config and runtime helpers 2026-02-16 14:59:30 +00:00
signal channels: migrate core channel account listing to factory 2026-02-16 23:53:19 +01:00
slack fix(slack): extract text and media from forwarded message attachments 2026-02-16 23:55:34 +01:00
telegram Handle Telegram poll vote updates for agent context 2026-02-16 23:54:56 +01:00
terminal test: strengthen ports, tool policy, and note wrapping 2026-02-16 02:45:00 +00:00
test-helpers refactor(test): simplify state dir env helpers 2026-02-16 00:08:00 +00:00
test-utils refactor(core): dedupe shared config and runtime helpers 2026-02-16 14:59:30 +00:00
tts refactor(channels): dedupe transport and gateway test scaffolds 2026-02-16 14:59:31 +00:00
tui refactor(tests): share harnesses for cli and monitor fixtures 2026-02-16 17:06:40 +00:00
types
utils refactor(utils): share chunkItems helper 2026-02-16 01:52:30 +00:00
web fix(whatsapp): allow per-message link preview override\n\nWhatsApp messages default to enabling link previews for URLs. This adds\nsupport for overriding this behavior per-message via the \nparameter (e.g. from tool options), consistent with Telegram.\n\nFix: Updated internal WhatsApp Web API layers to pass option\ndown to Baileys . 2026-02-16 23:57:09 +01:00
whatsapp
wizard Onboarding: fix webchat URL loopback and canonical session 2026-02-16 23:52:00 +01:00
channel-web.ts
docker-setup.test.ts
entry.ts
extensionAPI.ts
globals.ts
index.ts
logger.test.ts perf(test): fold console prefix tests into logger suite 2026-02-16 02:45:00 +00:00
logger.ts
logging.ts
polls.test.ts
polls.ts
runtime.ts refactor(core): extract shared runtime and wizard schemas 2026-02-16 17:06:40 +00:00
utils.test.ts perf(test): drop redundant index entrypoint tests 2026-02-16 02:45:00 +00:00
utils.ts
version.test.ts
version.ts