openclaw

Commit Graph

Author	SHA1	Message	Date
Julia Bush	e94ebfa084	fix: harden gateway SIGTERM shutdown (#51242 ) (thanks @juliabush) * fix: increase shutdown timeout to avoid SIGTERM hang * fix(telegram): abort polling fetch on shutdown to prevent SIGTERM hang * fix(gateway): enforce hard exit on shutdown timeout for SIGTERM * fix: tighten gateway shutdown watchdog * fix: harden gateway SIGTERM shutdown (#51242) (thanks @juliabush) --------- Co-authored-by: Ayaan Zaidi <hi@obviy.us>	2026-03-23 15:01:42 +05:30
Peter Steinberger	fe5819887b	refactor(gateway): centralize discovery target handling	2026-03-23 00:38:31 -07:00
Peter Steinberger	deecf68b59	fix(gateway): fail closed on unresolved discovery endpoints	2026-03-23 00:27:37 -07:00
Peter Steinberger	75835fc664	test: restore runtime-aware cli mocks	2026-03-22 18:35:37 -07:00
Peter Steinberger	4ee41cc6f3	refactor(cli): separate json payload output from logging	2026-03-22 23:19:17 +00:00
Peter Steinberger	680eff63fb	fix: land SIGUSR1 orphan recovery regressions (#47719 ) (thanks @joeykrug)	2026-03-15 22:32:36 -07:00
Peter Steinberger	4e055d8df2	refactor: share gateway timeout parsing	2026-03-14 01:41:16 +00:00
Peter Steinberger	158a3b49a7	test: deduplicate cli option collision fixtures	2026-03-10 20:34:54 +00:00
Charles Dusek	54be30ef89	fix(agents): bound compaction retry wait and drain embedded runs on restart (#40324 ) Merged via squash. Prepared head SHA: `cfd99562d6` Co-authored-by: cgdusek <38732970+cgdusek@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman	2026-03-09 08:27:29 -07:00
Peter Steinberger	3caab9260c	test: narrow gateway loop signal harness	2026-03-09 07:42:15 +00:00
Peter Steinberger	cc0f30f5fb	test: fix windows runtime and restart loop harnesses	2026-03-09 07:22:23 +00:00
merlin	f84adcbe88	fix: release gateway lock on restart failure + reply to Codex reviews - Release gateway lock when in-process restart fails, so daemon restart/stop can still manage the process (Codex P2) - P1 (env mismatch) already addressed: best-effort by design, documented in JSDoc	2026-03-09 05:53:52 +00:00
merlin	c79a0dbdb4	fix: address bot review feedback on #35862 - Remove dead 'return false' in runServiceStart (Greptile) - Include stack trace in run-loop crash guard error log (Greptile) - Only catch startup errors on subsequent restarts, not initial start (Codex P1) - Add JSDoc note about env var false positive edge case (Codex P1)	2026-03-09 05:53:52 +00:00
merlin	6740cdf160	fix(gateway): catch startup failure in run loop to prevent process exit (#35862 ) When an in-process restart (SIGUSR1) triggers a config-triggered restart and the new config is invalid, params.start() throws and the while loop exits, killing the process. On macOS this loses TCC permissions. Wrap params.start() in try/catch: on failure, set server=null, log the error, and wait for the next SIGUSR1 instead of crashing.	2026-03-09 05:53:52 +00:00
Daniel dos Santos Reis	1d6a2d0165	fix(gateway): exit non-zero on restart shutdown timeout When a config-change restart hits the force-exit timeout, exit with code 1 instead of 0 so launchd/systemd treats it as a failure and triggers a clean process restart. Stop-timeout stays at exit(0) since graceful stops should not cause supervisor recovery. Closes #36822	2026-03-09 05:38:54 +00:00
Vincent Koc	76a028a50a	Gateway CLI: allowlist password-file fixture	2026-03-07 18:28:18 -08:00
Vincent Koc	4062aa5e5d	Gateway: add safer password-file input for gateway run (#39067 ) * CLI: add gateway password-file option * Docs: document safer gateway password input * Update src/cli/gateway-cli/run.ts Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Tests: clean up gateway password temp dirs * CLI: restore gateway password warning flow * Security: harden secret file reads --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-03-07 18:20:17 -08:00
Peter Steinberger	1b9e4800eb	test: fix gateway register option collision mock	2026-03-08 01:58:33 +00:00
Vincent Koc	2c7fb54956	Config: fail closed invalid config loads (#39071 ) * Config: fail closed invalid config loads * CLI: keep diagnostics on explicit best-effort config * Tests: cover invalid config best-effort diagnostics * Changelog: note invalid config fail-closed fix * Status: pass best-effort config through status-all gateway RPCs * CLI: pass config through gateway secret RPC * CLI: skip plugin loading from invalid config * Tests: align daemon token drift env precedence	2026-03-07 17:48:13 -08:00
Peter Steinberger	cc7e61612a	fix(gateway): harden service-mode stale process cleanup (#38463 , thanks @spirittechie) Co-authored-by: Jesse Paul <drzin69@gmail.com>	2026-03-07 21:36:24 +00:00
Ayaan Zaidi	05c240fad6	fix: restart Windows gateway via Scheduled Task (#38825 ) (#38825 )	2026-03-07 18:00:38 +05:30
Josh Avant	72cf9253fc	Gateway: add SecretRef support for gateway.auth.token with auth-mode guardrails (#35094 )	2026-03-05 12:53:56 -06:00
Tak Hoffman	1be39d4250	fix(gateway): synthesize lifecycle robustness for restart and startup probes (#33831 ) * fix(gateway): correct launchctl command sequence for gateway restart (closes #20030) * fix(restart): expand HOME and escape label in launchctl plist path * fix(restart): poll port free after SIGKILL to prevent EADDRINUSE restart loop When cleanStaleGatewayProcessesSync() kills a stale gateway process, the kernel may not immediately release the TCP port. Previously the function returned after a fixed 500ms sleep (300ms SIGTERM + 200ms SIGKILL), allowing triggerOpenClawRestart() to hand off to systemd before the port was actually free. The new systemd process then raced the dying socket for port 18789, hit EADDRINUSE, and exited with status 1, causing systemd to retry indefinitely — the zombie restart loop reported in #33103. Fix: add waitForPortFreeSync() that polls lsof at 50ms intervals for up to 2 seconds after SIGKILL. cleanStaleGatewayProcessesSync() now blocks until the port is confirmed free (or the budget expires with a warning) before returning. The increased SIGTERM/SIGKILL wait budgets (600ms / 400ms) also give slow processes more time to exit cleanly. Fixes #33103 Related: #28134 * fix: add EADDRINUSE retry and TIME_WAIT port-bind checks for gateway startup * fix(ports): treat EADDRNOTAVAIL as non-retryable and fix flaky test * fix(gateway): hot-reload agents.defaults.models allowlist changes The reload plan had a rule for `agents.defaults.model` (singular) but not `agents.defaults.models` (plural — the allowlist array). Because `agents.defaults.models` does not prefix-match `agents.defaults.model.`, it fell through to the catch-all `agents` tail rule (kind=none), so allowlist edits in openclaw.json were silently ignored at runtime. Add a dedicated reload rule so changes to the models allowlist trigger a heartbeat restart, which re-reads the config and serves the updated list to clients. Fixes #33600 Co-authored-by: HCL <chenglunhu@gmail.com> Signed-off-by: HCL <chenglunhu@gmail.com> * test(restart): 100% branch coverage — audit round 2 Audit findings fixed: - remove dead guard: terminateStaleProcessesSync pids.length===0 check was unreachable (only caller cleanStaleGatewayProcessesSync already guards) - expose __testing.callSleepSyncRaw so sleepSync's real Atomics.wait path can be unit-tested directly without going through the override - fix broken sleepSync Atomics.wait test: previous test set override=null but cleanStaleGatewayProcessesSync returned before calling sleepSync — replaced with direct callSleepSyncRaw calls that actually exercise L36/L42-47 - fix pid collision: two tests used process.pid+304 (EPERM + dead-at-SIGTERM); EPERM test changed to process.pid+305 - fix misindented tests: 'deduplicates pids' and 'lsof status 1 container edge case' were outside their intended describe blocks; moved to correct scopes (findGatewayPidsOnPortSync and pollPortOnce respectively) - add missing branch tests: - status 1 + non-empty stdout with zero openclaw pids → free:true (L145) - mid-loop non-openclaw cmd in &&-chain (L67) - consecutive p-lines without c-line between them (L67) - invalid PID in p-line (p0 / pNaN) — ternary false branch (L67) - unknown lsof output line (else-if false branch L69) Coverage: 100% stmts / 100% branch / 100% funcs / 100% lines (36 tests) * test(restart): fix stale-pid test typing for tsgo * fix(gateway): address lifecycle review findings * test(update): make restart-helper path assertions windows-safe --------- Signed-off-by: HCL <chenglunhu@gmail.com> Co-authored-by: Glucksberg <markuscontasul@gmail.com> Co-authored-by: Efe Büken <efe@arven.digital> Co-authored-by: Riccardo Marino <rmarino@apple.com> Co-authored-by: HCL <chenglunhu@gmail.com>	2026-03-03 21:31:12 -06:00
Peter Steinberger	2287d1ec13	test: micro-optimize slow suites and CLI command setup	2026-03-02 23:00:49 +00:00
Peter Steinberger	d92fc85555	refactor(cli): dedupe gateway run mode parsing	2026-02-26 19:50:49 +01:00
Peter Steinberger	a909019078	fix: align gateway run auth modes (#27469 ) (thanks @s1korrrr)	2026-02-26 18:20:27 +00:00
Rafal	1087033abd	fix(cli): list all supported auth modes in gateway run --auth help Made-with: Cursor	2026-02-26 18:20:27 +00:00
Peter Steinberger	c397a02c9a	fix(queue): harden drain/abort/timeout race handling - reject new lane enqueues once gateway drain begins - always reset lane draining state and isolate onWait callback failures - persist per-session abort cutoff and skip stale queued messages - avoid false 600s agentTurn timeout in isolated cron jobs Fixes #27407 Fixes #27332 Fixes #27427 Co-authored-by: Kevin Shenghui <shenghuikevin@github.com> Co-authored-by: zjmy <zhangjunmengyang@gmail.com> Co-authored-by: suko <miha.sukic@gmail.com>	2026-02-26 13:43:39 +01:00
Peter Steinberger	2f8c68ae4d	refactor(test): dedupe run-loop signal harness setup	2026-02-22 21:19:09 +00:00
Peter Steinberger	e6383a2c13	fix(gateway): probe port liveness for stale lock recovery Co-authored-by: Operative-001 <261882263+Operative-001@users.noreply.github.com>	2026-02-22 22:11:51 +01:00
Peter Steinberger	296b19e413	test: dedupe gateway browser discord and channel coverage	2026-02-22 17:11:54 +00:00
Peter Steinberger	ee7a43b895	test: replace slow gateway SIGTERM integration coverage	2026-02-22 17:06:35 +00:00
Peter Steinberger	edaa5ef7a5	refactor(gateway): simplify restart flow and expand lock tests	2026-02-22 10:44:47 +01:00
Peter Steinberger	dd07c06d00	fix: tighten gateway restart loop handling (#23416 ) (thanks @jeffwnli)	2026-02-22 10:38:32 +01:00
jeffr	9c30243c8f	fix: release gateway lock before spawning restart child Move lock.release() before restartGatewayProcessWithFreshPid() so the spawned child can immediately acquire the lock without racing against a zombie parent. This eliminates the root cause of the restart loop where the child times out waiting for a lock held by its now-dead parent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 10:38:32 +01:00
jeffr	01bd83d644	fix: release gateway lock before process.exit in run-loop process.exit() called from inside an async IIFE bypasses the outer try/finally block that releases the gateway lock. This leaves a stale lock file pointing to a zombie PID, preventing the spawned child or systemctl restart from acquiring the lock. Release the lock explicitly before calling exit in both the restart-spawned and stop code paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 10:38:32 +01:00
Peter Steinberger	a1cb700a05	test: dedupe and optimize test suites	2026-02-19 15:19:38 +00:00
Peter Steinberger	b4dbe03298	refactor: unify restart gating and update availability sync	2026-02-19 10:00:41 +01:00
Gustavo Madeira Santana	c5698caca3	Security: default gateway auth bootstrap and explicit mode none (#20686 ) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: `be1b73182c` Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras	2026-02-19 02:35:50 -05:00
Peter Steinberger	8f866d51c4	test(cli): dedupe runtime capture fixtures across command specs	2026-02-18 13:34:03 +00:00
Gustavo Madeira Santana	40a6661597	test(cli): fix option-collision mock typings	2026-02-17 21:32:04 -05:00
Gustavo Madeira Santana	5a31da8eec	chore: format imports in gateway and session tools	2026-02-17 21:10:38 -05:00
Gustavo Madeira Santana	985ec71c55	CLI: resolve parent/subcommand option collisions (#18725 ) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: `b7e51cf909` Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras	2026-02-17 20:57:09 -05:00
Peter Steinberger	b8b43175c5	style: align formatting with oxfmt 0.33	2026-02-18 01:34:35 +00:00
Peter Steinberger	31f9be126c	style: run oxfmt and fix gate failures	2026-02-18 01:29:02 +00:00
cpojer	048e29ea35	chore: Fix types in tests 45/N.	2026-02-17 15:50:07 +09:00
cpojer	f2f17bafbc	chore: Fix types in tests 30/N.	2026-02-17 14:32:57 +09:00
cpojer	d0cb8c19b2	chore: wtf.	2026-02-17 13:36:48 +09:00
Sebastian	ed11e93cf2	chore(format)	2026-02-16 23:20:16 -05:00
Gustavo Madeira Santana	7b172d61cd	Revert "fix: respect OPENCLAW_HOME for isolated gateway instances" This reverts commit `34b18ea9db`.	2026-02-16 20:36:01 -05:00

1 2

95 Commits