Commit Graph

167 Commits

Author SHA1 Message Date
Peter Steinberger ccf16cd889
fix(gateway): clear trusted-proxy control ui scopes 2026-03-17 10:07:53 -07:00
Peter Steinberger a69f6190ab
fix(gateway): pin plugin webhook route registry (#47902) 2026-03-15 21:53:05 -07:00
Andrew Demczuk 26e0a3ee9a
fix(gateway): skip Control UI pairing when auth.mode=none (closes #42931) (#47148)
When auth is completely disabled (mode=none), requiring device pairing
for Control UI operator sessions adds friction without security value
since any client can already connect without credentials.

Add authMode parameter to shouldSkipControlUiPairing so the bypass
fires only for Control UI + operator role + auth.mode=none. This avoids
the #43478 regression where a top-level OR disabled pairing for ALL
websocket clients.
2026-03-15 13:03:39 +01:00
rstar327 ba6064cc22
feat(gateway): make health monitor stale threshold and max restarts configurable (openclaw#42107)
Verified:
- pnpm exec vitest --run src/config/config-misc.test.ts -t "gateway.channelHealthCheckMinutes"
- pnpm exec vitest --run src/gateway/server-channels.test.ts -t "health monitor"
- pnpm exec vitest --run src/gateway/channel-health-monitor.test.ts src/gateway/server/readiness.test.ts
- pnpm exec vitest --run extensions/feishu/src/outbound.test.ts
- pnpm exec tsc --noEmit

Co-authored-by: rstar327 <114364448+rstar327@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-14 21:21:56 -05:00
Andrew Demczuk 92fc8065e9
fix(gateway): remove re-introduced auth.mode=none pairing bypass
The revert of #43478 (commit 39b4185d0b) was silently undone by
3704293e6f which was based on a branch that included the original
change. This removes the auth.mode=none skipPairing condition again.

The blanket skip was too broad - it disabled pairing for ALL websocket
clients, not just Control UI behind reverse proxies.
2026-03-15 00:46:24 +01:00
George Zhang 3704293e6f
browser: drop headless/remote MCP attach modes, simplify existing-session to autoConnect-only (#46628) 2026-03-14 15:54:22 -07:00
George Zhang b1d8737017
browser: drop chrome-relay auto-creation, simplify to user profile only (#46596)
Merged via squash.

Prepared head SHA: 74becc8f7d
Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com>
Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com>
Reviewed-by: @odysseus0
2026-03-14 15:40:02 -07:00
Vincent Koc 39b4185d0b revert: 9bffa3422c 2026-03-14 15:09:22 -07:00
Andrew Demczuk 678ea77dcf
style(gateway): fix oxfmt formatting and remove unused test helper 2026-03-14 21:46:53 +01:00
Sally O'Malley 8db6fcca77
fix(gateway/cli): relax local backend self-pairing and harden launchd restarts (#46290)
Signed-off-by: sallyom <somalley@redhat.com>
2026-03-14 14:27:52 -04:00
Andrew Demczuk 9bffa3422c
fix(gateway): skip device pairing when auth.mode=none
Fixes #42931

When gateway.auth.mode is set to "none", authentication succeeds with
method "none" but sharedAuthOk remains false because the auth-context
only recognises token/password/trusted-proxy methods. This causes all
pairing-skip conditions to fail, so Control UI browser connections get
closed with code 1008 "pairing required" despite auth being disabled.

Short-circuit the skipPairing check: if the operator explicitly
disabled authentication, device pairing (which is itself an auth
mechanism) must also be bypassed.

Fixes #42931
2026-03-14 19:17:39 +01:00
Sally O'Malley e5fe818a74
fix(gateway/ui): restore control-ui auth bypass and classify connect failures (#45512)
Merged via squash.

Prepared head SHA: 42b5595ede
Co-authored-by: sallyom <11166065+sallyom@users.noreply.github.com>
Co-authored-by: BunsDev <68980965+BunsDev@users.noreply.github.com>
Reviewed-by: @BunsDev
2026-03-13 20:13:35 -05:00
Peter Steinberger 4523260dda test: share gateway route auth helpers 2026-03-14 00:35:07 +00:00
Peter Steinberger 4674fbf923 refactor: share handshake auth helper builders 2026-03-13 21:40:53 +00:00
Peter Steinberger 2f58647033 refactor: share plugin route auth test harness 2026-03-13 18:38:12 +00:00
Peter Steinberger db9c755045 refactor: share readiness test harness 2026-03-13 18:38:11 +00:00
Peter Steinberger 7dc447f79f fix(gateway): strip unbound scopes for shared-auth connects 2026-03-13 02:51:55 +00:00
Vincent Koc 8661c271e9 Gateway: preserve trusted-proxy browser scopes 2026-03-12 21:00:43 -04:00
Peter Steinberger 01e4845f6d
refactor: extract websocket handshake auth helpers 2026-03-12 22:46:28 +00:00
Peter Steinberger bf89947a8e
fix: switch pairing setup codes to bootstrap tokens 2026-03-12 22:23:07 +00:00
Peter Steinberger 445ff0242e
refactor(gateway): cache hook proxy config in runtime state 2026-03-12 21:43:36 +00:00
Peter Steinberger 4da617e178
fix(gateway): honor trusted proxy hook auth rate limits 2026-03-12 21:35:57 +00:00
Vincent Koc 5e389d5e7c
Gateway/ws: clear unbound scopes for shared-token auth (#44306)
* Gateway/ws: clear unbound shared-auth scopes

* Gateway/auth: cover shared-token scope stripping

* Changelog: add shared-token scope stripping entry

* Gateway/ws: preserve allowed control-ui scopes

* Gateway/auth: assert control-ui admin scopes survive allowed device-less auth

* Gateway/auth: cover shared-password scope stripping
2026-03-12 14:52:24 -04:00
Vincent Koc eff0d5a947
Hardening: tighten preauth WebSocket handshake limits (#44089)
* Gateway: tighten preauth handshake limits

* Changelog: note WebSocket preauth hardening

* Gateway: count preauth frame bytes accurately

* Gateway: cap WebSocket payloads before auth
2026-03-12 10:55:41 -04:00
Robin Waslander ebed3bbde1
fix(gateway): enforce browser origin check regardless of proxy headers
In trusted-proxy mode, enforceOriginCheckForAnyClient was set to false
whenever proxy headers were present. This allowed browser-originated
WebSocket connections from untrusted origins to bypass origin validation
entirely, as the check only ran for control-ui and webchat client types.

An attacker serving a page from an untrusted origin could connect through
a trusted reverse proxy, inherit proxy-injected identity, and obtain
operator.admin access via the sharedAuthOk / roleCanSkipDeviceIdentity
path without any origin restriction.

Remove the hasProxyHeaders exemption so origin validation runs for all
browser-originated connections regardless of how the request arrived.

Fixes GHSA-5wcw-8jjv-m286
2026-03-12 01:16:52 +01:00
Robin Waslander a1520d70ff
fix(gateway): propagate real gateway client into plugin subagent runtime
Plugin subagent dispatch used a hardcoded synthetic client carrying
operator.admin, operator.approvals, and operator.pairing for all
runtime.subagent.* calls. Plugin HTTP routes with auth:"plugin" require
no gateway auth by design, so an unauthenticated external request could
drive admin-only gateway methods (sessions.delete, agent.run) through
the subagent runtime.

Propagate the real gateway client into the plugin runtime request scope
when one is available. Plugin HTTP routes now run inside a scoped
runtime client: auth:"plugin" routes receive a non-admin synthetic
operator.write client; gateway-authenticated routes retain admin-capable
scopes. The security boundary is enforced at the HTTP handler level.

Fixes GHSA-xw77-45gv-p728
2026-03-11 14:17:01 +01:00
Josh Avant a76e810193
fix(gateway): harden token fallback/reconnect behavior and docs (#42507)
* fix(gateway): harden token fallback and auth reconnect handling

* docs(gateway): clarify auth retry and token-drift recovery

* fix(gateway): tighten auth reconnect gating across clients

* fix: harden gateway token retry (#42507) (thanks @joshavant)
2026-03-10 17:05:57 -05:00
Mariano d4e59a3666
Cron: enforce cron-owned delivery contract (#40998)
Merged via squash.

Prepared head SHA: 5877389e33
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
2026-03-09 20:12:37 +01:00
Tak Hoffman d9e8e8ac15
fix: resolve live config paths in status and gateway metadata (#39952)
* fix: resolve live config paths in status and gateway metadata

* fix: resolve remaining runtime config path references

* test: cover gateway config.set config path response
2026-03-08 09:59:32 -05:00
Peter Steinberger ac86deccee fix(gateway): harden plugin HTTP route auth 2026-03-07 19:55:06 +00:00
Peter Steinberger 4113a0f39e refactor(gateway): dedupe readiness healthy snapshot fixtures 2026-03-07 17:58:31 +00:00
ql-wade a5c07fa115
fix(gateway): skip stale-socket restarts for Telegram polling (openclaw#38405)
Verified:
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: ql-wade <262266039+ql-wade@users.noreply.github.com>
2026-03-07 00:20:34 -06:00
Vincent Koc ab5fcfcc01
feat(gateway): add channel-backed readiness probes (#38285)
* Changelog: add channel-backed readiness probe entry

* Gateway: add channel-backed readiness probes

* Docs: describe readiness probe behavior

* Gateway: add readiness probe regression tests

* Changelog: dedupe gateway probe entries

* Docs: fix readiness startup grace description

* Changelog: remove stale readiness entry

* Gateway: cover readiness hardening

* Gateway: harden readiness probes
2026-03-06 15:15:23 -05:00
Tak Hoffman 1be39d4250
fix(gateway): synthesize lifecycle robustness for restart and startup probes (#33831)
* fix(gateway): correct launchctl command sequence for gateway restart (closes #20030)

* fix(restart): expand HOME and escape label in launchctl plist path

* fix(restart): poll port free after SIGKILL to prevent EADDRINUSE restart loop

When cleanStaleGatewayProcessesSync() kills a stale gateway process,
the kernel may not immediately release the TCP port. Previously the
function returned after a fixed 500ms sleep (300ms SIGTERM + 200ms
SIGKILL), allowing triggerOpenClawRestart() to hand off to systemd
before the port was actually free. The new systemd process then raced
the dying socket for port 18789, hit EADDRINUSE, and exited with
status 1, causing systemd to retry indefinitely — the zombie restart
loop reported in #33103.

Fix: add waitForPortFreeSync() that polls lsof at 50ms intervals for
up to 2 seconds after SIGKILL. cleanStaleGatewayProcessesSync() now
blocks until the port is confirmed free (or the budget expires with a
warning) before returning. The increased SIGTERM/SIGKILL wait budgets
(600ms / 400ms) also give slow processes more time to exit cleanly.

Fixes #33103
Related: #28134

* fix: add EADDRINUSE retry and TIME_WAIT port-bind checks for gateway startup

* fix(ports): treat EADDRNOTAVAIL as non-retryable and fix flaky test

* fix(gateway): hot-reload agents.defaults.models allowlist changes

The reload plan had a rule for `agents.defaults.model` (singular) but
not `agents.defaults.models` (plural — the allowlist array).  Because
`agents.defaults.models` does not prefix-match `agents.defaults.model.`,
it fell through to the catch-all `agents` tail rule (kind=none), so
allowlist edits in openclaw.json were silently ignored at runtime.

Add a dedicated reload rule so changes to the models allowlist trigger
a heartbeat restart, which re-reads the config and serves the updated
list to clients.

Fixes #33600

Co-authored-by: HCL <chenglunhu@gmail.com>
Signed-off-by: HCL <chenglunhu@gmail.com>

* test(restart): 100% branch coverage — audit round 2

Audit findings fixed:
- remove dead guard: terminateStaleProcessesSync pids.length===0 check was
  unreachable (only caller cleanStaleGatewayProcessesSync already guards)
- expose __testing.callSleepSyncRaw so sleepSync's real Atomics.wait path
  can be unit-tested directly without going through the override
- fix broken sleepSync Atomics.wait test: previous test set override=null
  but cleanStaleGatewayProcessesSync returned before calling sleepSync —
  replaced with direct callSleepSyncRaw calls that actually exercise L36/L42-47
- fix pid collision: two tests used process.pid+304 (EPERM + dead-at-SIGTERM);
  EPERM test changed to process.pid+305
- fix misindented tests: 'deduplicates pids' and 'lsof status 1 container
  edge case' were outside their intended describe blocks; moved to correct
  scopes (findGatewayPidsOnPortSync and pollPortOnce respectively)
- add missing branch tests:
  - status 1 + non-empty stdout with zero openclaw pids → free:true (L145)
  - mid-loop non-openclaw cmd in &&-chain (L67)
  - consecutive p-lines without c-line between them (L67)
  - invalid PID in p-line (p0 / pNaN) — ternary false branch (L67)
  - unknown lsof output line (else-if false branch L69)

Coverage: 100% stmts / 100% branch / 100% funcs / 100% lines (36 tests)

* test(restart): fix stale-pid test typing for tsgo

* fix(gateway): address lifecycle review findings

* test(update): make restart-helper path assertions windows-safe

---------

Signed-off-by: HCL <chenglunhu@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: Efe Büken <efe@arven.digital>
Co-authored-by: Riccardo Marino <rmarino@apple.com>
Co-authored-by: HCL <chenglunhu@gmail.com>
2026-03-03 21:31:12 -06:00
Liu Xiaopai ae29842158
Gateway: fix stale self version in status output (#32655)
Merged via squash.

Prepared head SHA: b9675d1f90
Co-authored-by: liuxiaopai-ai <73659136+liuxiaopai-ai@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-03-03 02:41:52 -05:00
Peter Steinberger 8768487aee refactor(shared): dedupe protocol schema typing and session/media helpers 2026-03-02 19:57:33 +00:00
Peter Steinberger 7a7eee920a refactor(gateway): harden plugin http route contracts 2026-03-02 16:48:00 +00:00
Peter Steinberger 33e76db12a refactor(gateway): scope ws origin fallback metrics to runtime 2026-03-02 16:47:00 +00:00
Peter Steinberger d5ae4b8337 fix(gateway): require local client for loopback origin fallback 2026-03-02 16:37:45 +00:00
Peter Steinberger 2fd8264ab0 refactor(gateway): hard-break plugin wildcard http handlers 2026-03-02 16:24:06 +00:00
Peter Steinberger 93b0724025 fix(gateway): fail closed plugin auth path canonicalization 2026-03-02 15:55:32 +00:00
Peter Steinberger d3e0c0b29c test(gateway): dedupe gateway and infra test scaffolds 2026-03-02 07:13:10 +00:00
Sid e1e715c53d
fix(gateway): skip device pairing for local backend self-connections (#30801)
* fix(gateway): skip device pairing for local backend self-connections

When gateway.tls is enabled, sessions_spawn (and other internal
callGateway operations) creates a new WebSocket to the gateway.
The gateway treated this self-connection like any external client
and enforced device pairing, rejecting it with "pairing required"
(close code 1008). This made sub-agent spawning impossible when
TLS was enabled in Docker with bind: "lan".

Skip pairing for connections that are gateway-client self-connections
from localhost with valid shared auth (token/password). These are
internal backend calls (e.g. sessions_spawn, subagent-announce) that
already have valid credentials and connect from the same host.

Closes #30740

* gateway: tighten backend self-pair bypass guard

* tests: cover backend self-pairing local-vs-remote auth path

* changelog: add gateway tls pairing fix credit

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-03-01 21:46:33 -08:00
Peter Steinberger 2d31126e6a refactor(shared): extract reused path and normalization helpers 2026-03-02 05:20:19 +00:00
Peter Steinberger cef5fae0a2 refactor(gateway): dedupe origin seeding and plugin route auth matching 2026-03-02 00:42:22 +00:00
Peter Steinberger 53d10f8688 fix(gateway): land access/auth/config migration cluster
Land #28960 by @Glucksberg (Tailscale origin auto-allowlist).
Land #29394 by @synchronic1 (allowedOrigins upgrade migration).
Land #29198 by @Mariana-Codebase (plugin HTTP auth guard + route precedence).
Land #30910 by @liuxiaopai-ai (tailscale bind/config.patch guard).

Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: synchronic1 <synchronic1@users.noreply.github.com>
Co-authored-by: Mariana Sinisterra <mariana.data@outlook.com>
Co-authored-by: liuxiaopai-ai <73659136+liuxiaopai-ai@users.noreply.github.com>
2026-03-02 00:10:51 +00:00
Ayaan Zaidi 54eaf17327 feat(gateway): add node canvas capability refresh flow 2026-02-27 12:16:36 +05:30
Vincent Koc cb9374a2a1
Gateway: improve device-auth v2 migration diagnostics (#28305)
* Gateway: add device-auth detail code resolver

* Gateway: emit specific device-auth detail codes

* Gateway tests: cover nonce and signature detail codes

* Docs: add gateway device-auth migration diagnostics

* Docs: add device-auth v2 troubleshooting signatures
2026-02-26 21:05:43 -08:00
Peter Steinberger 081b1aa1ed refactor(gateway): unify v3 auth payload builders and vectors 2026-02-26 15:08:50 +01:00
Kevin Shenghui 9c142993b8 fix: preserve operator scopes for shared auth connections
When connecting via shared gateway token (no device identity),
the operator scopes were being cleared, causing API operations
to fail with 'missing scope' errors.

This fix preserves scopes when sharedAuthOk is true, allowing
headless/API operator clients to retain their requested scopes.

Fixes #27494

(cherry picked from commit c71c8948bd)
2026-02-26 13:40:58 +00:00