Commit Graph

172 Commits

Author SHA1 Message Date
Val Alexander 3e2b3bd2c5
Fix Control UI operator.read scope handling (#53110)
Preserve Control UI scopes through the device-auth bypass path, normalize implied operator device-auth scopes, ignore cached under-scoped operator tokens, and degrade read-backed main pages gracefully when a connection truly lacks operator.read.

Co-authored-by: Val Alexander <68980965+BunsDev@users.noreply.github.com>
2026-03-23 14:57:21 -05:00
Vincent Koc d5dc6b6573 fix(gateway): require auth for canvas routes 2026-03-23 09:31:40 -07:00
Peter Steinberger ea579ef858
fix(gateway): preserve async hook ingress provenance 2026-03-22 22:21:49 -07:00
Peter Steinberger 6d34d62795
test: harden no-isolate gateway auth and pairing 2026-03-22 15:15:50 -07:00
clay-datacurve 7b61ca1b06
Session management improvements and dashboard API (#50101)
* fix: make cleanup "keep" persist subagent sessions indefinitely

* feat: expose subagent session metadata in sessions list

* fix: include status and timing in sessions_list tool

* fix: hide injected timestamp prefixes in chat ui

* feat: push session list updates over websocket

* feat: expose child subagent sessions in subagents list

* feat: add admin http endpoint to kill sessions

* Emit session.message websocket events for transcript updates

* Estimate session costs in sessions list

* Add direct session history HTTP and SSE endpoints

* Harden dashboard session events and history APIs

* Add session lifecycle gateway methods

* Add dashboard session API improvements

* Add dashboard session model and parent linkage support

* fix: tighten dashboard session API metadata

* Fix dashboard session cost metadata

* Persist accumulated session cost

* fix: stop followup queue drain cfg crash

* Fix dashboard session create and model metadata

* fix: stop guessing session model costs

* Gateway: cache OpenRouter pricing for configured models

* Gateway: add timeout session status

* Fix subagent spawn test config loading

* Gateway: preserve operator scopes without device identity

* Emit user message transcript events and deduplicate plugin warnings

* feat: emit sessions.changed lifecycle event on subagent spawn

Adds a session-lifecycle-events module (similar to transcript-events)
that emits create events when subagents are spawned. The gateway
server.impl.ts listens for these events and broadcasts sessions.changed
with reason=create to SSE subscribers, so dashboards can pick up new
subagent sessions without polling.

* Gateway: allow persistent dashboard orchestrator sessions

* fix: preserve operator scopes for token-authenticated backend clients

Backend clients (like agent-dashboard) that authenticate with a valid gateway
token but don't present a device identity were getting their scopes stripped.
The scope-clearing logic ran before checking the device identity decision,
so even when evaluateMissingDeviceIdentity returned 'allow' (because
roleCanSkipDeviceIdentity passed for token-authed operators), scopes were
already cleared.

Fix: also check decision.kind before clearing scopes, so token-authenticated
operators keep their requested scopes.

* Gateway: allow operator-token session kills

* Fix stale active subagent status after follow-up runs

* Fix dashboard image attachments in sessions send

* Fix completed session follow-up status updates

* feat: stream session tool events to operator UIs

* Add sessions.steer gateway coverage

* Persist subagent timing in session store

* Fix subagent session transcript event keys

* Fix active subagent session status in gateway

* bump session label max to 512

* Fix gateway send session reactivation

* fix: publish terminal session lifecycle state

* feat: change default session reset to effectively never

- Change DEFAULT_RESET_MODE from "daily" to "idle"
- Change DEFAULT_IDLE_MINUTES from 60 to 0 (0 = disabled/never)
- Allow idleMinutes=0 through normalization (don't clamp to 1)
- Treat idleMinutes=0 as "no idle expiry" in evaluateSessionFreshness
- Default behavior: mode "idle" + idleMinutes 0 = sessions never auto-reset
- Update test assertion for new default mode

* fix: prep session management followups (#50101) (thanks @clay-datacurve)

---------

Co-authored-by: Tyler Yust <TYTYYUST@YAHOO.COM>
2026-03-19 12:12:30 +09:00
Peter Steinberger ccf16cd889
fix(gateway): clear trusted-proxy control ui scopes 2026-03-17 10:07:53 -07:00
Peter Steinberger a69f6190ab
fix(gateway): pin plugin webhook route registry (#47902) 2026-03-15 21:53:05 -07:00
Andrew Demczuk 26e0a3ee9a
fix(gateway): skip Control UI pairing when auth.mode=none (closes #42931) (#47148)
When auth is completely disabled (mode=none), requiring device pairing
for Control UI operator sessions adds friction without security value
since any client can already connect without credentials.

Add authMode parameter to shouldSkipControlUiPairing so the bypass
fires only for Control UI + operator role + auth.mode=none. This avoids
the #43478 regression where a top-level OR disabled pairing for ALL
websocket clients.
2026-03-15 13:03:39 +01:00
rstar327 ba6064cc22
feat(gateway): make health monitor stale threshold and max restarts configurable (openclaw#42107)
Verified:
- pnpm exec vitest --run src/config/config-misc.test.ts -t "gateway.channelHealthCheckMinutes"
- pnpm exec vitest --run src/gateway/server-channels.test.ts -t "health monitor"
- pnpm exec vitest --run src/gateway/channel-health-monitor.test.ts src/gateway/server/readiness.test.ts
- pnpm exec vitest --run extensions/feishu/src/outbound.test.ts
- pnpm exec tsc --noEmit

Co-authored-by: rstar327 <114364448+rstar327@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-14 21:21:56 -05:00
Andrew Demczuk 92fc8065e9
fix(gateway): remove re-introduced auth.mode=none pairing bypass
The revert of #43478 (commit 39b4185d0b) was silently undone by
3704293e6f which was based on a branch that included the original
change. This removes the auth.mode=none skipPairing condition again.

The blanket skip was too broad - it disabled pairing for ALL websocket
clients, not just Control UI behind reverse proxies.
2026-03-15 00:46:24 +01:00
George Zhang 3704293e6f
browser: drop headless/remote MCP attach modes, simplify existing-session to autoConnect-only (#46628) 2026-03-14 15:54:22 -07:00
George Zhang b1d8737017
browser: drop chrome-relay auto-creation, simplify to user profile only (#46596)
Merged via squash.

Prepared head SHA: 74becc8f7d
Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com>
Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com>
Reviewed-by: @odysseus0
2026-03-14 15:40:02 -07:00
Vincent Koc 39b4185d0b revert: 9bffa3422c 2026-03-14 15:09:22 -07:00
Andrew Demczuk 678ea77dcf
style(gateway): fix oxfmt formatting and remove unused test helper 2026-03-14 21:46:53 +01:00
Sally O'Malley 8db6fcca77
fix(gateway/cli): relax local backend self-pairing and harden launchd restarts (#46290)
Signed-off-by: sallyom <somalley@redhat.com>
2026-03-14 14:27:52 -04:00
Andrew Demczuk 9bffa3422c
fix(gateway): skip device pairing when auth.mode=none
Fixes #42931

When gateway.auth.mode is set to "none", authentication succeeds with
method "none" but sharedAuthOk remains false because the auth-context
only recognises token/password/trusted-proxy methods. This causes all
pairing-skip conditions to fail, so Control UI browser connections get
closed with code 1008 "pairing required" despite auth being disabled.

Short-circuit the skipPairing check: if the operator explicitly
disabled authentication, device pairing (which is itself an auth
mechanism) must also be bypassed.

Fixes #42931
2026-03-14 19:17:39 +01:00
Sally O'Malley e5fe818a74
fix(gateway/ui): restore control-ui auth bypass and classify connect failures (#45512)
Merged via squash.

Prepared head SHA: 42b5595ede
Co-authored-by: sallyom <11166065+sallyom@users.noreply.github.com>
Co-authored-by: BunsDev <68980965+BunsDev@users.noreply.github.com>
Reviewed-by: @BunsDev
2026-03-13 20:13:35 -05:00
Peter Steinberger 4523260dda test: share gateway route auth helpers 2026-03-14 00:35:07 +00:00
Peter Steinberger 4674fbf923 refactor: share handshake auth helper builders 2026-03-13 21:40:53 +00:00
Peter Steinberger 2f58647033 refactor: share plugin route auth test harness 2026-03-13 18:38:12 +00:00
Peter Steinberger db9c755045 refactor: share readiness test harness 2026-03-13 18:38:11 +00:00
Peter Steinberger 7dc447f79f fix(gateway): strip unbound scopes for shared-auth connects 2026-03-13 02:51:55 +00:00
Vincent Koc 8661c271e9 Gateway: preserve trusted-proxy browser scopes 2026-03-12 21:00:43 -04:00
Peter Steinberger 01e4845f6d
refactor: extract websocket handshake auth helpers 2026-03-12 22:46:28 +00:00
Peter Steinberger bf89947a8e
fix: switch pairing setup codes to bootstrap tokens 2026-03-12 22:23:07 +00:00
Peter Steinberger 445ff0242e
refactor(gateway): cache hook proxy config in runtime state 2026-03-12 21:43:36 +00:00
Peter Steinberger 4da617e178
fix(gateway): honor trusted proxy hook auth rate limits 2026-03-12 21:35:57 +00:00
Vincent Koc 5e389d5e7c
Gateway/ws: clear unbound scopes for shared-token auth (#44306)
* Gateway/ws: clear unbound shared-auth scopes

* Gateway/auth: cover shared-token scope stripping

* Changelog: add shared-token scope stripping entry

* Gateway/ws: preserve allowed control-ui scopes

* Gateway/auth: assert control-ui admin scopes survive allowed device-less auth

* Gateway/auth: cover shared-password scope stripping
2026-03-12 14:52:24 -04:00
Vincent Koc eff0d5a947
Hardening: tighten preauth WebSocket handshake limits (#44089)
* Gateway: tighten preauth handshake limits

* Changelog: note WebSocket preauth hardening

* Gateway: count preauth frame bytes accurately

* Gateway: cap WebSocket payloads before auth
2026-03-12 10:55:41 -04:00
Robin Waslander ebed3bbde1
fix(gateway): enforce browser origin check regardless of proxy headers
In trusted-proxy mode, enforceOriginCheckForAnyClient was set to false
whenever proxy headers were present. This allowed browser-originated
WebSocket connections from untrusted origins to bypass origin validation
entirely, as the check only ran for control-ui and webchat client types.

An attacker serving a page from an untrusted origin could connect through
a trusted reverse proxy, inherit proxy-injected identity, and obtain
operator.admin access via the sharedAuthOk / roleCanSkipDeviceIdentity
path without any origin restriction.

Remove the hasProxyHeaders exemption so origin validation runs for all
browser-originated connections regardless of how the request arrived.

Fixes GHSA-5wcw-8jjv-m286
2026-03-12 01:16:52 +01:00
Robin Waslander a1520d70ff
fix(gateway): propagate real gateway client into plugin subagent runtime
Plugin subagent dispatch used a hardcoded synthetic client carrying
operator.admin, operator.approvals, and operator.pairing for all
runtime.subagent.* calls. Plugin HTTP routes with auth:"plugin" require
no gateway auth by design, so an unauthenticated external request could
drive admin-only gateway methods (sessions.delete, agent.run) through
the subagent runtime.

Propagate the real gateway client into the plugin runtime request scope
when one is available. Plugin HTTP routes now run inside a scoped
runtime client: auth:"plugin" routes receive a non-admin synthetic
operator.write client; gateway-authenticated routes retain admin-capable
scopes. The security boundary is enforced at the HTTP handler level.

Fixes GHSA-xw77-45gv-p728
2026-03-11 14:17:01 +01:00
Josh Avant a76e810193
fix(gateway): harden token fallback/reconnect behavior and docs (#42507)
* fix(gateway): harden token fallback and auth reconnect handling

* docs(gateway): clarify auth retry and token-drift recovery

* fix(gateway): tighten auth reconnect gating across clients

* fix: harden gateway token retry (#42507) (thanks @joshavant)
2026-03-10 17:05:57 -05:00
Mariano d4e59a3666
Cron: enforce cron-owned delivery contract (#40998)
Merged via squash.

Prepared head SHA: 5877389e33
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
2026-03-09 20:12:37 +01:00
Tak Hoffman d9e8e8ac15
fix: resolve live config paths in status and gateway metadata (#39952)
* fix: resolve live config paths in status and gateway metadata

* fix: resolve remaining runtime config path references

* test: cover gateway config.set config path response
2026-03-08 09:59:32 -05:00
Peter Steinberger ac86deccee fix(gateway): harden plugin HTTP route auth 2026-03-07 19:55:06 +00:00
Peter Steinberger 4113a0f39e refactor(gateway): dedupe readiness healthy snapshot fixtures 2026-03-07 17:58:31 +00:00
ql-wade a5c07fa115
fix(gateway): skip stale-socket restarts for Telegram polling (openclaw#38405)
Verified:
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: ql-wade <262266039+ql-wade@users.noreply.github.com>
2026-03-07 00:20:34 -06:00
Vincent Koc ab5fcfcc01
feat(gateway): add channel-backed readiness probes (#38285)
* Changelog: add channel-backed readiness probe entry

* Gateway: add channel-backed readiness probes

* Docs: describe readiness probe behavior

* Gateway: add readiness probe regression tests

* Changelog: dedupe gateway probe entries

* Docs: fix readiness startup grace description

* Changelog: remove stale readiness entry

* Gateway: cover readiness hardening

* Gateway: harden readiness probes
2026-03-06 15:15:23 -05:00
Tak Hoffman 1be39d4250
fix(gateway): synthesize lifecycle robustness for restart and startup probes (#33831)
* fix(gateway): correct launchctl command sequence for gateway restart (closes #20030)

* fix(restart): expand HOME and escape label in launchctl plist path

* fix(restart): poll port free after SIGKILL to prevent EADDRINUSE restart loop

When cleanStaleGatewayProcessesSync() kills a stale gateway process,
the kernel may not immediately release the TCP port. Previously the
function returned after a fixed 500ms sleep (300ms SIGTERM + 200ms
SIGKILL), allowing triggerOpenClawRestart() to hand off to systemd
before the port was actually free. The new systemd process then raced
the dying socket for port 18789, hit EADDRINUSE, and exited with
status 1, causing systemd to retry indefinitely — the zombie restart
loop reported in #33103.

Fix: add waitForPortFreeSync() that polls lsof at 50ms intervals for
up to 2 seconds after SIGKILL. cleanStaleGatewayProcessesSync() now
blocks until the port is confirmed free (or the budget expires with a
warning) before returning. The increased SIGTERM/SIGKILL wait budgets
(600ms / 400ms) also give slow processes more time to exit cleanly.

Fixes #33103
Related: #28134

* fix: add EADDRINUSE retry and TIME_WAIT port-bind checks for gateway startup

* fix(ports): treat EADDRNOTAVAIL as non-retryable and fix flaky test

* fix(gateway): hot-reload agents.defaults.models allowlist changes

The reload plan had a rule for `agents.defaults.model` (singular) but
not `agents.defaults.models` (plural — the allowlist array).  Because
`agents.defaults.models` does not prefix-match `agents.defaults.model.`,
it fell through to the catch-all `agents` tail rule (kind=none), so
allowlist edits in openclaw.json were silently ignored at runtime.

Add a dedicated reload rule so changes to the models allowlist trigger
a heartbeat restart, which re-reads the config and serves the updated
list to clients.

Fixes #33600

Co-authored-by: HCL <chenglunhu@gmail.com>
Signed-off-by: HCL <chenglunhu@gmail.com>

* test(restart): 100% branch coverage — audit round 2

Audit findings fixed:
- remove dead guard: terminateStaleProcessesSync pids.length===0 check was
  unreachable (only caller cleanStaleGatewayProcessesSync already guards)
- expose __testing.callSleepSyncRaw so sleepSync's real Atomics.wait path
  can be unit-tested directly without going through the override
- fix broken sleepSync Atomics.wait test: previous test set override=null
  but cleanStaleGatewayProcessesSync returned before calling sleepSync —
  replaced with direct callSleepSyncRaw calls that actually exercise L36/L42-47
- fix pid collision: two tests used process.pid+304 (EPERM + dead-at-SIGTERM);
  EPERM test changed to process.pid+305
- fix misindented tests: 'deduplicates pids' and 'lsof status 1 container
  edge case' were outside their intended describe blocks; moved to correct
  scopes (findGatewayPidsOnPortSync and pollPortOnce respectively)
- add missing branch tests:
  - status 1 + non-empty stdout with zero openclaw pids → free:true (L145)
  - mid-loop non-openclaw cmd in &&-chain (L67)
  - consecutive p-lines without c-line between them (L67)
  - invalid PID in p-line (p0 / pNaN) — ternary false branch (L67)
  - unknown lsof output line (else-if false branch L69)

Coverage: 100% stmts / 100% branch / 100% funcs / 100% lines (36 tests)

* test(restart): fix stale-pid test typing for tsgo

* fix(gateway): address lifecycle review findings

* test(update): make restart-helper path assertions windows-safe

---------

Signed-off-by: HCL <chenglunhu@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: Efe Büken <efe@arven.digital>
Co-authored-by: Riccardo Marino <rmarino@apple.com>
Co-authored-by: HCL <chenglunhu@gmail.com>
2026-03-03 21:31:12 -06:00
Liu Xiaopai ae29842158
Gateway: fix stale self version in status output (#32655)
Merged via squash.

Prepared head SHA: b9675d1f90
Co-authored-by: liuxiaopai-ai <73659136+liuxiaopai-ai@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-03-03 02:41:52 -05:00
Peter Steinberger 8768487aee refactor(shared): dedupe protocol schema typing and session/media helpers 2026-03-02 19:57:33 +00:00
Peter Steinberger 7a7eee920a refactor(gateway): harden plugin http route contracts 2026-03-02 16:48:00 +00:00
Peter Steinberger 33e76db12a refactor(gateway): scope ws origin fallback metrics to runtime 2026-03-02 16:47:00 +00:00
Peter Steinberger d5ae4b8337 fix(gateway): require local client for loopback origin fallback 2026-03-02 16:37:45 +00:00
Peter Steinberger 2fd8264ab0 refactor(gateway): hard-break plugin wildcard http handlers 2026-03-02 16:24:06 +00:00
Peter Steinberger 93b0724025 fix(gateway): fail closed plugin auth path canonicalization 2026-03-02 15:55:32 +00:00
Peter Steinberger d3e0c0b29c test(gateway): dedupe gateway and infra test scaffolds 2026-03-02 07:13:10 +00:00
Sid e1e715c53d
fix(gateway): skip device pairing for local backend self-connections (#30801)
* fix(gateway): skip device pairing for local backend self-connections

When gateway.tls is enabled, sessions_spawn (and other internal
callGateway operations) creates a new WebSocket to the gateway.
The gateway treated this self-connection like any external client
and enforced device pairing, rejecting it with "pairing required"
(close code 1008). This made sub-agent spawning impossible when
TLS was enabled in Docker with bind: "lan".

Skip pairing for connections that are gateway-client self-connections
from localhost with valid shared auth (token/password). These are
internal backend calls (e.g. sessions_spawn, subagent-announce) that
already have valid credentials and connect from the same host.

Closes #30740

* gateway: tighten backend self-pair bypass guard

* tests: cover backend self-pairing local-vs-remote auth path

* changelog: add gateway tls pairing fix credit

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-03-01 21:46:33 -08:00
Peter Steinberger 2d31126e6a refactor(shared): extract reused path and normalization helpers 2026-03-02 05:20:19 +00:00
Peter Steinberger cef5fae0a2 refactor(gateway): dedupe origin seeding and plugin route auth matching 2026-03-02 00:42:22 +00:00