openclaw

Commit Graph

Author	SHA1	Message	Date
Aleksandrs Tihenko	c0026274d9	fix(auth): distinguish revoked API keys from transient auth errors (#25754 ) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: `8f9c07a200` Co-authored-by: rrenamed <87486610+rrenamed@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras	2026-02-25 19:47:16 -05:00
taw0002	3c57bf4c85	fix: treat HTTP 502/503/504 as failover-eligible (timeout reason) (#21017 ) * fix: treat HTTP 502/503/504 as failover-eligible (timeout reason) When a model API returns 502 Bad Gateway, 503 Service Unavailable, or 504 Gateway Timeout, the error object carries the status code directly. resolveFailoverReasonFromError() only checked 402/429/401/403/408/400, so 5xx server errors fell through to message-based classification which requires the status code to appear at the start of the error message. Many API SDKs (Google, Anthropic) set err.status = 503 without prefixing the message with '503', so the message classifier never matched and failover never triggered — the run retried the same broken model. Add 502/503/504 to the status-code branch, returning 'timeout' (matching the existing behavior of isTransientHttpError in the message classifier). Fixes #20999 * Changelog: add failover 502/503/504 note with credits * Failover: classify HTTP 504 as transient in message parser * Changelog: credit taw0002 and vincentkoc for failover fix --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-02-23 03:01:57 -05:00
mudrii	7ecfc1d93c	fix(auth): bidirectional mode/type compat + sync OAuth to all agents (#12692 ) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: `2dee8e1174` Co-authored-by: mudrii <220262+mudrii@users.noreply.github.com> Co-authored-by: obviyus <22031114+obviyus@users.noreply.github.com> Reviewed-by: @obviyus	2026-02-20 16:01:09 +05:30
Protocol Zero	2af3415fac	fix: treat HTTP 503 as failover-eligible for LLM provider errors (#21086 ) * fix: treat HTTP 503 as failover-eligible for LLM provider errors When LLM SDKs wrap 503 responses, the leading "503" prefix is lost (e.g. Google Gemini returns "high demand" / "UNAVAILABLE" without a numeric prefix). The existing isTransientHttpError only matches messages starting with "503 ...", so these wrapped errors silently skip failover — no profile rotation, no model fallback. This patch closes that gap: - resolveFailoverReasonFromError: map HTTP status 503 → rate_limit (covers structured error objects with a status field) - ERROR_PATTERNS.overloaded: add /\b503\b/, "service unavailable", "high demand" (covers message-only classification when the leading status prefix is absent) Existing isTransientHttpError behavior is unchanged; these additions are complementary and only fire for errors that previously fell through unclassified. * fix: address review feedback — drop /\b503\b/ pattern, add test coverage - Remove `/\b503\b/` from ERROR_PATTERNS.overloaded to resolve the semantic inconsistency noted by reviewers: `isTransientHttpError` already handles messages prefixed with "503" (→ "timeout"), so a redundant overloaded pattern would classify the same class of errors differently depending on message formatting. - Keep "service unavailable" and "high demand" patterns — these are the real gap-fillers for SDK-rewritten messages that lack a numeric prefix. - Add test case for JSON-wrapped 503 error body containing "overloaded" to strengthen coverage. * fix: unify 503 classification — status 503 → timeout (consistent with isTransientHttpError) resolveFailoverReasonFromError previously mapped status 503 → "rate_limit", while the string-based isTransientHttpError mapped "503 ..." → "timeout". Align both paths: structured {status: 503} now also returns "timeout", matching the existing transient-error convention. Both reasons are failover-eligible, so runtime behavior is unchanged. --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-02-19 12:45:09 -08:00
Sebastian	fbda9a93fd	fix(failover): align abort timeout detection and regressions	2026-02-16 21:00:27 -05:00
Daniel Sauer	12ce358da5	fix(failover): recognize 'abort' stop reason as timeout for model fallback When streaming providers (GLM, OpenRouter, etc.) return 'stop reason: abort' due to stream interruption, OpenClaw's failover mechanism did not recognize this as a timeout condition. This prevented fallback models from being triggered, leaving users with failed requests instead of graceful failover. Changes: - Add abort patterns to ERROR_PATTERNS.timeout in pi-embedded-helpers/errors.ts - Extend TIMEOUT_HINT_RE regex to include abort patterns in failover-error.ts Fixes #18453 Co-authored-by: James <james@openclaw.ai>	2026-02-16 23:49:51 +01:00
Oren	71b4be8799	fix: handle 400 status in failover to enable model fallback (#1879 )	2026-02-08 23:12:06 -08:00
cpojer	5ceff756e1	chore: Enable "curly" rule to avoid single-statement if confusion/errors.	2026-01-31 16:19:20 +09:00
Luke	be1cdc9370	fix(agents): treat provider request-aborted as timeout for fallback (#1576 ) * fix(agents): treat request-aborted as timeout for fallback * test(e2e): add provider timeout fallback	2026-01-24 11:27:24 +00:00
Peter Steinberger	ec27c813cc	fix(fallback): handle timeout aborts Co-authored-by: Mykyta Bozhenko <21245729+cheeeee@users.noreply.github.com>	2026-01-18 07:52:44 +00:00
Peter Steinberger	c379191f80	chore: migrate to oxlint and oxfmt Co-authored-by: Christoph Nakazawa <christoph.pojer@gmail.com>	2026-01-14 15:02:19 +00:00
Peter Steinberger	53ec8e36cb	refactor: centralize failover error parsing	2026-01-10 01:26:06 +01:00
Peter Steinberger	402c35b91c	refactor(agents): centralize failover normalization	2026-01-09 22:15:06 +01:00
Peter Steinberger	c27b1441f7	fix(auth): billing backoff + cooldown UX	2026-01-09 22:00:14 +01:00

14 Commits