docs: refresh failover and compaction refs

2026-04-04 14:44:42 +01:00 · 2026-04-04 14:44:42 +01:00 · 73584b1d33
parent bbb73d3171
commit 73584b1d33
5 changed files with 27 additions and 3 deletions
--- a/docs/concepts/compaction.md
+++ b/docs/concepts/compaction.md
@ -25,7 +25,9 @@ model sees on the next turn.

 Auto-compaction is on by default. It runs when the session nears the context
 limit, or when the model returns a context-overflow error (in which case
-OpenClaw compacts and retries).
+OpenClaw compacts and retries). Typical overflow signatures include
+`request_too_large`, `context length exceeded`, `input exceeds the maximum
+number of tokens`, and `input is too long for the model`.

 <Info>
 Before compacting, OpenClaw automatically reminds the agent to save important
--- a/docs/concepts/model-failover.md
+++ b/docs/concepts/model-failover.md
@ -120,6 +120,10 @@ If you have both an OAuth profile and an API key profile for the same provider,

 When a profile fails due to auth/rate‑limit errors (or a timeout that looks
 like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.
+That rate-limit bucket is broader than plain `429`: it also includes provider
+messages such as `Too many concurrent requests`, `ThrottlingException`,
+`throttled`, `resource exhausted`, and periodic usage-window limits such as
+`weekly/monthly limit reached`.
 Format/invalid‑request errors (for example Cloud Code Assist tool call ID
 validation failures) are treated as failover‑worthy and use the same cooldowns.
 OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`,
@ -223,6 +227,8 @@ Model fallback does not continue on:

 - explicit aborts that are not timeout/failover-shaped
 - context overflow errors that should stay inside compaction/retry logic
+  (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum
+number of tokens`, or `The input is too long for the model`)
 - a final unknown error when there are no candidates left

 ### Cooldown skip vs probe behavior
--- a/docs/concepts/model-providers.md
+++ b/docs/concepts/model-providers.md
@ -154,7 +154,10 @@ surface.
  - `<PROVIDER>_API_KEY_*` (numbered list, e.g. `<PROVIDER>_API_KEY_1`)
 - For Google providers, `GOOGLE_API_KEY` is also included as fallback.
 - Key selection order preserves priority and deduplicates values.
- Requests are retried with the next key only on rate-limit responses (for example `429`, `rate_limit`, `quota`, `resource exhausted`).
+- Requests are retried with the next key only on rate-limit responses (for
+  example `429`, `rate_limit`, `quota`, `resource exhausted`, `Too many
+concurrent requests`, `ThrottlingException`, or periodic usage-limit
+  messages).
 - Non-rate-limit failures fail immediately; no key rotation is attempted.
 - When all candidate keys fail, the final error is returned from the last attempt.

--- a/docs/help/faq.md
+++ b/docs/help/faq.md
@ -2362,6 +2362,16 @@ for usage/billing and raise limits as needed.

    Cooldowns apply to failing profiles (exponential backoff), so OpenClaw can keep responding even when a provider is rate-limited or temporarily failing.

+    The rate-limit bucket includes more than plain `429` responses. OpenClaw
+    also treats messages like `Too many concurrent requests`,
+    `ThrottlingException`, `resource exhausted`, and periodic usage-window
+    limits (`weekly/monthly limit reached`) as failover-worthy rate limits.
+
+    Context-overflow errors are different: signatures such as
+    `request_too_large`, `input exceeds the maximum number of tokens`, or
+    `input is too long for the model` stay on the compaction/retry path instead
+    of advancing model fallback.
+
  </Accordion>

  <Accordion title='What does "No credentials found for profile anthropic:default" mean?'>
--- a/docs/reference/session-management-compaction.md
+++ b/docs/reference/session-management-compaction.md
@ -216,7 +216,10 @@ Compaction is **persistent** (unlike session pruning). See [/concepts/session-pr

 In the embedded Pi agent, auto-compaction triggers in two cases:

-1. **Overflow recovery**: the model returns a context overflow error → compact → retry.
+1. **Overflow recovery**: the model returns a context overflow error
+   (`request_too_large`, `context length exceeded`, `input exceeds the maximum
+number of tokens`, `input is too long for the model`, and similar
+   provider-shaped variants) → compact → retry.
 2. **Threshold maintenance**: after a successful turn, when:

 `contextTokens > contextWindow - reserveTokens`