From 42778ccd46bf727c07a236d9b36bacdb6ce54b74 Mon Sep 17 00:00:00 2001
From: Peter Steinberger <steipete@gmail.com>
Date: Sat, 4 Apr 2026 12:21:30 +0100
Subject: [PATCH] docs: refresh provider stream family refs

---
 docs/plugins/architecture.md         | 20 +++++++++++++-------
 docs/plugins/sdk-overview.md         |  2 +-
 docs/plugins/sdk-provider-plugins.md | 21 ++++++++++++++++++---
 3 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/docs/plugins/architecture.md b/docs/plugins/architecture.md
index e26ff569e28..ceea1b6ae98 100644
--- a/docs/plugins/architecture.md
+++ b/docs/plugins/architecture.md
@@ -741,12 +741,14 @@ api.registerProvider({
   synthetic Codex catalog rows, and ChatGPT usage endpoint integration.
 - Google AI Studio and Gemini CLI OAuth use `resolveDynamicModel`,
   `buildReplayPolicy`, `sanitizeReplayHistory`,
-  `resolveReasoningOutputMode`, and `isModernModelRef` because the
+  `resolveReasoningOutputMode`, `wrapStreamFn`, and `isModernModelRef` because the
   `google-gemini` replay family owns Gemini 3.1 forward-compat fallback,
   native Gemini replay validation, bootstrap replay sanitation, tagged
-  reasoning-output mode, and modern-model matching; Gemini CLI OAuth also uses
-  `formatApiKey`, `resolveUsageAuth`, and `fetchUsageSnapshot` for token
-  formatting, token parsing, and quota endpoint wiring.
+  reasoning-output mode, and modern-model matching, while the
+  `google-thinking` stream family owns Gemini thinking payload normalization;
+  Gemini CLI OAuth also uses `formatApiKey`, `resolveUsageAuth`, and
+  `fetchUsageSnapshot` for token formatting, token parsing, and quota endpoint
+  wiring.
 - Anthropic Vertex uses `buildReplayPolicy` through the
   `anthropic-by-model` replay family so Claude-specific replay cleanup stays
   scoped to Claude ids instead of every `anthropic-messages` transport.
@@ -764,9 +766,12 @@ api.registerProvider({
   `hybrid-anthropic-openai` replay family because one provider owns both
   Anthropic-message and OpenAI-compatible semantics; it keeps Claude-only
   thinking-block dropping on the Anthropic side while overriding reasoning
-  output mode back to native.
+  output mode back to native, and the `minimax-fast-mode` stream family owns
+  fast-mode model rewrites on the shared stream path.
 - Moonshot uses `catalog` plus `wrapStreamFn` because it still uses the shared
-  OpenAI transport but needs provider-owned thinking payload normalization.
+  OpenAI transport but needs provider-owned thinking payload normalization; the
+  `moonshot-thinking` stream family maps config plus `/think` state onto its
+  native binary thinking payload.
 - Kilocode uses `catalog`, `capabilities`, `wrapStreamFn`, and
   `isCacheTtlEligible` because it needs provider-owned request headers,
   reasoning payload normalization, Gemini transcript hints, and Anthropic
@@ -775,7 +780,8 @@ api.registerProvider({
   `isCacheTtlEligible`, `isBinaryThinking`, `isModernModelRef`,
   `resolveUsageAuth`, and `fetchUsageSnapshot` because it owns GLM-5 fallback,
   `tool_stream` defaults, binary thinking UX, modern-model matching, and both
-  usage auth + quota fetching.
+  usage auth + quota fetching; the `tool-stream-default-on` stream family keeps
+  the default-on `tool_stream` wrapper out of per-provider handwritten glue.
 - Mistral, OpenCode Zen, and OpenCode Go use `capabilities` only to keep
   transcript/tooling quirks out of core.
 - Catalog-only bundled providers such as `byteplus`, `cloudflare-ai-gateway`,
diff --git a/docs/plugins/sdk-overview.md b/docs/plugins/sdk-overview.md
index a9ced1155a5..11972b47820 100644
--- a/docs/plugins/sdk-overview.md
+++ b/docs/plugins/sdk-overview.md
@@ -83,7 +83,7 @@ subpaths is in `scripts/lib/plugin-sdk-entrypoints.json`.
     | `plugin-sdk/provider-catalog-shared` | `findCatalogTemplate`, `buildSingleProviderApiKeyCatalog`, `supportsNativeStreamingUsageCompat`, `applyProviderNativeStreamingUsageCompat` |
     | `plugin-sdk/provider-tools` | `buildProviderToolCompatFamilyHooks`, Gemini schema helpers |
     | `plugin-sdk/provider-usage` | `fetchClaudeUsage` and similar |
-    | `plugin-sdk/provider-stream` | Stream wrapper types + provider stream wrappers |
+    | `plugin-sdk/provider-stream` | `buildProviderStreamFamilyHooks`, stream wrapper types, provider stream wrappers |
     | `plugin-sdk/provider-onboard` | Onboarding config patch helpers |
     | `plugin-sdk/global-singleton` | Process-local singleton/map/cache helpers |
   </Accordion>
diff --git a/docs/plugins/sdk-provider-plugins.md b/docs/plugins/sdk-provider-plugins.md
index e7fa9188262..ec5c47fbb29 100644
--- a/docs/plugins/sdk-provider-plugins.md
+++ b/docs/plugins/sdk-provider-plugins.md
@@ -255,13 +255,12 @@ API key auth, and dynamic model resolution.
 
     ```typescript
     import { buildProviderReplayFamilyHooks } from "openclaw/plugin-sdk/provider-model-shared";
+    import { buildProviderStreamFamilyHooks } from "openclaw/plugin-sdk/provider-stream";
     import { buildProviderToolCompatFamilyHooks } from "openclaw/plugin-sdk/provider-tools";
-    import { createGoogleThinkingPayloadWrapper } from "openclaw/plugin-sdk/provider-stream";
 
     const GOOGLE_FAMILY_HOOKS = {
       ...buildProviderReplayFamilyHooks({ family: "google-gemini" }),
-      wrapStreamFn: (ctx) =>
-        createGoogleThinkingPayloadWrapper(ctx.streamFn, ctx.thinkingLevel),
+      ...buildProviderStreamFamilyHooks("google-thinking"),
       ...buildProviderToolCompatFamilyHooks("gemini"),
     };
 
@@ -290,6 +289,22 @@ API key auth, and dynamic model resolution.
     - `minimax`: `hybrid-anthropic-openai`
     - `moonshot`, `ollama`, `xai`, and `zai`: `openai-compatible`
 
+    Available stream families today:
+
+    | Family | What it wires in |
+    | --- | --- |
+    | `google-thinking` | Gemini thinking payload normalization on the shared stream path |
+    | `moonshot-thinking` | Moonshot binary native-thinking payload mapping from config + `/think` level |
+    | `minimax-fast-mode` | MiniMax fast-mode model rewrite on the shared stream path |
+    | `tool-stream-default-on` | Default-on `tool_stream` wrapper for providers like Z.AI that want tool streaming unless explicitly disabled |
+
+    Real bundled examples:
+
+    - `google` and `google-gemini-cli`: `google-thinking`
+    - `moonshot`: `moonshot-thinking`
+    - `minimax` and `minimax-portal`: `minimax-fast-mode`
+    - `zai`: `tool-stream-default-on`
+
     <Tabs>
       <Tab title="Token exchange">
         For providers that need a token exchange before each inference call: