docs(providers): add generation setup pages

2026-04-05 23:20:52 +01:00 · 2026-04-05 23:20:52 +01:00 · f30c087fdf
parent 1a3eb38aaf
commit f30c087fdf
11 changed files with 406 additions and 16 deletions
--- a/docs/providers/alibaba.md
+++ b/docs/providers/alibaba.md
@ -0,0 +1,73 @@
+---
+title: "Alibaba Model Studio"
+summary: "Alibaba Model Studio Wan video generation in OpenClaw"
+read_when:
+  - You want to use Alibaba Wan video generation in OpenClaw
+  - You need Model Studio or DashScope API key setup for video generation
+---
+
+# Alibaba Model Studio
+
+OpenClaw ships a bundled `alibaba` video-generation provider for Wan models on
+Alibaba Model Studio / DashScope.
+
+- Provider: `alibaba`
+- Preferred auth: `MODELSTUDIO_API_KEY`
+- Also accepted: `DASHSCOPE_API_KEY`, `QWEN_API_KEY`
+- API: DashScope / Model Studio async video generation
+
+## Quick start
+
+1. Set an API key:
+
+```bash
+openclaw onboard --auth-choice qwen-standard-api-key
+```
+
+2. Set a default video model:
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "alibaba/wan2.6-t2v",
+      },
+    },
+  },
+}
+```
+
+## Built-in Wan models
+
+The bundled `alibaba` provider currently registers:
+
+- `alibaba/wan2.6-t2v`
+- `alibaba/wan2.6-i2v`
+- `alibaba/wan2.6-r2v`
+- `alibaba/wan2.6-r2v-flash`
+- `alibaba/wan2.7-r2v`
+
+## Current limits
+
+- Up to **1** output video per request
+- Up to **1** input image
+- Up to **4** input videos
+- Up to **10 seconds** duration
+- Supports `size`, `aspectRatio`, `resolution`, `audio`, and `watermark`
+- Reference image/video mode currently requires **remote http(s) URLs**
+
+## Relationship to Qwen
+
+The bundled `qwen` provider also uses Alibaba-hosted DashScope endpoints for
+Wan video generation. Use:
+
+- `qwen/...` when you want the canonical Qwen provider surface
+- `alibaba/...` when you want the direct vendor-owned Wan video surface
+
+## Related
+
+- [Video Generation](/tools/video-generation)
+- [Qwen](/providers/qwen)
+- [Qwen / Model Studio](/providers/qwen_modelstudio)
+- [Configuration Reference](/gateway/configuration-reference#agent-defaults)
--- a/docs/providers/fal.md
+++ b/docs/providers/fal.md
@ -0,0 +1,90 @@
+---
+title: "fal"
+summary: "fal image and video generation setup in OpenClaw"
+read_when:
+  - You want to use fal image generation in OpenClaw
+  - You need the FAL_KEY auth flow
+  - You want fal defaults for image_generate or video_generate
+---
+
+# fal
+
+OpenClaw ships a bundled `fal` provider for hosted image and video generation.
+
+- Provider: `fal`
+- Auth: `FAL_KEY`
+- API: fal model endpoints
+
+## Quick start
+
+1. Set the API key:
+
+```bash
+openclaw onboard --auth-choice fal-api-key
+```
+
+2. Set a default image model:
+
+```json5
+{
+  agents: {
+    defaults: {
+      imageGenerationModel: {
+        primary: "fal/fal-ai/flux/dev",
+      },
+    },
+  },
+}
+```
+
+## Image generation
+
+The bundled `fal` image-generation provider defaults to
+`fal/fal-ai/flux/dev`.
+
+- Generate: up to 4 images per request
+- Edit mode: enabled, 1 reference image
+- Supports `size`, `aspectRatio`, and `resolution`
+- Current edit caveat: the fal image edit endpoint does **not** support
+  `aspectRatio` overrides
+
+To use fal as the default image provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      imageGenerationModel: {
+        primary: "fal/fal-ai/flux/dev",
+      },
+    },
+  },
+}
+```
+
+## Video generation
+
+The bundled `fal` video-generation provider defaults to
+`fal/fal-ai/minimax/video-01-live`.
+
+- Modes: text-to-video and single-image reference flows
+
+To use fal as the default video provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "fal/fal-ai/minimax/video-01-live",
+      },
+    },
+  },
+}
+```
+
+## Related
+
+- [Image Generation](/tools/image-generation)
+- [Video Generation](/tools/video-generation)
+- [Configuration Reference](/gateway/configuration-reference#agent-defaults)
--- a/docs/providers/google.md
+++ b/docs/providers/google.md
@ -100,6 +100,50 @@ The bundled `google` image-generation provider defaults to
 Image generation, media understanding, and Gemini Grounding all stay on the
 `google` provider id.

+To use Google as the default image provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      imageGenerationModel: {
+        primary: "google/gemini-3.1-flash-image-preview",
+      },
+    },
+  },
+}
+```
+
+See [Image Generation](/tools/image-generation) for the shared tool
+parameters, provider selection, and failover behavior.
+
+## Video generation
+
+The bundled `google` plugin also registers video generation through the shared
+`video_generate` tool.
+
+- Default video model: `google/veo-3.1-fast-generate-preview`
+- Modes: text-to-video, image-to-video, and single-video reference flows
+- Supports `aspectRatio`, `resolution`, and `audio`
+- Current duration clamp: **4 to 8 seconds**
+
+To use Google as the default video provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "google/veo-3.1-fast-generate-preview",
+      },
+    },
+  },
+}
+```
+
+See [Video Generation](/tools/video-generation) for the shared tool
+parameters, provider selection, and failover behavior.
+
 ## Environment note

 If the Gateway runs as a daemon (launchd/systemd), make sure `GEMINI_API_KEY`
--- a/docs/providers/minimax.md
+++ b/docs/providers/minimax.md
@ -63,6 +63,35 @@ The built-in bundled MiniMax text catalog itself stays text-only metadata until
 that explicit provider config exists. Image understanding is exposed separately
 through the plugin-owned `MiniMax-VL-01` media provider.

+See [Image Generation](/tools/image-generation) for the shared tool
+parameters, provider selection, and failover behavior.
+
+## Video generation
+
+The bundled `minimax` plugin also registers video generation through the shared
+`video_generate` tool.
+
+- Default video model: `minimax/MiniMax-Hailuo-2.3`
+- Modes: text-to-video and single-image reference flows
+- Supports `aspectRatio` and `resolution`
+
+To use MiniMax as the default video provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "minimax/MiniMax-Hailuo-2.3",
+      },
+    },
+  },
+}
+```
+
+See [Video Generation](/tools/video-generation) for the shared tool
+parameters, provider selection, and failover behavior.
+
 ## Image understanding

 The MiniMax plugin registers image understanding separately from the text
--- a/docs/providers/openai.md
+++ b/docs/providers/openai.md
@ -108,6 +108,63 @@ OpenClaw does **not** expose `openai/gpt-5.3-codex-spark` on the direct OpenAI
 API path. `pi-ai` still ships a built-in row for that model, but live OpenAI API
 requests currently reject it. Spark is treated as Codex-only in OpenClaw.

+## Image generation
+
+The bundled `openai` plugin also registers image generation through the shared
+`image_generate` tool.
+
+- Default image model: `openai/gpt-image-1`
+- Generate: up to 4 images per request
+- Edit mode: enabled, up to 5 reference images
+- Supports `size`
+- Current OpenAI-specific caveat: OpenClaw does not forward `aspectRatio` or
+  `resolution` overrides to the OpenAI Images API today
+
+To use OpenAI as the default image provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      imageGenerationModel: {
+        primary: "openai/gpt-image-1",
+      },
+    },
+  },
+}
+```
+
+See [Image Generation](/tools/image-generation) for the shared tool
+parameters, provider selection, and failover behavior.
+
+## Video generation
+
+The bundled `openai` plugin also registers video generation through the shared
+`video_generate` tool.
+
+- Default video model: `openai/sora-2`
+- Modes: text-to-video, image-to-video, and single-video reference/edit flows
+- Current limits: 1 image or 1 video reference input
+- Current OpenAI-specific caveat: OpenClaw does not forward `aspectRatio` or
+  `resolution` overrides to the native OpenAI video API today
+
+To use OpenAI as the default video provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "openai/sora-2",
+      },
+    },
+  },
+}
+```
+
+See [Video Generation](/tools/video-generation) for the shared tool
+parameters, provider selection, and failover behavior.
+
 ## Option B: OpenAI Code (Codex) subscription

 **Best for:** using ChatGPT/Codex subscription access instead of an API key.
--- a/docs/providers/qwen.md
+++ b/docs/providers/qwen.md
@ -1,10 +1,9 @@
+---
 summary: "Use Qwen Cloud via OpenClaw's bundled qwen provider"
 read_when:
-
- You want to use Qwen with OpenClaw
- You previously used Qwen OAuth
-  title: "Qwen"
-
+  - You want to use Qwen with OpenClaw
+  - You previously used Qwen OAuth
+title: "Qwen"
 ---

 # Qwen
@ -127,5 +126,8 @@ Current bundled Qwen video-generation limits:
  file paths are rejected up front because the DashScope video endpoint does not
  accept uploaded local buffers for those references.

+See [Video Generation](/tools/video-generation) for the shared tool
+parameters, provider selection, and failover behavior.
+
 See [Qwen / Model Studio](/providers/qwen_modelstudio) for endpoint-level detail
 and compatibility notes.
--- a/docs/providers/qwen_modelstudio.md
+++ b/docs/providers/qwen_modelstudio.md
@ -1,11 +1,10 @@
+---
 title: "Qwen / Model Studio"
 summary: "Endpoint detail for the bundled qwen provider and its legacy modelstudio compatibility surface"
 read_when:
-
- You want endpoint-level detail for Qwen Cloud / Alibaba DashScope
- You need the env var compatibility story for the qwen provider
- You want to use the Standard (pay-as-you-go) or Coding Plan endpoint
-
+  - You want endpoint-level detail for Qwen Cloud / Alibaba DashScope
+  - You need the env var compatibility story for the qwen provider
+  - You want to use the Standard (pay-as-you-go) or Coding Plan endpoint
 ---

 # Qwen / Model Studio (Alibaba Cloud)
@ -135,3 +134,34 @@ endpoint/key pair.
 If the Gateway runs as a daemon (launchd/systemd), make sure
 `QWEN_API_KEY` is available to that process (for example, in
 `~/.openclaw/.env` or via `env.shellEnv`).
+
+## Wan video generation
+
+The Standard DashScope surface also backs the bundled Wan video-generation
+providers.
+
+You can address the same Wan family through either prefix:
+
+- canonical Qwen refs:
+  - `qwen/wan2.6-t2v`
+  - `qwen/wan2.6-i2v`
+  - `qwen/wan2.6-r2v`
+  - `qwen/wan2.6-r2v-flash`
+  - `qwen/wan2.7-r2v`
+- direct Alibaba refs:
+  - `alibaba/wan2.6-t2v`
+  - `alibaba/wan2.6-i2v`
+  - `alibaba/wan2.6-r2v`
+  - `alibaba/wan2.6-r2v-flash`
+  - `alibaba/wan2.7-r2v`
+
+All Wan reference modes currently require **remote http(s) URLs** for image or
+video references. Local file paths are rejected before upload because the
+DashScope video endpoint does not accept local-buffer reference assets for
+those modes.
+
+## Related
+
+- [Qwen](/providers/qwen)
+- [Alibaba Model Studio](/providers/alibaba)
+- [Video Generation](/tools/video-generation)
--- a/docs/providers/together.md
+++ b/docs/providers/together.md
@ -68,3 +68,29 @@ OpenClaw currently ships this bundled Together catalog:
 | `together/moonshotai/Kimi-K2-Instruct-0905`                  | Kimi K2-Instruct 0905                  | text        | 262,144    | Secondary Kimi text model        |

 The onboarding preset sets `together/moonshotai/Kimi-K2.5` as the default model.
+
+## Video generation
+
+The bundled `together` plugin also registers video generation through the
+shared `video_generate` tool.
+
+- Default video model: `together/Wan-AI/Wan2.2-T2V-A14B`
+- Modes: text-to-video and single-image reference flows
+- Supports `aspectRatio` and `resolution`
+
+To use Together as the default video provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "together/Wan-AI/Wan2.2-T2V-A14B",
+      },
+    },
+  },
+}
+```
+
+See [Video Generation](/tools/video-generation) for the shared tool
+parameters, provider selection, and failover behavior.
--- a/docs/providers/xai.md
+++ b/docs/providers/xai.md
@ -75,6 +75,34 @@ The bundled `grok` web-search provider uses `XAI_API_KEY` too:
 openclaw config set tools.web.search.provider grok
 ```

+## Video generation
+
+The bundled `xai` plugin also registers video generation through the shared
+`video_generate` tool.
+
+- Default video model: `xai/grok-imagine-video`
+- Modes: text-to-video, image-to-video, and remote video edit/extend flows
+- Supports `aspectRatio` and `resolution`
+- Current limit: local video buffers are not accepted; use remote `http(s)`
+  URLs for video-reference/edit inputs
+
+To use xAI as the default video provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "xai/grok-imagine-video",
+      },
+    },
+  },
+}
+```
+
+See [Video Generation](/tools/video-generation) for the shared tool
+parameters, provider selection, and failover behavior.
+
 ## Known limits

 - Auth is API-key only today. There is no xAI OAuth/device-code flow in OpenClaw yet.
--- a/docs/tools/image-generation.md
+++ b/docs/tools/image-generation.md
@ -24,7 +24,9 @@ The tool only appears when at least one image generation provider is available.
 {
  agents: {
    defaults: {
-      imageGenerationModel: "openai/gpt-image-1",
+      imageGenerationModel: {
+        primary: "openai/gpt-image-1",
+      },
    },
  },
 }
@ -74,10 +76,6 @@ Not all providers support all parameters. The tool passes what each provider sup
 {
  agents: {
    defaults: {
-      // String form: primary model only
-      imageGenerationModel: "google/gemini-3.1-flash-image-preview",
-
-      // Object form: primary + ordered fallbacks
      imageGenerationModel: {
        primary: "openai/gpt-image-1",
        fallbacks: ["google/gemini-3.1-flash-image-preview", "fal/fal-ai/flux/dev"],
@ -135,5 +133,9 @@ MiniMax image generation is available through both bundled MiniMax auth paths:
 ## Related

 - [Tools Overview](/tools) — all available agent tools
+- [fal](/providers/fal) — fal image and video provider setup
+- [Google (Gemini)](/providers/google) — Gemini image provider setup
+- [MiniMax](/providers/minimax) — MiniMax image provider setup
+- [OpenAI](/providers/openai) — OpenAI Images provider setup
 - [Configuration Reference](/gateway/configuration-reference#agent-defaults) — `imageGenerationModel` config
 - [Models](/concepts/models) — model configuration and failover
--- a/docs/tools/video-generation.md
+++ b/docs/tools/video-generation.md
@ -24,7 +24,9 @@ The tool only appears when at least one video-generation provider is available.
 {
  agents: {
    defaults: {
-      videoGenerationModel: "qwen/wan2.6-t2v",
+      videoGenerationModel: {
+        primary: "qwen/wan2.6-t2v",
+      },
    },
  },
 }
@ -121,6 +123,13 @@ The bundled Qwen provider supports text-to-video plus image/video reference mode
 ## Related

 - [Tools Overview](/tools) — all available agent tools
+- [Alibaba Model Studio](/providers/alibaba) — direct Wan provider setup
+- [Google (Gemini)](/providers/google) — Veo provider setup
+- [MiniMax](/providers/minimax) — Hailuo provider setup
+- [OpenAI](/providers/openai) — Sora provider setup
 - [Qwen](/providers/qwen) — Qwen-specific setup and limits
+- [Qwen / Model Studio](/providers/qwen_modelstudio) — endpoint-level DashScope detail
+- [Together AI](/providers/together) — Together Wan provider setup
+- [xAI](/providers/xai) — Grok video provider setup
 - [Configuration Reference](/gateway/configuration-reference#agent-defaults) — `videoGenerationModel` config
 - [Models](/concepts/models) — model configuration and failover