diff --git a/docs/providers/alibaba.md b/docs/providers/alibaba.md new file mode 100644 index 00000000000..3325f71fafb --- /dev/null +++ b/docs/providers/alibaba.md @@ -0,0 +1,73 @@ +--- +title: "Alibaba Model Studio" +summary: "Alibaba Model Studio Wan video generation in OpenClaw" +read_when: + - You want to use Alibaba Wan video generation in OpenClaw + - You need Model Studio or DashScope API key setup for video generation +--- + +# Alibaba Model Studio + +OpenClaw ships a bundled `alibaba` video-generation provider for Wan models on +Alibaba Model Studio / DashScope. + +- Provider: `alibaba` +- Preferred auth: `MODELSTUDIO_API_KEY` +- Also accepted: `DASHSCOPE_API_KEY`, `QWEN_API_KEY` +- API: DashScope / Model Studio async video generation + +## Quick start + +1. Set an API key: + +```bash +openclaw onboard --auth-choice qwen-standard-api-key +``` + +2. Set a default video model: + +```json5 +{ + agents: { + defaults: { + videoGenerationModel: { + primary: "alibaba/wan2.6-t2v", + }, + }, + }, +} +``` + +## Built-in Wan models + +The bundled `alibaba` provider currently registers: + +- `alibaba/wan2.6-t2v` +- `alibaba/wan2.6-i2v` +- `alibaba/wan2.6-r2v` +- `alibaba/wan2.6-r2v-flash` +- `alibaba/wan2.7-r2v` + +## Current limits + +- Up to **1** output video per request +- Up to **1** input image +- Up to **4** input videos +- Up to **10 seconds** duration +- Supports `size`, `aspectRatio`, `resolution`, `audio`, and `watermark` +- Reference image/video mode currently requires **remote http(s) URLs** + +## Relationship to Qwen + +The bundled `qwen` provider also uses Alibaba-hosted DashScope endpoints for +Wan video generation. Use: + +- `qwen/...` when you want the canonical Qwen provider surface +- `alibaba/...` when you want the direct vendor-owned Wan video surface + +## Related + +- [Video Generation](/tools/video-generation) +- [Qwen](/providers/qwen) +- [Qwen / Model Studio](/providers/qwen_modelstudio) +- [Configuration Reference](/gateway/configuration-reference#agent-defaults) diff --git a/docs/providers/fal.md b/docs/providers/fal.md new file mode 100644 index 00000000000..31229b30cb0 --- /dev/null +++ b/docs/providers/fal.md @@ -0,0 +1,90 @@ +--- +title: "fal" +summary: "fal image and video generation setup in OpenClaw" +read_when: + - You want to use fal image generation in OpenClaw + - You need the FAL_KEY auth flow + - You want fal defaults for image_generate or video_generate +--- + +# fal + +OpenClaw ships a bundled `fal` provider for hosted image and video generation. + +- Provider: `fal` +- Auth: `FAL_KEY` +- API: fal model endpoints + +## Quick start + +1. Set the API key: + +```bash +openclaw onboard --auth-choice fal-api-key +``` + +2. Set a default image model: + +```json5 +{ + agents: { + defaults: { + imageGenerationModel: { + primary: "fal/fal-ai/flux/dev", + }, + }, + }, +} +``` + +## Image generation + +The bundled `fal` image-generation provider defaults to +`fal/fal-ai/flux/dev`. + +- Generate: up to 4 images per request +- Edit mode: enabled, 1 reference image +- Supports `size`, `aspectRatio`, and `resolution` +- Current edit caveat: the fal image edit endpoint does **not** support + `aspectRatio` overrides + +To use fal as the default image provider: + +```json5 +{ + agents: { + defaults: { + imageGenerationModel: { + primary: "fal/fal-ai/flux/dev", + }, + }, + }, +} +``` + +## Video generation + +The bundled `fal` video-generation provider defaults to +`fal/fal-ai/minimax/video-01-live`. + +- Modes: text-to-video and single-image reference flows + +To use fal as the default video provider: + +```json5 +{ + agents: { + defaults: { + videoGenerationModel: { + primary: "fal/fal-ai/minimax/video-01-live", + }, + }, + }, +} +``` + +## Related + +- [Image Generation](/tools/image-generation) +- [Video Generation](/tools/video-generation) +- [Configuration Reference](/gateway/configuration-reference#agent-defaults) diff --git a/docs/providers/google.md b/docs/providers/google.md index 7109b55c36d..52f0aa703c5 100644 --- a/docs/providers/google.md +++ b/docs/providers/google.md @@ -100,6 +100,50 @@ The bundled `google` image-generation provider defaults to Image generation, media understanding, and Gemini Grounding all stay on the `google` provider id. +To use Google as the default image provider: + +```json5 +{ + agents: { + defaults: { + imageGenerationModel: { + primary: "google/gemini-3.1-flash-image-preview", + }, + }, + }, +} +``` + +See [Image Generation](/tools/image-generation) for the shared tool +parameters, provider selection, and failover behavior. + +## Video generation + +The bundled `google` plugin also registers video generation through the shared +`video_generate` tool. + +- Default video model: `google/veo-3.1-fast-generate-preview` +- Modes: text-to-video, image-to-video, and single-video reference flows +- Supports `aspectRatio`, `resolution`, and `audio` +- Current duration clamp: **4 to 8 seconds** + +To use Google as the default video provider: + +```json5 +{ + agents: { + defaults: { + videoGenerationModel: { + primary: "google/veo-3.1-fast-generate-preview", + }, + }, + }, +} +``` + +See [Video Generation](/tools/video-generation) for the shared tool +parameters, provider selection, and failover behavior. + ## Environment note If the Gateway runs as a daemon (launchd/systemd), make sure `GEMINI_API_KEY` diff --git a/docs/providers/minimax.md b/docs/providers/minimax.md index 073490aa308..d4c7130b233 100644 --- a/docs/providers/minimax.md +++ b/docs/providers/minimax.md @@ -63,6 +63,35 @@ The built-in bundled MiniMax text catalog itself stays text-only metadata until that explicit provider config exists. Image understanding is exposed separately through the plugin-owned `MiniMax-VL-01` media provider. +See [Image Generation](/tools/image-generation) for the shared tool +parameters, provider selection, and failover behavior. + +## Video generation + +The bundled `minimax` plugin also registers video generation through the shared +`video_generate` tool. + +- Default video model: `minimax/MiniMax-Hailuo-2.3` +- Modes: text-to-video and single-image reference flows +- Supports `aspectRatio` and `resolution` + +To use MiniMax as the default video provider: + +```json5 +{ + agents: { + defaults: { + videoGenerationModel: { + primary: "minimax/MiniMax-Hailuo-2.3", + }, + }, + }, +} +``` + +See [Video Generation](/tools/video-generation) for the shared tool +parameters, provider selection, and failover behavior. + ## Image understanding The MiniMax plugin registers image understanding separately from the text diff --git a/docs/providers/openai.md b/docs/providers/openai.md index 2223854a46f..bc83801e9b3 100644 --- a/docs/providers/openai.md +++ b/docs/providers/openai.md @@ -108,6 +108,63 @@ OpenClaw does **not** expose `openai/gpt-5.3-codex-spark` on the direct OpenAI API path. `pi-ai` still ships a built-in row for that model, but live OpenAI API requests currently reject it. Spark is treated as Codex-only in OpenClaw. +## Image generation + +The bundled `openai` plugin also registers image generation through the shared +`image_generate` tool. + +- Default image model: `openai/gpt-image-1` +- Generate: up to 4 images per request +- Edit mode: enabled, up to 5 reference images +- Supports `size` +- Current OpenAI-specific caveat: OpenClaw does not forward `aspectRatio` or + `resolution` overrides to the OpenAI Images API today + +To use OpenAI as the default image provider: + +```json5 +{ + agents: { + defaults: { + imageGenerationModel: { + primary: "openai/gpt-image-1", + }, + }, + }, +} +``` + +See [Image Generation](/tools/image-generation) for the shared tool +parameters, provider selection, and failover behavior. + +## Video generation + +The bundled `openai` plugin also registers video generation through the shared +`video_generate` tool. + +- Default video model: `openai/sora-2` +- Modes: text-to-video, image-to-video, and single-video reference/edit flows +- Current limits: 1 image or 1 video reference input +- Current OpenAI-specific caveat: OpenClaw does not forward `aspectRatio` or + `resolution` overrides to the native OpenAI video API today + +To use OpenAI as the default video provider: + +```json5 +{ + agents: { + defaults: { + videoGenerationModel: { + primary: "openai/sora-2", + }, + }, + }, +} +``` + +See [Video Generation](/tools/video-generation) for the shared tool +parameters, provider selection, and failover behavior. + ## Option B: OpenAI Code (Codex) subscription **Best for:** using ChatGPT/Codex subscription access instead of an API key. diff --git a/docs/providers/qwen.md b/docs/providers/qwen.md index 3581a53ca07..e252c62bd00 100644 --- a/docs/providers/qwen.md +++ b/docs/providers/qwen.md @@ -1,10 +1,9 @@ +--- summary: "Use Qwen Cloud via OpenClaw's bundled qwen provider" read_when: - -- You want to use Qwen with OpenClaw -- You previously used Qwen OAuth - title: "Qwen" - + - You want to use Qwen with OpenClaw + - You previously used Qwen OAuth +title: "Qwen" --- # Qwen @@ -127,5 +126,8 @@ Current bundled Qwen video-generation limits: file paths are rejected up front because the DashScope video endpoint does not accept uploaded local buffers for those references. +See [Video Generation](/tools/video-generation) for the shared tool +parameters, provider selection, and failover behavior. + See [Qwen / Model Studio](/providers/qwen_modelstudio) for endpoint-level detail and compatibility notes. diff --git a/docs/providers/qwen_modelstudio.md b/docs/providers/qwen_modelstudio.md index e287936d410..5071e667918 100644 --- a/docs/providers/qwen_modelstudio.md +++ b/docs/providers/qwen_modelstudio.md @@ -1,11 +1,10 @@ +--- title: "Qwen / Model Studio" summary: "Endpoint detail for the bundled qwen provider and its legacy modelstudio compatibility surface" read_when: - -- You want endpoint-level detail for Qwen Cloud / Alibaba DashScope -- You need the env var compatibility story for the qwen provider -- You want to use the Standard (pay-as-you-go) or Coding Plan endpoint - + - You want endpoint-level detail for Qwen Cloud / Alibaba DashScope + - You need the env var compatibility story for the qwen provider + - You want to use the Standard (pay-as-you-go) or Coding Plan endpoint --- # Qwen / Model Studio (Alibaba Cloud) @@ -135,3 +134,34 @@ endpoint/key pair. If the Gateway runs as a daemon (launchd/systemd), make sure `QWEN_API_KEY` is available to that process (for example, in `~/.openclaw/.env` or via `env.shellEnv`). + +## Wan video generation + +The Standard DashScope surface also backs the bundled Wan video-generation +providers. + +You can address the same Wan family through either prefix: + +- canonical Qwen refs: + - `qwen/wan2.6-t2v` + - `qwen/wan2.6-i2v` + - `qwen/wan2.6-r2v` + - `qwen/wan2.6-r2v-flash` + - `qwen/wan2.7-r2v` +- direct Alibaba refs: + - `alibaba/wan2.6-t2v` + - `alibaba/wan2.6-i2v` + - `alibaba/wan2.6-r2v` + - `alibaba/wan2.6-r2v-flash` + - `alibaba/wan2.7-r2v` + +All Wan reference modes currently require **remote http(s) URLs** for image or +video references. Local file paths are rejected before upload because the +DashScope video endpoint does not accept local-buffer reference assets for +those modes. + +## Related + +- [Qwen](/providers/qwen) +- [Alibaba Model Studio](/providers/alibaba) +- [Video Generation](/tools/video-generation) diff --git a/docs/providers/together.md b/docs/providers/together.md index edc9676318b..42898f4e08a 100644 --- a/docs/providers/together.md +++ b/docs/providers/together.md @@ -68,3 +68,29 @@ OpenClaw currently ships this bundled Together catalog: | `together/moonshotai/Kimi-K2-Instruct-0905` | Kimi K2-Instruct 0905 | text | 262,144 | Secondary Kimi text model | The onboarding preset sets `together/moonshotai/Kimi-K2.5` as the default model. + +## Video generation + +The bundled `together` plugin also registers video generation through the +shared `video_generate` tool. + +- Default video model: `together/Wan-AI/Wan2.2-T2V-A14B` +- Modes: text-to-video and single-image reference flows +- Supports `aspectRatio` and `resolution` + +To use Together as the default video provider: + +```json5 +{ + agents: { + defaults: { + videoGenerationModel: { + primary: "together/Wan-AI/Wan2.2-T2V-A14B", + }, + }, + }, +} +``` + +See [Video Generation](/tools/video-generation) for the shared tool +parameters, provider selection, and failover behavior. diff --git a/docs/providers/xai.md b/docs/providers/xai.md index bd97e6dbb7d..e495ad74446 100644 --- a/docs/providers/xai.md +++ b/docs/providers/xai.md @@ -75,6 +75,34 @@ The bundled `grok` web-search provider uses `XAI_API_KEY` too: openclaw config set tools.web.search.provider grok ``` +## Video generation + +The bundled `xai` plugin also registers video generation through the shared +`video_generate` tool. + +- Default video model: `xai/grok-imagine-video` +- Modes: text-to-video, image-to-video, and remote video edit/extend flows +- Supports `aspectRatio` and `resolution` +- Current limit: local video buffers are not accepted; use remote `http(s)` + URLs for video-reference/edit inputs + +To use xAI as the default video provider: + +```json5 +{ + agents: { + defaults: { + videoGenerationModel: { + primary: "xai/grok-imagine-video", + }, + }, + }, +} +``` + +See [Video Generation](/tools/video-generation) for the shared tool +parameters, provider selection, and failover behavior. + ## Known limits - Auth is API-key only today. There is no xAI OAuth/device-code flow in OpenClaw yet. diff --git a/docs/tools/image-generation.md b/docs/tools/image-generation.md index 303d62f4ac4..2205fed4732 100644 --- a/docs/tools/image-generation.md +++ b/docs/tools/image-generation.md @@ -24,7 +24,9 @@ The tool only appears when at least one image generation provider is available. { agents: { defaults: { - imageGenerationModel: "openai/gpt-image-1", + imageGenerationModel: { + primary: "openai/gpt-image-1", + }, }, }, } @@ -74,10 +76,6 @@ Not all providers support all parameters. The tool passes what each provider sup { agents: { defaults: { - // String form: primary model only - imageGenerationModel: "google/gemini-3.1-flash-image-preview", - - // Object form: primary + ordered fallbacks imageGenerationModel: { primary: "openai/gpt-image-1", fallbacks: ["google/gemini-3.1-flash-image-preview", "fal/fal-ai/flux/dev"], @@ -135,5 +133,9 @@ MiniMax image generation is available through both bundled MiniMax auth paths: ## Related - [Tools Overview](/tools) — all available agent tools +- [fal](/providers/fal) — fal image and video provider setup +- [Google (Gemini)](/providers/google) — Gemini image provider setup +- [MiniMax](/providers/minimax) — MiniMax image provider setup +- [OpenAI](/providers/openai) — OpenAI Images provider setup - [Configuration Reference](/gateway/configuration-reference#agent-defaults) — `imageGenerationModel` config - [Models](/concepts/models) — model configuration and failover diff --git a/docs/tools/video-generation.md b/docs/tools/video-generation.md index 1594a3e388e..082b92787d9 100644 --- a/docs/tools/video-generation.md +++ b/docs/tools/video-generation.md @@ -24,7 +24,9 @@ The tool only appears when at least one video-generation provider is available. { agents: { defaults: { - videoGenerationModel: "qwen/wan2.6-t2v", + videoGenerationModel: { + primary: "qwen/wan2.6-t2v", + }, }, }, } @@ -121,6 +123,13 @@ The bundled Qwen provider supports text-to-video plus image/video reference mode ## Related - [Tools Overview](/tools) — all available agent tools +- [Alibaba Model Studio](/providers/alibaba) — direct Wan provider setup +- [Google (Gemini)](/providers/google) — Veo provider setup +- [MiniMax](/providers/minimax) — Hailuo provider setup +- [OpenAI](/providers/openai) — Sora provider setup - [Qwen](/providers/qwen) — Qwen-specific setup and limits +- [Qwen / Model Studio](/providers/qwen_modelstudio) — endpoint-level DashScope detail +- [Together AI](/providers/together) — Together Wan provider setup +- [xAI](/providers/xai) — Grok video provider setup - [Configuration Reference](/gateway/configuration-reference#agent-defaults) — `videoGenerationModel` config - [Models](/concepts/models) — model configuration and failover