docs: document music generation async flow

2026-04-06 01:46:25 +01:00 · 2026-04-06 01:46:25 +01:00 · f6dbcf4cda
parent 3027f0dde5
commit f6dbcf4cda
13 changed files with 253 additions and 46 deletions
--- a/docs/concepts/models.md
+++ b/docs/concepts/models.md
@ -30,6 +30,7 @@ Related:
  falls back to `agents.defaults.imageModel`, then the resolved session/default
  model.
 - `agents.defaults.imageGenerationModel` is used by the shared image-generation capability. If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
+- `agents.defaults.musicGenerationModel` is used by the shared music-generation capability. If omitted, `music_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered music-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
 - `agents.defaults.videoGenerationModel` is used by the shared video-generation capability. If omitted, `video_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered video-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
 - Per-agent defaults can override `agents.defaults.model` via `agents.list[].model` plus bindings (see [/concepts/multi-agent](/concepts/multi-agent)).

@ -253,5 +254,6 @@ This applies whenever OpenClaw regenerates `models.json`, including command-driv
 - [Model Providers](/concepts/model-providers) — provider routing and auth
 - [Model Failover](/concepts/model-failover) — fallback chains
 - [Image Generation](/tools/image-generation) — image model configuration
+- [Music Generation](/tools/music-generation) — music model configuration
 - [Video Generation](/tools/video-generation) — video model configuration
 - [Configuration Reference](/gateway/configuration-reference#agent-defaults) — model config keys
--- a/docs/gateway/configuration-reference.md
+++ b/docs/gateway/configuration-reference.md
@ -1026,6 +1026,11 @@ Time format in system prompt. Default: `auto` (OS preference).
  - Typical values: `google/gemini-3.1-flash-image-preview` for native Gemini image generation, `fal/fal-ai/flux/dev` for fal, or `openai/gpt-image-1` for OpenAI Images.
  - If you select a provider/model directly, configure the matching provider auth/API key too (for example `GEMINI_API_KEY` or `GOOGLE_API_KEY` for `google/*`, `OPENAI_API_KEY` for `openai/*`, `FAL_KEY` for `fal/*`).
  - If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order.
+- `musicGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
+  - Used by the shared music-generation capability and the built-in `music_generate` tool.
+  - Typical values: `google/lyria-3-clip-preview`, `google/lyria-3-pro-preview`, or `minimax/music-2.5+`.
+  - If omitted, `music_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered music-generation providers in provider-id order.
+  - If you select a provider/model directly, configure the matching provider auth/API key too.
 - `videoGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
  - Used by the shared video-generation capability and the built-in `video_generate` tool.
  - Typical values: `qwen/wan2.6-t2v`, `qwen/wan2.6-i2v`, `qwen/wan2.6-r2v`, `qwen/wan2.6-r2v-flash`, or `qwen/wan2.7-r2v`.
--- a/docs/plugins/architecture.md
+++ b/docs/plugins/architecture.md
@ -35,6 +35,7 @@ native OpenClaw plugin registers against one or more capability types:
 | Realtime voice         | `api.registerRealtimeVoiceProvider(...)`         | `openai`                             |
 | Media understanding    | `api.registerMediaUnderstandingProvider(...)`    | `openai`, `google`                   |
 | Image generation       | `api.registerImageGenerationProvider(...)`       | `openai`, `google`, `fal`, `minimax` |
+| Music generation       | `api.registerMusicGenerationProvider(...)`       | `google`, `minimax`                  |
 | Video generation       | `api.registerVideoGenerationProvider(...)`       | `qwen`                               |
 | Web fetch              | `api.registerWebFetchProvider(...)`              | `firecrawl`                          |
 | Web search             | `api.registerWebSearchProvider(...)`             | `google`                             |
--- a/docs/plugins/building-plugins.md
+++ b/docs/plugins/building-plugins.md
@ -157,6 +157,7 @@ A single plugin can register any number of capabilities via the `api` object:
 | Realtime voice         | `api.registerRealtimeVoiceProvider(...)`         | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
 | Media understanding    | `api.registerMediaUnderstandingProvider(...)`    | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
 | Image generation       | `api.registerImageGenerationProvider(...)`       | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
+| Music generation       | `api.registerMusicGenerationProvider(...)`       | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
 | Video generation       | `api.registerVideoGenerationProvider(...)`       | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
 | Web fetch              | `api.registerWebFetchProvider(...)`              | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
 | Web search             | `api.registerWebSearchProvider(...)`             | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
--- a/docs/plugins/manifest.md
+++ b/docs/plugins/manifest.md
@ -128,26 +128,26 @@ Those belong in your plugin code and `package.json`.

 ## Top-level field reference

-| Field                               | Required | Type                             | What it means                                                                                                                                                                              |
-| ----------------------------------- | -------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `id`                                | Yes      | `string`                         | Canonical plugin id. This is the id used in `plugins.entries.<id>`.                                                                                                                        |
-| `configSchema`                      | Yes      | `object`                         | Inline JSON Schema for this plugin's config.                                                                                                                                               |
-| `enabledByDefault`                  | No       | `true`                           | Marks a bundled plugin as enabled by default. Omit it, or set any non-`true` value, to leave the plugin disabled by default.                                                               |
-| `legacyPluginIds`                   | No       | `string[]`                       | Legacy ids that normalize to this canonical plugin id.                                                                                                                                     |
-| `autoEnableWhenConfiguredProviders` | No       | `string[]`                       | Provider ids that should auto-enable this plugin when auth, config, or model refs mention them.                                                                                            |
-| `kind`                              | No       | `"memory"` \| `"context-engine"` | Declares an exclusive plugin kind used by `plugins.slots.*`.                                                                                                                               |
-| `channels`                          | No       | `string[]`                       | Channel ids owned by this plugin. Used for discovery and config validation.                                                                                                                |
-| `providers`                         | No       | `string[]`                       | Provider ids owned by this plugin.                                                                                                                                                         |
-| `modelSupport`                      | No       | `object`                         | Manifest-owned shorthand model-family metadata used to auto-load the plugin before runtime.                                                                                                |
-| `providerAuthEnvVars`               | No       | `Record<string, string[]>`       | Cheap provider-auth env metadata that OpenClaw can inspect without loading plugin code.                                                                                                    |
-| `providerAuthChoices`               | No       | `object[]`                       | Cheap auth-choice metadata for onboarding pickers, preferred-provider resolution, and simple CLI flag wiring.                                                                              |
-| `contracts`                         | No       | `object`                         | Static bundled capability snapshot for speech, realtime transcription, realtime voice, media-understanding, image-generation, video-generation, web-fetch, web search, and tool ownership. |
-| `channelConfigs`                    | No       | `Record<string, object>`         | Manifest-owned channel config metadata merged into discovery and validation surfaces before runtime loads.                                                                                 |
-| `skills`                            | No       | `string[]`                       | Skill directories to load, relative to the plugin root.                                                                                                                                    |
-| `name`                              | No       | `string`                         | Human-readable plugin name.                                                                                                                                                                |
-| `description`                       | No       | `string`                         | Short summary shown in plugin surfaces.                                                                                                                                                    |
-| `version`                           | No       | `string`                         | Informational plugin version.                                                                                                                                                              |
-| `uiHints`                           | No       | `Record<string, object>`         | UI labels, placeholders, and sensitivity hints for config fields.                                                                                                                          |
+| Field                               | Required | Type                             | What it means                                                                                                                                                                                                |
+| ----------------------------------- | -------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `id`                                | Yes      | `string`                         | Canonical plugin id. This is the id used in `plugins.entries.<id>`.                                                                                                                                          |
+| `configSchema`                      | Yes      | `object`                         | Inline JSON Schema for this plugin's config.                                                                                                                                                                 |
+| `enabledByDefault`                  | No       | `true`                           | Marks a bundled plugin as enabled by default. Omit it, or set any non-`true` value, to leave the plugin disabled by default.                                                                                 |
+| `legacyPluginIds`                   | No       | `string[]`                       | Legacy ids that normalize to this canonical plugin id.                                                                                                                                                       |
+| `autoEnableWhenConfiguredProviders` | No       | `string[]`                       | Provider ids that should auto-enable this plugin when auth, config, or model refs mention them.                                                                                                              |
+| `kind`                              | No       | `"memory"` \| `"context-engine"` | Declares an exclusive plugin kind used by `plugins.slots.*`.                                                                                                                                                 |
+| `channels`                          | No       | `string[]`                       | Channel ids owned by this plugin. Used for discovery and config validation.                                                                                                                                  |
+| `providers`                         | No       | `string[]`                       | Provider ids owned by this plugin.                                                                                                                                                                           |
+| `modelSupport`                      | No       | `object`                         | Manifest-owned shorthand model-family metadata used to auto-load the plugin before runtime.                                                                                                                  |
+| `providerAuthEnvVars`               | No       | `Record<string, string[]>`       | Cheap provider-auth env metadata that OpenClaw can inspect without loading plugin code.                                                                                                                      |
+| `providerAuthChoices`               | No       | `object[]`                       | Cheap auth-choice metadata for onboarding pickers, preferred-provider resolution, and simple CLI flag wiring.                                                                                                |
+| `contracts`                         | No       | `object`                         | Static bundled capability snapshot for speech, realtime transcription, realtime voice, media-understanding, image-generation, music-generation, video-generation, web-fetch, web search, and tool ownership. |
+| `channelConfigs`                    | No       | `Record<string, object>`         | Manifest-owned channel config metadata merged into discovery and validation surfaces before runtime loads.                                                                                                   |
+| `skills`                            | No       | `string[]`                       | Skill directories to load, relative to the plugin root.                                                                                                                                                      |
+| `name`                              | No       | `string`                         | Human-readable plugin name.                                                                                                                                                                                  |
+| `description`                       | No       | `string`                         | Short summary shown in plugin surfaces.                                                                                                                                                                      |
+| `version`                           | No       | `string`                         | Informational plugin version.                                                                                                                                                                                |
+| `uiHints`                           | No       | `Record<string, object>`         | UI labels, placeholders, and sensitivity hints for config fields.                                                                                                                                            |

 ## providerAuthChoices reference

--- a/docs/plugins/sdk-migration.md
+++ b/docs/plugins/sdk-migration.md
@ -263,6 +263,8 @@ Current bundled provider examples:
  | `plugin-sdk/realtime-transcription` | Realtime transcription helpers | Provider types and registry helpers |
  | `plugin-sdk/realtime-voice` | Realtime voice helpers | Provider types and registry helpers |
  | `plugin-sdk/image-generation-core` | Shared image-generation core | Image-generation types, failover, auth, and registry helpers |
+  | `plugin-sdk/music-generation` | Music-generation helpers | Music-generation provider/request/result types |
+  | `plugin-sdk/music-generation-core` | Shared music-generation core | Music-generation types, failover helpers, provider lookup, and model-ref parsing |
  | `plugin-sdk/video-generation` | Video-generation helpers | Video-generation provider/request/result types |
  | `plugin-sdk/video-generation-core` | Shared video-generation core | Video-generation types, failover helpers, provider lookup, and model-ref parsing |
  | `plugin-sdk/interactive-runtime` | Interactive reply helpers | Interactive reply payload normalization/reduction |
--- a/docs/plugins/sdk-overview.md
+++ b/docs/plugins/sdk-overview.md
@ -232,6 +232,8 @@ explicitly promotes one as public.
    | `plugin-sdk/realtime-voice` | Realtime voice provider types and registry helpers |
    | `plugin-sdk/image-generation` | Image generation provider types |
    | `plugin-sdk/image-generation-core` | Shared image-generation types, failover, auth, and registry helpers |
+    | `plugin-sdk/music-generation` | Music generation provider/request/result types |
+    | `plugin-sdk/music-generation-core` | Shared music-generation types, failover helpers, provider lookup, and model-ref parsing |
    | `plugin-sdk/video-generation` | Video generation provider/request/result types |
    | `plugin-sdk/video-generation-core` | Shared video-generation types, failover helpers, provider lookup, and model-ref parsing |
    | `plugin-sdk/webhook-targets` | Webhook target registry and route-install helpers |
@ -288,6 +290,7 @@ methods:
 | `api.registerRealtimeVoiceProvider(...)`         | Duplex realtime voice sessions   |
 | `api.registerMediaUnderstandingProvider(...)`    | Image/audio/video analysis       |
 | `api.registerImageGenerationProvider(...)`       | Image generation                 |
+| `api.registerMusicGenerationProvider(...)`       | Music generation                 |
 | `api.registerVideoGenerationProvider(...)`       | Video generation                 |
 | `api.registerWebFetchProvider(...)`              | Web fetch / scrape provider      |
 | `api.registerWebSearchProvider(...)`             | Web search                       |
--- a/docs/providers/google.md
+++ b/docs/providers/google.md
@ -51,6 +51,7 @@ openclaw onboard --non-interactive \
 | ---------------------- | ----------------- |
 | Chat completions       | Yes               |
 | Image generation       | Yes               |
+| Music generation       | Yes               |
 | Image understanding    | Yes               |
 | Audio transcription    | Yes               |
 | Video understanding    | Yes               |
@ -144,6 +145,35 @@ To use Google as the default video provider:
 See [Video Generation](/tools/video-generation) for the shared tool
 parameters, provider selection, and failover behavior.

+## Music generation
+
+The bundled `google` plugin also registers music generation through the shared
+`music_generate` tool.
+
+- Default music model: `google/lyria-3-clip-preview`
+- Also supports `google/lyria-3-pro-preview`
+- Prompt controls: `lyrics` and `instrumental`
+- Output format: `mp3` by default, plus `wav` on `google/lyria-3-pro-preview`
+- Reference inputs: up to 10 images
+- Session-backed runs detach through the shared task/status flow, including `action: "status"`
+
+To use Google as the default music provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      musicGenerationModel: {
+        primary: "google/lyria-3-clip-preview",
+      },
+    },
+  },
+}
+```
+
+See [Music Generation](/tools/music-generation) for the shared tool
+parameters, provider selection, and failover behavior.
+
 ## Environment note

 If the Gateway runs as a daemon (launchd/systemd), make sure `GEMINI_API_KEY`
--- a/docs/providers/index.md
+++ b/docs/providers/index.md
@ -72,7 +72,7 @@ Looking for chat channel docs (WhatsApp/Telegram/Discord/Slack/Mattermost (plugi

 - [Additional bundled variants](/providers/models#additional-bundled-provider-variants) - Anthropic Vertex, Copilot Proxy, and Gemini CLI OAuth
 - [Image Generation](/tools/image-generation) - Shared `image_generate` tool, provider selection, and failover
- [Music Generation](/tools/music-generation) - Plugin-provided `music_generate` tool surfaces
+- [Music Generation](/tools/music-generation) - Shared `music_generate` tool, provider selection, and failover
 - [Video Generation](/tools/video-generation) - Shared `video_generate` tool, provider selection, and failover

 ## Transcription providers
--- a/docs/providers/minimax.md
+++ b/docs/providers/minimax.md
@ -14,6 +14,7 @@ MiniMax also provides:

 - bundled speech synthesis via T2A v2
 - bundled image understanding via `MiniMax-VL-01`
+- bundled music generation via `music-2.5+`
 - bundled `web_search` through the MiniMax Coding Plan search API

 Provider split:
@ -66,6 +67,34 @@ through the plugin-owned `MiniMax-VL-01` media provider.
 See [Image Generation](/tools/image-generation) for the shared tool
 parameters, provider selection, and failover behavior.

+## Music generation
+
+The bundled `minimax` plugin also registers music generation through the shared
+`music_generate` tool.
+
+- Default music model: `minimax/music-2.5+`
+- Also supports `minimax/music-2.5` and `minimax/music-2.0`
+- Prompt controls: `lyrics`, `instrumental`, `durationSeconds`
+- Output format: `mp3`
+- Session-backed runs detach through the shared task/status flow, including `action: "status"`
+
+To use MiniMax as the default music provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      musicGenerationModel: {
+        primary: "minimax/music-2.5+",
+      },
+    },
+  },
+}
+```
+
+See [Music Generation](/tools/music-generation) for the shared tool
+parameters, provider selection, and failover behavior.
+
 ## Video generation

 The bundled `minimax` plugin also registers video generation through the shared
--- a/docs/tools/index.md
+++ b/docs/tools/index.md
@ -66,6 +66,7 @@ These tools ship with OpenClaw and are available without installing any plugins:
 | `nodes`                                    | Discover and target paired devices                                    |                                             |
 | `cron` / `gateway`                         | Manage scheduled jobs; inspect, patch, restart, or update the gateway |                                             |
 | `image` / `image_generate`                 | Analyze or generate images                                            | [Image Generation](/tools/image-generation) |
+| `music_generate`                           | Generate music tracks                                                 | [Music Generation](/tools/music-generation) |
 | `video_generate`                           | Generate videos                                                       | [Video Generation](/tools/video-generation) |
 | `tts`                                      | One-shot text-to-speech conversion                                    | [TTS](/tools/tts)                           |
 | `sessions_*` / `subagents` / `agents_list` | Session management, status, and sub-agent orchestration               | [Sub-agents](/tools/subagents)              |
@ -73,6 +74,8 @@ These tools ship with OpenClaw and are available without installing any plugins:

 For image work, use `image` for analysis and `image_generate` for generation or editing. If you target `openai/*`, `google/*`, `fal/*`, or another non-default image provider, configure that provider's auth/API key first.

+For music work, use `music_generate`. If you target `google/*`, `minimax/*`, or another non-default music provider, configure that provider's auth/API key first.
+
 For video work, use `video_generate`. If you target `qwen/*` or another non-default video provider, configure that provider's auth/API key first.

 For workflow-driven audio generation, use `music_generate` when a plugin such as
@ -128,12 +131,12 @@ config. Deny always wins over allow.
 `tools.profile` sets a base allowlist before `allow`/`deny` is applied.
 Per-agent override: `agents.list[].tools.profile`.

-| Profile     | What it includes                                                                                                                |
-| ----------- | ------------------------------------------------------------------------------------------------------------------------------- |
-| `full`      | No restriction (same as unset)                                                                                                  |
-| `coding`    | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `video_generate` |
-| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status`                                       |
-| `minimal`   | `session_status` only                                                                                                           |
+| Profile     | What it includes                                                                                                                                  |
+| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `full`      | No restriction (same as unset)                                                                                                                    |
+| `coding`    | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `music_generate`, `video_generate` |
+| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status`                                                         |
+| `minimal`   | `session_status` only                                                                                                                             |

 ### Tool groups

@ -151,7 +154,7 @@ Use `group:*` shorthands in allow/deny lists:
 | `group:messaging`  | message                                                                                                   |
 | `group:nodes`      | nodes                                                                                                     |
 | `group:agents`     | agents_list                                                                                               |
-| `group:media`      | image, image_generate, video_generate, tts                                                                |
+| `group:media`      | image, image_generate, music_generate, video_generate, tts                                                |
 | `group:openclaw`   | All built-in OpenClaw tools (excludes plugin tools)                                                       |

 `sessions_history` returns a bounded, safety-filtered recall view. It strips
--- a/docs/tools/music-generation.md
+++ b/docs/tools/music-generation.md
@ -1,23 +1,68 @@
 ---
-summary: "Generate music or audio with plugin-provided tools such as ComfyUI workflows"
+summary: "Generate music with shared providers or plugin-provided workflows"
 read_when:
  - Generating music or audio via the agent
-  - Configuring plugin-provided music generation tools
+  - Configuring music generation providers and models
  - Understanding the music_generate tool parameters
 title: "Music Generation"
 ---

 # Music Generation

-The `music_generate` tool lets the agent create audio files when a plugin
-registers music generation support.
+The `music_generate` tool lets the agent create music or audio through either:

-The bundled `comfy` plugin currently provides `music_generate` using a
-workflow-configured ComfyUI graph.
+- the shared music-generation capability with configured providers such as Google and MiniMax
+- plugin-provided tool surfaces such as a workflow-configured ComfyUI graph
+
+For shared provider-backed agent sessions, OpenClaw starts music generation as a
+background task, tracks it in the task ledger, then wakes the agent again when
+the track is ready so the agent can post the finished audio back into the
+original channel.
+
+<Note>
+The built-in shared tool only appears when at least one music-generation provider is available. If you don't see `music_generate` in your agent's tools, configure `agents.defaults.musicGenerationModel` or set up a provider API key.
+</Note>
+
+<Note>
+Plugin-provided `music_generate` implementations can expose different parameters or runtime behavior. The async task/status flow below applies to the built-in shared provider-backed path.
+</Note>

 ## Quick start

-1. Configure `models.providers.comfy.music` with a workflow JSON and prompt/output nodes.
+### Shared provider-backed generation
+
+1. Set an API key for at least one provider, for example `GEMINI_API_KEY` or
+   `MINIMAX_API_KEY`.
+2. Optionally set your preferred model:
+
+```json5
+{
+  agents: {
+    defaults: {
+      musicGenerationModel: {
+        primary: "google/lyria-3-clip-preview",
+      },
+    },
+  },
+}
+```
+
+3. Ask the agent: _"Generate an upbeat synthpop track about a night drive
+   through a neon city."_
+
+The agent calls `music_generate` automatically. No tool allow-listing needed.
+
+For direct synchronous contexts without a session-backed agent run, the built-in
+tool still falls back to inline generation and returns the final media path in
+the tool result.
+
+### Workflow-driven plugin generation
+
+The bundled `comfy` plugin can also provide `music_generate` using a
+workflow-configured ComfyUI graph.
+
+1. Configure `models.providers.comfy.music` with a workflow JSON and
+   prompt/output nodes.
 2. If you use Comfy Cloud, set `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY`.
 3. Ask the agent for music or call the tool directly.

@ -27,22 +72,102 @@ Example:
 /tool music_generate prompt="Warm ambient synth loop with soft tape texture"
 ```

-## Tool parameters
+## Shared bundled provider support

-| Parameter  | Type   | Description                                         |
-| ---------- | ------ | --------------------------------------------------- |
-| `prompt`   | string | Music or audio generation prompt                    |
-| `action`   | string | `"generate"` (default) or `"list"`                  |
-| `model`    | string | Provider/model override. Currently `comfy/workflow` |
-| `filename` | string | Output filename hint for the saved audio file       |
+| Provider | Default model          | Reference inputs | Supported controls                                        | API key                            |
+| -------- | ---------------------- | ---------------- | --------------------------------------------------------- | ---------------------------------- |
+| Google   | `lyria-3-clip-preview` | Up to 10 images  | `lyrics`, `instrumental`, `format`                        | `GEMINI_API_KEY`, `GOOGLE_API_KEY` |
+| MiniMax  | `music-2.5+`           | None             | `lyrics`, `instrumental`, `durationSeconds`, `format=mp3` | `MINIMAX_API_KEY`                  |

-## Current provider support
+## Plugin-provided support

 | Provider | Model      | Notes                           |
 | -------- | ---------- | ------------------------------- |
 | ComfyUI  | `workflow` | Workflow-defined music or audio |

-## Live test
+Use `action: "list"` to inspect available shared providers and models at
+runtime:
+
+```text
+/tool music_generate action=list
+```
+
+## Built-in tool parameters
+
+| Parameter         | Type     | Description                                                                                       |
+| ----------------- | -------- | ------------------------------------------------------------------------------------------------- |
+| `prompt`          | string   | Music generation prompt (required for `action: "generate"`)                                       |
+| `action`          | string   | `"generate"` (default), `"status"` for the current session task, or `"list"` to inspect providers |
+| `model`           | string   | Provider/model override, e.g. `google/lyria-3-pro-preview` or `comfy/workflow`                   |
+| `lyrics`          | string   | Optional lyrics when the provider supports explicit lyric input                                   |
+| `instrumental`    | boolean  | Request instrumental-only output when the provider supports it                                    |
+| `image`           | string   | Single reference image path or URL                                                                |
+| `images`          | string[] | Multiple reference images (up to 10)                                                              |
+| `durationSeconds` | number   | Target duration in seconds when the provider supports duration hints                              |
+| `format`          | string   | Output format hint (`mp3` or `wav`) when the provider supports it                                 |
+| `filename`        | string   | Output filename hint                                                                              |
+
+Not all providers or plugins support all parameters. The shared built-in tool
+validates provider capability limits before it submits the request.
+
+## Async behavior for the shared provider-backed path
+
+- Session-backed agent runs: `music_generate` creates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message.
+- Duplicate prevention: while that background task is still `queued` or `running`, later `music_generate` calls in the same session return task status instead of starting another generation.
+- Status lookup: use `action: "status"` to inspect the active session-backed music task without starting a new one.
+- Task tracking: use `openclaw tasks list` or `openclaw tasks show <taskId>` to inspect queued, running, and terminal status for the generation.
+- Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
+- Prompt hint: later user/manual turns in the same session get a small runtime hint when a music task is already in flight so the model does not blindly call `music_generate` again.
+- No-session fallback: direct/local contexts without a real agent session still run inline and return the final audio result in the same turn.
+
+## Configuration
+
+### Model selection
+
+```json5
+{
+  agents: {
+    defaults: {
+      musicGenerationModel: {
+        primary: "google/lyria-3-clip-preview",
+        fallbacks: ["minimax/music-2.5+"],
+      },
+    },
+  },
+}
+```
+
+### Provider selection order
+
+When generating music, OpenClaw tries providers in this order:
+
+1. `model` parameter from the tool call, if the agent specifies one
+2. `musicGenerationModel.primary` from config
+3. `musicGenerationModel.fallbacks` in order
+4. Auto-detection using auth-backed provider defaults only:
+   - current default provider first
+   - remaining registered music-generation providers in provider-id order
+
+If a provider fails, the next candidate is tried automatically. If all fail, the
+error includes details from each attempt.
+
+## Provider notes
+
+- Google uses Lyria 3 batch generation. The current bundled flow supports
+  prompt, optional lyrics text, and optional reference images.
+- MiniMax uses the batch `music_generation` endpoint. The current bundled flow
+  supports prompt, optional lyrics, instrumental mode, duration steering, and
+  mp3 output.
+- ComfyUI support is workflow-driven and depends on the configured graph plus
+  node mapping for prompt/output fields.
+
+## Live tests
+
+Opt-in live coverage for the shared bundled providers:
+
+```bash
+OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
+```

 Opt-in live coverage for the bundled ComfyUI music path:

@ -50,10 +175,15 @@ Opt-in live coverage for the bundled ComfyUI music path:
 OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts
 ```

-The live file also covers comfy image and video workflows when those sections
-are configured.
+The Comfy live file also covers comfy image and video workflows when those
+sections are configured.

 ## Related

+- [Background Tasks](/automation/tasks) - task tracking for detached `music_generate` runs
+- [Configuration Reference](/gateway/configuration-reference#agent-defaults) - `musicGenerationModel` config
 - [ComfyUI](/providers/comfy)
+- [Google (Gemini)](/providers/google)
+- [MiniMax](/providers/minimax)
+- [Models](/concepts/models) - model configuration and failover
 - [Tools Overview](/tools)
--- a/docs/tools/plugin.md
+++ b/docs/tools/plugin.md
@ -319,6 +319,7 @@ Common registration methods:
 | `registerRealtimeVoiceProvider`         | Duplex realtime voice       |
 | `registerMediaUnderstandingProvider`    | Image/audio analysis        |
 | `registerImageGenerationProvider`       | Image generation            |
+| `registerMusicGenerationProvider`       | Music generation            |
 | `registerVideoGenerationProvider`       | Video generation            |
 | `registerWebFetchProvider`              | Web fetch / scrape provider |
 | `registerWebSearchProvider`             | Web search                  |