diff --git a/docs/concepts/models.md b/docs/concepts/models.md index a7173c0ce15..18bcec18d6d 100644 --- a/docs/concepts/models.md +++ b/docs/concepts/models.md @@ -30,6 +30,7 @@ Related: falls back to `agents.defaults.imageModel`, then the resolved session/default model. - `agents.defaults.imageGenerationModel` is used by the shared image-generation capability. If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key. +- `agents.defaults.musicGenerationModel` is used by the shared music-generation capability. If omitted, `music_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered music-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key. - `agents.defaults.videoGenerationModel` is used by the shared video-generation capability. If omitted, `video_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered video-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key. - Per-agent defaults can override `agents.defaults.model` via `agents.list[].model` plus bindings (see [/concepts/multi-agent](/concepts/multi-agent)). @@ -253,5 +254,6 @@ This applies whenever OpenClaw regenerates `models.json`, including command-driv - [Model Providers](/concepts/model-providers) — provider routing and auth - [Model Failover](/concepts/model-failover) — fallback chains - [Image Generation](/tools/image-generation) — image model configuration +- [Music Generation](/tools/music-generation) — music model configuration - [Video Generation](/tools/video-generation) — video model configuration - [Configuration Reference](/gateway/configuration-reference#agent-defaults) — model config keys diff --git a/docs/gateway/configuration-reference.md b/docs/gateway/configuration-reference.md index 9fe57359b85..40fc863492c 100644 --- a/docs/gateway/configuration-reference.md +++ b/docs/gateway/configuration-reference.md @@ -1026,6 +1026,11 @@ Time format in system prompt. Default: `auto` (OS preference). - Typical values: `google/gemini-3.1-flash-image-preview` for native Gemini image generation, `fal/fal-ai/flux/dev` for fal, or `openai/gpt-image-1` for OpenAI Images. - If you select a provider/model directly, configure the matching provider auth/API key too (for example `GEMINI_API_KEY` or `GOOGLE_API_KEY` for `google/*`, `OPENAI_API_KEY` for `openai/*`, `FAL_KEY` for `fal/*`). - If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order. +- `musicGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`). + - Used by the shared music-generation capability and the built-in `music_generate` tool. + - Typical values: `google/lyria-3-clip-preview`, `google/lyria-3-pro-preview`, or `minimax/music-2.5+`. + - If omitted, `music_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered music-generation providers in provider-id order. + - If you select a provider/model directly, configure the matching provider auth/API key too. - `videoGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`). - Used by the shared video-generation capability and the built-in `video_generate` tool. - Typical values: `qwen/wan2.6-t2v`, `qwen/wan2.6-i2v`, `qwen/wan2.6-r2v`, `qwen/wan2.6-r2v-flash`, or `qwen/wan2.7-r2v`. diff --git a/docs/plugins/architecture.md b/docs/plugins/architecture.md index 6429caa7864..cdf042d0163 100644 --- a/docs/plugins/architecture.md +++ b/docs/plugins/architecture.md @@ -35,6 +35,7 @@ native OpenClaw plugin registers against one or more capability types: | Realtime voice | `api.registerRealtimeVoiceProvider(...)` | `openai` | | Media understanding | `api.registerMediaUnderstandingProvider(...)` | `openai`, `google` | | Image generation | `api.registerImageGenerationProvider(...)` | `openai`, `google`, `fal`, `minimax` | +| Music generation | `api.registerMusicGenerationProvider(...)` | `google`, `minimax` | | Video generation | `api.registerVideoGenerationProvider(...)` | `qwen` | | Web fetch | `api.registerWebFetchProvider(...)` | `firecrawl` | | Web search | `api.registerWebSearchProvider(...)` | `google` | diff --git a/docs/plugins/building-plugins.md b/docs/plugins/building-plugins.md index 25062c5afc3..1b71b807690 100644 --- a/docs/plugins/building-plugins.md +++ b/docs/plugins/building-plugins.md @@ -157,6 +157,7 @@ A single plugin can register any number of capabilities via the `api` object: | Realtime voice | `api.registerRealtimeVoiceProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) | | Media understanding | `api.registerMediaUnderstandingProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) | | Image generation | `api.registerImageGenerationProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) | +| Music generation | `api.registerMusicGenerationProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) | | Video generation | `api.registerVideoGenerationProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) | | Web fetch | `api.registerWebFetchProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) | | Web search | `api.registerWebSearchProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) | diff --git a/docs/plugins/manifest.md b/docs/plugins/manifest.md index 993d7bcca69..256117d2285 100644 --- a/docs/plugins/manifest.md +++ b/docs/plugins/manifest.md @@ -128,26 +128,26 @@ Those belong in your plugin code and `package.json`. ## Top-level field reference -| Field | Required | Type | What it means | -| ----------------------------------- | -------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `id` | Yes | `string` | Canonical plugin id. This is the id used in `plugins.entries.`. | -| `configSchema` | Yes | `object` | Inline JSON Schema for this plugin's config. | -| `enabledByDefault` | No | `true` | Marks a bundled plugin as enabled by default. Omit it, or set any non-`true` value, to leave the plugin disabled by default. | -| `legacyPluginIds` | No | `string[]` | Legacy ids that normalize to this canonical plugin id. | -| `autoEnableWhenConfiguredProviders` | No | `string[]` | Provider ids that should auto-enable this plugin when auth, config, or model refs mention them. | -| `kind` | No | `"memory"` \| `"context-engine"` | Declares an exclusive plugin kind used by `plugins.slots.*`. | -| `channels` | No | `string[]` | Channel ids owned by this plugin. Used for discovery and config validation. | -| `providers` | No | `string[]` | Provider ids owned by this plugin. | -| `modelSupport` | No | `object` | Manifest-owned shorthand model-family metadata used to auto-load the plugin before runtime. | -| `providerAuthEnvVars` | No | `Record` | Cheap provider-auth env metadata that OpenClaw can inspect without loading plugin code. | -| `providerAuthChoices` | No | `object[]` | Cheap auth-choice metadata for onboarding pickers, preferred-provider resolution, and simple CLI flag wiring. | -| `contracts` | No | `object` | Static bundled capability snapshot for speech, realtime transcription, realtime voice, media-understanding, image-generation, video-generation, web-fetch, web search, and tool ownership. | -| `channelConfigs` | No | `Record` | Manifest-owned channel config metadata merged into discovery and validation surfaces before runtime loads. | -| `skills` | No | `string[]` | Skill directories to load, relative to the plugin root. | -| `name` | No | `string` | Human-readable plugin name. | -| `description` | No | `string` | Short summary shown in plugin surfaces. | -| `version` | No | `string` | Informational plugin version. | -| `uiHints` | No | `Record` | UI labels, placeholders, and sensitivity hints for config fields. | +| Field | Required | Type | What it means | +| ----------------------------------- | -------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `id` | Yes | `string` | Canonical plugin id. This is the id used in `plugins.entries.`. | +| `configSchema` | Yes | `object` | Inline JSON Schema for this plugin's config. | +| `enabledByDefault` | No | `true` | Marks a bundled plugin as enabled by default. Omit it, or set any non-`true` value, to leave the plugin disabled by default. | +| `legacyPluginIds` | No | `string[]` | Legacy ids that normalize to this canonical plugin id. | +| `autoEnableWhenConfiguredProviders` | No | `string[]` | Provider ids that should auto-enable this plugin when auth, config, or model refs mention them. | +| `kind` | No | `"memory"` \| `"context-engine"` | Declares an exclusive plugin kind used by `plugins.slots.*`. | +| `channels` | No | `string[]` | Channel ids owned by this plugin. Used for discovery and config validation. | +| `providers` | No | `string[]` | Provider ids owned by this plugin. | +| `modelSupport` | No | `object` | Manifest-owned shorthand model-family metadata used to auto-load the plugin before runtime. | +| `providerAuthEnvVars` | No | `Record` | Cheap provider-auth env metadata that OpenClaw can inspect without loading plugin code. | +| `providerAuthChoices` | No | `object[]` | Cheap auth-choice metadata for onboarding pickers, preferred-provider resolution, and simple CLI flag wiring. | +| `contracts` | No | `object` | Static bundled capability snapshot for speech, realtime transcription, realtime voice, media-understanding, image-generation, music-generation, video-generation, web-fetch, web search, and tool ownership. | +| `channelConfigs` | No | `Record` | Manifest-owned channel config metadata merged into discovery and validation surfaces before runtime loads. | +| `skills` | No | `string[]` | Skill directories to load, relative to the plugin root. | +| `name` | No | `string` | Human-readable plugin name. | +| `description` | No | `string` | Short summary shown in plugin surfaces. | +| `version` | No | `string` | Informational plugin version. | +| `uiHints` | No | `Record` | UI labels, placeholders, and sensitivity hints for config fields. | ## providerAuthChoices reference diff --git a/docs/plugins/sdk-migration.md b/docs/plugins/sdk-migration.md index 1da85108d35..e94a0fb031a 100644 --- a/docs/plugins/sdk-migration.md +++ b/docs/plugins/sdk-migration.md @@ -263,6 +263,8 @@ Current bundled provider examples: | `plugin-sdk/realtime-transcription` | Realtime transcription helpers | Provider types and registry helpers | | `plugin-sdk/realtime-voice` | Realtime voice helpers | Provider types and registry helpers | | `plugin-sdk/image-generation-core` | Shared image-generation core | Image-generation types, failover, auth, and registry helpers | + | `plugin-sdk/music-generation` | Music-generation helpers | Music-generation provider/request/result types | + | `plugin-sdk/music-generation-core` | Shared music-generation core | Music-generation types, failover helpers, provider lookup, and model-ref parsing | | `plugin-sdk/video-generation` | Video-generation helpers | Video-generation provider/request/result types | | `plugin-sdk/video-generation-core` | Shared video-generation core | Video-generation types, failover helpers, provider lookup, and model-ref parsing | | `plugin-sdk/interactive-runtime` | Interactive reply helpers | Interactive reply payload normalization/reduction | diff --git a/docs/plugins/sdk-overview.md b/docs/plugins/sdk-overview.md index 3253e9a5122..5a2cd9d7544 100644 --- a/docs/plugins/sdk-overview.md +++ b/docs/plugins/sdk-overview.md @@ -232,6 +232,8 @@ explicitly promotes one as public. | `plugin-sdk/realtime-voice` | Realtime voice provider types and registry helpers | | `plugin-sdk/image-generation` | Image generation provider types | | `plugin-sdk/image-generation-core` | Shared image-generation types, failover, auth, and registry helpers | + | `plugin-sdk/music-generation` | Music generation provider/request/result types | + | `plugin-sdk/music-generation-core` | Shared music-generation types, failover helpers, provider lookup, and model-ref parsing | | `plugin-sdk/video-generation` | Video generation provider/request/result types | | `plugin-sdk/video-generation-core` | Shared video-generation types, failover helpers, provider lookup, and model-ref parsing | | `plugin-sdk/webhook-targets` | Webhook target registry and route-install helpers | @@ -288,6 +290,7 @@ methods: | `api.registerRealtimeVoiceProvider(...)` | Duplex realtime voice sessions | | `api.registerMediaUnderstandingProvider(...)` | Image/audio/video analysis | | `api.registerImageGenerationProvider(...)` | Image generation | +| `api.registerMusicGenerationProvider(...)` | Music generation | | `api.registerVideoGenerationProvider(...)` | Video generation | | `api.registerWebFetchProvider(...)` | Web fetch / scrape provider | | `api.registerWebSearchProvider(...)` | Web search | diff --git a/docs/providers/google.md b/docs/providers/google.md index 52f0aa703c5..e8a98334937 100644 --- a/docs/providers/google.md +++ b/docs/providers/google.md @@ -51,6 +51,7 @@ openclaw onboard --non-interactive \ | ---------------------- | ----------------- | | Chat completions | Yes | | Image generation | Yes | +| Music generation | Yes | | Image understanding | Yes | | Audio transcription | Yes | | Video understanding | Yes | @@ -144,6 +145,35 @@ To use Google as the default video provider: See [Video Generation](/tools/video-generation) for the shared tool parameters, provider selection, and failover behavior. +## Music generation + +The bundled `google` plugin also registers music generation through the shared +`music_generate` tool. + +- Default music model: `google/lyria-3-clip-preview` +- Also supports `google/lyria-3-pro-preview` +- Prompt controls: `lyrics` and `instrumental` +- Output format: `mp3` by default, plus `wav` on `google/lyria-3-pro-preview` +- Reference inputs: up to 10 images +- Session-backed runs detach through the shared task/status flow, including `action: "status"` + +To use Google as the default music provider: + +```json5 +{ + agents: { + defaults: { + musicGenerationModel: { + primary: "google/lyria-3-clip-preview", + }, + }, + }, +} +``` + +See [Music Generation](/tools/music-generation) for the shared tool +parameters, provider selection, and failover behavior. + ## Environment note If the Gateway runs as a daemon (launchd/systemd), make sure `GEMINI_API_KEY` diff --git a/docs/providers/index.md b/docs/providers/index.md index c6c3d8d558b..dbd25d14d32 100644 --- a/docs/providers/index.md +++ b/docs/providers/index.md @@ -72,7 +72,7 @@ Looking for chat channel docs (WhatsApp/Telegram/Discord/Slack/Mattermost (plugi - [Additional bundled variants](/providers/models#additional-bundled-provider-variants) - Anthropic Vertex, Copilot Proxy, and Gemini CLI OAuth - [Image Generation](/tools/image-generation) - Shared `image_generate` tool, provider selection, and failover -- [Music Generation](/tools/music-generation) - Plugin-provided `music_generate` tool surfaces +- [Music Generation](/tools/music-generation) - Shared `music_generate` tool, provider selection, and failover - [Video Generation](/tools/video-generation) - Shared `video_generate` tool, provider selection, and failover ## Transcription providers diff --git a/docs/providers/minimax.md b/docs/providers/minimax.md index d4c7130b233..8772bb14740 100644 --- a/docs/providers/minimax.md +++ b/docs/providers/minimax.md @@ -14,6 +14,7 @@ MiniMax also provides: - bundled speech synthesis via T2A v2 - bundled image understanding via `MiniMax-VL-01` +- bundled music generation via `music-2.5+` - bundled `web_search` through the MiniMax Coding Plan search API Provider split: @@ -66,6 +67,34 @@ through the plugin-owned `MiniMax-VL-01` media provider. See [Image Generation](/tools/image-generation) for the shared tool parameters, provider selection, and failover behavior. +## Music generation + +The bundled `minimax` plugin also registers music generation through the shared +`music_generate` tool. + +- Default music model: `minimax/music-2.5+` +- Also supports `minimax/music-2.5` and `minimax/music-2.0` +- Prompt controls: `lyrics`, `instrumental`, `durationSeconds` +- Output format: `mp3` +- Session-backed runs detach through the shared task/status flow, including `action: "status"` + +To use MiniMax as the default music provider: + +```json5 +{ + agents: { + defaults: { + musicGenerationModel: { + primary: "minimax/music-2.5+", + }, + }, + }, +} +``` + +See [Music Generation](/tools/music-generation) for the shared tool +parameters, provider selection, and failover behavior. + ## Video generation The bundled `minimax` plugin also registers video generation through the shared diff --git a/docs/tools/index.md b/docs/tools/index.md index 4463306ac46..27c6601d63f 100644 --- a/docs/tools/index.md +++ b/docs/tools/index.md @@ -66,6 +66,7 @@ These tools ship with OpenClaw and are available without installing any plugins: | `nodes` | Discover and target paired devices | | | `cron` / `gateway` | Manage scheduled jobs; inspect, patch, restart, or update the gateway | | | `image` / `image_generate` | Analyze or generate images | [Image Generation](/tools/image-generation) | +| `music_generate` | Generate music tracks | [Music Generation](/tools/music-generation) | | `video_generate` | Generate videos | [Video Generation](/tools/video-generation) | | `tts` | One-shot text-to-speech conversion | [TTS](/tools/tts) | | `sessions_*` / `subagents` / `agents_list` | Session management, status, and sub-agent orchestration | [Sub-agents](/tools/subagents) | @@ -73,6 +74,8 @@ These tools ship with OpenClaw and are available without installing any plugins: For image work, use `image` for analysis and `image_generate` for generation or editing. If you target `openai/*`, `google/*`, `fal/*`, or another non-default image provider, configure that provider's auth/API key first. +For music work, use `music_generate`. If you target `google/*`, `minimax/*`, or another non-default music provider, configure that provider's auth/API key first. + For video work, use `video_generate`. If you target `qwen/*` or another non-default video provider, configure that provider's auth/API key first. For workflow-driven audio generation, use `music_generate` when a plugin such as @@ -128,12 +131,12 @@ config. Deny always wins over allow. `tools.profile` sets a base allowlist before `allow`/`deny` is applied. Per-agent override: `agents.list[].tools.profile`. -| Profile | What it includes | -| ----------- | ------------------------------------------------------------------------------------------------------------------------------- | -| `full` | No restriction (same as unset) | -| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `video_generate` | -| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` | -| `minimal` | `session_status` only | +| Profile | What it includes | +| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | +| `full` | No restriction (same as unset) | +| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `music_generate`, `video_generate` | +| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` | +| `minimal` | `session_status` only | ### Tool groups @@ -151,7 +154,7 @@ Use `group:*` shorthands in allow/deny lists: | `group:messaging` | message | | `group:nodes` | nodes | | `group:agents` | agents_list | -| `group:media` | image, image_generate, video_generate, tts | +| `group:media` | image, image_generate, music_generate, video_generate, tts | | `group:openclaw` | All built-in OpenClaw tools (excludes plugin tools) | `sessions_history` returns a bounded, safety-filtered recall view. It strips diff --git a/docs/tools/music-generation.md b/docs/tools/music-generation.md index b4df88c0958..3b60afa0649 100644 --- a/docs/tools/music-generation.md +++ b/docs/tools/music-generation.md @@ -1,23 +1,68 @@ --- -summary: "Generate music or audio with plugin-provided tools such as ComfyUI workflows" +summary: "Generate music with shared providers or plugin-provided workflows" read_when: - Generating music or audio via the agent - - Configuring plugin-provided music generation tools + - Configuring music generation providers and models - Understanding the music_generate tool parameters title: "Music Generation" --- # Music Generation -The `music_generate` tool lets the agent create audio files when a plugin -registers music generation support. +The `music_generate` tool lets the agent create music or audio through either: -The bundled `comfy` plugin currently provides `music_generate` using a -workflow-configured ComfyUI graph. +- the shared music-generation capability with configured providers such as Google and MiniMax +- plugin-provided tool surfaces such as a workflow-configured ComfyUI graph + +For shared provider-backed agent sessions, OpenClaw starts music generation as a +background task, tracks it in the task ledger, then wakes the agent again when +the track is ready so the agent can post the finished audio back into the +original channel. + + +The built-in shared tool only appears when at least one music-generation provider is available. If you don't see `music_generate` in your agent's tools, configure `agents.defaults.musicGenerationModel` or set up a provider API key. + + + +Plugin-provided `music_generate` implementations can expose different parameters or runtime behavior. The async task/status flow below applies to the built-in shared provider-backed path. + ## Quick start -1. Configure `models.providers.comfy.music` with a workflow JSON and prompt/output nodes. +### Shared provider-backed generation + +1. Set an API key for at least one provider, for example `GEMINI_API_KEY` or + `MINIMAX_API_KEY`. +2. Optionally set your preferred model: + +```json5 +{ + agents: { + defaults: { + musicGenerationModel: { + primary: "google/lyria-3-clip-preview", + }, + }, + }, +} +``` + +3. Ask the agent: _"Generate an upbeat synthpop track about a night drive + through a neon city."_ + +The agent calls `music_generate` automatically. No tool allow-listing needed. + +For direct synchronous contexts without a session-backed agent run, the built-in +tool still falls back to inline generation and returns the final media path in +the tool result. + +### Workflow-driven plugin generation + +The bundled `comfy` plugin can also provide `music_generate` using a +workflow-configured ComfyUI graph. + +1. Configure `models.providers.comfy.music` with a workflow JSON and + prompt/output nodes. 2. If you use Comfy Cloud, set `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY`. 3. Ask the agent for music or call the tool directly. @@ -27,22 +72,102 @@ Example: /tool music_generate prompt="Warm ambient synth loop with soft tape texture" ``` -## Tool parameters +## Shared bundled provider support -| Parameter | Type | Description | -| ---------- | ------ | --------------------------------------------------- | -| `prompt` | string | Music or audio generation prompt | -| `action` | string | `"generate"` (default) or `"list"` | -| `model` | string | Provider/model override. Currently `comfy/workflow` | -| `filename` | string | Output filename hint for the saved audio file | +| Provider | Default model | Reference inputs | Supported controls | API key | +| -------- | ---------------------- | ---------------- | --------------------------------------------------------- | ---------------------------------- | +| Google | `lyria-3-clip-preview` | Up to 10 images | `lyrics`, `instrumental`, `format` | `GEMINI_API_KEY`, `GOOGLE_API_KEY` | +| MiniMax | `music-2.5+` | None | `lyrics`, `instrumental`, `durationSeconds`, `format=mp3` | `MINIMAX_API_KEY` | -## Current provider support +## Plugin-provided support | Provider | Model | Notes | | -------- | ---------- | ------------------------------- | | ComfyUI | `workflow` | Workflow-defined music or audio | -## Live test +Use `action: "list"` to inspect available shared providers and models at +runtime: + +```text +/tool music_generate action=list +``` + +## Built-in tool parameters + +| Parameter | Type | Description | +| ----------------- | -------- | ------------------------------------------------------------------------------------------------- | +| `prompt` | string | Music generation prompt (required for `action: "generate"`) | +| `action` | string | `"generate"` (default), `"status"` for the current session task, or `"list"` to inspect providers | +| `model` | string | Provider/model override, e.g. `google/lyria-3-pro-preview` or `comfy/workflow` | +| `lyrics` | string | Optional lyrics when the provider supports explicit lyric input | +| `instrumental` | boolean | Request instrumental-only output when the provider supports it | +| `image` | string | Single reference image path or URL | +| `images` | string[] | Multiple reference images (up to 10) | +| `durationSeconds` | number | Target duration in seconds when the provider supports duration hints | +| `format` | string | Output format hint (`mp3` or `wav`) when the provider supports it | +| `filename` | string | Output filename hint | + +Not all providers or plugins support all parameters. The shared built-in tool +validates provider capability limits before it submits the request. + +## Async behavior for the shared provider-backed path + +- Session-backed agent runs: `music_generate` creates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message. +- Duplicate prevention: while that background task is still `queued` or `running`, later `music_generate` calls in the same session return task status instead of starting another generation. +- Status lookup: use `action: "status"` to inspect the active session-backed music task without starting a new one. +- Task tracking: use `openclaw tasks list` or `openclaw tasks show ` to inspect queued, running, and terminal status for the generation. +- Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself. +- Prompt hint: later user/manual turns in the same session get a small runtime hint when a music task is already in flight so the model does not blindly call `music_generate` again. +- No-session fallback: direct/local contexts without a real agent session still run inline and return the final audio result in the same turn. + +## Configuration + +### Model selection + +```json5 +{ + agents: { + defaults: { + musicGenerationModel: { + primary: "google/lyria-3-clip-preview", + fallbacks: ["minimax/music-2.5+"], + }, + }, + }, +} +``` + +### Provider selection order + +When generating music, OpenClaw tries providers in this order: + +1. `model` parameter from the tool call, if the agent specifies one +2. `musicGenerationModel.primary` from config +3. `musicGenerationModel.fallbacks` in order +4. Auto-detection using auth-backed provider defaults only: + - current default provider first + - remaining registered music-generation providers in provider-id order + +If a provider fails, the next candidate is tried automatically. If all fail, the +error includes details from each attempt. + +## Provider notes + +- Google uses Lyria 3 batch generation. The current bundled flow supports + prompt, optional lyrics text, and optional reference images. +- MiniMax uses the batch `music_generation` endpoint. The current bundled flow + supports prompt, optional lyrics, instrumental mode, duration steering, and + mp3 output. +- ComfyUI support is workflow-driven and depends on the configured graph plus + node mapping for prompt/output fields. + +## Live tests + +Opt-in live coverage for the shared bundled providers: + +```bash +OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts +``` Opt-in live coverage for the bundled ComfyUI music path: @@ -50,10 +175,15 @@ Opt-in live coverage for the bundled ComfyUI music path: OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts ``` -The live file also covers comfy image and video workflows when those sections -are configured. +The Comfy live file also covers comfy image and video workflows when those +sections are configured. ## Related +- [Background Tasks](/automation/tasks) - task tracking for detached `music_generate` runs +- [Configuration Reference](/gateway/configuration-reference#agent-defaults) - `musicGenerationModel` config - [ComfyUI](/providers/comfy) +- [Google (Gemini)](/providers/google) +- [MiniMax](/providers/minimax) +- [Models](/concepts/models) - model configuration and failover - [Tools Overview](/tools) diff --git a/docs/tools/plugin.md b/docs/tools/plugin.md index 979a3600056..d47cb8505f5 100644 --- a/docs/tools/plugin.md +++ b/docs/tools/plugin.md @@ -319,6 +319,7 @@ Common registration methods: | `registerRealtimeVoiceProvider` | Duplex realtime voice | | `registerMediaUnderstandingProvider` | Image/audio analysis | | `registerImageGenerationProvider` | Image generation | +| `registerMusicGenerationProvider` | Music generation | | `registerVideoGenerationProvider` | Video generation | | `registerWebFetchProvider` | Web fetch / scrape provider | | `registerWebSearchProvider` | Web search |