mirror of https://github.com/openclaw/openclaw.git
docs: document music generation async flow
This commit is contained in:
parent
3027f0dde5
commit
f6dbcf4cda
|
|
@ -30,6 +30,7 @@ Related:
|
|||
falls back to `agents.defaults.imageModel`, then the resolved session/default
|
||||
model.
|
||||
- `agents.defaults.imageGenerationModel` is used by the shared image-generation capability. If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
|
||||
- `agents.defaults.musicGenerationModel` is used by the shared music-generation capability. If omitted, `music_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered music-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
|
||||
- `agents.defaults.videoGenerationModel` is used by the shared video-generation capability. If omitted, `video_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered video-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
|
||||
- Per-agent defaults can override `agents.defaults.model` via `agents.list[].model` plus bindings (see [/concepts/multi-agent](/concepts/multi-agent)).
|
||||
|
||||
|
|
@ -253,5 +254,6 @@ This applies whenever OpenClaw regenerates `models.json`, including command-driv
|
|||
- [Model Providers](/concepts/model-providers) — provider routing and auth
|
||||
- [Model Failover](/concepts/model-failover) — fallback chains
|
||||
- [Image Generation](/tools/image-generation) — image model configuration
|
||||
- [Music Generation](/tools/music-generation) — music model configuration
|
||||
- [Video Generation](/tools/video-generation) — video model configuration
|
||||
- [Configuration Reference](/gateway/configuration-reference#agent-defaults) — model config keys
|
||||
|
|
|
|||
|
|
@ -1026,6 +1026,11 @@ Time format in system prompt. Default: `auto` (OS preference).
|
|||
- Typical values: `google/gemini-3.1-flash-image-preview` for native Gemini image generation, `fal/fal-ai/flux/dev` for fal, or `openai/gpt-image-1` for OpenAI Images.
|
||||
- If you select a provider/model directly, configure the matching provider auth/API key too (for example `GEMINI_API_KEY` or `GOOGLE_API_KEY` for `google/*`, `OPENAI_API_KEY` for `openai/*`, `FAL_KEY` for `fal/*`).
|
||||
- If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order.
|
||||
- `musicGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
|
||||
- Used by the shared music-generation capability and the built-in `music_generate` tool.
|
||||
- Typical values: `google/lyria-3-clip-preview`, `google/lyria-3-pro-preview`, or `minimax/music-2.5+`.
|
||||
- If omitted, `music_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered music-generation providers in provider-id order.
|
||||
- If you select a provider/model directly, configure the matching provider auth/API key too.
|
||||
- `videoGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
|
||||
- Used by the shared video-generation capability and the built-in `video_generate` tool.
|
||||
- Typical values: `qwen/wan2.6-t2v`, `qwen/wan2.6-i2v`, `qwen/wan2.6-r2v`, `qwen/wan2.6-r2v-flash`, or `qwen/wan2.7-r2v`.
|
||||
|
|
|
|||
|
|
@ -35,6 +35,7 @@ native OpenClaw plugin registers against one or more capability types:
|
|||
| Realtime voice | `api.registerRealtimeVoiceProvider(...)` | `openai` |
|
||||
| Media understanding | `api.registerMediaUnderstandingProvider(...)` | `openai`, `google` |
|
||||
| Image generation | `api.registerImageGenerationProvider(...)` | `openai`, `google`, `fal`, `minimax` |
|
||||
| Music generation | `api.registerMusicGenerationProvider(...)` | `google`, `minimax` |
|
||||
| Video generation | `api.registerVideoGenerationProvider(...)` | `qwen` |
|
||||
| Web fetch | `api.registerWebFetchProvider(...)` | `firecrawl` |
|
||||
| Web search | `api.registerWebSearchProvider(...)` | `google` |
|
||||
|
|
|
|||
|
|
@ -157,6 +157,7 @@ A single plugin can register any number of capabilities via the `api` object:
|
|||
| Realtime voice | `api.registerRealtimeVoiceProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
| Media understanding | `api.registerMediaUnderstandingProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
| Image generation | `api.registerImageGenerationProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
| Music generation | `api.registerMusicGenerationProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
| Video generation | `api.registerVideoGenerationProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
| Web fetch | `api.registerWebFetchProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
| Web search | `api.registerWebSearchProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
|
|
|
|||
|
|
@ -128,26 +128,26 @@ Those belong in your plugin code and `package.json`.
|
|||
|
||||
## Top-level field reference
|
||||
|
||||
| Field | Required | Type | What it means |
|
||||
| ----------------------------------- | -------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `id` | Yes | `string` | Canonical plugin id. This is the id used in `plugins.entries.<id>`. |
|
||||
| `configSchema` | Yes | `object` | Inline JSON Schema for this plugin's config. |
|
||||
| `enabledByDefault` | No | `true` | Marks a bundled plugin as enabled by default. Omit it, or set any non-`true` value, to leave the plugin disabled by default. |
|
||||
| `legacyPluginIds` | No | `string[]` | Legacy ids that normalize to this canonical plugin id. |
|
||||
| `autoEnableWhenConfiguredProviders` | No | `string[]` | Provider ids that should auto-enable this plugin when auth, config, or model refs mention them. |
|
||||
| `kind` | No | `"memory"` \| `"context-engine"` | Declares an exclusive plugin kind used by `plugins.slots.*`. |
|
||||
| `channels` | No | `string[]` | Channel ids owned by this plugin. Used for discovery and config validation. |
|
||||
| `providers` | No | `string[]` | Provider ids owned by this plugin. |
|
||||
| `modelSupport` | No | `object` | Manifest-owned shorthand model-family metadata used to auto-load the plugin before runtime. |
|
||||
| `providerAuthEnvVars` | No | `Record<string, string[]>` | Cheap provider-auth env metadata that OpenClaw can inspect without loading plugin code. |
|
||||
| `providerAuthChoices` | No | `object[]` | Cheap auth-choice metadata for onboarding pickers, preferred-provider resolution, and simple CLI flag wiring. |
|
||||
| `contracts` | No | `object` | Static bundled capability snapshot for speech, realtime transcription, realtime voice, media-understanding, image-generation, video-generation, web-fetch, web search, and tool ownership. |
|
||||
| `channelConfigs` | No | `Record<string, object>` | Manifest-owned channel config metadata merged into discovery and validation surfaces before runtime loads. |
|
||||
| `skills` | No | `string[]` | Skill directories to load, relative to the plugin root. |
|
||||
| `name` | No | `string` | Human-readable plugin name. |
|
||||
| `description` | No | `string` | Short summary shown in plugin surfaces. |
|
||||
| `version` | No | `string` | Informational plugin version. |
|
||||
| `uiHints` | No | `Record<string, object>` | UI labels, placeholders, and sensitivity hints for config fields. |
|
||||
| Field | Required | Type | What it means |
|
||||
| ----------------------------------- | -------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `id` | Yes | `string` | Canonical plugin id. This is the id used in `plugins.entries.<id>`. |
|
||||
| `configSchema` | Yes | `object` | Inline JSON Schema for this plugin's config. |
|
||||
| `enabledByDefault` | No | `true` | Marks a bundled plugin as enabled by default. Omit it, or set any non-`true` value, to leave the plugin disabled by default. |
|
||||
| `legacyPluginIds` | No | `string[]` | Legacy ids that normalize to this canonical plugin id. |
|
||||
| `autoEnableWhenConfiguredProviders` | No | `string[]` | Provider ids that should auto-enable this plugin when auth, config, or model refs mention them. |
|
||||
| `kind` | No | `"memory"` \| `"context-engine"` | Declares an exclusive plugin kind used by `plugins.slots.*`. |
|
||||
| `channels` | No | `string[]` | Channel ids owned by this plugin. Used for discovery and config validation. |
|
||||
| `providers` | No | `string[]` | Provider ids owned by this plugin. |
|
||||
| `modelSupport` | No | `object` | Manifest-owned shorthand model-family metadata used to auto-load the plugin before runtime. |
|
||||
| `providerAuthEnvVars` | No | `Record<string, string[]>` | Cheap provider-auth env metadata that OpenClaw can inspect without loading plugin code. |
|
||||
| `providerAuthChoices` | No | `object[]` | Cheap auth-choice metadata for onboarding pickers, preferred-provider resolution, and simple CLI flag wiring. |
|
||||
| `contracts` | No | `object` | Static bundled capability snapshot for speech, realtime transcription, realtime voice, media-understanding, image-generation, music-generation, video-generation, web-fetch, web search, and tool ownership. |
|
||||
| `channelConfigs` | No | `Record<string, object>` | Manifest-owned channel config metadata merged into discovery and validation surfaces before runtime loads. |
|
||||
| `skills` | No | `string[]` | Skill directories to load, relative to the plugin root. |
|
||||
| `name` | No | `string` | Human-readable plugin name. |
|
||||
| `description` | No | `string` | Short summary shown in plugin surfaces. |
|
||||
| `version` | No | `string` | Informational plugin version. |
|
||||
| `uiHints` | No | `Record<string, object>` | UI labels, placeholders, and sensitivity hints for config fields. |
|
||||
|
||||
## providerAuthChoices reference
|
||||
|
||||
|
|
|
|||
|
|
@ -263,6 +263,8 @@ Current bundled provider examples:
|
|||
| `plugin-sdk/realtime-transcription` | Realtime transcription helpers | Provider types and registry helpers |
|
||||
| `plugin-sdk/realtime-voice` | Realtime voice helpers | Provider types and registry helpers |
|
||||
| `plugin-sdk/image-generation-core` | Shared image-generation core | Image-generation types, failover, auth, and registry helpers |
|
||||
| `plugin-sdk/music-generation` | Music-generation helpers | Music-generation provider/request/result types |
|
||||
| `plugin-sdk/music-generation-core` | Shared music-generation core | Music-generation types, failover helpers, provider lookup, and model-ref parsing |
|
||||
| `plugin-sdk/video-generation` | Video-generation helpers | Video-generation provider/request/result types |
|
||||
| `plugin-sdk/video-generation-core` | Shared video-generation core | Video-generation types, failover helpers, provider lookup, and model-ref parsing |
|
||||
| `plugin-sdk/interactive-runtime` | Interactive reply helpers | Interactive reply payload normalization/reduction |
|
||||
|
|
|
|||
|
|
@ -232,6 +232,8 @@ explicitly promotes one as public.
|
|||
| `plugin-sdk/realtime-voice` | Realtime voice provider types and registry helpers |
|
||||
| `plugin-sdk/image-generation` | Image generation provider types |
|
||||
| `plugin-sdk/image-generation-core` | Shared image-generation types, failover, auth, and registry helpers |
|
||||
| `plugin-sdk/music-generation` | Music generation provider/request/result types |
|
||||
| `plugin-sdk/music-generation-core` | Shared music-generation types, failover helpers, provider lookup, and model-ref parsing |
|
||||
| `plugin-sdk/video-generation` | Video generation provider/request/result types |
|
||||
| `plugin-sdk/video-generation-core` | Shared video-generation types, failover helpers, provider lookup, and model-ref parsing |
|
||||
| `plugin-sdk/webhook-targets` | Webhook target registry and route-install helpers |
|
||||
|
|
@ -288,6 +290,7 @@ methods:
|
|||
| `api.registerRealtimeVoiceProvider(...)` | Duplex realtime voice sessions |
|
||||
| `api.registerMediaUnderstandingProvider(...)` | Image/audio/video analysis |
|
||||
| `api.registerImageGenerationProvider(...)` | Image generation |
|
||||
| `api.registerMusicGenerationProvider(...)` | Music generation |
|
||||
| `api.registerVideoGenerationProvider(...)` | Video generation |
|
||||
| `api.registerWebFetchProvider(...)` | Web fetch / scrape provider |
|
||||
| `api.registerWebSearchProvider(...)` | Web search |
|
||||
|
|
|
|||
|
|
@ -51,6 +51,7 @@ openclaw onboard --non-interactive \
|
|||
| ---------------------- | ----------------- |
|
||||
| Chat completions | Yes |
|
||||
| Image generation | Yes |
|
||||
| Music generation | Yes |
|
||||
| Image understanding | Yes |
|
||||
| Audio transcription | Yes |
|
||||
| Video understanding | Yes |
|
||||
|
|
@ -144,6 +145,35 @@ To use Google as the default video provider:
|
|||
See [Video Generation](/tools/video-generation) for the shared tool
|
||||
parameters, provider selection, and failover behavior.
|
||||
|
||||
## Music generation
|
||||
|
||||
The bundled `google` plugin also registers music generation through the shared
|
||||
`music_generate` tool.
|
||||
|
||||
- Default music model: `google/lyria-3-clip-preview`
|
||||
- Also supports `google/lyria-3-pro-preview`
|
||||
- Prompt controls: `lyrics` and `instrumental`
|
||||
- Output format: `mp3` by default, plus `wav` on `google/lyria-3-pro-preview`
|
||||
- Reference inputs: up to 10 images
|
||||
- Session-backed runs detach through the shared task/status flow, including `action: "status"`
|
||||
|
||||
To use Google as the default music provider:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
musicGenerationModel: {
|
||||
primary: "google/lyria-3-clip-preview",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
See [Music Generation](/tools/music-generation) for the shared tool
|
||||
parameters, provider selection, and failover behavior.
|
||||
|
||||
## Environment note
|
||||
|
||||
If the Gateway runs as a daemon (launchd/systemd), make sure `GEMINI_API_KEY`
|
||||
|
|
|
|||
|
|
@ -72,7 +72,7 @@ Looking for chat channel docs (WhatsApp/Telegram/Discord/Slack/Mattermost (plugi
|
|||
|
||||
- [Additional bundled variants](/providers/models#additional-bundled-provider-variants) - Anthropic Vertex, Copilot Proxy, and Gemini CLI OAuth
|
||||
- [Image Generation](/tools/image-generation) - Shared `image_generate` tool, provider selection, and failover
|
||||
- [Music Generation](/tools/music-generation) - Plugin-provided `music_generate` tool surfaces
|
||||
- [Music Generation](/tools/music-generation) - Shared `music_generate` tool, provider selection, and failover
|
||||
- [Video Generation](/tools/video-generation) - Shared `video_generate` tool, provider selection, and failover
|
||||
|
||||
## Transcription providers
|
||||
|
|
|
|||
|
|
@ -14,6 +14,7 @@ MiniMax also provides:
|
|||
|
||||
- bundled speech synthesis via T2A v2
|
||||
- bundled image understanding via `MiniMax-VL-01`
|
||||
- bundled music generation via `music-2.5+`
|
||||
- bundled `web_search` through the MiniMax Coding Plan search API
|
||||
|
||||
Provider split:
|
||||
|
|
@ -66,6 +67,34 @@ through the plugin-owned `MiniMax-VL-01` media provider.
|
|||
See [Image Generation](/tools/image-generation) for the shared tool
|
||||
parameters, provider selection, and failover behavior.
|
||||
|
||||
## Music generation
|
||||
|
||||
The bundled `minimax` plugin also registers music generation through the shared
|
||||
`music_generate` tool.
|
||||
|
||||
- Default music model: `minimax/music-2.5+`
|
||||
- Also supports `minimax/music-2.5` and `minimax/music-2.0`
|
||||
- Prompt controls: `lyrics`, `instrumental`, `durationSeconds`
|
||||
- Output format: `mp3`
|
||||
- Session-backed runs detach through the shared task/status flow, including `action: "status"`
|
||||
|
||||
To use MiniMax as the default music provider:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
musicGenerationModel: {
|
||||
primary: "minimax/music-2.5+",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
See [Music Generation](/tools/music-generation) for the shared tool
|
||||
parameters, provider selection, and failover behavior.
|
||||
|
||||
## Video generation
|
||||
|
||||
The bundled `minimax` plugin also registers video generation through the shared
|
||||
|
|
|
|||
|
|
@ -66,6 +66,7 @@ These tools ship with OpenClaw and are available without installing any plugins:
|
|||
| `nodes` | Discover and target paired devices | |
|
||||
| `cron` / `gateway` | Manage scheduled jobs; inspect, patch, restart, or update the gateway | |
|
||||
| `image` / `image_generate` | Analyze or generate images | [Image Generation](/tools/image-generation) |
|
||||
| `music_generate` | Generate music tracks | [Music Generation](/tools/music-generation) |
|
||||
| `video_generate` | Generate videos | [Video Generation](/tools/video-generation) |
|
||||
| `tts` | One-shot text-to-speech conversion | [TTS](/tools/tts) |
|
||||
| `sessions_*` / `subagents` / `agents_list` | Session management, status, and sub-agent orchestration | [Sub-agents](/tools/subagents) |
|
||||
|
|
@ -73,6 +74,8 @@ These tools ship with OpenClaw and are available without installing any plugins:
|
|||
|
||||
For image work, use `image` for analysis and `image_generate` for generation or editing. If you target `openai/*`, `google/*`, `fal/*`, or another non-default image provider, configure that provider's auth/API key first.
|
||||
|
||||
For music work, use `music_generate`. If you target `google/*`, `minimax/*`, or another non-default music provider, configure that provider's auth/API key first.
|
||||
|
||||
For video work, use `video_generate`. If you target `qwen/*` or another non-default video provider, configure that provider's auth/API key first.
|
||||
|
||||
For workflow-driven audio generation, use `music_generate` when a plugin such as
|
||||
|
|
@ -128,12 +131,12 @@ config. Deny always wins over allow.
|
|||
`tools.profile` sets a base allowlist before `allow`/`deny` is applied.
|
||||
Per-agent override: `agents.list[].tools.profile`.
|
||||
|
||||
| Profile | What it includes |
|
||||
| ----------- | ------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `full` | No restriction (same as unset) |
|
||||
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `video_generate` |
|
||||
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
|
||||
| `minimal` | `session_status` only |
|
||||
| Profile | What it includes |
|
||||
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `full` | No restriction (same as unset) |
|
||||
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `music_generate`, `video_generate` |
|
||||
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
|
||||
| `minimal` | `session_status` only |
|
||||
|
||||
### Tool groups
|
||||
|
||||
|
|
@ -151,7 +154,7 @@ Use `group:*` shorthands in allow/deny lists:
|
|||
| `group:messaging` | message |
|
||||
| `group:nodes` | nodes |
|
||||
| `group:agents` | agents_list |
|
||||
| `group:media` | image, image_generate, video_generate, tts |
|
||||
| `group:media` | image, image_generate, music_generate, video_generate, tts |
|
||||
| `group:openclaw` | All built-in OpenClaw tools (excludes plugin tools) |
|
||||
|
||||
`sessions_history` returns a bounded, safety-filtered recall view. It strips
|
||||
|
|
|
|||
|
|
@ -1,23 +1,68 @@
|
|||
---
|
||||
summary: "Generate music or audio with plugin-provided tools such as ComfyUI workflows"
|
||||
summary: "Generate music with shared providers or plugin-provided workflows"
|
||||
read_when:
|
||||
- Generating music or audio via the agent
|
||||
- Configuring plugin-provided music generation tools
|
||||
- Configuring music generation providers and models
|
||||
- Understanding the music_generate tool parameters
|
||||
title: "Music Generation"
|
||||
---
|
||||
|
||||
# Music Generation
|
||||
|
||||
The `music_generate` tool lets the agent create audio files when a plugin
|
||||
registers music generation support.
|
||||
The `music_generate` tool lets the agent create music or audio through either:
|
||||
|
||||
The bundled `comfy` plugin currently provides `music_generate` using a
|
||||
workflow-configured ComfyUI graph.
|
||||
- the shared music-generation capability with configured providers such as Google and MiniMax
|
||||
- plugin-provided tool surfaces such as a workflow-configured ComfyUI graph
|
||||
|
||||
For shared provider-backed agent sessions, OpenClaw starts music generation as a
|
||||
background task, tracks it in the task ledger, then wakes the agent again when
|
||||
the track is ready so the agent can post the finished audio back into the
|
||||
original channel.
|
||||
|
||||
<Note>
|
||||
The built-in shared tool only appears when at least one music-generation provider is available. If you don't see `music_generate` in your agent's tools, configure `agents.defaults.musicGenerationModel` or set up a provider API key.
|
||||
</Note>
|
||||
|
||||
<Note>
|
||||
Plugin-provided `music_generate` implementations can expose different parameters or runtime behavior. The async task/status flow below applies to the built-in shared provider-backed path.
|
||||
</Note>
|
||||
|
||||
## Quick start
|
||||
|
||||
1. Configure `models.providers.comfy.music` with a workflow JSON and prompt/output nodes.
|
||||
### Shared provider-backed generation
|
||||
|
||||
1. Set an API key for at least one provider, for example `GEMINI_API_KEY` or
|
||||
`MINIMAX_API_KEY`.
|
||||
2. Optionally set your preferred model:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
musicGenerationModel: {
|
||||
primary: "google/lyria-3-clip-preview",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
3. Ask the agent: _"Generate an upbeat synthpop track about a night drive
|
||||
through a neon city."_
|
||||
|
||||
The agent calls `music_generate` automatically. No tool allow-listing needed.
|
||||
|
||||
For direct synchronous contexts without a session-backed agent run, the built-in
|
||||
tool still falls back to inline generation and returns the final media path in
|
||||
the tool result.
|
||||
|
||||
### Workflow-driven plugin generation
|
||||
|
||||
The bundled `comfy` plugin can also provide `music_generate` using a
|
||||
workflow-configured ComfyUI graph.
|
||||
|
||||
1. Configure `models.providers.comfy.music` with a workflow JSON and
|
||||
prompt/output nodes.
|
||||
2. If you use Comfy Cloud, set `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY`.
|
||||
3. Ask the agent for music or call the tool directly.
|
||||
|
||||
|
|
@ -27,22 +72,102 @@ Example:
|
|||
/tool music_generate prompt="Warm ambient synth loop with soft tape texture"
|
||||
```
|
||||
|
||||
## Tool parameters
|
||||
## Shared bundled provider support
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| ---------- | ------ | --------------------------------------------------- |
|
||||
| `prompt` | string | Music or audio generation prompt |
|
||||
| `action` | string | `"generate"` (default) or `"list"` |
|
||||
| `model` | string | Provider/model override. Currently `comfy/workflow` |
|
||||
| `filename` | string | Output filename hint for the saved audio file |
|
||||
| Provider | Default model | Reference inputs | Supported controls | API key |
|
||||
| -------- | ---------------------- | ---------------- | --------------------------------------------------------- | ---------------------------------- |
|
||||
| Google | `lyria-3-clip-preview` | Up to 10 images | `lyrics`, `instrumental`, `format` | `GEMINI_API_KEY`, `GOOGLE_API_KEY` |
|
||||
| MiniMax | `music-2.5+` | None | `lyrics`, `instrumental`, `durationSeconds`, `format=mp3` | `MINIMAX_API_KEY` |
|
||||
|
||||
## Current provider support
|
||||
## Plugin-provided support
|
||||
|
||||
| Provider | Model | Notes |
|
||||
| -------- | ---------- | ------------------------------- |
|
||||
| ComfyUI | `workflow` | Workflow-defined music or audio |
|
||||
|
||||
## Live test
|
||||
Use `action: "list"` to inspect available shared providers and models at
|
||||
runtime:
|
||||
|
||||
```text
|
||||
/tool music_generate action=list
|
||||
```
|
||||
|
||||
## Built-in tool parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| ----------------- | -------- | ------------------------------------------------------------------------------------------------- |
|
||||
| `prompt` | string | Music generation prompt (required for `action: "generate"`) |
|
||||
| `action` | string | `"generate"` (default), `"status"` for the current session task, or `"list"` to inspect providers |
|
||||
| `model` | string | Provider/model override, e.g. `google/lyria-3-pro-preview` or `comfy/workflow` |
|
||||
| `lyrics` | string | Optional lyrics when the provider supports explicit lyric input |
|
||||
| `instrumental` | boolean | Request instrumental-only output when the provider supports it |
|
||||
| `image` | string | Single reference image path or URL |
|
||||
| `images` | string[] | Multiple reference images (up to 10) |
|
||||
| `durationSeconds` | number | Target duration in seconds when the provider supports duration hints |
|
||||
| `format` | string | Output format hint (`mp3` or `wav`) when the provider supports it |
|
||||
| `filename` | string | Output filename hint |
|
||||
|
||||
Not all providers or plugins support all parameters. The shared built-in tool
|
||||
validates provider capability limits before it submits the request.
|
||||
|
||||
## Async behavior for the shared provider-backed path
|
||||
|
||||
- Session-backed agent runs: `music_generate` creates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message.
|
||||
- Duplicate prevention: while that background task is still `queued` or `running`, later `music_generate` calls in the same session return task status instead of starting another generation.
|
||||
- Status lookup: use `action: "status"` to inspect the active session-backed music task without starting a new one.
|
||||
- Task tracking: use `openclaw tasks list` or `openclaw tasks show <taskId>` to inspect queued, running, and terminal status for the generation.
|
||||
- Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
|
||||
- Prompt hint: later user/manual turns in the same session get a small runtime hint when a music task is already in flight so the model does not blindly call `music_generate` again.
|
||||
- No-session fallback: direct/local contexts without a real agent session still run inline and return the final audio result in the same turn.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Model selection
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
musicGenerationModel: {
|
||||
primary: "google/lyria-3-clip-preview",
|
||||
fallbacks: ["minimax/music-2.5+"],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Provider selection order
|
||||
|
||||
When generating music, OpenClaw tries providers in this order:
|
||||
|
||||
1. `model` parameter from the tool call, if the agent specifies one
|
||||
2. `musicGenerationModel.primary` from config
|
||||
3. `musicGenerationModel.fallbacks` in order
|
||||
4. Auto-detection using auth-backed provider defaults only:
|
||||
- current default provider first
|
||||
- remaining registered music-generation providers in provider-id order
|
||||
|
||||
If a provider fails, the next candidate is tried automatically. If all fail, the
|
||||
error includes details from each attempt.
|
||||
|
||||
## Provider notes
|
||||
|
||||
- Google uses Lyria 3 batch generation. The current bundled flow supports
|
||||
prompt, optional lyrics text, and optional reference images.
|
||||
- MiniMax uses the batch `music_generation` endpoint. The current bundled flow
|
||||
supports prompt, optional lyrics, instrumental mode, duration steering, and
|
||||
mp3 output.
|
||||
- ComfyUI support is workflow-driven and depends on the configured graph plus
|
||||
node mapping for prompt/output fields.
|
||||
|
||||
## Live tests
|
||||
|
||||
Opt-in live coverage for the shared bundled providers:
|
||||
|
||||
```bash
|
||||
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
|
||||
```
|
||||
|
||||
Opt-in live coverage for the bundled ComfyUI music path:
|
||||
|
||||
|
|
@ -50,10 +175,15 @@ Opt-in live coverage for the bundled ComfyUI music path:
|
|||
OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts
|
||||
```
|
||||
|
||||
The live file also covers comfy image and video workflows when those sections
|
||||
are configured.
|
||||
The Comfy live file also covers comfy image and video workflows when those
|
||||
sections are configured.
|
||||
|
||||
## Related
|
||||
|
||||
- [Background Tasks](/automation/tasks) - task tracking for detached `music_generate` runs
|
||||
- [Configuration Reference](/gateway/configuration-reference#agent-defaults) - `musicGenerationModel` config
|
||||
- [ComfyUI](/providers/comfy)
|
||||
- [Google (Gemini)](/providers/google)
|
||||
- [MiniMax](/providers/minimax)
|
||||
- [Models](/concepts/models) - model configuration and failover
|
||||
- [Tools Overview](/tools)
|
||||
|
|
|
|||
|
|
@ -319,6 +319,7 @@ Common registration methods:
|
|||
| `registerRealtimeVoiceProvider` | Duplex realtime voice |
|
||||
| `registerMediaUnderstandingProvider` | Image/audio analysis |
|
||||
| `registerImageGenerationProvider` | Image generation |
|
||||
| `registerMusicGenerationProvider` | Music generation |
|
||||
| `registerVideoGenerationProvider` | Video generation |
|
||||
| `registerWebFetchProvider` | Web fetch / scrape provider |
|
||||
| `registerWebSearchProvider` | Web search |
|
||||
|
|
|
|||
Loading…
Reference in New Issue