diff --git a/docs/logging.md b/docs/logging.md index e54e16d9e12..1f34115fb90 100644 --- a/docs/logging.md +++ b/docs/logging.md @@ -9,13 +9,13 @@ title: "Logging Overview" # Logging -OpenClaw logs in two places: +OpenClaw has two main log surfaces: - **File logs** (JSON lines) written by the Gateway. -- **Console output** shown in terminals and the Control UI. +- **Console output** shown in terminals and the Gateway Debug UI. -This page explains where logs live, how to read them, and how to configure log -levels and formats. +The Control UI **Logs** tab tails the gateway file log. This page explains where +logs live, how to read them, and how to configure log levels and formats. ## Where logs live @@ -45,6 +45,12 @@ Use the CLI to tail the gateway log file via RPC: openclaw logs --follow ``` +Useful current options: + +- `--local-time`: render timestamps in your local timezone +- `--url ` / `--token ` / `--timeout `: standard Gateway RPC flags +- `--expect-final`: agent-backed RPC final-response wait flag (accepted here via the shared client layer) + Output modes: - **TTY sessions**: pretty, colorized, structured log lines. @@ -53,6 +59,10 @@ Output modes: - `--plain`: force plain text in TTY sessions. - `--no-color`: disable ANSI colors. +When you pass an explicit `--url`, the CLI does not auto-apply config or +environment credentials; include `--token` yourself if the target Gateway +requires auth. + In JSON mode, the CLI emits `type`-tagged objects: - `meta`: stream metadata (file, cursor, size) @@ -60,6 +70,10 @@ In JSON mode, the CLI emits `type`-tagged objects: - `notice`: truncation / rotation hints - `raw`: unparsed log line +If the local loopback Gateway asks for pairing, `openclaw logs` falls back to +the configured local log file automatically. Explicit `--url` targets do not +use this fallback. + If the Gateway is unreachable, the CLI prints a short hint to run: ```bash @@ -96,6 +110,23 @@ Console logs are **TTY-aware** and formatted for readability: Console formatting is controlled by `logging.consoleStyle`. +### Gateway WebSocket logs + +`openclaw gateway` also has WebSocket protocol logging for RPC traffic: + +- normal mode: only interesting results (errors, parse errors, slow calls) +- `--verbose`: all request/response traffic +- `--ws-log auto|compact|full`: pick the verbose rendering style +- `--compact`: alias for `--ws-log compact` + +Examples: + +```bash +openclaw gateway +openclaw gateway --verbose --ws-log compact +openclaw gateway --verbose --ws-log full +``` + ## Configuring logging All logging configuration lives under `logging` in `~/.openclaw/openclaw.json`. @@ -120,7 +151,8 @@ All logging configuration lives under `logging` in `~/.openclaw/openclaw.json`. You can override both via the **`OPENCLAW_LOG_LEVEL`** environment variable (e.g. `OPENCLAW_LOG_LEVEL=debug`). The env var takes precedence over the config file, so you can raise verbosity for a single run without editing `openclaw.json`. You can also pass the global CLI option **`--log-level `** (for example, `openclaw --log-level debug gateway run`), which overrides the environment variable for that command. -`--verbose` only affects console output; it does not change file log levels. +`--verbose` only affects console output and WS log verbosity; it does not change +file log levels. ### Console styles diff --git a/docs/tts.md b/docs/tts.md index 5fea8967ce2..e7f9a51ac04 100644 --- a/docs/tts.md +++ b/docs/tts.md @@ -9,13 +9,14 @@ title: "Text-to-Speech (legacy path)" # Text-to-speech (TTS) -OpenClaw can convert outbound replies into audio using ElevenLabs, Microsoft, or OpenAI. +OpenClaw can convert outbound replies into audio using ElevenLabs, Microsoft, MiniMax, or OpenAI. It works anywhere OpenClaw can send audio. ## Supported services - **ElevenLabs** (primary or fallback provider) - **Microsoft** (primary or fallback provider; current bundled implementation uses `node-edge-tts`) +- **MiniMax** (primary or fallback provider; uses the T2A v2 API) - **OpenAI** (primary or fallback provider; also used for summaries) ### Microsoft speech notes @@ -33,9 +34,10 @@ or ElevenLabs. ## Optional keys -If you want OpenAI or ElevenLabs: +If you want OpenAI, ElevenLabs, or MiniMax: - `ELEVENLABS_API_KEY` (or `XI_API_KEY`) +- `MINIMAX_API_KEY` - `OPENAI_API_KEY` Microsoft speech does **not** require an API key. @@ -50,6 +52,7 @@ so that provider must also be authenticated if you enable summaries. - [OpenAI Audio API reference](https://platform.openai.com/docs/api-reference/audio) - [ElevenLabs Text to Speech](https://elevenlabs.io/docs/api-reference/text-to-speech) - [ElevenLabs Authentication](https://elevenlabs.io/docs/api-reference/authentication) +- [MiniMax T2A v2 API](https://platform.minimaxi.com/document/T2A%20V2) - [node-edge-tts](https://github.com/SchneeHertz/node-edge-tts) - [Microsoft Speech output formats](https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech#audio-outputs) @@ -143,6 +146,30 @@ Full schema is in [Gateway configuration](/gateway/configuration). } ``` +### MiniMax primary + +```json5 +{ + messages: { + tts: { + auto: "always", + provider: "minimax", + providers: { + minimax: { + apiKey: "minimax_api_key", + baseUrl: "https://api.minimax.io", + model: "speech-2.8-hd", + voiceId: "English_expressive_narrator", + speed: 1.0, + vol: 1.0, + pitch: 0, + }, + }, + }, + }, +} +``` + ### Disable Microsoft speech ```json5 @@ -211,7 +238,7 @@ Then run: - `tagged` only sends audio when the reply includes `[[tts]]` tags. - `enabled`: legacy toggle (doctor migrates this to `auto`). - `mode`: `"final"` (default) or `"all"` (includes tool/block replies). -- `provider`: speech provider id such as `"elevenlabs"`, `"microsoft"`, or `"openai"` (fallback is automatic). +- `provider`: speech provider id such as `"elevenlabs"`, `"microsoft"`, `"minimax"`, or `"openai"` (fallback is automatic). - If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order. - Legacy `provider: "edge"` still works and is normalized to `microsoft`. - `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`. @@ -223,7 +250,7 @@ Then run: - `maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded. - `timeoutMs`: request timeout (ms). - `prefsPath`: override the local prefs JSON path (provider/limit/summary). -- `apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `OPENAI_API_KEY`). +- `apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `MINIMAX_API_KEY`, `OPENAI_API_KEY`). - `providers.elevenlabs.baseUrl`: override ElevenLabs API base URL. - `providers.openai.baseUrl`: override the OpenAI TTS endpoint. - Resolution order: `messages.tts.providers.openai.baseUrl` -> `OPENAI_TTS_BASE_URL` -> `https://api.openai.com/v1` @@ -235,6 +262,12 @@ Then run: - `providers.elevenlabs.applyTextNormalization`: `auto|on|off` - `providers.elevenlabs.languageCode`: 2-letter ISO 639-1 (e.g. `en`, `de`) - `providers.elevenlabs.seed`: integer `0..4294967295` (best-effort determinism) +- `providers.minimax.baseUrl`: override MiniMax API base URL (default `https://api.minimax.io`, env: `MINIMAX_API_HOST`). +- `providers.minimax.model`: TTS model (default `speech-2.8-hd`, env: `MINIMAX_TTS_MODEL`). +- `providers.minimax.voiceId`: voice identifier (default `English_expressive_narrator`, env: `MINIMAX_TTS_VOICE_ID`). +- `providers.minimax.speed`: playback speed `0.5..2.0` (default 1.0). +- `providers.minimax.vol`: volume `(0, 10]` (default 1.0; must be greater than 0). +- `providers.minimax.pitch`: pitch shift `-12..12` (default 0). - `providers.microsoft.enabled`: allow Microsoft speech usage (default `true`; no API key). - `providers.microsoft.voice`: Microsoft neural voice name (e.g. `en-US-MichelleNeural`). - `providers.microsoft.lang`: language code (e.g. `en-US`). @@ -269,10 +302,12 @@ Here you go. Available directive keys (when enabled): -- `provider` (registered speech provider id, for example `openai`, `elevenlabs`, or `microsoft`; requires `allowProvider: true`) -- `voice` (OpenAI voice) or `voiceId` (ElevenLabs) -- `model` (OpenAI TTS model or ElevenLabs model id) +- `provider` (registered speech provider id, for example `openai`, `elevenlabs`, `minimax`, or `microsoft`; requires `allowProvider: true`) +- `voice` (OpenAI voice) or `voiceId` (ElevenLabs / MiniMax) +- `model` (OpenAI TTS model, ElevenLabs model id, or MiniMax model) - `stability`, `similarityBoost`, `style`, `speed`, `useSpeakerBoost` +- `vol` / `volume` (MiniMax volume, 0-10) +- `pitch` (MiniMax pitch, -12 to 12) - `applyTextNormalization` (`auto|on|off`) - `languageCode` (ISO 639-1) - `seed` @@ -328,6 +363,7 @@ These override `messages.tts.*` for that host. - 48kHz / 64kbps is a good voice message tradeoff. - **Other channels**: MP3 (`mp3_44100_128` from ElevenLabs, `mp3` from OpenAI). - 44.1kHz / 128kbps is the default balance for speech clarity. +- **MiniMax**: MP3 (`speech-2.8-hd` model, 32kHz sample rate). Voice-note format not natively supported; use OpenAI or ElevenLabs for guaranteed Opus voice messages. - **Microsoft**: uses `microsoft.outputFormat` (default `audio-24khz-48kbitrate-mono-mp3`). - The bundled transport accepts an `outputFormat`, but not all formats are available from the service. - Output format values follow Microsoft Speech output formats (including Ogg/WebM Opus).