diff --git a/docs/gateway/doctor.md b/docs/gateway/doctor.md index 370f179065d..baf31dfda06 100644 --- a/docs/gateway/doctor.md +++ b/docs/gateway/doctor.md @@ -61,18 +61,22 @@ cat ~/.openclaw/openclaw.json - Optional pre-flight update for git installs (interactive only). - UI protocol freshness check (rebuilds Control UI when the protocol schema is newer). - Health check + restart prompt. -- Skills status summary (eligible/missing/blocked). +- Skills status summary (eligible/missing/blocked) and plugin status. - Config normalization for legacy values. - Browser migration checks for legacy Chrome extension configs and Chrome MCP readiness. - OpenCode provider override warnings (`models.providers.opencode` / `models.providers.opencode-go`). +- OAuth TLS prerequisites check for OpenAI Codex OAuth profiles. - Legacy on-disk state migration (sessions/agent dir/WhatsApp auth). +- Legacy plugin manifest contract key migration (`speechProviders`, `mediaUnderstandingProviders`, `imageGenerationProviders` → `contracts`). - Legacy cron store migration (`jobId`, `schedule.cron`, top-level delivery/payload fields, payload `provider`, simple `notify: true` webhook fallback jobs). +- Session lock file inspection and stale lock cleanup. - State integrity and permissions checks (sessions, transcripts, state dir). - Config file permission checks (chmod 600) when running locally. - Model auth health: checks OAuth expiry, can refresh expiring tokens, and reports auth-profile cooldown/disabled states. - Extra workspace dir detection (`~/openclaw`). - Sandbox image repair when sandboxing is enabled. - Legacy service migration and extra gateway detection. +- Matrix channel legacy state migration (in `--fix` / `--repair` mode). - Gateway runtime checks (service installed but not running; cached launchd label). - Channel status warnings (probed from the running gateway). - Supervisor config audit (launchd/systemd/schtasks) with optional repair. @@ -81,6 +85,9 @@ cat ~/.openclaw/openclaw.json - Security warnings for open DM policies. - Gateway auth checks for local token mode (offers token generation when no token source exists; does not overwrite token SecretRef configs). - systemd linger check on Linux. +- Workspace bootstrap file size check (truncation/near-limit warnings for context files). +- Shell completion status check and auto-install/upgrade. +- Memory search embedding provider readiness check (local model, remote API key, or QMD binary). - Source install checks (pnpm workspace mismatch, missing UI assets, missing tsx binary). - Writes updated config + wizard metadata. @@ -177,6 +184,16 @@ still requires: This check does **not** apply to Docker, sandbox, remote-browser, or other headless flows. Those continue to use raw CDP. +### 2d) OAuth TLS prerequisites + +When an OpenAI Codex OAuth profile is configured, doctor probes the OpenAI +authorization endpoint to verify that the local Node/OpenSSL TLS stack can +validate the certificate chain. If the probe fails with a certificate error (for +example `UNABLE_TO_GET_ISSUER_CERT_LOCALLY`, expired cert, or self-signed cert), +doctor prints platform-specific fix guidance. On macOS with a Homebrew Node, the +fix is usually `brew postinstall ca-certificates`. With `--deep`, the probe runs +even if the gateway is healthy. + ### 3) Legacy state migrations (disk layout) Doctor can migrate older on-disk layouts into the current structure: @@ -195,6 +212,14 @@ the legacy sessions + agent dir on startup so history/auth/models land in the per-agent path without a manual doctor run. WhatsApp auth is intentionally only migrated via `openclaw doctor`. +### 3a) Legacy plugin manifest migrations + +Doctor scans all installed plugin manifests for deprecated top-level capability keys +(`speechProviders`, `mediaUnderstandingProviders`, `imageGenerationProviders`). +When found, it offers to move them into the `contracts` object and rewrite the manifest +file in-place. This migration is idempotent; if the `contracts` key already has the +same values, the legacy key is removed without duplicating the data. + ### 3b) Legacy cron store migrations Doctor also checks the cron job store (`~/.openclaw/cron/jobs.json` by default, @@ -214,6 +239,15 @@ Doctor only auto-migrates `notify: true` jobs when it can do so without changing behavior. If a job combines legacy notify fallback with an existing non-webhook delivery mode, doctor warns and leaves that job for manual review. +### 3c) Session lock cleanup + +Doctor scans every agent session directory for stale write-lock files — files left +behind when a session exited abnormally. For each lock file found it reports: +the path, PID, whether the PID is still alive, lock age, and whether it is +considered stale (dead PID or older than 30 minutes). In `--fix` / `--repair` +mode it removes stale lock files automatically; otherwise it prints a note and +instructs you to rerun with `--fix`. + ### 4) State integrity checks (session persistence, routing, and safety) The state directory is the operational brainstem. If it vanishes, you lose @@ -277,6 +311,15 @@ port. It can also scan for extra gateway-like services and print cleanup hints. Profile-named OpenClaw gateway services are considered first-class and are not flagged as "extra." +### 8b) Startup Matrix migration + +When a Matrix channel account has a pending or actionable legacy state migration, +doctor (in `--fix` / `--repair` mode) creates a pre-migration snapshot and then +runs the best-effort migration steps: legacy Matrix state migration and legacy +encrypted-state preparation. Both steps are non-fatal; errors are logged and +startup continues. In read-only mode (`openclaw doctor` without `--fix`) this check +is skipped entirely. + ### 9) Security warnings Doctor emits warnings when a provider is open to DMs without an allowlist, or @@ -287,10 +330,44 @@ when a policy is configured in a dangerous way. If running as a systemd user service, doctor ensures lingering is enabled so the gateway stays alive after logout. -### 11) Skills status +### 11) Workspace status (skills, plugins, and legacy dirs) -Doctor prints a quick summary of eligible/missing/blocked skills for the current -workspace. +Doctor prints a summary of the workspace state for the default agent: + +- **Skills status**: counts eligible, missing-requirements, and allowlist-blocked skills. +- **Legacy workspace dirs**: warns when `~/openclaw` or other legacy workspace directories + exist alongside the current workspace. +- **Plugin status**: counts loaded/disabled/errored plugins; lists plugin IDs for any + errors; reports bundle plugin capabilities. +- **Plugin compatibility warnings**: flags plugins that have compatibility issues with + the current runtime. +- **Plugin diagnostics**: surfaces any load-time warnings or errors emitted by the + plugin registry. + +### 11b) Bootstrap file size + +Doctor checks whether workspace bootstrap files (for example `AGENTS.md`, +`CLAUDE.md`, or other injected context files) are near or over the configured +character budget. It reports per-file raw vs. injected character counts, truncation +percentage, truncation cause (`max/file` or `max/total`), and total injected +characters as a fraction of the total budget. When files are truncated or near +the limit, doctor prints tips for tuning `agents.defaults.bootstrapMaxChars` +and `agents.defaults.bootstrapTotalMaxChars`. + +### 11c) Shell completion + +Doctor checks whether tab completion is installed for the current shell +(zsh, bash, fish, or PowerShell): + +- If the shell profile uses a slow dynamic completion pattern + (`source <(openclaw completion ...)`), doctor upgrades it to the faster + cached file variant. +- If completion is configured in the profile but the cache file is missing, + doctor regenerates the cache automatically. +- If no completion is configured at all, doctor prompts to install it + (interactive mode only; skipped with `--non-interactive`). + +Run `openclaw completion --write-state` to regenerate the cache manually. ### 12) Gateway auth checks (local token) @@ -313,6 +390,26 @@ Some repair flows need to inspect configured credentials without weakening runti Doctor runs a health check and offers to restart the gateway when it looks unhealthy. +### 13b) Memory search readiness + +Doctor checks whether the configured memory search embedding provider is ready +for the default agent. The behavior depends on the configured backend and provider: + +- **QMD backend**: probes whether the `qmd` binary is available and startable. + If not, prints fix guidance including the npm package and a manual binary path option. +- **Explicit local provider**: checks for a local model file or a recognized + remote/downloadable model URL. If missing, suggests switching to a remote provider. +- **Explicit remote provider** (`openai`, `voyage`, etc.): verifies an API key is + present in the environment or auth store. Prints actionable fix hints if missing. +- **Auto provider**: checks local model availability first, then tries each remote + provider in auto-selection order. + +When a gateway probe result is available (gateway was healthy at the time of the +check), doctor cross-references its result with the CLI-visible config and notes +any discrepancy. + +Use `openclaw memory status --deep` to verify embedding readiness at runtime. + ### 14) Channel status warnings If the gateway is healthy, doctor runs a channel status probe and reports