From 89414ed8575c4d99a2638bd5fbed80526fa20efd Mon Sep 17 00:00:00 2001 From: Gustavo Madeira Santana Date: Sun, 15 Mar 2026 11:27:12 +0000 Subject: [PATCH] Docs: track extension host migration internally --- ...capability-catalog-and-arbitration-spec.md | 525 ++++++ ...claw-extension-contribution-schema-spec.md | 867 +++++++++ ...law-extension-host-implementation-guide.md | 500 +++++ ...ension-host-lifecycle-and-security-spec.md | 635 +++++++ .../openclaw-kernel-event-pipeline-spec.md | 781 ++++++++ ...w-kernel-extension-host-transition-plan.md | 1663 +++++++++++++++++ 6 files changed, 4971 insertions(+) create mode 100644 docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md create mode 100644 docs/.internal/extension-host-migration/openclaw-extension-contribution-schema-spec.md create mode 100644 docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md create mode 100644 docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md create mode 100644 docs/.internal/extension-host-migration/openclaw-kernel-event-pipeline-spec.md create mode 100644 docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md diff --git a/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md b/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md new file mode 100644 index 00000000000..659e80e6fb0 --- /dev/null +++ b/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md @@ -0,0 +1,525 @@ +Temporary internal migration note: remove this document once the extension-host migration is complete. + +# OpenClaw Capability Catalog And Arbitration Spec + +Date: 2026-03-15 + +## Purpose + +This document defines how the system compiles agent-visible, operator-visible, and runtime-internal catalogs from active contributions and how it resolves conflicting or parallel providers. + +The kernel should expose canonical actions, not raw plugin identities. + +Host-managed install, onboarding, and lightweight channel catalogs remain separate from the kernel capability catalog. + +## TODOs + +- [ ] Implement kernel-owned internal and agent-visible catalogs. +- [ ] Implement host-owned operator catalogs and static setup catalogs. +- [ ] Implement canonical action registration and review workflow in code. +- [ ] Implement arbitration and conflict handling for at least one multi-provider family. +- [ ] Migrate the existing tool, provider, setup, and slot-selection surfaces so they no longer act as parallel catalog or arbitration systems. +- [ ] Record pilot parity for `thread-ownership` first and `telegram` second before broader catalog publication. +- [ ] Track which current `main` actions have been mapped into canonical action ids. + +## Implementation Status + +Current status against this spec: + +- canonical catalogs and arbitration have not started +- only the earliest host-managed static metadata work has landed + +What has been implemented: + +- an initial Phase 0 cutover inventory now exists in `src/extension-host/cutover-inventory.md` +- channel catalog package metadata parsing now routes through host-owned schema helpers +- host-owned resolved-extension records now carry the static metadata needed for install, onboarding, and lightweight operator UX +- config doc baseline generation now uses the same host-owned resolved-extension metadata path +- plugin SDK alias resolution now routes through `src/extension-host/loader-compat.ts` +- loader provenance, duplicate-order, and warning policy now route through `src/extension-host/loader-policy.ts` +- loader module-export resolution, config validation, and memory-slot load decisions now route through `src/extension-host/loader-runtime.ts` +- loader record-state transitions now route through `src/extension-host/loader-state.ts` +- channel, provider, gateway-method, tool, CLI, service, command, context-engine, and hook registration normalization now has a host-owned helper boundary for future catalog migration + +How it has been implemented: + +- by moving package metadata parsing behind `src/extension-host/schema.ts` +- by keeping the existing catalog behavior intact while shifting metadata ownership into normalized host-owned records +- by reusing the resolved-extension registry for static operator/documentation surfaces instead of creating separate metadata caches +- by beginning runtime registration migration with host-owned normalization helpers before attempting full canonical catalog publication +- by beginning loader-path migration with host-owned compatibility, policy, runtime, and record-state helpers before attempting canonical catalog publication + +What remains pending: + +- canonical capability ids +- runtime-derived kernel catalogs +- host-owned operator catalogs beyond the existing lightweight static paths +- arbitration modes and selection logic +- tool/provider/slot migration into one canonical catalog and arbitration model + +## Goals + +- agents see a stable, context-aware catalog of what they can do +- multiple active providers for the same functional area are supported +- collisions are detected and resolved deterministically +- operator commands and runtime backends stay separate from agent tools +- the catalog covers the broader current action surface, not only send and reply +- slot-backed providers such as context engines are selected explicitly +- setup and install metadata stay in host-managed catalogs instead of leaking into runtime catalogs + +## Migration Framing + +This spec replaces existing partial catalog and arbitration behavior already present on `main`. + +It is not a standalone greenfield system. + +Current behavior already exists in at least these places: + +- agent-visible plugin tool grouping in `src/gateway/server-methods/tools-catalog.ts:71` +- provider auth and setup selection in `src/commands/auth-choice.apply.plugin-provider.ts:106` +- slot selection in `src/plugins/slots.ts:39` +- channel picker and onboarding metadata in `src/channels/plugins/catalog.ts:26` + +Implementation rule: + +- Phase 5 and Phase 6 are only complete when those legacy paths have been absorbed into the canonical or host-owned catalog model rather than left as a second source of truth + +## Catalog Types + +The system should maintain separate catalogs for: + +- agent-visible capabilities +- operator-visible capabilities +- runtime-internal providers + +These catalogs may draw from the same contributions but have different visibility and arbitration rules. + +Ownership split: + +- the kernel publishes runtime-derived internal and agent-visible catalogs +- the extension host publishes operator-visible catalogs, including host-only surfaces and any runtime-derived entries the operator surface needs + +## Host-Managed Setup And Install Catalogs + +Current `main` also has host-managed metadata that is not a kernel capability catalog: + +- install metadata from `src/plugins/install.ts:48` +- channel picker and onboarding metadata from `src/channels/plugins/catalog.ts:26` +- lightweight shared channel behavior from `src/channels/dock.ts:228` + +The extension host should keep publishing these static catalogs for setup and operator UX. + +They should not be folded into the agent capability catalog. + +This host-managed layer should also publish: + +- local operator CLI commands from `surface.cli` +- setup and onboarding flows from `surface.setup` +- static channel picker metadata and lightweight dock-derived operator hints without activating heavy runtimes + +Sequencing rule: + +- these host-managed static catalogs should migrate before broad runtime catalog publication because they depend on static metadata, not heavy activation + +## Canonical Capability Model + +Each catalog entry should contain: + +- `capabilityId` +- `kind` +- `canonicalAction` +- `displayName` +- `description` +- `providerKey` +- `scope` +- `availability` +- `requiresSelection` +- `inputSchema` +- `outputSchema` +- `policy` +- `telemetryTags` + +### `capabilityId` + +Stable runtime id for the contribution-backed capability. + +### `canonicalAction` + +A stable action family such as: + +- `message.send` +- `message.reply` +- `directory.lookup` +- `provider.authenticate` +- `provider.configure` +- `memory.search` +- `memory.store` +- `message.broadcast` +- `message.poll` +- `message.react` +- `message.edit` +- `message.delete` +- `message.pin` +- `message.thread.manage` +- `voice.call.start` +- `diff.render` + +The agent planner reasons over canonical actions first. + +Governance decision: + +- canonical action ids are open, namespaced strings +- core action families should still live in one source-of-truth registry in code +- if a new capability fits an existing family, reuse it +- if semantics are new, add a reviewed canonical action id to that registry +- contributions may not define new arbitration modes or planner semantics outside the core catalog and arbitration schema + +### `providerKey` + +Identifies the concrete provider instance behind the action. + +Examples: + +- `messaging:slack:work` +- `messaging:telegram:personal` +- `memory:lancedb:default` +- `runtime-backend:acp:acpx` + +## Visibility Rules + +### Agent-visible + +Used for agent planning and tool calling. + +Includes: + +- agent tools +- channel messaging actions such as send, reply, broadcast, poll, react, edit, delete, pin, and thread actions when available in context +- memory actions when policy allows them +- voice or telephony actions +- selected interaction or workflow actions + +### Operator-visible + +Used for admin, control, setup, CLI, and diagnostic surfaces. + +Includes: + +- control commands +- setup flows +- provider integration and auth flows +- status surfaces +- CLI commands + +Important distinction: + +- `capability.control-command` is for chat or native commands that bypass the model +- `surface.cli` and `surface.setup` are host-managed local operator surfaces and are not kernel runtime capabilities + +Operator-visible control-command surfaces should preserve current command metadata such as: + +- whether the command accepts arguments +- provider-specific native command names when a provider supports native slash or menu registration + +### Runtime-internal + +Not shown to agents or operators as catalog actions. + +Includes: + +- runtime backends +- context engines +- pure event observers +- route augmenters + +## Conflict Classes + +The host must resolve different conflict types differently. + +### 1. Runtime id conflict + +Fatal during validation. + +### 2. Canonical action overlap + +Multiple providers implement the same action family. + +This is expected for messaging, auth, or directory. + +### 3. Planner-visible name collision + +Two agent-visible capabilities want the same public name. + +This must be resolved before catalog publication. + +### 4. Singleton slot conflict + +Two contributions claim a slot that is intentionally exclusive. + +Examples: + +- default memory backend +- default context engine + +### 5. Route surface conflict + +Two contributions require the same target or routing ownership semantics. + +### 6. Backend selector conflict + +Two runtime backends claim the same selector with incompatible exclusivity. + +## Arbitration Modes + +### `exclusive` + +Exactly one active provider may exist for the slot. + +Examples: + +- one default context engine +- one default memory store, unless the operator opts into parallel memory providers + +### `ranked` + +Many providers may exist, but one default is chosen by rank. + +Examples: + +- multiple auth methods for one provider +- multiple backends for the same subsystem + +### `parallel` + +Many providers may remain simultaneously available. + +Examples: + +- Slack, Discord, and Telegram messaging providers for the same agent +- multiple directory sources + +### `composed` + +Many providers contribute to a single pipeline. + +Examples: + +- context augmentation +- prompt guidance +- telemetry enrichment + +## Agent Catalog Compilation + +The kernel compiles the agent-visible catalog from: + +- active contributions +- current workspace +- current agent +- active session bindings +- route and account context +- current adapter action support +- policy restrictions +- contribution visibility rules + +Catalog compilation is context-sensitive. + +The same agent may see different capability sets in: + +- Slack thread context +- Telegram DM context +- voice call context +- local CLI session + +First-cut migration targets: + +- plugin tools currently exposed by plugin grouping +- messaging actions for the first channel pilot +- route-affecting behaviors that influence whether an action is available at all + +## Capability Selection Rules + +When the agent or runtime needs one provider for a canonical action, selection should use this order: + +1. explicit target or provider selector +2. explicit session binding +3. current conversation or thread route binding +4. current adapter or account capability support +5. policy-forced default +6. ranked default provider +7. deterministic fallback by extension id and contribution id + +This is especially important for `message.send` and `message.reply`. + +## Messaging Example + +One agent may have: + +- Discord adapter on work account +- Slack adapter on work account +- Telegram adapter on personal account + +The agent should not see three unrelated tools named “send message”. + +Instead it should see canonical action families, with provider resolution handled by: + +- current conversation route +- current session binding +- explicit target selector when needed + +Examples: + +- `message.send` +- `message.reply` +- `message.broadcast` +- `message.poll` +- `message.react` + +If disambiguation is required, the planner or runtime can use structured selectors such as: + +- target channel kind +- account id +- conversation ref + +## Agent Naming Rules + +Agent-visible names must be stable and minimally ambiguous. + +Rules: + +- canonical names belong to action families +- provider labels are attached only when needed for disambiguation +- aliases do not create additional planner-visible tools unless explicitly requested +- the host rejects duplicate planner-visible names when the runtime cannot disambiguate them + +This avoids exposing raw extension names unless necessary. + +## Operator Command Separation + +Control commands are not agent tools. + +Examples today: + +- `src/plugins/commands.ts:1` +- `extensions/phone-control/index.ts:330` + +They belong only in operator catalogs and control surfaces. + +## Provider Integration Selection + +Provider integration flows should be modeled as operator-visible capabilities, not agent-visible tools. + +Selection rules: + +- provider id first +- method id second +- rank or policy third + +Multiple auth methods for one provider may coexist. + +The selected provider integration may also contribute: + +- discovery order +- onboarding metadata +- token refresh behavior +- model-selected hooks + +## Memory Arbitration + +Memory needs both backend arbitration and agent action arbitration. + +### Backend arbitration + +Usually `exclusive` or `ranked`. + +### Agent action arbitration + +May still expose: + +- `memory.search` +- `memory.store` + +If parallel memory providers are enabled, the planner should either target the default store or use explicit selectors. + +## Context Engine Arbitration + +Context engines are runtime-internal providers selected through an explicit exclusive slot. + +Selection rules: + +- explicit configured engine id wins +- otherwise use the slot default +- if the selected engine is unavailable, fail with a typed configuration error rather than silently picking an arbitrary fallback + +## Runtime Backend Arbitration + +Runtime backends such as ACP are runtime-internal providers. + +Selection rules: + +- explicit backend id wins +- otherwise use healthy highest-ranked backend +- if a subsystem declares an exclusive slot, the host enforces it before kernel startup + +This is why `capability.runtime-backend` must be a first-class family. + +## Catalog Publication + +The kernel should publish: + +- a full internal catalog +- a filtered agent catalog + +The extension host should publish: + +- a filtered operator catalog + +Publication should occur after: + +- dependency resolution +- policy approval +- contribution activation +- route and account context binding + +Host-managed install and onboarding descriptors may move into host ownership earlier because they come from static metadata, not runtime activation. + +Full catalog publication, consolidation, and legacy-path replacement still belong to the catalog-migration phase. + +Performance requirement: + +- publishing host-managed setup and install catalogs must not require activating heavy adapter runtimes +- publishing operator-visible static catalogs must preserve current dock-style cheap-path behavior, including prompt hints and shared formatting helpers where those are consumed without runtime activation + +## Telemetry And Auditing + +Capability selection must emit structured events for: + +- conflict detection +- provider selection +- fallback selection +- planner-visible disambiguation +- veto or cancellation caused by route augmenters +- slot selection for context engines or other exclusive runtime providers + +## Migration Mapping From Today + +- channel capabilities from `extensions/discord/src/channel.ts:74`, `extensions/slack/src/channel.ts:107`, and `extensions/telegram/src/channel.ts:120` collapse into canonical messaging action families +- diffs becomes an agent-visible tool family plus a host-managed route surface from `extensions/diffs/index.ts:27` +- provider integration from `extensions/google-gemini-cli-auth/index.ts:24` becomes operator-visible setup and auth capabilities +- voice-call from `extensions/voice-call/index.ts:230` becomes a mix of agent-visible actions, runtime providers, and operator surfaces +- ACP backend registration from `extensions/acpx/src/service.ts:55` becomes runtime-internal backend arbitration +- context-engine registration becomes runtime-internal slot arbitration from `src/context-engine/registry.ts:60` +- native command registration remains an operator or transport surface concern rather than an agent-visible catalog concern + +## Immediate Implementation Work + +1. Add canonical action ids and provider keys to resolved contributions. +2. Implement host-side conflict detection for planner-visible names and singleton slots. +3. Implement kernel-side context-aware catalog compilation. +4. Add host-managed static catalogs for install and onboarding metadata alongside the runtime catalogs. +5. Migrate the existing plugin tool grouping path onto canonical agent catalog entries. +6. Migrate the existing provider auth and setup selection path onto host-owned setup catalogs and canonical provider metadata. +7. Add provider selection logic for the broader messaging action family before migrating all channels. +8. Add runtime-backend and context-engine arbitration using the same rank and slot model where appropriate. +9. Ensure lightweight setup catalogs can be built from static descriptors alone. +10. Add a reviewed core registry for canonical action families and document how new ids are introduced. +11. Record catalog and arbitration parity for `thread-ownership` first and `telegram` second before broader rollout. diff --git a/docs/.internal/extension-host-migration/openclaw-extension-contribution-schema-spec.md b/docs/.internal/extension-host-migration/openclaw-extension-contribution-schema-spec.md new file mode 100644 index 00000000000..7cc9fcdcd80 --- /dev/null +++ b/docs/.internal/extension-host-migration/openclaw-extension-contribution-schema-spec.md @@ -0,0 +1,867 @@ +Temporary internal migration note: remove this document once the extension-host migration is complete. + +# OpenClaw Extension Contribution Schema Spec + +Date: 2026-03-15 + +## Purpose + +This document defines the concrete schema the extension host uses to convert extension packages into resolved runtime contributions for the kernel. + +The kernel must never parse plugin manifests or interpret package layout directly. It only receives validated contribution objects. + +## TODOs + +- [ ] Finalize TypeScript source-of-truth types for `ResolvedExtension`, `ResolvedContribution`, and `ContributionPolicy`. +- [ ] Implement manifest and contribution validators from this schema. +- [ ] Lock the static distribution metadata shape, including full channel catalog parity fields. +- [ ] Lock the package metadata and static distribution parsing contract used by install, onboarding, status, and lightweight UX flows. +- [ ] Lock the `surface.config`, `surface.setup`, and `capability.control-command` descriptor shapes. +- [ ] Preserve minimal SDK compatibility loading while this schema replaces legacy runtime assumptions. +- [ ] Record pilot parity and schema adjustments for `thread-ownership` first and `telegram` second. +- [ ] Record any schema changes discovered during the first pilot migration. + +## Implementation Status + +Current status against this spec: + +- the initial source-of-truth types have landed in code, but they are not final +- static normalization work has started +- validators and explicit compatibility translation work have not landed + +What has been implemented: + +- an initial Phase 0 cutover inventory now exists in `src/extension-host/cutover-inventory.md` +- `ResolvedExtension`, `ResolvedContribution`, and `ContributionPolicy` now exist in `src/extension-host/schema.ts` +- a legacy-to-normalized adapter now builds `ResolvedExtension` records from current plugin manifests and package metadata +- package metadata parsing for discovery, install, and channel catalog paths now routes through host-owned schema helpers +- manifest-registry records now carry a normalized `resolvedExtension` +- a host-owned resolved-extension registry now exists for static consumers +- config doc baseline generation now reads bundled extension metadata through the resolved-extension registry +- the first runtime registration normalization helpers now exist in `src/extension-host/runtime-registrations.ts` for channel, provider, HTTP-route, gateway-method, tool, CLI, service, command, context-engine, and hook writes +- plugin SDK alias resolution now routes through `src/extension-host/loader-compat.ts` +- loader provenance, duplicate-order, and warning policy now route through `src/extension-host/loader-policy.ts` +- loader module-export resolution, config validation, and memory-slot load decisions now route through `src/extension-host/loader-runtime.ts` +- loader record-state transitions now route through `src/extension-host/loader-state.ts` + +How it has been implemented: + +- by wrapping current manifest and package metadata rather than replacing the plugin loader outright +- by introducing a compatibility-oriented `resolveLegacyExtensionDescriptor(...)` path first +- by moving static metadata consumers onto the normalized model before attempting runtime contribution migration +- by keeping legacy manifest records available only as compatibility projections while new readers move to the normalized shape +- by starting runtime contribution migration with normalization helpers that preserve the legacy plugin API surface +- by making the first loader compatibility and runtime decisions explicit host-owned helpers before introducing a versioned compatibility layer + +What remains pending: + +- final schema shape +- manifest and contribution validators +- explicit `surface.setup` and `capability.control-command` descriptor work +- minimal SDK compatibility loading as an intentional, versioned compatibility layer rather than the current host-owned helper layering around the old loader path + +## Design Goals + +- one schema for bundled and external extensions +- one contribution model for channels, auth, memory, tools, ACP, voice, diffs, and future extension types +- explicit ids, scopes, dependencies, permissions, and arbitration metadata +- lightweight static descriptors for install, onboarding, and shared UX paths +- truthful permission semantics that do not imply sandboxing where none exists +- preserve prompt-mutation policy and adapter UX descriptors that exist outside simple send and receive +- enough structure for the host to detect conflicts before activation + +## Sequencing Constraints + +This schema must be introduced without breaking current extension loading. + +Therefore: + +- the first implementation cut must preserve current `openclaw/plugin-sdk/*` imports through compatibility loading +- static distribution metadata must be modeled as first-class schema, not deferred until after runtime contribution migration +- package-level metadata and manifest-level metadata must converge into one normalized `ResolvedExtension` model +- the first pilots should be `thread-ownership` first and `telegram` second, because they validate different schema surfaces with limited extra migration noise + +## Runtime Boundary + +The package or bundle unit is an extension. + +The runtime unit is a contribution. + +One extension may emit many contributions. The extension host is responsible for: + +- loading the extension manifest +- validating all contribution descriptors +- rejecting or isolating invalid contributions +- constructing resolved contribution objects for the kernel +- preserving static host-owned descriptors used by install, onboarding, and status UX + +## Manifest Shape + +Recommended manifest shape: + +```json +{ + "id": "openclaw.discord", + "name": "Discord", + "version": "2026.3.0", + "apiVersion": "1.0", + "entry": "./index.ts", + "description": "Discord transport and interaction support", + "bundled": true, + "permissionMode": "advisory", + "tags": ["channel", "messaging"], + "dependencies": { + "requires": [], + "optional": [] + }, + "permissions": [ + "runtime.adapter", + "network.outbound", + "credentials.read", + "credentials.write", + "http.route.gateway" + ], + "config": { + "schema": {}, + "uiHints": {} + }, + "distribution": { + "install": { + "entries": ["./dist/index.js"], + "npmSpec": "@openclaw/discord", + "defaultChoice": "npm" + }, + "catalog": { + "channels": [ + { + "id": "discord", + "label": "Discord", + "docsPath": "/channels/discord" + } + ] + } + }, + "contributions": [ + { + "id": "discord.adapter", + "kind": "adapter.runtime", + "title": "Discord messaging adapter" + } + ] +} +``` + +## Required Top-Level Fields + +- `id` + Stable extension package id. Never reused for a different extension. +- `name` + Human-facing name for operator surfaces. +- `version` + Extension package version. +- `apiVersion` + Extension-host contract version the package was built against. +- `entry` + Entry module the host activates. +- `distribution` + Static metadata for install, onboarding, config, status, and lightweight operator UX. The block may be empty, but the field family is part of the source-of-truth shape. +- `contributions` + List of contribution descriptors emitted by the extension. + +## Recommended Top-Level Fields + +- `description` +- `bundled` +- `permissionMode` +- `tags` +- `dependencies` +- `permissions` +- `config.schema` +- `config.uiHints` +- `distribution` +- `docs` +- `homepage` +- `support` + +`bundled` is host metadata only. The kernel must never receive or depend on it. + +Implementation rule: + +- `distribution`, config metadata, and package metadata must be parseable without activating the extension entry module + +## Permission Semantics + +`permissions` describe requested host-managed powers and operator risk. + +They do not automatically imply a hard runtime sandbox. + +Recommended top-level field: + +- `permissionMode` + - `advisory` + - `host-enforced` + - `sandbox-enforced` + +For the first extension-host cut, the default is `advisory` because extensions still run as trusted in-process code. + +## Contribution Descriptor + +Every contribution descriptor must contain: + +- `id` + Stable within the extension. The host resolves the runtime id as `/`. +- `kind` + Contribution family. +- `title` + Human-facing label. + +Recommended common fields: + +- `description` +- `aliases` +- `tags` +- `enabledByDefault` +- `scope` +- `arbitration` +- `dependsOn` +- `permissions` +- `visibility` +- `capabilities` +- `selectors` +- `priority` +- `policy` + +## Common Contribution Fields + +### `scope` + +Describes where the contribution is valid. + +Supported scope fields: + +- `global` +- `workspace` +- `agent` +- `account` +- `channel` +- `conversation` +- `provider` + +Examples: + +- a Slack adapter contribution is typically scoped by `account` and `channel` +- a memory backend is usually `workspace` or `agent` +- a provider integration contribution is scoped by `provider` + +### `arbitration` + +Declares how the host and kernel should treat overlapping providers. + +Supported modes: + +- `exclusive` +- `ranked` +- `parallel` +- `composed` + +Supported attributes: + +- `mode` +- `defaultRank` +- `singletonSlot` +- `selectionKey` +- `composeOrder` + +### `visibility` + +Declares whether the contribution is visible to: + +- agents +- operators +- both +- neither + +This matters because many contributions are runtime-only and must never appear in the agent tool catalog. + +### `policy` + +Declares host-managed policy gates that are more specific than broad permissions. + +Examples: + +- prompt mutation allowed, constrained, or denied +- fail-open versus fail-closed routing behavior +- whether a contribution may run on sync transcript hot paths + +Decision for the first foundation cut: + +```ts +type ContributionPolicy = { + promptMutation?: "none" | "append-only" | "replace-allowed"; + routeEffect?: "observe-only" | "augment" | "veto" | "resolve"; + failureMode?: "fail-open" | "fail-closed"; + executionMode?: "parallel" | "sequential" | "sync-sequential"; +}; +``` + +These fields should be typed, not left as arbitrary metadata. + +First-cut rule: + +- keep `policy` limited to parity-driving behaviors unless the pilot migrations prove a broader typed model is necessary + +### `dependsOn` + +Contribution-level dependencies. + +Supported dependency types: + +- `requires` +- `optional` +- `conflicts` +- `supersedes` + +Dependencies reference contribution ids, not package names, because runtime behavior is contribution-driven. + +## Contribution Families + +### Kernel-facing families + +- `adapter.runtime` +- `capability.agent-tool` +- `capability.control-command` +- `capability.provider-integration` +- `capability.memory` +- `capability.context-engine` +- `capability.context-augmenter` +- `capability.event-handler` +- `capability.route-augmenter` +- `capability.interaction` +- `capability.rpc` +- `capability.runtime-backend` + +### Host-managed families + +- `service.background` +- `surface.cli` +- `surface.config` +- `surface.status` +- `surface.setup` +- `surface.http-route` + +## Family Contracts + +### `adapter.runtime` + +Used for messaging transports and any ingress or egress runtime. + +Required runtime contract: + +- `startAccount(accountRef)` +- `stopAccount(accountRef)` +- `decodeIngress(rawEvent)` +- `send(outboundEnvelope)` +- `health()` + +Optional runtime contract: + +- `handleAction(actionRef, payload)` +- `edit(outboundEnvelope)` +- `delete(targetRef)` +- `react(targetRef, reaction)` +- `poll(targetRef, pollPayload)` +- `fetchThread(threadRef)` +- `fetchMessage(messageRef)` +- `resolveDirectory(query)` +- `openConversation(target)` + +Required descriptor metadata: + +- supported conversation kinds +- identity scheme +- account binding model +- supported message action set +- supported interaction classes +- whether the adapter supports edits, deletes, reactions, polls, threads, buttons, cards, modals, moderation, or admin actions +- lightweight dock metadata for shared code paths that must not load the heavy runtime +- optional shared UX descriptors for typing, delivery feedback, reply context, history hints, and streaming behavior +- optional reload descriptors for config-driven hot-restart or no-op behavior +- optional gateway feature descriptors for method advertisement or transport-owned control surfaces + +Important distinction: + +- callable gateway or runtime methods belong in `capability.rpc` +- adapter-level gateway feature descriptors are metadata only +- those descriptors may advertise compatibility features, native transport affordances, or transport-owned control surfaces during migration, but they do not define a second callable RPC surface + +The dock metadata is host-only. It is the normalized replacement for the current lightweight channel dock behavior in `src/channels/dock.ts:228`. + +The lightweight dock contract should be specific enough to preserve current host-shared behavior from `main`, including: + +- command gating hints +- allow-from formatting and default-target helpers +- threading defaults and reply-context helpers +- elevated allow-from fallback behavior +- agent prompt hints such as `messageToolHints` + +### `capability.agent-tool` + +Represents an agent-visible action. + +Required descriptor metadata: + +- canonical action id +- planner-visible name +- input schema +- output schema or result contract +- visibility rules +- targeting requirements + +### `capability.control-command` + +Represents operator-facing commands that bypass the agent. + +Examples today: + +- `extensions/phone-control/index.ts:330` +- current plugin command registrations in `src/plugins/commands.ts:1` + +Required descriptor metadata: + +- command name +- description +- auth requirement +- surface availability +- whether the command accepts arguments +- optional provider-specific native command names for native slash or menu surfaces + +Behavior rule: + +- if a command does not accept arguments and arguments are supplied, the host should treat that invocation as a non-match and allow normal built-in or agent handling to continue + +This preserves current behavior in `src/plugins/commands.ts:163`. + +These are not agent tools. + +### `capability.provider-integration` + +Represents provider discovery, setup, auth, and post-selection lifecycle for model providers. + +Required descriptor metadata: + +- provider id +- auth method ids +- auth kinds +- discovery order +- wizard or onboarding metadata +- credential outputs +- optional config patch outputs +- optional refresh contract +- optional model-selected lifecycle hooks + +This family exists because today's provider plugin contract includes more than auth, as shown in `src/plugins/types.ts:158`. + +### `capability.memory` + +Represents a memory store or memory query runtime. + +Required descriptor metadata: + +- store kind +- supported query modes +- write policy +- default arbitration mode + +### `capability.context-engine` + +Represents a context-engine factory selected through an exclusive slot. + +Required descriptor metadata: + +- engine id +- singleton slot id +- factory contract +- default rank +- config selector key + +### `capability.context-augmenter` + +Represents a contribution that enriches prompt, tool, or session context without taking routing ownership. + +Examples today: + +- `extensions/diffs/index.ts:38` +- auto-recall style prompt/context contributions in `extensions/memory-lancedb/index.ts:548` + +Recommended policy metadata: + +- `promptMutation` + - `none` + - `append-only` + - `replace-allowed` + +This preserves behavior currently gated by `plugins.entries..hooks.allowPromptInjection`. + +### `capability.event-handler` + +Represents observers or side-effect handlers on canonical kernel events. + +This family cannot mutate routing or veto delivery unless it is explicitly declared as `capability.route-augmenter`. + +Required descriptor metadata: + +- target event families +- handler class +- execution mode +- optional bridge source when the contribution originates from legacy hook or event systems + +### `capability.route-augmenter` + +Represents runtime handlers that can influence routing, binding, or egress decisions. + +Examples today: + +- send veto behavior in `extensions/thread-ownership/index.ts:63` +- subagent delivery target selection in `extensions/discord/src/subagent-hooks.ts:103` + +Required descriptor metadata: + +- allowed decision classes +- target event families +- fail-open or fail-closed behavior +- whether the handler must run on a sync hot path + +### `capability.interaction` + +Represents canonical interaction handlers such as slash commands, buttons, form submissions, or modal actions. + +### `capability.rpc` + +Represents internal callable methods that are not agent tools. + +Examples today: + +- voice-call gateway methods in `extensions/voice-call/index.ts:230` + +This family is the only place callable gateway-style methods should live. + +If an adapter or transport wants to advertise that such methods exist, it may do so through metadata only, but the callable contract itself still belongs to `capability.rpc`. + +### `capability.runtime-backend` + +Represents a backend runtime provider used by another subsystem rather than directly by the agent. + +ACP is the reference example: + +- `extensions/acpx/src/service.ts:55` +- `src/acp/runtime/registry.ts:4` + +Required descriptor metadata: + +- backend class id +- selector key +- health probe contract +- default selection rank +- exclusivity or parallelism policy + +This family exists because not all runtime providers are user-facing adapters. + +### `service.background` + +Represents long-running extension-managed processes owned by the host. + +Examples today: + +- `extensions/acpx/index.ts:10` +- `extensions/voice-call/index.ts:510` +- `extensions/diagnostics-otel/index.ts:10` + +Required descriptor metadata: + +- state scope +- desired state subdirectory +- startup ordering +- optional health contract + +### `surface.http-route` + +Represents host-managed HTTP or webhook surfaces. + +Examples today: + +- `extensions/diffs/index.ts:28` +- current plugin route registration in `src/plugins/http-registry.ts:12` + +Required descriptor metadata: + +- path +- auth mode +- match mode +- route owner id +- route class +- lifecycle mode (`static` or `dynamic`) +- scope metadata for account- or workspace-scoped routes + +### `surface.config`, `surface.status`, `surface.setup`, `surface.cli` + +These are operator surfaces, not kernel runtime behavior. + +They must remain host-managed. + +#### `surface.config` + +Represents extension-provided config schema and config UI metadata consumed by host config APIs and operator UIs. + +Required descriptor metadata: + +- config schema +- UI hints +- sensitivity metadata for secret-bearing fields +- redaction and restoration compatibility requirements for round-tripping edited config + +Important rule: + +- `uiHints.sensitive` is not cosmetic metadata only +- the host must preserve current redaction and restore semantics used by config read and write flows, as in `src/gateway/server-methods/config.ts:151` and `src/config/redact-snapshot.ts:349` + +#### `surface.cli` + +Represents local operator CLI commands and subcommands registered under host-owned command trees. + +Supported use cases: + +- standalone diagnostic or admin commands +- install or update helpers +- provider-specific local operator commands +- entrypoints into interactive setup flows + +Required descriptor metadata: + +- command path or command id +- short description +- invocation mode (`standalone`, `subcommand`, or `flow-entry`) +- whether the command is interactive, non-interactive, or both + +#### `surface.setup` + +Represents host-managed setup and onboarding flows owned by an extension. + +Supported use cases: + +- interactive onboarding wizards +- non-interactive setup for automation or CI +- provider auth and configuration flows +- channel onboarding and account setup + +Required descriptor metadata: + +- flow id +- target surface (`cli`, `status`, `setup-ui`, or similar host surface) +- supported modes (`interactive`, `non-interactive`, or both) +- typed outputs such as config patches, credential results, install requests, status notes, or follow-up actions +- optional status phase for setup discovery and quickstart ranking +- optional reconfigure or already-configured flow +- optional disable flow +- optional DM policy prompts or policy patch outputs +- optional account-recording callback outputs for host-owned persistence + +Ownership rule: + +- extensions may own the flow logic +- the host owns prompting, persistence, credential writes, and command-tree integration + +The setup contract should be able to represent the current onboarding adapter phases in `src/channels/plugins/onboarding-types.ts:59`, including: + +- `getStatus` +- `configure` +- `configureInteractive` +- `configureWhenConfigured` +- `disable` + +Recommended status metadata: + +- whether the target is configured +- status lines +- optional selection hint +- optional quickstart score + +## Static Distribution Metadata + +Current `main` still relies on package metadata and lightweight descriptors outside the runtime contribution graph. + +Examples: + +- install entries in `src/plugins/install.ts:48` +- channel catalog metadata in `src/channels/plugins/catalog.ts:26` +- onboarding/status fallbacks in `src/commands/onboard-channels.ts:117` +- lightweight docks in `src/channels/dock.ts:228` + +The host should therefore parse a separate static metadata block. + +Recommended shape: + +```ts +type DistributionMetadata = { + install?: { + entries?: string[]; + npmSpec?: string; + localPath?: string; + defaultChoice?: "npm" | "local"; + }; + catalog?: { + channels?: Array<{ + id: string; + label: string; + selectionLabel?: string; + detailLabel?: string; + docsPath?: string; + docsLabel?: string; + blurb?: string; + order?: number; + aliases?: string[]; + preferOver?: string[]; + systemImage?: string; + selectionDocsPrefix?: string; + selectionDocsOmitLabel?: boolean; + selectionExtras?: string[]; + showConfigured?: boolean; + quickstartAllowFrom?: boolean; + forceAccountBinding?: boolean; + preferSessionLookupForAnnounceTarget?: boolean; + }>; + }; + docks?: Array<{ + adapterId: string; + capabilities: string[]; + metadata: Record; + }>; +}; +``` + +These descriptors are host-only and may be read before runtime activation. + +The catalog shape should preserve current host-visible channel metadata from `src/plugins/manifest.ts:121` and `src/channels/plugins/catalog.ts:117`, rather than collapsing it into a smaller generic shape. + +Performance requirement: + +- the host must be able to parse static distribution metadata without instantiating the heavy runtime entry module + +## Resolved Extension And Contribution Objects + +The host should normalize each package into one `ResolvedExtension` object, then derive static and runtime registries from it. + +Recommended shape: + +```ts +type ResolvedExtension = { + id: string; + version: string; + apiVersion: string; + source: { + origin: "bundled" | "global" | "workspace" | "config"; + path: string; + provenance?: string; + }; + static: { + install?: DistributionMetadata["install"]; + catalog?: DistributionMetadata["catalog"]; + docks?: DistributionMetadata["docks"]; + docs?: Record; + setup?: Array>; + config?: { + schema?: Record; + uiHints?: Record; + }; + }; + runtime: { + contributions: ResolvedContribution[]; + services: Array>; + routes: Array>; + policies: Array>; + stateOwnership: Record; + }; +}; +``` + +After validation, the host produces resolved contribution objects with normalized ids and runtime metadata. + +Recommended shape: + +```ts +type ResolvedContribution = { + runtimeId: string; + extensionId: string; + contributionId: string; + kind: string; + title: string; + description?: string; + arbitration: ArbitrationDescriptor; + scope: ScopeDescriptor; + permissions: string[]; + dependencies: ResolvedDependencyGraph; + visibility: VisibilityDescriptor; + permissionMode: "advisory" | "host-enforced" | "sandbox-enforced"; + runtime: unknown; + metadata: Record; +}; +``` + +The kernel only receives resolved contribution objects. + +## Naming Rules + +- extension ids are globally unique +- contribution ids are unique within an extension +- runtime ids are globally unique +- agent-visible names are not assumed unique and must be checked by the host +- aliases are advisory only; they never override canonical ids + +Canonical action ids are open, namespaced strings, but core action families should be maintained in one reviewed source-of-truth registry. + +Plugins may introduce new actions only by: + +- reusing an existing canonical family +- or adding a newly reviewed canonical action id through the host or kernel registry update process + +Plugins must not define new arbitration semantics outside the core schema. + +## Migration Mapping From Today + +- `registerChannel(...)` becomes one or more `adapter.runtime` contributions plus host surfaces +- `registerProvider(...)` becomes `capability.provider-integration` +- `registerTool(...)` becomes `capability.agent-tool` +- `registerCommand(...)` becomes `capability.control-command` +- `on(...)` becomes either `capability.event-handler`, `capability.context-augmenter`, or `capability.route-augmenter` +- `registerGatewayMethod(...)` becomes `capability.rpc` +- ACP backend registration becomes `capability.runtime-backend` +- `registerContextEngine(...)` becomes `capability.context-engine` +- `registerService(...)` becomes `service.background` +- `registerHttpRoute(...)` becomes `surface.http-route` +- package install and channel metadata become host-owned static distribution descriptors +- `configSchema` and `uiHints` become `surface.config` + +Legacy runtime compatibility namespaces should also map intentionally into the new SDK instead of being carried forward wholesale. + +Recommended module mapping: + +- legacy `channelRuntime.text` -> SDK text and formatting helpers +- legacy `channelRuntime.reply` -> SDK reply dispatch and envelope helpers +- legacy `channelRuntime.routing` -> SDK route resolution helpers +- legacy `channelRuntime.pairing` -> SDK pairing helpers +- legacy `channelRuntime.media` -> SDK media helpers +- legacy `channelRuntime.activity` and `channelRuntime.session` -> SDK session and activity helpers +- legacy `channelRuntime.mentions`, `groups`, and `commands` -> SDK shared channel-policy helpers +- legacy `channelRuntime.debounce` -> SDK inbound debounce helpers +- provider-specific runtime namespaces should become provider-scoped compatibility shims only, not long-term core SDK modules + +## Immediate Implementation Work + +1. Add a new manifest parser in the extension host rather than extending `src/plugins/manifest.ts:11`. +2. Define TypeScript source-of-truth types for `ResolvedExtension`, `ResolvedContribution`, and `ContributionPolicy`. +3. Create validators for top-level manifest fields and per-family descriptors. +4. Add static distribution and package metadata parsers for install, onboarding, config, status, and dock descriptors. +5. Preserve minimal SDK compatibility loading while the new schema is introduced. +6. Build a compatibility translator from current plugin registrations into contribution descriptors. +7. Keep the legacy manifest as an input format only during migration. +8. Record parity gaps discovered while migrating `thread-ownership` first. +9. Record parity gaps discovered while migrating `telegram` second. diff --git a/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md b/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md new file mode 100644 index 00000000000..9f8c4290877 --- /dev/null +++ b/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md @@ -0,0 +1,500 @@ +Temporary internal migration note: remove this document once the extension-host migration is complete. + +# OpenClaw Extension Host Implementation Guide + +Date: 2026-03-15 + +## Purpose + +This is the main execution guide for implementing the extension-host and kernel transition. + +Use it as the top-level implementation document. + +## How We Fix It + +Fix this as a staged architectural migration, not a broad refactor. + +1. Lock the boundary first by writing the cutover inventory and adding anti-corruption interfaces so no new plugin-specific behavior leaks into the kernel. +2. Introduce source-of-truth extension schema types and the `ResolvedExtension` model while preserving current `openclaw/plugin-sdk/*` loading through minimal compatibility support. +3. Move discovery, policy, provenance, static metadata, and registration ownership into the extension host, including hooks, channels, providers, tools, routes, CLI, setup, services, and slot-backed providers. +4. Prove the path with pilot migrations: `thread-ownership` first for non-channel hook behavior, then `telegram` for channel compatibility. +5. After pilot parity is established, move runtime behavior onto canonical event stages and replace the fragmented tool, provider, and slot-selection paths with one catalog and arbitration model. +6. Remove the legacy plugin runtime as the default path only after the host path has parity and the duplicate legacy systems are gone or explicitly downgraded to compatibility-only shims. + +The other docs remain the source of truth for their domains: + +- `openclaw-extension-contribution-schema-spec.md` +- `openclaw-extension-host-lifecycle-and-security-spec.md` +- `openclaw-kernel-event-pipeline-spec.md` +- `openclaw-capability-catalog-and-arbitration-spec.md` +- `openclaw-kernel-extension-host-transition-plan.md` + +## TODOs + +- [ ] Confirm the implementation order and owners for each phase. +- [x] Create the initial code skeleton for kernel and extension-host boundaries. +- [x] Write the initial boundary cutover inventory for every current plugin-owned surface. +- [ ] Keep the boundary cutover inventory updated as surfaces move. +- [ ] Track PRs, migrations, and follow-up gaps by phase. +- [ ] Keep the linked spec TODO sections in sync with implementation progress. +- [ ] Define the detailed pilot migration matrix and parity checks before Phase 3 starts. +- [ ] Mark this guide complete only when the legacy plugin path is no longer the primary runtime path. + +## Implementation Status + +Current status against this guide: + +- Phase 0 has started but is not complete. +- Phase 1 has started but is not complete. +- Phase 2 has started in a narrow, compatibility-preserving form but is not complete. +- Phases 3 through 7 have not started in a meaningful way yet. + +What has been implemented so far: + +- a new `src/extension-host/*` boundary now exists in code +- active runtime registry ownership moved into `src/extension-host/active-registry.ts` +- `src/plugins/runtime.ts` now acts as a compatibility facade over the host-owned active registry +- registry activation now routes through `src/extension-host/activation.ts` +- initial source-of-truth types landed in `src/extension-host/schema.ts`, including `ResolvedExtension`, `ResolvedContribution`, and `ContributionPolicy` +- static manifest and package metadata are now normalized through host-owned helpers rather than being interpreted only inside plugin-era modules +- `src/plugins/manifest-registry.ts` now carries a normalized `resolvedExtension` alongside the legacy flat manifest record +- `src/extension-host/resolved-registry.ts` now exposes a host-owned resolved-extension registry view +- an initial Phase 0 inventory now exists in `src/extension-host/cutover-inventory.md` +- plugin SDK alias resolution now routes through `src/extension-host/loader-compat.ts` +- loader provenance, duplicate-order, and warning policy now route through `src/extension-host/loader-policy.ts` +- loader module-export resolution, config validation, and memory-slot load decisions now route through `src/extension-host/loader-runtime.ts` +- loader record-state transitions now route through `src/extension-host/loader-state.ts` +- runtime registration normalization has started in `src/extension-host/runtime-registrations.ts` for channel, provider, HTTP-route, gateway-method, tool, CLI, service, command, context-engine, and hook registrations +- several static and lookup consumers now read through the host boundary or resolved-extension model: + - channel registry and dock lookups + - message-channel normalization + - plugin HTTP route registry default lookup + - discovery and install package metadata parsing + - channel catalog package metadata parsing + - plugin skill discovery + - plugin auto-enable + - config doc baseline generation + - config validation indexing + +How it has been done: + +- by extracting narrow host-owned modules first and making existing plugin modules delegate to them +- by preserving current behavior and import surfaces wherever possible instead of attempting a broad rewrite +- by introducing normalized static records before touching heavy runtime activation paths +- by converting one static consumer at a time so each call site can move without forcing a loader rewrite +- by extracting low-risk runtime registration helpers next and letting `src/plugins/registry.ts` delegate to them as a compatibility facade +- by keeping duplicate enforcement in legacy subsystems only where that logic has not moved yet, such as plugin commands +- by starting loader and lifecycle migration with compatibility helpers for activation and SDK alias resolution before changing discovery or policy behavior +- by moving loader-owned policy helpers next, while keeping module loading and enablement flow behavior unchanged +- by moving loader runtime decisions behind host-owned helpers while preserving lazy loading, config validation behavior, and memory-slot policy behavior +- by moving loader record-state transitions into host-owned helpers before introducing a full lifecycle state machine +- by moving central readers first, so later lifecycle and compatibility work can land on one boundary instead of many ad hoc call sites +- by adding focused tests for each extracted seam before widening the boundary further + +What is still missing for these phases: + +- keeping the cutover inventory current as more surfaces move +- the lifecycle state machine, remaining loader orchestration, policy gate, and broad host-owned registries described for Phase 2 +- minimal SDK compatibility work beyond preserving current behavior indirectly through existing loading +- any pilot migration, event pipeline, canonical catalog, or arbitration implementation + +## Implementation Order + +Implement phases in this order: + +1. Phase 0: boundary inventory and anti-corruption layer +2. Phase 1: contribution schema, package metadata, and minimal SDK compatibility +3. Phase 2: extension host lifecycle and registries +4. Phase 3: broader legacy compatibility bridges +5. Phase 4: canonical event pipeline +6. Phase 5: capability catalog migration +7. Phase 6: arbitration migration +8. Phase 7: broader migration and legacy removal + +This order matters because each layer depends on the previous one: + +- catalogs depend on normalized contributions +- normalized contributions depend on host discovery and validation +- existing extensions must keep loading while the schema and SDK boundary changes +- migrated hooks depend on the canonical event pipeline +- install, onboarding, and status flows depend on static metadata before runtime activation +- catalogs and arbitration already exist in partial forms, so their phases are migrations, not greenfield work +- safe removal of legacy paths depends on compatibility coverage and parity checks + +## Implementation Guardrails + +Do not implement every abstraction in the docs in the first cut. + +Treat some parts of the design as ceilings rather than immediate scope: + +- event taxonomy should start with three execution modes only: + - parallel observers + - sequential merge or decision handlers + - sync transcript hot paths +- permission modes should implement `advisory` and `host-enforced` first +- `sandbox-enforced` should remain a future contract until real isolation exists +- catalog publication should start small: + - kernel internal catalog + - kernel agent catalog + - host operator and static registries +- adapter metadata should stay minimal and parity-driven +- setup flow typing should start with a small result set: + - config patch + - credential result + - status note + - follow-up action +- canonical action governance should start as one source file plus tests, not a larger process framework +- arbitration should start with: + - exclusive slot + - ranked provider + - parallel provider + +The first implementation goal is parity for pilot migrations, not maximum generality. + +If a design choice is not required to migrate one channel extension and one non-channel extension safely, defer it. + +## Current Runtime Surfaces That Must Be Accounted For + +The current plugin system already owns more than runtime activation. + +Before implementation starts, write and maintain a cutover inventory for these surfaces: + +- manifest loading and static metadata +- package-level install and onboarding metadata +- discovery, provenance, and origin precedence +- config schema and UI hint loading +- typed hooks and legacy hook bridges +- channels and channel lookup +- providers and provider auth/setup flows +- tools and agent-visible tool catalogs +- HTTP routes and gateway methods +- CLI registrars and plugin commands +- services and context-engine registrations +- slot selection and other existing arbitration paths +- status, reload, install, update, and diagnostics surfaces + +Do not treat Phase 5 and Phase 6 as new systems built in isolation. + +They must absorb and replace the existing partial catalog and arbitration behaviors rather than creating a second source of truth. + +## Phase Guide + +### Phase 0: Lock the boundary + +Goal: + +- define the kernel versus extension-host boundary in code and imports +- inventory every current plugin-owned surface that crosses that boundary + +Deliverables: + +- boundary cutover inventory +- anti-corruption interfaces for host-owned registration surfaces +- initial feature flags for host-path versus legacy-path execution +- directory and import boundaries for kernel and extension-host code + +Primary docs: + +- `openclaw-kernel-extension-host-transition-plan.md` +- `openclaw-extension-contribution-schema-spec.md` + +Exit criteria: + +- kernel code does not take new dependencies on legacy plugin shapes +- extension-host directory structure exists +- compatibility-only surfaces are identified +- each current plugin-owned surface is tagged as kernel-owned, host-owned, or compatibility-only +- no new direct writes to global registries are introduced without going through the new boundary + +Current implementation status: + +- partially implemented +- the code boundary exists in `src/extension-host/*` +- central active-registry ownership now routes through the host boundary +- several central runtime readers now consume the host-owned boundary instead of reading directly from `src/plugins/runtime.ts` +- the initial cutover inventory now exists in `src/extension-host/cutover-inventory.md` and is being updated as surfaces move, but the phase is still incomplete because loader orchestration, lifecycle ownership, and later compatibility phases have not moved yet + +### Phase 1: Define the schema + +Goal: + +- implement the source-of-truth manifest and contribution types +- preserve existing extension loading while the schema and SDK boundary changes + +Primary doc: + +- `openclaw-extension-contribution-schema-spec.md` + +Deliverables: + +- manifest parser +- package metadata parser +- contribution validators +- `ResolvedExtension` +- `ResolvedContribution` +- typed `ContributionPolicy` +- static metadata parser +- new versioned SDK contract surface +- minimal SDK compatibility loading surface +- normalized install and onboarding metadata model + +Exit criteria: + +- extensions can be normalized into static and runtime sections without activating heavy runtime code +- existing extension SDK imports still resolve through the compatibility loading path + +Current implementation status: + +- partially implemented +- `ResolvedExtension`, `ResolvedContribution`, and `ContributionPolicy` landed as initial code types +- legacy manifest and package metadata now converge into a normalized `resolvedExtension` record carried by the manifest registry +- discovery, install, and catalog metadata parsing now go through host-owned schema helpers +- partial explicit compatibility now exists through host-owned loader-compat and loader-runtime helpers, but full manifest or contribution validators and a versioned SDK compatibility layer are not implemented yet + +### Phase 2: Build the extension host + +Goal: + +- implement discovery, validation, policy, registries, and lifecycle ownership + +Primary doc: + +- `openclaw-extension-host-lifecycle-and-security-spec.md` + +Deliverables: + +- discovery pipeline +- activation state machine +- policy evaluator +- host-owned registries +- host-owned adapters for hooks, channels, providers, tools, HTTP routes, gateway methods, CLI, services, commands, and context engines +- per-extension state ownership +- provenance and origin handling +- config redaction-aware schema loading +- reload and route ownership handling + +Exit criteria: + +- the host can load bundled and external extensions into normalized registries +- the host can populate normalized registries without direct kernel writes except through explicit compatibility adapters + +Current implementation status: + +- partially implemented in a compatibility-preserving form +- the host owns the active registry state +- the host exposes a resolved-extension registry view for static consumers +- plugin skills, plugin auto-enable, and config validation indexing now consume host-owned resolved-extension data +- activation, loader policy, loader runtime decisions, and loader record-state helpers now route through `src/extension-host/*` +- lifecycle state ownership, activation states, policy evaluation, and broad host-owned registries are still not implemented + +### Phase 3: Build compatibility bridges + +Goal: + +- keep current extensions working through the host without leaking legacy contracts into the kernel + +Primary docs: + +- `openclaw-kernel-extension-host-transition-plan.md` +- `openclaw-extension-contribution-schema-spec.md` + +Deliverables: + +- `ChannelPlugin` compatibility translators +- plugin SDK compatibility loading +- runtime-channel namespace translation into the new SDK modules +- legacy setup and CLI translation +- legacy config schema and UI hint translation +- pilot migration matrix with explicit parity labels + +Exit criteria: + +- `thread-ownership` runs through the host path as the first non-channel pilot +- `telegram` runs through the host path as the first channel pilot +- both pilots have explicit parity results for discovery, config, activation, diagnostics, and runtime behavior + +### Phase 4: Implement the canonical event pipeline + +Goal: + +- move runtime hook behavior onto explicit canonical events + +Primary doc: + +- `openclaw-kernel-event-pipeline-spec.md` + +Deliverables: + +- event type definitions +- stage runner +- sync transcript-write stages +- bridges from legacy hook buses +- mapping table from existing typed and legacy hooks to canonical stages + +Exit criteria: + +- migrated extensions can use canonical events without relying directly on old plugin hook execution +- pilot hook behaviors have parity coverage against the pre-host path + +### Phase 5: Implement catalogs + +Goal: + +- compile runtime-derived agent and internal catalogs, plus host-owned operator catalogs +- replace existing plugin-identity-driven catalog surfaces with canonical family-based catalogs + +Primary doc: + +- `openclaw-capability-catalog-and-arbitration-spec.md` + +Deliverables: + +- kernel internal catalog +- kernel agent catalog +- host operator catalog +- static setup and install catalogs +- canonical action registry +- migration plan for existing tool, provider, and setup catalog surfaces + +Exit criteria: + +- agent-visible tools are compiled from canonical action families instead of plugin identity +- setup and install catalogs no longer depend on duplicated legacy metadata paths + +### Phase 6: Implement arbitration + +Goal: + +- resolve overlap, ranking, selection, and slot conflicts deterministically +- absorb the existing slot and provider selection behavior into canonical arbitration + +Primary doc: + +- `openclaw-capability-catalog-and-arbitration-spec.md` + +Deliverables: + +- conflict detection +- provider selection +- slot arbitration +- planner-visible name collision handling +- migration plan for existing slot and name-collision behaviors + +Exit criteria: + +- at least one multi-provider family works through canonical arbitration +- legacy slot and provider-selection paths no longer act as separate arbitration systems + +### Phase 7: Migrate and remove legacy paths + +Goal: + +- finish migration and shrink compatibility-only surfaces + +Primary docs: + +- `openclaw-kernel-extension-host-transition-plan.md` +- all other docs as parity references + +Deliverables: + +- channel migrations +- non-channel extension migrations +- parity tests +- deprecation markers +- removal plan for obsolete compatibility shims + +Exit criteria: + +- legacy plugin runtime is no longer the default execution path + +## Pilot Matrix + +Initial pilot set: + +- non-channel pilot: `thread-ownership` +- channel pilot: `telegram` + +Why these pilots: + +- `thread-ownership` exercises typed hook loading without introducing CLI, HTTP route, or service migration at the same time +- `telegram` exercises the `ChannelPlugin` compatibility path with a minimal top-level plugin registration surface + +Second-wave compatibility candidates after the pilots are stable: + +- `line` for channel plus command registration +- `device-pair` for command, service, and setup flow coverage + +Each pilot must record parity for: + +- discovery and precedence +- manifest and static metadata loading +- config schema and UI hints +- enabled and disabled state handling +- activation and reload behavior +- diagnostics and status output +- runtime behavior on the migrated path +- compatibility-only gaps that still remain + +## Recommended First Implementation Slice + +If you want the lowest-risk start, do this first: + +1. write the boundary cutover inventory +2. add source-of-truth types +3. add the static metadata and package metadata parsers +4. add `ResolvedExtension` +5. add minimal SDK compatibility loading +6. add host discovery and validation +7. bring `thread-ownership` through the host path first +8. bring `telegram` through the host path second + +Status of this slice: + +- steps 2 through 6 are underway +- step 1 is still missing as a formal artifact +- steps 7 and 8 have not started + +Concrete landings from this slice: + +- the host boundary exists +- source-of-truth schema types exist +- package metadata parsing now routes through the host schema layer +- `ResolvedExtension` exists in code and is attached to manifest-registry records +- host-owned active-registry and resolved-registry views exist +- early static consumers have moved onto the new host-owned data path + +Do not start with catalogs or arbitration first. + +Also avoid these first-cut traps: + +- do not build a broad event scheduling framework before the canonical stages exist +- do not turn permission descriptors into fake sandbox guarantees +- do not build a large operator catalog publication layer before the host registries are real +- do not over-type setup flows before the pilot migrations prove the minimum result model is insufficient + +## Tracking Rules + +When implementation begins: + +- update this guide first with phase status +- update the matching spec TODOs when a domain changes +- record where the implementation intentionally diverged from the spec +- record which behaviors are full parity, partial parity, or compatibility-only +- update the pilot parity matrix whenever a migrated surface changes + +## Suggested Status Format + +Use this format in each doc when work starts: + +- `not started` +- `in progress` +- `implemented` +- `verified` +- `deferred` + +For example: + +- `ResolvedExtension` registry: `implemented` +- setup fallback removal: `deferred` +- sync transcript-write parity tests: `in progress` diff --git a/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md b/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md new file mode 100644 index 00000000000..885f06d804a --- /dev/null +++ b/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md @@ -0,0 +1,635 @@ +Temporary internal migration note: remove this document once the extension-host migration is complete. + +# OpenClaw Extension Host Lifecycle And Security Spec + +Date: 2026-03-15 + +## Purpose + +This document defines how the extension host discovers, validates, activates, isolates, and stops extensions while applying operator policy, permission metadata, persistence boundaries, and contribution dependencies. + +The kernel does not participate in these concerns directly. + +## TODOs + +- [x] Write the initial boundary cutover inventory for every current plugin-owned surface. +- [ ] Keep the boundary cutover inventory updated as surfaces move. +- [ ] Implement the extension lifecycle state machine and document the concrete runtime states in code. +- [ ] Implement advisory versus enforced permission handling exactly as specified here. +- [ ] Implement host-owned registries for config, setup, CLI, routes, services, slots, and backends. +- [ ] Implement per-extension state ownership and migration from current shared plugin state. +- [ ] Record pilot parity for `thread-ownership` first and `telegram` second before broad legacy rollout. +- [ ] Track which hardening, reload, and provenance rules have reached parity with `main`. + +## Implementation Status + +Current status against this spec: + +- registry ownership and the first compatibility-preserving loader slices have landed +- lifecycle orchestration, policy gates, and activation-state management have not landed + +What has been implemented: + +- an initial Phase 0 cutover inventory now exists in `src/extension-host/cutover-inventory.md` +- active registry ownership now lives in the extension host boundary rather than only in plugin-era runtime state +- central lookup surfaces now consume the host-owned active registry +- registry activation now routes through `src/extension-host/activation.ts` +- a host-owned resolved-extension registry exists for static consumers +- static config-baseline generation now reads bundled extension metadata through the host-owned resolved-extension registry +- channel, provider, HTTP-route, gateway-method, tool, CLI, service, command, context-engine, and hook registration normalization now delegates through `src/extension-host/runtime-registrations.ts` +- loader provenance, duplicate-order, and warning policy now route through `src/extension-host/loader-policy.ts` +- loader module-export resolution, config validation, and memory-slot load decisions now route through `src/extension-host/loader-runtime.ts` +- loader record-state transitions now route through `src/extension-host/loader-state.ts` + +How it has been implemented: + +- by extracting `src/extension-host/active-registry.ts` and making `src/plugins/runtime.ts` delegate to it +- by leaving lifecycle behavior unchanged for now and only moving ownership of the shared registry boundary +- by moving low-risk readers first, such as channel lookup, dock lookup, message-channel lookup, and default HTTP route registry access +- by extending that same host-owned boundary into static consumers instead of introducing separate one-off metadata loaders +- by starting runtime-registry migration with low-risk validation and normalization helpers while leaving lifecycle ordering and activation behavior unchanged +- by leaving start/stop ordering and duplicate-enforcement behavior in legacy subsystems where those subsystems are still the real owner +- by treating hook execution and hook registration as separate migration concerns so event-pipeline work does not get conflated with record normalization +- by starting loader/lifecycle migration with activation and SDK alias compatibility helpers while leaving discovery and policy flow unchanged +- by moving provenance and duplicate-order policy next, so lifecycle migration can land on host-owned policy helpers instead of loader-local utilities +- by moving loader runtime decisions next while preserving the current lazy-load, config-validation, and memory-slot behavior +- by moving record-state transitions next while leaving the lifecycle state machine itself unimplemented + +What is still pending from this spec: + +- the lifecycle state machine +- activation pipeline ownership +- host-owned registries for setup, CLI, routes, services, slots, and backends +- permission-mode enforcement +- per-extension state ownership and migration +- provenance, reload, and hardening parity tracking + +## Goals + +- deterministic activation and shutdown +- explicit failure states +- no hidden privilege escalation +- stable persistence ownership rules +- truthful security semantics for the current trusted in-process model +- safe support for bundled and external extensions under the same model +- preserve existing hardening and prompt-mutation policy behavior during the migration + +## Implementation Sequencing Constraints + +This spec is not a greenfield host design. + +The host must absorb existing behavior that already lives in: + +- plugin discovery and manifest loading +- config schema and UI hint handling +- route and gateway registration +- channels and channel lookup +- providers and provider auth or setup flows +- tools, commands, and CLI registration +- services, backends, and slot-backed providers +- reload, diagnostics, install, update, and status behavior + +Therefore: + +- Phase 0 must produce a cutover inventory for those surfaces before registry ownership changes begin +- Phase 1 must preserve current SDK loading through minimal compatibility support +- Phase 2 registry work must be broad enough to cover all currently registered surfaces, not only a narrow runtime subset +- Phase 3 must prove parity through `thread-ownership` first and `telegram` second before broader rollout + +## Trust Model Reality + +Current `main` treats installed and enabled extensions as trusted code running in-process: + +- trusted plugin concept in `SECURITY.md:108` +- in-process loading in `src/plugins/loader.ts:621` + +That means the initial extension host has two separate jobs: + +- enforce operator policy for activation, route exposure, host-owned registries, and auditing +- accurately communicate that this is not yet a hard sandbox against arbitrary extension code + +Recommended enforcement levels: + +- `advisory` + Host policy, audit, and compatibility guidance only. This is the current default. Permission mismatch alone should not block activation in this mode, though the host may warn and withhold optional host-published surfaces. +- `host-enforced` + Host-owned capabilities and registries are gated, but extension code still runs in-process. +- `sandbox-enforced` + A future mode with real process, VM, or IPC isolation where permissions become a true security boundary. + +## Lifecycle States + +Every extension instance moves through these states: + +1. `discovered` +2. `manifest-loaded` +3. `validated` +4. `dependency-resolved` +5. `policy-approved` +6. `instantiated` +7. `registered` +8. `starting` +9. `ready` +10. `degraded` +11. `stopping` +12. `stopped` +13. `failed` + +The host owns the state machine. + +## Activation Pipeline + +### 1. Discovery + +The host scans: + +- bundled extension inventory +- configured external extension paths or packages +- disabled extension state + +Discovery is metadata-only. No extension code executes in this phase. + +### 2. Manifest Load + +The host loads and validates manifest syntax. + +Failures here prevent instantiation. + +This phase must cover both: + +- runtime contribution descriptors +- package-level static metadata used for install, onboarding, status, and lightweight operator UX + +### 3. Schema Validation + +The host validates: + +- top-level extension manifest +- contribution descriptors +- config schema +- config UI hints and sensitivity metadata +- permission declarations +- dependency declarations +- policy declarations such as prompt-mutation behavior + +### 4. Dependency Resolution + +The host resolves: + +- extension api compatibility +- SDK compatibility mode and deprecation requirements +- required contribution dependencies +- optional dependencies +- conflict declarations +- singleton slot collisions + +Compatibility decision: + +- the host should support only a short compatibility window, ideally one or two older SDK contract versions at a time +- extensions outside that window must fail validation with a clear remediation path + +Sequencing rule: + +- minimal compatibility loading must exist before broader schema or registry changes depend on the new manifest model + +### 5. Policy Gate + +The host computes the requested permission set and compares it against operator policy. + +In `host-enforced` or `sandbox-enforced` mode, extensions that are not allowed to receive all required permissions do not activate or do not register the gated contributions. + +In `advisory` mode, this gate records warnings, informs operator-visible policy state, and may withhold optional host-published surfaces, but permission mismatch alone does not fail activation. + +It does not sandbox arbitrary filesystem, network, or child-process access from trusted in-process extension code. + +### 6. Instantiation + +The host loads the extension entrypoint and asks it to emit contribution descriptors and runtime factories. + +Unless the host is running in a future isolated mode, instantiation still executes trusted extension code inside the OpenClaw process. + +### 7. Registration + +The host resolves runtime ids, arbitration metadata, and activation order, then registers contributions into host-owned registries. + +This includes host-managed operator registries for: + +- CLI commands +- setup and onboarding flows +- config and status surfaces +- dynamic HTTP routes +- config reload descriptors and gateway feature advertisement where those surfaces remain host-managed during migration + +Callable gateway or runtime methods are separate from this advertisement layer and should continue to register through the runtime contribution model as `capability.rpc`. + +The registration boundary should cover the full current surface area as one migration set: + +- hooks and event handlers +- channels and lightweight channel descriptors +- providers and provider-setup surfaces +- tools and control commands +- CLI, setup, config, and status surfaces +- HTTP routes and gateway methods +- services, runtime backends, and slot-backed providers + +Do not migrate only a subset and leave the rest writing into the legacy registry model indefinitely. + +### 8. Start + +The host starts host-managed services, assigns per-extension state and route ownership, and activates kernel-facing contributions. + +### 9. Ready + +The extension is active and visible to kernel or operator surfaces as appropriate. + +## Failure Modes + +Supported failure classes: + +- `manifest-invalid` +- `api-version-unsupported` +- `dependency-missing` +- `dependency-conflict` +- `policy-denied` +- `instantiation-failed` +- `registration-conflict` +- `startup-failed` +- `runtime-degraded` + +The host must record failure class, extension id, contribution ids, and operator-visible remediation. + +## Dependency Rules + +Dependencies must be explicit and machine-checkable. + +### Extension-level dependencies + +Used when one extension package requires another package to be present. + +### Contribution-level dependencies + +Used when a specific runtime contract depends on another contribution. + +Examples: + +- a route augmenter may require a specific adapter family +- an auth helper may require a provider contribution +- a diagnostics extension may optionally bind to a runtime backend if present + +### Conflict rules + +Extensions may declare: + +- `conflicts` +- `supersedes` +- `replaces` + +The host resolves these before activation. + +## Discovery And Load Hardening + +The extension host must preserve current path-safety, provenance, and duplicate-resolution protections. + +At minimum, preserve parity with: + +- path and boundary checks during load in `src/plugins/loader.ts:744` +- manifest precedence and duplicate-origin handling in `src/plugins/manifest-registry.ts:15` +- provenance warnings during activation in `src/plugins/loader.ts:500` + +Security hardening from the current loader is part of the host contract, not an optional implementation detail. + +Parity requirement: + +- the pilot migrations must show that these hardening rules still apply on the host path, not only on the legacy path + +## Policy And Permission Model + +Permissions are granted to extension instances by the host as policy metadata and host capability grants. + +The kernel must never infer privilege from contribution kind alone. + +The host must track both: + +- requested permissions +- enforcement level (`advisory`, `host-enforced`, or `sandbox-enforced`) +- host-managed policy gates such as prompt mutation and sync hot-path eligibility + +### Recommended permission set + +- `runtime.adapter` +- `runtime.route-augment` +- `runtime.veto-send` +- `runtime.backend-register` +- `agent.tool.expose` +- `control.command.expose` +- `interaction.handle` +- `rpc.expose` +- `service.background` +- `http.route.gateway` +- `http.route.plugin` +- `config.read` +- `config.write` +- `state.read` +- `state.write` +- `credentials.read` +- `credentials.write` +- `network.outbound` +- `process.spawn` +- `filesystem.workspace.read` +- `filesystem.workspace.write` + +Permissions should be independently reviewable and denyable. + +In `advisory` mode they also function as: + +- operator review prompts +- activation policy inputs +- audit and telemetry tags +- documentation of why an extension needs sensitive host-owned surfaces + +### Fine-grained policy gates + +Some behavior should remain under dedicated policy gates instead of being flattened into generic permissions. + +Examples: + +- prompt mutation or prompt injection behavior +- sync transcript-write participation +- fail-open versus fail-closed route augmentation + +This preserves the intent of current controls such as `plugins.entries..hooks.allowPromptInjection`. + +### High-risk permissions + +These should require explicit operator approval or a strong default policy: + +- `runtime.veto-send` +- `runtime.route-augment` +- `runtime.backend-register` +- `credentials.write` +- `process.spawn` +- `http.route.plugin` +- `filesystem.workspace.write` + +High-risk permissions should still matter in `advisory` mode because they drive operator trust decisions even before real isolation exists. + +## Persistence Ownership + +Persistence must be partitioned by owner and intent. + +### Config + +Operator-managed configuration belongs to the host. + +Extensions may contribute: + +- config schema +- config UI hints and sensitivity metadata +- defaults +- migration hints +- setup flow outputs such as config patches produced through host-owned setup primitives + +Extensions must not arbitrarily mutate unrelated config keys. + +The host must also preserve current config redaction semantics: + +- config UI hints such as `sensitive` affect host behavior, not only UI decoration +- config read, redact, restore, and validate flows must preserve round-trippable secret handling comparable to `src/gateway/server-methods/config.ts:151` and `src/config/redact-snapshot.ts:349` + +### State + +Each extension gets a host-assigned state directory. + +This is where background services and caches persist local state. + +This is a required migration change from the current shared plugin service state shape in `src/plugins/services.ts:18`. + +The host must also define a migration strategy for existing state: + +- detect old shared plugin state layouts +- migrate or alias data into per-extension directories +- keep rollback behavior explicit + +### Credentials + +Credential persistence is host-owned. + +Provider integration extensions may return credential payloads, but they must not choose final storage shape or bypass the credential store. + +This is required because auth flows like `extensions/google-gemini-cli-auth/index.ts:24` interact with credentials and config together. + +This rule also applies when those flows are invoked through extension-owned CLI or setup flows. + +### Session and transcript state + +Kernel-owned. + +Extensions may observe or augment session state through declared runtime contracts, but they do not own transcript persistence. + +### Backend-owned state + +Runtime backends such as ACP may require separate service state, but ownership still flows through the host-assigned state boundary. + +### Distribution and onboarding metadata + +Install metadata, channel catalog metadata, docs links, and quickstart hints are host-owned static metadata. + +They are not kernel persistence and they are not extension-private state. + +That static metadata should preserve current channel catalog fields from `src/plugins/manifest.ts:121`, including aliases, docs labels, precedence hints, binding hints, picker extras, and announce-target hints. + +## HTTP And Webhook Ownership + +The host owns all HTTP route registration and conflict resolution. + +This is required because routes can conflict across extensions today, as seen in `src/plugins/http-registry.ts:12`. + +### Route classes + +- ingress transport routes +- authenticated plugin routes +- public callback routes +- diagnostic or admin routes +- dynamic account-scoped routes + +### Required route metadata + +- path +- auth mode +- match mode +- owner contribution id +- whether the route is externally reachable +- whether the route is safe to expose when the extension is disabled +- lifecycle mode (`static` or `dynamic`) +- scope metadata such as account, workspace, or provider binding + +### Conflict rules + +- exact path collisions require explicit resolution +- prefix collisions require overlap analysis +- auth mismatches are fatal +- one extension may not replace another extension's route without explicit policy + +Dynamic route registration must also return an unregister handle so route ownership can be cleaned up during reload, account removal, or degraded shutdown. + +## Runtime Backend Contract + +Some extension contributions provide runtime backends consumed by subsystems rather than directly by the agent. + +ACP is the reference case today: + +- backend type in `src/acp/runtime/registry.ts:4` +- registration in `extensions/acpx/src/service.ts:55` + +### Required backend descriptor + +- backend class id +- backend instance id +- selector key +- health probe +- capability list +- selection rank +- arbitration mode + +### Required backend lifecycle + +- register +- unregister +- probe +- health +- degrade +- recover + +### Backend selection rules + +- explicit requested backend id wins +- if none requested, pick the healthiest backend with the best rank +- if multiple healthy backends tie, use deterministic ordering by extension id then contribution id +- if all backends are unhealthy, expose a typed unavailability error + +### Singleton vs parallel + +Not every backend is singleton. + +ACP may remain effectively singleton at first, but the contract should support future parallel backends with explicit selectors. + +## Slot-Backed Provider Contract + +Not every exclusive runtime provider is a generic backend. + +Current `main` already has slot-backed provider selection in: + +- `src/plugins/slots.ts:12` +- `src/context-engine/registry.ts:60` + +The host must model explicit slot-backed providers for cases such as: + +- context engines +- default memory providers +- future execution or planning engines + +Required slot rules: + +- each slot has a stable slot id +- each slot has a host-defined default +- explicit config selection wins +- only one active provider may own an exclusive slot +- migration preserves existing config semantics such as `plugins.slots.memory` and `plugins.slots.contextEngine` + +Migration rule: + +- slot-backed providers must move into host-owned registries before broader catalog and arbitration migration claims are considered complete + +## Isolation Rules + +The host must isolate extension failures from the kernel as much as possible. + +Minimum requirements: + +- one extension failing startup does not block unrelated extensions +- one contribution registration failure does not corrupt host state +- background-service failures transition the extension to `degraded` or `failed` without leaving stale registrations behind +- stop hooks are best-effort and time-bounded + +In the current trusted in-process mode, "isolation" here means lifecycle and registry isolation, not a security sandbox. + +## Reload And Upgrade Rules + +Hot reload is optional. Deterministic restart behavior is required. + +On reload or upgrade: + +1. stop host-managed services +2. unregister contributions +3. clear host-owned route, command, backend, and slot registrations +4. clear dynamic account-scoped routes and stale runtime handles +5. instantiate the new version +6. reactivate only after validation and policy checks succeed + +If the host continues to support config-driven hot reload during migration, it must also preserve: + +- channel-owned reload prefix behavior equivalent to current `configPrefixes` and `noopPrefixes` +- gateway feature advertisement cleanup and re-registration +- setup-flow and native-command registrations that depend on account-scoped runtime state + +This advertisement handling does not replace callable RPC registration. If a migrated extension exposes callable gateway-style methods, those should still be re-registered through `capability.rpc`. + +During migration, keep the current built-in onboarding fallback in place until host-owned setup surfaces cover bundled channels with parity. + +Pilot rule: + +- the fallback stays in place until `telegram` parity has been recorded for setup-adjacent host behavior, even if runtime messaging parity lands earlier + +## Operator Policy + +The host should support policy controls for: + +- allowed extension ids +- denied permissions +- default permission grants for bundled extensions +- allowed extension origins and provenance requirements +- origin precedence and duplicate resolution +- workspace extensions disabled by default unless explicitly allowed +- bundled channel auto-enable rules tied to channel config +- route exposure policy +- network egress policy +- backend selection policy +- whether external extensions are permitted at all +- SDK compatibility level and deprecation mode +- prompt-mutation policy defaults +- whether interactive extension-owned CLI and setup flows are allowed +- whether extension-owned native command registration is allowed on specific providers +- whether config-driven hot reload descriptors are honored or downgraded to restart-only behavior + +## Observability + +The host must emit structured telemetry for: + +- activation timings +- policy denials +- contribution conflicts +- route conflicts +- backend registration and health +- service start and stop +- extension degradation and recovery +- provenance warnings and origin overrides +- state migration outcomes +- compatibility-mode activation and deprecated SDK usage +- setup flow phase transitions and fallback-path usage +- config redaction or restore validation failures +- reload descriptor application and gateway feature re-registration + +## Immediate Implementation Work + +1. Write the boundary cutover inventory for every current plugin-owned surface. +2. Introduce an extension-host lifecycle state machine. +3. Move route registration policy out of plugin internals into host-owned registries. +4. Add a policy evaluator that understands advisory versus enforced permission modes. +5. Add host-owned credential and per-extension state boundaries for extension services. +6. Generalize backend registration into a host-managed `capability.runtime-backend` registry. +7. Add slot-backed provider management for context engines and other exclusive runtime providers. +8. Preserve provenance, origin precedence, and current workspace and bundled enablement rules in host policy. +9. Preserve prompt-mutation policy gates and add explicit state migration handling. +10. Add explicit host registries and typed contracts for extension-owned hooks, channels, providers, tools, commands, CLI, setup flows, config surfaces, and status surfaces. +11. Preserve config redaction-aware schema behavior and current reload or gateway feature contracts during migration. +12. Record lifecycle parity for `thread-ownership` first and `telegram` second before broadening the compatibility bridges. diff --git a/docs/.internal/extension-host-migration/openclaw-kernel-event-pipeline-spec.md b/docs/.internal/extension-host-migration/openclaw-kernel-event-pipeline-spec.md new file mode 100644 index 00000000000..e85ab7f9197 --- /dev/null +++ b/docs/.internal/extension-host-migration/openclaw-kernel-event-pipeline-spec.md @@ -0,0 +1,781 @@ +Temporary internal migration note: remove this document once the extension-host migration is complete. + +# OpenClaw Kernel Event Pipeline Spec + +Date: 2026-03-15 + +## Purpose + +This document defines the canonical kernel event model, execution stages, handler classes, ordering, mutation rules, and veto semantics. + +The goal is to replace today's mixed plugin hook behavior with one explicit runtime pipeline and a small set of execution modes that match current `main` behavior. + +## TODOs + +- [ ] Implement canonical event types and stage ordering in code. +- [ ] Bridge current plugin hooks, internal hooks, and agent event streams into the pipeline. +- [ ] Implement sync transcript-write stages with parity for current hot paths. +- [ ] Record the legacy-to-canonical mapping table used by the first pilot migrations. +- [ ] Record parity for `thread-ownership` first and `telegram` second before broader event migration. +- [ ] Document which legacy hook sources are still bridged and which have been retired. +- [ ] Add parity tests for veto, resolver, and sync-stage behavior. + +## Implementation Status + +Current status against this spec: + +- no canonical event pipeline work has landed yet +- only the prerequisites from earlier phases are underway + +Relevant prerequisite work that has landed: + +- an initial Phase 0 cutover inventory now exists in `src/extension-host/cutover-inventory.md` +- the extension-host boundary now owns active registry state +- registry activation now routes through `src/extension-host/activation.ts` +- initial normalized extension schema types now exist +- static consumers can now read host-owned resolved-extension data +- config doc baseline generation now uses the same host-owned resolved-extension data path +- channel, provider, HTTP-route, gateway-method, tool, CLI, service, command, context-engine, and hook registration normalization now has a host-owned helper boundary +- loader provenance and duplicate-order policy now have a host-owned helper boundary +- loader module-export resolution, config validation, and memory-slot load decisions now have a host-owned helper boundary +- loader record-state transitions now have a host-owned helper boundary + +Why this matters for this spec: + +- event work should land on top of a host-owned boundary and normalized contribution model rather than on top of more plugin-era runtime seams +- the current implementation has deliberately not started bridge or stage work before those earlier boundaries were in place, including the first loader-runtime and record-state seams + +## Design Goals + +- every inbound and outbound path goes through one canonical pipeline +- handler behavior is declared, not inferred +- routing-affecting handlers are distinct from passive observers +- ordering and merge rules are deterministic +- extension failures are isolated and visible +- sync transcript-write paths remain explicit rather than being hidden inside generic async stages +- current plugin hooks, internal hooks, and agent event streams can be bridged into one model incrementally +- the migration path for legacy event buses is explicit rather than accidental + +## Sequencing Constraints + +This pipeline is a migration target, not a prerequisite for every other host change. + +Therefore: + +- minimal SDK compatibility and host registry ownership should land before broad hook migration +- the first event migration should prove parity for a small non-channel hook case and a channel case +- do not require every event family to be implemented before pilot migrations can bridge the current hook set +- do not leave legacy hook buses as undocumented permanent peers to the canonical pipeline + +## Canonical Event Families + +The kernel should emit typed event families instead of raw plugin hook names. + +Recommended families: + +- `runtime.started` +- `runtime.stopping` +- `gateway.starting` +- `gateway.started` +- `gateway.stopping` +- `command.received` +- `command.completed` +- `account.started` +- `account.stopped` +- `ingress.received` +- `ingress.normalized` +- `routing.resolving` +- `routing.resolved` +- `session.starting` +- `session.started` +- `session.resetting` +- `agent.starting` +- `agent.model.resolving` +- `agent.prompt.building` +- `agent.llm.input` +- `agent.llm.output` +- `agent.tool.calling` +- `agent.tool.called` +- `transcript.tool-result.persisting` +- `transcript.message.writing` +- `compaction.before` +- `compaction.after` +- `agent.completed` +- `egress.preparing` +- `egress.sending` +- `egress.sent` +- `egress.cancelled` +- `egress.failed` +- `interaction.received` +- `subagent.spawning` +- `subagent.spawned` +- `subagent.delivery.resolving` +- `subagent.delivery.resolved` +- `subagent.completed` + +These families intentionally cover the behavior currently spread across `src/plugins/hooks.ts:1`, `src/hooks/internal-hooks.ts:13`, `src/infra/agent-events.ts:3`, and channel monitors. + +## Canonical Event Envelope + +Every event should carry: + +- `eventId` +- `family` +- `occurredAt` +- `workspaceId` +- `agentId` +- `sessionId` +- `accountRef` +- `conversationRef` +- `threadRef` +- `messageRef` +- `sourceContributionId` +- `correlationId` +- `payload` +- `metadata` +- `providerMetadata` +- `hotPath` + +The event envelope is immutable. Mutation happens through stage outputs, not by mutating the event object in place. + +## Handler Classes + +Each handler contribution must declare exactly one class: + +- `observer` +- `augmenter` +- `mutator` +- `veto` +- `resolver` + +### `observer` + +Side effects only. No runtime decision output. + +### `augmenter` + +May attach additional context for downstream stages. + +Examples: + +- prompt context injection +- memory recall summaries +- diagnostics enrichment + +### `mutator` + +May modify a typed working object for the current pipeline stage. + +Examples: + +- prompt build additions +- model override +- tool call decoration + +### `veto` + +May cancel a downstream action with a typed reason. + +Examples today: + +- send cancellation in `extensions/thread-ownership/index.ts:63` + +### `resolver` + +May produce a selected target or route decision. + +Examples today: + +- subagent delivery target selection in `extensions/discord/src/subagent-hooks.ts:103` + +Only `veto` and `resolver` handlers may influence routing or delivery decisions. + +## Execution Modes + +The semantic handler class is not enough by itself. + +Each stage must also declare one of three execution modes: + +- `parallel` + For read-only observers and low-risk side effects. +- `sequential` + For merge, mutation, veto, and resolver stages. +- `sync-sequential` + For transcript and persistence hot paths where async handlers are not allowed. + +This mirrors current `main` behavior in `src/plugins/hooks.ts:199`, `src/plugins/hooks.ts:226`, `src/plugins/hooks.ts:465`, and `src/plugins/hooks.ts:528`. + +## Deterministic Ordering + +Within a stage, handlers run in this order: + +1. explicit priority descending +2. extension id ascending +3. contribution id ascending + +Priority is optional. Ties must resolve deterministically. + +## Stage Execution Model + +Every pipeline stage declares: + +- which handler classes are allowed +- execution mode +- whether handlers run in parallel or sequentially +- how outputs are merged +- whether errors fail open or fail closed + +## Gateway And Command Pipeline + +### Stage: `gateway.starting`, `gateway.started`, `gateway.stopping` + +Allowed handler classes: + +- `observer` + +Execution mode: + +- `parallel` + +Purpose: + +- lifecycle telemetry +- startup and shutdown side effects + +### Stage: `command.received`, `command.completed` + +Allowed handler classes: + +- `observer` +- `augmenter` + +Execution mode: + +- `sequential` + +Purpose: + +- command audit +- command lifecycle integration +- operator-visible side effects +- preserve source-surface metadata for chat commands, native commands, and host CLI invocations when those flows are bridged into canonical command events + +Bridge requirement: + +- the current internal hook bus in `src/hooks/internal-hooks.ts:13` +- and the current agent event stream in `src/infra/agent-events.ts:3` + +must be mapped deliberately into canonical families during migration. + +Acceptable end states are: + +- they become compatibility sources that emit canonical events +- or they are fully retired after parity is reached + +An undocumented permanent fourth event system is not acceptable. + +## Ingress Pipeline + +### Stage 1: `ingress.received` + +Input: + +- raw adapter payload + +Allowed handler classes: + +- `observer` + +Execution mode: + +- `parallel` + +Purpose: + +- telemetry +- raw audit +- diagnostics + +### Stage 2: `ingress.normalized` + +Input: + +- normalized inbound envelope from `adapter.runtime.decodeIngress` + +Allowed handler classes: + +- `observer` +- `augmenter` +- `mutator` + +Execution mode: + +- `sequential` + +Purpose: + +- add normalized metadata +- enrich source/account context +- attach pre-routing annotations + +This stage must not choose a route. + +### Stage 3: `routing.resolving` + +Allowed handler classes: + +- `augmenter` +- `resolver` +- `veto` + +Execution mode: + +- `sequential` + +Purpose: + +- route lookup +- ownership checks +- subagent delivery target resolution +- policy application before route finalization + +Merge rules: + +- `resolver` outputs produce candidate route decisions +- highest-precedence valid decision wins +- `veto` may cancel route selection + +### Stage 4: `routing.resolved` + +Allowed handler classes: + +- `observer` +- `augmenter` + +Execution mode: + +- `sequential` + +Purpose: + +- emit resolved route metadata +- enrich downstream session context + +### Stage 5: `session.starting` + +Allowed handler classes: + +- `observer` +- `augmenter` +- `mutator` + +Execution mode: + +- `sequential` + +Purpose: + +- bind session context +- attach memory lookup keys +- prepare session-scoped metadata + +### Stage 6: `session.started` + +Allowed handler classes: + +- `observer` + +Execution mode: + +- `parallel` + +Purpose: + +- fire lifecycle observers + +### Stage 7: `agent.starting` + +Allowed handler classes: + +- `observer` +- `augmenter` + +Execution mode: + +- `sequential` + +Purpose: + +- last pre-run annotations + +## Prompt And Model Pipeline + +### Stage: `agent.model.resolving` + +Allowed handler classes: + +- `mutator` + +Execution mode: + +- `sequential` + +Merge rules: + +- first defined model override wins +- first defined provider override wins + +This mirrors current precedence in `src/plugins/hooks.ts:117`. + +### Stage: `agent.prompt.building` + +Allowed handler classes: + +- `augmenter` +- `mutator` + +Execution mode: + +- `sequential` + +Merge rules: + +- static system guidance composes in declared order +- ephemeral prompt additions compose in declared order +- direct system prompt replacement is allowed only for explicitly trusted mutators + +This replaces the ambiguous overlap between `before_prompt_build` and legacy `before_agent_start` in `src/plugins/types.ts:422`. + +### Stage: `agent.llm.input` + +Allowed handler classes: + +- `observer` +- `augmenter` + +Execution mode: + +- `sequential` + +Purpose: + +- provider-call audit +- input usage and prompt metadata capture + +### Stage: `agent.llm.output` + +Allowed handler classes: + +- `observer` +- `augmenter` + +Execution mode: + +- `sequential` + +Purpose: + +- provider response audit +- usage capture +- output enrichment + +## Tool Pipeline + +### Stage: `agent.tool.calling` + +Allowed handler classes: + +- `observer` +- `augmenter` +- `mutator` +- `veto` + +Execution mode: + +- `sequential` + +Purpose: + +- tool policy checks +- argument normalization +- tool-call audit + +### Stage: `agent.tool.called` + +Allowed handler classes: + +- `observer` +- `augmenter` + +Execution mode: + +- `sequential` + +Purpose: + +- result indexing +- memory capture +- diagnostics + +### Stage: `agent.completed` + +Allowed handler classes: + +- `observer` +- `augmenter` + +Execution mode: + +- `sequential` + +Purpose: + +- end-of-run capture +- automatic memory storage +- metrics + +## Persistence Pipeline + +### Stage: `transcript.tool-result.persisting` + +Allowed handler classes: + +- `mutator` + +Execution mode: + +- `sync-sequential` + +Purpose: + +- mutate the tool-result message that will be appended to transcripts + +Rules: + +- async handlers are invalid +- handlers run in deterministic priority order +- each handler sees the previous handler's output + +This is the explicit replacement for today's sync-only `tool_result_persist` hook in `src/plugins/hooks.ts:465`. + +### Stage: `transcript.message.writing` + +Allowed handler classes: + +- `mutator` +- `veto` + +Execution mode: + +- `sync-sequential` + +Purpose: + +- final transcript message mutation +- transcript write suppression when explicitly requested + +Rules: + +- async handlers are invalid +- successful veto decisions are terminal +- mutation happens before the final write + +This is the explicit replacement for today's sync-only `before_message_write` hook in `src/plugins/hooks.ts:528`. + +## Compaction And Reset Pipeline + +Canonical stages: + +- `compaction.before` +- `compaction.after` +- `session.resetting` + +## Egress Pipeline + +### Stage 1: `egress.preparing` + +Input: + +- normalized outbound envelope + +Allowed handler classes: + +- `observer` +- `augmenter` +- `mutator` +- `veto` +- `resolver` + +Execution mode: + +- `sequential` + +Purpose: + +- choose provider or account when not explicit +- attach send metadata +- enforce ownership or safety policy + +This stage replaces today’s mixed send hooks and route checks. + +### Stage 2: `egress.sending` + +Allowed handler classes: + +- `observer` + +Execution mode: + +- `parallel` + +Purpose: + +- telemetry and audit before transport send + +### Stage 3: `egress.sent`, `egress.cancelled`, `egress.failed` + +Allowed handler classes: + +- `observer` +- `augmenter` + +Execution mode: + +- `sequential` + +Purpose: + +- post-send side effects +- delivery-state indexing + +## Interaction Pipeline + +Interaction events should not be routed through message hooks. + +Canonical stages: + +- `interaction.received` +- `interaction.resolved` +- `interaction.completed` + +These handle slash commands, button presses, modal submissions, and similar surfaces. + +## Subagent Pipeline + +The current hook set already proves this needs explicit treatment: + +- `subagent_spawning` +- `subagent_delivery_target` +- `subagent_spawned` +- `subagent_ended` + +The canonical form should be: + +- `subagent.spawning` +- `subagent.spawned` +- `subagent.delivery.resolving` +- `subagent.delivery.resolved` +- `subagent.completed` + +Resolver semantics: + +- multiple candidates may be proposed +- explicit target beats inferred target +- otherwise highest-ranked valid candidate wins + +## Merge Rules + +### Observer + +No merge output. + +### Augmenter + +Produces additive metadata only. + +Conflicts merge by: + +- key append for list-like fields +- last-writer-wins only for fields explicitly marked replaceable + +### Mutator + +Produces typed patch objects. + +Rules: + +- patch schema is stage-specific +- patches apply in deterministic order +- later patches see earlier outputs + +### Veto + +Produces: + +- `allow` +- `cancel` + +Rules: + +- one `cancel` is terminal if the stage is fail-closed +- fail-open stages may ignore veto errors but not successful veto decisions + +### Resolver + +Produces candidate selections. + +Rules: + +- explicit target selectors win +- otherwise rank, policy, and deterministic tie-breakers apply + +## Error Handling + +Per-stage error policy must be explicit. + +Recommended defaults: + +- telemetry and observer stages fail open +- routing and send veto stages fail open unless the contribution declares `failClosed` +- credential or auth mutation stages fail closed +- backend selection stages fail closed when no valid provider remains +- sync transcript stages fail open on handler failure but must still reject accidental async handlers + +## Legacy Hook Mapping + +Current hook names map approximately like this: + +- `before_model_resolve` -> `agent.model.resolving` +- `before_prompt_build` -> `agent.prompt.building` +- `before_agent_start` -> split between `agent.model.resolving` and `agent.prompt.building` +- `llm_input` -> `agent.llm.input` +- `llm_output` -> `agent.llm.output` +- `message_received` -> `ingress.normalized` +- `message_sending` -> `egress.preparing` +- `message_sent` -> `egress.sent` +- `before_tool_call` -> `agent.tool.calling` +- `after_tool_call` -> `agent.tool.called` +- `tool_result_persist` -> `transcript.tool-result.persisting` +- `before_message_write` -> `transcript.message.writing` +- `before_compaction` -> `compaction.before` +- `after_compaction` -> `compaction.after` +- `before_reset` -> `session.resetting` +- `gateway_start` -> `gateway.started` +- `gateway_stop` -> `gateway.stopping` +- `subagent_delivery_target` -> `subagent.delivery.resolving` + +First pilot focus: + +- `thread-ownership` should validate `message_received` and `message_sending` migration into canonical ingress and egress stages +- `telegram` should validate that channel-path runtime behavior can participate in canonical events without reintroducing plugin-shaped kernel seams + +## Immediate Implementation Work + +1. Add canonical event and stage types to the kernel. +2. Build a stage runner with explicit handler-class validation. +3. Add typed patch and veto result contracts per stage, including sync-sequential stages. +4. Bridge legacy plugin hooks, internal hooks, and agent events into canonical stages in the extension host only. +5. Record the exact legacy-to-canonical mapping used by `thread-ownership`. +6. Record the exact legacy-to-canonical mapping used by `telegram`. +7. Refactor one channel and one non-channel extension through the new pipeline before broader migration. +8. Decide and document the retirement plan for any legacy event bus that remains after parity is achieved. diff --git a/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md b/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md new file mode 100644 index 00000000000..0de1a271ae7 --- /dev/null +++ b/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md @@ -0,0 +1,1663 @@ +Temporary internal migration note: remove this document once the extension-host migration is complete. + +# OpenClaw Kernel + Extension Host Transition Plan + +Date: 2026-03-15 + +## Purpose + +This document defines a stricter transition plan for OpenClaw: + +- the kernel must contain no plugin-specific code +- bundled extensions must be treated the same as externally installed extensions +- agents must see a clean, canonical catalog of what they can do +- conflicts and parallel providers must be handled explicitly, including multiple active messaging channels for the same agent +- the plan must preserve current functionality such as onboarding metadata, slot-backed providers, and transcript-write hooks + +This is a stricter target than the earlier universal adapter plan. The earlier plan still kept plugin-shaped compatibility concerns too close to core. This version moves those concerns into an extension host layer outside the kernel. + +## TODOs + +- [ ] Confirm the implementation phase order still matches current repo priorities and staffing. +- [x] Write the initial boundary cutover inventory for every current plugin-owned surface. +- [ ] Keep the boundary cutover inventory updated as surfaces move. +- [ ] Track which phase has started, is in progress, and is complete. +- [ ] Link each completed phase to the concrete PRs or commits that implemented it. +- [ ] Mark which legacy compatibility shims still exist and which have been removed. +- [ ] Define the detailed pilot migration matrix and parity gates before broader compatibility rollout. +- [ ] Record any intentional scope cuts from the original transition sequence. + +## Implementation Status + +Current status against this transition plan: + +- Phase 0 has started but is not complete. +- Phase 1 has started but is not complete. +- Phase 2 has started in a compatibility-preserving host-boundary form but is not complete. +- Phase 3 onward remains unimplemented. + +What has landed: + +- a new `src/extension-host/*` boundary now exists and owns active registry state +- the legacy plugin runtime now delegates active-registry ownership to the extension host +- registry activation now routes through `src/extension-host/activation.ts` +- initial normalized extension types now exist in code, including `ResolvedExtension`, `ResolvedContribution`, and `ContributionPolicy` +- plugin manifest records now carry a normalized `resolvedExtension` +- a host-owned resolved-extension registry view now exists for static consumers +- an initial Phase 0 cutover inventory now exists in `src/extension-host/cutover-inventory.md` +- plugin SDK alias resolution now routes through `src/extension-host/loader-compat.ts` +- loader provenance, duplicate-order, and warning policy now route through `src/extension-host/loader-policy.ts` +- loader module-export resolution, config validation, and memory-slot load decisions now route through `src/extension-host/loader-runtime.ts` +- loader record-state transitions now route through `src/extension-host/loader-state.ts` +- runtime registration normalization has started in `src/extension-host/runtime-registrations.ts` for channel, provider, HTTP-route, gateway-method, tool, CLI, service, command, context-engine, and hook registrations +- several existing consumers now read host-owned normalized data instead of plugin-era manifest or runtime state directly: + - channel and dock lookup surfaces + - message-channel normalization + - plugin HTTP route registry default lookup + - package metadata parsing in discovery and install flows + - channel catalog package metadata parsing + - plugin skill discovery + - plugin auto-enable + - config doc baseline generation + - config validation indexing + +How it was done: + +- by extracting a host-owned active-registry module first +- by turning `src/plugins/runtime.ts` into a compatibility facade rather than breaking existing callers +- by introducing normalized static schema types before changing heavy runtime activation paths +- by letting the legacy manifest registry project into a host-owned resolved-extension shape so existing call sites could migrate incrementally +- by migrating static consumers one by one onto resolved-extension data instead of forcing a single cutover +- by moving the first low-risk runtime writes behind host-owned helpers while keeping `src/plugins/registry.ts` as the compatibility call surface +- by leaving duplicate enforcement in legacy subsystems only where that behavior has not been migrated yet, such as plugin commands +- by moving the first loader-owned compatibility pieces behind host-owned helpers before changing discovery, enablement, or policy flow +- by moving the next loader-owned policy helpers behind host-owned modules while preserving the current load/skip/error behavior +- by moving loader runtime decisions next, while preserving lazy loading, config validation behavior, and memory-slot policy behavior +- by moving loader record-state transitions into host-owned helpers before introducing a full lifecycle state machine +- by moving static and lookup-heavy consumers first, where the ownership boundary matters but runtime risk is lower + +What has not landed: + +- keeping the cutover inventory current as more surfaces move +- the lifecycle state machine and remaining loader orchestration +- host-owned registration surfaces beyond the first channel, provider, HTTP-route, gateway-method, tool, CLI, service, command, context-engine, and hook helper slices +- SDK compatibility translation work +- canonical event stages +- canonical capability catalogs +- arbitration migration +- pilot migrations for `thread-ownership` or `telegram` + +## Non-Negotiable End State + +The kernel must not know about: + +- plugins +- plugin manifests +- plugin ids +- plugin installation sources +- bundled versus external origin +- channel-specific runtime namespaces +- legacy `ChannelPlugin` compatibility shapes + +The kernel may only know about: + +- contributions +- capabilities +- adapters +- canonical events +- routing +- policy +- sessions +- agent-visible tools and actions + +The distinction matters: + +- "plugin" is a packaging and lifecycle concern +- "contribution" is a runtime contract + +The extension host handles plugins. The kernel handles contributions. + +## Executive Summary + +Target architecture: + +- `kernel`: pure runtime engine, no plugin-specific concepts +- `extension-host`: discovers extensions, resolves manifests, enforces enablement and conflicts, loads compatibility shims, and emits resolved contributions into the kernel +- `extensions`: all optional functionality, including bundled channels and bundled non-channel features + +Key outcome: + +- built-in and external extensions use the same contribution model +- the kernel sees only resolved runtime contributions +- agent-visible capabilities are compiled centrally from active contributions +- conflicting or overlapping providers are handled by explicit arbitration policies + +How we fix it: + +- lock the boundary first so no new plugin-shaped behavior spreads into the kernel +- add source-of-truth schema and minimal SDK compatibility before broad host migration +- move lifecycle, static metadata, and registry ownership into the extension host +- prove the model with `thread-ownership` first and `telegram` second +- replace fragmented event, catalog, and arbitration behavior with canonical systems +- remove the legacy runtime only after parity is proven and duplicate systems are gone + +Security note: + +- the first host cut is still operating inside OpenClaw's current trusted in-process extension model +- permission descriptors can gate activation, host-owned registries, and operator policy, but they are not a hard sandbox until a real isolation boundary exists + +## Companion Specs + +This plan is the architecture document. The concrete implementation contracts now live in companion specs: + +- `openclaw-extension-contribution-schema-spec.md` +- `openclaw-extension-host-lifecycle-and-security-spec.md` +- `openclaw-kernel-event-pipeline-spec.md` +- `openclaw-capability-catalog-and-arbitration-spec.md` + +Together, these close the remaining implementation gaps: + +- exact contribution and manifest schema +- activation, dependency, permission, and persistence rules +- event-stage ordering, merge, veto, and resolver semantics +- capability naming, disambiguation, arbitration, and agent visibility rules + +## Implementation Order + +Implement in this order: + +1. Phase 0: boundary inventory and anti-corruption layer +2. Phase 1: schema, package metadata, and minimal SDK compatibility +3. Phase 2: extension host lifecycle and registries +4. Phase 3: broader legacy compatibility bridges +5. Phase 4: canonical event pipeline +6. Phase 5: catalog migration +7. Phase 6: arbitration migration +8. Phase 7: broader migration and legacy removal + +Why this order: + +- schema and static metadata must exist before cheap-path install, onboarding, and status flows can move +- minimal SDK compatibility must exist before broader schema or host work can safely load current extensions +- host registries must exist before compatibility shims have somewhere correct to land +- event migration depends on the host and compatibility bridges +- catalogs and arbitration are migrations of existing behavior, not greenfield systems +- legacy removal is only safe after pilot parity and compatibility coverage are proven + +## Design Principles + +1. Plugin-agnostic kernel + +The kernel must compile and function without any code that is semantically specific to plugins. + +2. Contributions over registrations + +The runtime contract is a graph of contributed capabilities, not a set of plugin-specific registration methods. + +3. Bundled equals external + +Bundled extensions must pass through the same host pipeline as external ones. + +4. Capabilities are first-class + +Agent-facing behavior is built from a canonical capability catalog, not from extension identities. + +5. Conflicts are policy, not accidents + +Name collisions, slot collisions, and overlapping providers must be resolved explicitly in the host. + +6. Parallel providers are normal + +Messaging, directory, and other capabilities may have multiple active providers at the same time. + +7. Compatibility belongs outside the kernel + +Legacy `ChannelPlugin` support and existing plugin API bridges must live in the extension host or shim packages, never in the kernel. + +8. Security language must match the implementation + +Permission descriptors are useful, but they must not be described as a security boundary while extensions still run as trusted in-process code. + +9. Static descriptors stay separate from heavy runtime code + +Install metadata, onboarding labels, docs links, and lightweight channel behavior must remain cheap to load and must not require activating a full adapter runtime. + +10. Performance regressions are architectural bugs + +The host must preserve current lazy-loading, caching, and lightweight shared-path behavior rather than rebuilding the same metadata on every startup path. + +11. Replace existing behavior, do not duplicate it + +Catalog, arbitration, setup, and status migration must absorb the existing partial systems rather than layering a second source of truth beside them. + +## Architecture Overview + +## 1. Kernel + +The kernel is the runtime engine. It owns: + +- canonical message and event types +- adapter runtime contracts +- routing +- session storage semantics +- policy evaluation +- capability catalog compilation +- ingress and egress pipelines +- agent tool visibility and arbitration +- telemetry over canonical events + +The kernel does not load extensions directly. + +## 2. Extension Host + +The extension host owns: + +- extension discovery +- bundled extension inventory +- external extension inventory +- manifest parsing +- distribution and install metadata +- lightweight adapter and channel descriptors for onboarding and status UX +- enablement and disablement state +- dependency and version checks +- conflict resolution +- provenance and trust-policy evaluation +- contribution graph assembly +- host-owned route, command, backend, credential, and state registries +- legacy compatibility wrappers +- SDK compatibility and deprecation shims +- error isolation during extension activation + +The host converts extension packages into kernel contributions. + +## 3. Extensions + +Extensions contain actual optional functionality: + +- channels +- provider auth helpers +- memory backends +- agent tools +- action handlers +- directory providers +- monitoring or lifecycle add-ons + +Bundled extensions live in the distribution but are still optional. They are not special from the kernel’s perspective. + +## Current Problems This Plan Solves + +Today, OpenClaw already has a universal registry in `src/plugins/registry.ts:129`, but the runtime still mixes plugin-specific and channel-specific assumptions into core paths. + +Examples: + +- `src/channels/plugins/types.plugin.ts:49` defines a plugin-shaped runtime contract for channels +- `src/plugins/runtime/types-channel.ts:16` exposes a large channel-specific runtime namespace +- many channel monitors manually stitch together route resolution, session recording, context normalization, and reply dispatch, for example `extensions/matrix/src/matrix/monitor/handler.ts:646` + +This creates four problems: + +1. The runtime is not truly uniform. +2. New extensions need deep knowledge of internals. +3. Cross-cutting behavior is duplicated or inconsistently ordered. +4. The kernel is forced to retain channel- and plugin-shaped seams. + +It also leaves important migration constraints that the new design must preserve: + +- prompt-mutation policy controls such as `plugins.entries..hooks.allowPromptInjection` +- path-safety and provenance checks during discovery and load +- lazy startup behavior that avoids loading heavy runtimes on cheap code paths +- existing state layouts that cannot be discarded without migration + +## Target Runtime Model + +## 1. Contributions + +A contribution is the only runtime unit the kernel accepts. + +Suggested categories: + +- `adapter.runtime` +- `capability.messaging` +- `capability.directory` +- `capability.memory` +- `capability.provider-integration` +- `capability.context-engine` +- `capability.agent-tool` +- `capability.interaction` +- `capability.status` +- `capability.setup` +- `capability.policy-augment` +- `capability.event-handler` +- `capability.runtime-backend` + +The kernel consumes contributions through typed contracts. + +## Current Runtime Surfaces That Must Be Cut Over + +Before implementation starts, write and maintain a cutover inventory for every current plugin-owned surface: + +- manifest loading and static metadata +- package-level install and onboarding metadata +- discovery, provenance, duplicate resolution, and origin precedence +- config schema and UI hint loading +- typed hooks and legacy hook bridges +- channels and channel lookup +- providers and provider auth or setup flows +- tools and agent-visible tool catalogs +- HTTP routes and gateway methods +- CLI registrars and plugin commands +- services and context-engine registrations +- slot selection and existing arbitration paths +- status, reload, install, update, and diagnostics surfaces + +Each surface must be tagged as: + +- kernel-owned +- host-owned +- compatibility-only + +No new direct writes to global plugin registries should be added outside the new host boundary once Phase 0 begins. + +## 1a. Extension taxonomy + +The new model must support more than channels. "Extension" is the package or bundle unit. One extension may emit one or many contributions. + +Representative extension classes in OpenClaw today: + +- channel or transport extensions +- provider integration extensions +- agent-tool extensions +- memory extensions +- telephony or voice extensions +- background service extensions +- CLI or operator-surface extensions +- status, setup, or config extensions +- context augmentation extensions + +Examples from the current repo: + +- provider auth in `extensions/google-gemini-cli-auth/index.ts:24` +- agent tool plus route plus context augmentation in `extensions/diffs/index.ts:27` +- telephony, gateway methods, tools, CLI, and services in `extensions/voice-call/index.ts:230` +- memory tools, lifecycle handlers, CLI, and service in `extensions/memory-lancedb/index.ts:314` + +The host must allow an extension to emit a mixed contribution set. Channels are only one case. + +## 1b. Contribution families + +The host should normalize extension outputs into a standard set of contribution families. + +Kernel-facing runtime families: + +- `adapter.runtime` +- `capability.agent-tool` +- `capability.control-command` +- `capability.provider-integration` +- `capability.memory` +- `capability.context-engine` +- `capability.context-augmenter` +- `capability.event-handler` +- `capability.route-augmenter` +- `capability.interaction` +- `capability.rpc` +- `capability.runtime-backend` + +Host-managed families: + +- `service.background` +- `surface.cli` +- `surface.config` +- `surface.status` +- `surface.setup` +- `surface.http-route` + +This split is important: + +- the kernel should own runtime behavior +- the host should own discovery, activation, admin surfaces, and compatibility + +## 1c. Mapping from current APIs + +Current extension APIs can be translated into contribution families by the extension host. + +Suggested mapping: + +- `registerChannel(...)` -> `adapter.runtime` plus lightweight dock metadata and optional `surface.config`, `surface.status`, `surface.setup` +- `registerProvider(...)` -> `capability.provider-integration` plus optional setup and auth surfaces +- `registerTool(...)` -> `capability.agent-tool` +- `registerCommand(...)` -> `capability.control-command` +- `on(...)` returning context or side effects -> `capability.context-augmenter` or `capability.event-handler` +- `on(...)` returning route, session-binding, or send-veto decisions -> `capability.route-augmenter` +- `registerGatewayMethod(...)` -> `capability.rpc` +- backend registration used by core subsystems -> `capability.runtime-backend` +- `registerContextEngine(...)` -> `capability.context-engine` +- `registerService(...)` -> `service.background` +- `registerCli(...)` -> `surface.cli` +- `registerHttpRoute(...)` -> `surface.http-route` +- config schema or UI hints -> `surface.config` +- package metadata used for install, onboarding, or channel catalogs -> host-owned static descriptors + +Concrete examples: + +- `extensions/google-gemini-cli-auth/index.ts:25` becomes `capability.provider-integration` +- `extensions/diffs/index.ts:27` becomes `capability.agent-tool` +- `extensions/diffs/index.ts:28` becomes a host-managed route or interaction surface +- `extensions/diffs/index.ts:38` becomes `capability.context-augmenter` +- `extensions/voice-call/index.ts:230` becomes `capability.rpc` and telephony runtime contributions +- `extensions/voice-call/index.ts:377` becomes `capability.agent-tool` +- `extensions/voice-call/index.ts:510` becomes `service.background` +- `extensions/acpx/src/service.ts:55` becomes `capability.runtime-backend` +- `extensions/memory-lancedb/index.ts:314` becomes `capability.agent-tool` +- `extensions/memory-lancedb/index.ts:548` becomes `capability.context-augmenter` +- `extensions/memory-lancedb/index.ts:664` becomes `service.background` +- `extensions/phone-control/index.ts:330` becomes `capability.control-command` +- `extensions/thread-ownership/index.ts:63` becomes `capability.route-augmenter` + +## 1d. Lightweight descriptors and distribution metadata + +The host also needs a static descriptor layer that is not the same thing as runtime contributions. + +Current `main` still depends on package metadata and lightweight channel docks for: + +- install and update eligibility +- onboarding and channel picker labels +- docs links and quickstart hints +- status and config surfaces that must stay cheap to load + +Examples on `main`: + +- package metadata in `src/plugins/manifest.ts:121` +- install validation in `src/plugins/install.ts:48` +- channel catalog assembly in `src/channels/plugins/catalog.ts:26` +- lightweight docks in `src/channels/dock.ts:228` + +The revised plan should add host-owned static descriptors for: + +- distribution and install metadata +- onboarding and channel catalog metadata +- lightweight adapter or channel dock metadata +- docs and quickstart hints +- config schema and UI hints used by host config APIs + +These are consumed by the extension host and operator UX only. + +They are not kernel contributions and they must not require loading a heavy adapter runtime. + +Performance requirement: + +- the host should preserve manifest caching, lazy runtime activation, and lightweight dock loading behavior comparable to `src/plugins/manifest-registry.ts:47`, `src/plugins/loader.ts:550`, and `src/channels/dock.ts:228` + +Parity requirement: + +- the static metadata model must preserve the host-visible channel fields used today in `src/plugins/manifest.ts:121` and `src/channels/plugins/catalog.ts:117`, including docs labels, aliases, precedence hints, binding hints, picker extras, and announce-target hints + +## 1e. Event handler classes + +The current plan needs a stronger distinction inside event-driven contributions. Not all handlers are equal. + +The kernel should support these handler classes explicitly: + +- `observer` + Read-only. May emit telemetry or side effects, but cannot affect control flow. +- `augmenter` + Adds context or metadata for later stages. +- `mutator` + May transform payloads. +- `veto` + May block an action such as send or route. +- `resolver` + May authoritatively propose or override a routing or delivery decision. + +This matters because current extensions already rely on these distinctions: + +- `extensions/diffs/index.ts:38` is an augmenter +- `extensions/thread-ownership/index.ts:87` is a veto on send +- `extensions/discord/src/subagent-hooks.ts:103` is a delivery-target resolver + +The plan must define: + +- ordering rules +- merge rules +- veto precedence +- whether multiple resolvers compose or compete + +Without this, the contribution model is too vague to safely replace today’s typed hook behavior. + +The semantic taxonomy is useful, but the runner should stay close to how `main` actually executes hooks. + +The first cut only needs three execution modes: + +- parallel observers +- sequential merge or decision handlers +- sync sequential hot-path handlers for transcript persistence and message writes + +Do not overbuild a more abstract scheduling system until the current hook classes have been migrated. + +Implementation guardrail: + +- keep the richer handler taxonomy as documentation of intent +- do not require separate engine machinery for every handler class in the first implementation + +## 1f. Kernel event families need expansion + +The current event list should be expanded beyond message and session events. + +Additional families needed for parity with the repo: + +- `gateway.started` +- `gateway.stopping` +- `agent.before-start` +- `agent.after-end` +- `agent.llm-input` +- `agent.llm-output` +- `tool.before-call` +- `tool.after-call` +- `transcript.tool-result-persisting` +- `transcript.message-writing` +- `compaction.before` +- `compaction.after` +- `session.resetting` +- `delivery.before-send` +- `delivery.after-send` +- `subagent.spawning` +- `subagent.delivery-target` +- `subagent.ended` +- `command.received` +- `command.completed` + +These are required to cover: + +- memory recall and auto-capture in `extensions/memory-lancedb/index.ts:548` +- send veto or mutation in `extensions/thread-ownership/index.ts:87` +- subagent thread binding in `extensions/discord/src/subagent-hooks.ts:41` +- transcript-write mutations in `src/plugins/hooks.ts:465` +- gateway, command, and agent event streams currently split across `src/hooks/internal-hooks.ts:13` and `src/infra/agent-events.ts:3` + +## 1g. Metadata strategy + +The plan must define how canonical events expose provider-specific metadata without pushing provider branches into the kernel. + +Recommended rule: + +- canonical fields for common semantics +- typed opaque metadata bag for provider-specific details +- metadata access helpers supplied by the contributing adapter or host layer +- lightweight static descriptors for operator-facing install and onboarding metadata + +This is necessary because current extensions depend on provider-specific metadata such as: + +- Slack thread and channel ids in `extensions/thread-ownership/index.ts:67` +- Discord thread-binding details in `extensions/discord/src/subagent-hooks.ts:67` + +## 1h. Stateful and slot-backed runtime providers + +The plan needs explicit categories both for extensions that provide runtime backends consumed by the rest of the system and for exclusive slot-backed providers selected by config. + +ACP is the clearest example: + +- `extensions/acpx/src/service.ts:55` registers a backend consumed elsewhere + +Context engines are the clearest slot-backed example on `main`: + +- slot definitions in `src/plugins/slots.ts:12` +- runtime resolution in `src/context-engine/registry.ts:60` + +These are not ordinary services. They are subsystem providers. The kernel needs a typed way to consume them without knowing about plugins. + +Suggested shape: + +- `capability.runtime-backend` + keyed by backend kind, for example `acp-runtime`, `memory-store`, `queue-owner`, or future execution backends +- `capability.context-engine` + keyed by engine id, with explicit exclusive-slot selection and host-managed defaulting + +Arbitration: + +- usually `exclusive` or `ranked` +- context engines are explicitly `exclusive` +- memory needs both backend arbitration and agent-action arbitration + +Migration requirement: + +- preserve current slot defaults and config semantics during the transition, including `plugins.slots.memory` and `plugins.slots.contextEngine` + +## 1i. HTTP and webhook surfaces + +The current plan did not explicitly separate HTTP route ownership and webhook handling. + +We need both: + +- host-managed HTTP route contributions for extension-owned pages or APIs +- adapter-owned ingress endpoints for transport webhooks + +Examples: + +- `extensions/diffs/index.ts:28` exposes a plugin-owned route +- `extensions/voice-call/index.ts:230` depends on RPC-like gateway methods + +The host must handle: + +- path conflict detection +- auth policy at the route level +- route lifecycle tied to extension activation +- dynamic account-scoped route registration and teardown, not only startup-time static routes + +This is required because `main` already supports runtime route registration and unregister handles in `src/plugins/http-registry.ts:12`. + +## 1j. Operator commands versus agent tools + +The plan must distinguish between: + +- agent tools usable by the model +- operator or user commands that bypass the model +- CLI commands for local operators +- CLI onboarding and setup flows for local operators + +Current examples: + +- `extensions/llm-task/index.ts:5` is an agent tool +- `extensions/phone-control/index.ts:330` is a control command +- `extensions/memory-lancedb/index.ts:500` is CLI + +These should not be collapsed into one generic concept. + +Recommended split: + +- `capability.control-command` + chat or native commands that bypass the model on messaging surfaces +- `surface.cli` + local operator CLI commands and subcommands +- `surface.setup` + interactive or non-interactive onboarding and setup flows invoked by host-owned surfaces + +This preserves the distinction that already exists on `main` between plugin CLI registrars, onboarding adapters, and chat control commands. + +Parity rule: + +- `capability.control-command` must preserve `acceptsArgs` matching behavior from `src/plugins/commands.ts:163` +- it must also preserve provider-specific native command names used by native command menus in `src/plugins/commands.ts:320` + +## 1k. Provider integration and auth ownership + +The plan needs to explicitly say where provider discovery, credential persistence, auth UX, and post-selection lifecycle hooks live. + +Provider integration contributions need host-injected capabilities for: + +- prompting +- browser or URL opening +- callback handling +- credential/profile persistence +- config patch application +- discovery order participation +- onboarding and wizard metadata +- token refresh or credential renewal +- model-selected lifecycle hooks + +Example: + +- `extensions/google-gemini-cli-auth/index.ts:25` +- provider plugin contracts in `src/plugins/types.ts:158` + +The kernel should not own interactive auth UX or credential store write policy. + +CLI implication: + +- provider setup, onboarding, and auth flows may be extension-owned in behavior +- but they must execute through host-owned CLI and setup primitives +- the host remains responsible for prompting, credential persistence, config writes, and policy checks + +## 1l. Permission model must match the trust model + +Current `main` treats installed extensions as trusted in-process code: + +- trusted plugin concept in `SECURITY.md:108` +- in-process loading in `src/plugins/loader.ts:621` + +That means the first extension-host cut must not present permission grants as a hard security boundary. + +In the initial host model: + +- permissions gate activation, route exposure, host-managed registries, and operator audit +- permissions do not sandbox arbitrary Node imports, filesystem access, network access, or child processes +- operator UI and docs must describe this as trusted in-process mode + +If OpenClaw later adds a real isolation boundary, keep the same descriptors but add an isolated execution mode where permissions become enforceable. + +Implementation guardrail: + +- phase 1 should implement `advisory` and `host-enforced` +- `sandbox-enforced` should remain a forward-compatible contract until a real isolation boundary exists + +## 1m. Prompt mutation policy parity + +The current runtime has a real policy knob for prompt mutation: + +- `plugins.entries..hooks.allowPromptInjection` in `src/plugins/config-state.ts:14` +- enforcement in `src/plugins/registry.ts:547` + +The new host and kernel split must preserve that behavior explicitly. + +Do not collapse it into a generic permission list and lose the existing distinction between: + +- prompt and model guidance that is allowed +- prompt mutation that is blocked or constrained by operator policy + +Recommended treatment: + +- keep prompt-mutation policy as a dedicated host-managed contribution policy +- apply it when translating legacy hooks and when compiling new event-handler or context-augmenter contributions + +## 1n. SDK compatibility and deprecation plan + +The migration also needs an explicit SDK story. + +Current `main` still depends heavily on: + +- compatibility alias loading in `src/plugin-sdk/root-alias.cjs` +- large runtime compatibility namespaces in `src/plugins/runtime/types-channel.ts:16` + +Decision: + +- introduce one new versioned extension-host SDK as the only target for new extension work +- treat existing `openclaw/plugin-sdk/*` subpaths as compatibility-only +- support at most one or two older SDK contract versions at a time through compatibility shims +- do not add new features to legacy subpaths; only bugfixes and migration bridges are allowed there + +The transition plan should therefore include: + +- a versioned extension-host SDK contract +- compatibility shims for current plugin SDK subpaths +- a deprecation timeline for channel-specific runtime namespaces +- contract tests that prove old extensions still load through the host during migration +- an explicit namespace-by-namespace migration map from `src/plugins/runtime/runtime-channel.ts:119` into the new SDK modules + +Version rule: + +- extensions declare `apiVersion` +- the host validates that version against the supported SDK compatibility window +- legacy compatibility windows should be short and explicit + +## 1o. Resolved extension model + +Decision: + +- the extension host should use one `ResolvedExtension` object as the canonical internal data model +- that object must separate cheap static metadata from runtime-activated state + +Suggested shape: + +```ts +type ResolvedExtension = { + id: string; + version: string; + apiVersion: string; + source: { + origin: "bundled" | "global" | "workspace" | "config"; + path: string; + provenance?: string; + }; + static: { + install?: unknown; + catalog?: unknown; + docks?: unknown; + docs?: unknown; + setup?: unknown; + config?: unknown; + }; + runtime: { + contributions: unknown[]; + services: unknown[]; + routes: unknown[]; + policies: unknown[]; + stateOwnership: unknown; + }; +}; +``` + +Registries are then built from that object: + +- static registry for install, onboarding, and lightweight UX paths +- runtime registry for activated contributions and services + +This keeps lifecycle and provenance coherent while preserving cheap shared-path access. + +The static section should also be able to carry host-consumed config schema and UI hints so config APIs can preserve redaction-aware schema behavior without activating runtime code. + +Implementation guardrail: + +- start with one `ResolvedExtension` model and two registries +- do not build extra registry layers unless a migration step proves they are needed + +## 2. Capability descriptors + +Every contribution must describe: + +- stable contribution key +- capability kind +- public names and aliases +- scope +- exclusivity model +- precedence hints +- selection rules +- dependencies on other contributions +- agent visibility metadata + +Example shape: + +```ts +type ContributionDescriptor = { + key: string; + kind: string; + names?: string[]; + aliases?: string[]; + scope?: "global" | "agent" | "session" | "channel" | "account"; + arbitration: "exclusive" | "ranked" | "parallel" | "composed"; + priority?: number; + dependsOn?: string[]; + agentVisible?: boolean; + description?: string; + policy?: { + promptMutation?: "none" | "append-only" | "replace-allowed"; + routeEffect?: "observe-only" | "augment" | "veto" | "resolve"; + failureMode?: "fail-open" | "fail-closed"; + executionMode?: "parallel" | "sequential" | "sync-sequential"; + }; +}; +``` + +Decision: + +- these policy fields should be typed in the first foundation cut +- do not use an unstructured policy blob for the behaviors that affect safety and runtime semantics + +## 3. Adapters + +In this model, a channel is not a special plugin subtype. It is an adapter contribution plus related optional descriptors. + +An adapter runtime contribution should include only transport behavior: + +- normalize ingress events +- send and manage outbound messages +- optional fetch APIs +- optional interaction surfaces +- optional account lifecycle hooks + +It may also expose typed behavioral descriptors for shared channel UX concerns such as: + +- typing or presence behavior +- status reactions or delivery feedback +- thread defaults and reply context +- streaming and draft delivery behavior +- history or context hints needed by shared pipelines +- reload hints for config-driven hot restart or no-op handling +- gateway feature descriptors for method advertisement when needed during migration + +It must not own routing, session semantics, pairing, or agent dispatch. + +Clarification: + +- adapter-level gateway descriptors are advertisement or compatibility metadata only +- callable gateway-style methods still map to `capability.rpc` +- this keeps `registerGatewayMethod(...)` on one migration path instead of splitting callable behavior across adapter and RPC surfaces + +The host-side dock or adapter descriptor must still be rich enough to preserve current cheap-path behavior from `src/channels/dock.ts:56`, including: + +- command gating hints +- allow-from formatting and default-target helpers +- threading defaults +- elevated fallbacks +- agent prompt hints such as `messageToolHints` + +Implementation guardrail: + +- preserve current cheap-path behavior +- do not design a broad adapter metadata platform beyond the fields needed for parity + +## 4. Canonical events + +Everything in the kernel flows through canonical events. + +Suggested event families: + +- `gateway.started` +- `gateway.stopping` +- `command.received` +- `command.completed` +- `ingress.received` +- `ingress.normalized` +- `routing.resolving` +- `routing.resolved` +- `session.starting` +- `session.started` +- `session.resetting` +- `agent.model.resolving` +- `agent.prompt.building` +- `agent.llm.input` +- `agent.llm.output` +- `agent.tool.calling` +- `agent.tool.called` +- `transcript.tool-result.persisting` +- `transcript.message.writing` +- `agent.completed` +- `egress.preparing` +- `egress.sent` +- `egress.failed` +- `interaction.received` +- `account.started` +- `account.stopped` +- `subagent.spawning` +- `subagent.delivery.resolving` +- `subagent.completed` + +All cross-cutting logic should attach to these events. + +Migration note: + +- the current internal hook bus in `src/hooks/internal-hooks.ts:13` +- and the agent event stream in `src/infra/agent-events.ts:3` + +must either be explicitly bridged into this event model or explicitly retired after parity is reached. Do not leave them as undocumented parallel systems. + +## 5. Agent-visible capability catalog + +Agents must not see plugins. They must see what they can do in the current context. + +The kernel should compile a catalog from active contributions plus runtime context: + +- current adapter +- current adapter action support +- current account +- current route +- current session +- policy +- permission checks +- arbitration outcome + +Canonical action governance: + +- canonical action ids remain open, namespaced strings such as `message.send` or `interaction.modal.open` +- the kernel should keep one source-of-truth registry for reviewed core action families +- if a new feature fits an existing semantic family, reuse that action id +- if the semantics are new, add a reviewed canonical action id to the core registry +- plugins must not invent new kernel-level arbitration semantics on their own + +Examples of catalog entries: + +- `message.send` +- `message.reply` +- `message.broadcast` +- `message.poll` +- `directory.lookup` +- `message.react` +- `message.edit` +- `message.delete` +- `message.pin` +- `memory.store` +- `memory.search` +- `interaction.modal.open` + +The catalog may include provider hints only when needed for disambiguation. + +Example: + +- `message.send` with selectors `target`, `provider`, `account` +- `message.reply` without any provider selector if the route already determines it + +## Conflict and Parallelism Model + +This is a first-class requirement. + +There are two separate conflict domains. + +## 1. Host-level conflicts + +These are resolved before contributions reach the kernel. + +Examples: + +- two extensions claim the same exclusive slot +- two extensions claim the same public command name +- two extensions claim the same contribution key + +Host policy options: + +- reject activation +- require explicit operator selection +- rank by configured priority +- rename or alias one contribution if allowed + +## 2. Kernel-level capability arbitration + +These are runtime selection questions, not activation errors. + +Examples: + +- multiple active messaging providers +- multiple directory providers +- multiple memory providers +- multiple prompt or policy augmenters + +The kernel must support four arbitration modes. + +### Exclusive + +Only one provider may be active. + +Use for: + +- default session store backend +- default memory backend if multiple are not supported + +### Ranked + +Multiple providers may exist, but one becomes the default unless explicitly overridden. + +Use for: + +- tool implementations with sensible fallback ordering + +### Parallel + +Multiple providers are equally valid and may coexist simultaneously. + +Use for: + +- messaging channels +- directory providers +- channel-specific action providers + +### Composed + +Multiple providers contribute to a shared pipeline. + +Use for: + +- prompt augmentation +- event enrichment +- delivery observers + +## Messaging is Parallel by Design + +This is critical. + +OpenClaw must support one agent receiving and sending through multiple channels and accounts simultaneously. Therefore: + +- messaging cannot be modeled as an exclusive capability +- messaging providers must be scoped by adapter id and account id +- routing and target resolution determine which provider is used for a given action + +Suggested provider identity: + +- `messaging:slack:work` +- `messaging:telegram:default` +- `messaging:whatsapp:personal` + +Selection order for outbound: + +1. explicit target or explicit provider +2. current conversation route +3. session last-route +4. configured default binding +5. operator-defined fallback + +Inbound selection is simpler: + +- the ingress event arrives via one adapter provider +- routing resolves which agent and session receive it +- the reply path inherits that provider unless explicitly overridden + +## Naming and Agent Visibility Rules + +The host should separate human-facing extension names from kernel-facing capability names. + +Extension metadata can say: + +- package id +- extension id +- display name + +But the kernel should compile normalized capability names: + +- `message.send` +- `message.reply` +- `directory.lookup` +- `memory.search` + +Conflicts in agent-visible names should be handled by the host and capability compiler. + +Rules: + +1. Prefer one canonical capability name for semantically equivalent actions. +2. Preserve provider identity as metadata, not as the primary tool name. +3. If two providers must both be visible, expose one canonical action with a provider selector instead of two arbitrarily named tools. +4. Only expose separate tools when the semantics are genuinely different. + +Example: + +Bad: + +- `send_slack_message` +- `send_telegram_message` +- `send_discord_message` + +Better: + +- `message.send` + with optional `provider` and `account` + +Best in-context: + +- `message.reply` +- `message.send` + +Where route-derived provider selection means the agent usually does not need to name the concrete messaging provider explicitly. + +## Proposed Module Layout + +### Kernel + +Suggested new top-level structure: + +- `src/kernel/events/` +- `src/kernel/types/` +- `src/kernel/ingress/` +- `src/kernel/egress/` +- `src/kernel/routing/` +- `src/kernel/sessions/` +- `src/kernel/policy/` +- `src/kernel/catalog/` +- `src/kernel/runtime/` + +### Extension Host + +- `src/extension-host/discovery/` +- `src/extension-host/manifests/` +- `src/extension-host/enablement/` +- `src/extension-host/conflicts/` +- `src/extension-host/static/` +- `src/extension-host/install/` +- `src/extension-host/policy/` +- `src/extension-host/contributions/` +- `src/extension-host/compat/` +- `src/extension-host/activation/` + +### Extensions + +- `extensions/*` + +The current plugin system should be gradually absorbed into `extension-host`, not `kernel`. + +## Transition Strategy + +This must be compatibility-first, but compatibility must live outside the kernel. + +## Phase 0: Boundary Inventory And Anti-Corruption Layer + +Objective: + +Make the boundary explicit before implementation begins and prevent further spread of legacy plugin assumptions. + +Tasks: + +- write an ADR defining `kernel` versus `extension-host` +- define a rule that kernel code may not import from `src/plugins`, `src/plugin-sdk`, or `extensions/*` +- write the boundary cutover inventory for every current plugin-owned surface +- define which surfaces are kernel-owned, host-owned, or compatibility-only +- add anti-corruption interfaces so new work cannot write directly into global plugin registries +- document the trusted in-process security model so permission descriptors are not misrepresented +- define the compatibility and deprecation strategy for the existing plugin SDK surface +- add feature flags for host-path versus legacy-path execution where needed for staged rollout + +Exit criteria: + +- the boundary is explicit and testable +- every current plugin-owned surface is tagged with its target owner +- no new direct kernel dependencies on legacy plugin shapes are introduced + +Current implementation status: + +- partially implemented +- the anti-corruption boundary now exists in code through `src/extension-host/active-registry.ts` +- several central readers now go through that boundary +- the initial cutover inventory now exists in `src/extension-host/cutover-inventory.md` and is being updated as surfaces move, but the phase is still incomplete because loader orchestration, lifecycle ownership, and later compatibility phases have not moved yet + +## Phase 1: Schema, Static Metadata, And Minimal SDK Compatibility + +Objective: + +Create the host data model and preserve extension loading while the boundary changes. + +Tasks: + +- define the `ResolvedExtension`, `ResolvedContribution`, and `ContributionPolicy` types +- define canonical contribution descriptors, slot-backed provider types, and catalog-facing metadata types +- add source-of-truth manifest parsing for runtime contributions +- add package metadata parsing for install, onboarding, and lightweight operator UX +- define the new versioned SDK contract and supported compatibility window +- add minimal SDK compatibility loading so current `openclaw/plugin-sdk/*` imports still resolve while the host work lands +- define the static versus runtime sections of `ResolvedExtension` +- preserve config schema and UI hint parsing without activating heavy runtimes + +Deliverables: + +- source-of-truth schema types +- static metadata parser +- package metadata parser +- compatibility-loading surface for the current SDK imports + +Exit criteria: + +- extensions can be normalized into static and runtime sections without activating heavy runtime code +- existing extensions still load through the compatibility loading path + +Current implementation status: + +- partially implemented +- a normalized static model exists in code through `ResolvedExtension` +- package metadata and manifest metadata now converge into host-owned normalized records +- discovery and install metadata parsing now go through host schema helpers +- partial explicit compatibility now exists through host-owned loader-compat and loader-runtime helpers, but a versioned minimal SDK compatibility layer still does not exist + +## Phase 2: Extension Host Lifecycle And Registries + +Objective: + +Move lifecycle and ownership concerns into the host. + +Tasks: + +- create extension discovery and manifest loaders +- move enablement logic into host +- move bundled extension inventory into host +- move install and onboarding metadata into host-owned static descriptors +- move contribution assembly into host +- move provenance, origin precedence, and slot policy into host +- implement contribution graph validation +- implement host-owned registries for hooks, channels, providers, tools, HTTP routes, gateway methods, CLI, services, commands, config, setup, status, backends, and slot-backed providers +- implement per-extension state and route registries +- preserve path-safety, provenance, duplicate-origin hardening, startup laziness, and manifest caching behavior +- keep current built-in onboarding fallbacks in place during early migration +- preserve current setup adapter phases such as status, configure, reconfigure, disable, and DM policy handling + +Deliverables: + +- `src/extension-host/*` +- host-owned static registry +- host-owned runtime registry + +Exit criteria: + +- the host can discover bundled and external extensions, preserve static metadata, and populate normalized registries +- registries are populated through host-owned interfaces rather than direct legacy global writes + +Current implementation status: + +- partially implemented in a compatibility-preserving form +- the host now owns active registry state +- the host now exposes resolved static registries for static consumers +- activation, loader policy, loader runtime decisions, and loader record-state helpers now route through `src/extension-host/*` +- broader lifecycle ownership, registration surfaces, policy gates, and activation-state management are still pending + +## Phase 3: Broader Legacy Compatibility Bridges + +Objective: + +Keep current extensions working through the host without leaking legacy contracts into the kernel. + +Tasks: + +- implement `ChannelPlugin` compatibility shims in `src/extension-host/compat/` +- adapt current plugin registrations into contribution descriptors +- translate current config schema and UI hint registration into `surface.config` +- translate existing plugin CLI registrars and onboarding adapters into `surface.cli` and `surface.setup` +- adapt existing gateway and status surfaces into host-level descriptors +- adapt current package metadata and channel docks into host-owned static descriptors +- preserve prompt-mutation policy enforcement when translating legacy hooks +- preserve config redaction semantics driven by `config.uiHints.sensitive` +- preserve hot reload, no-op config prefix behavior, and gateway feature advertisement where those behaviors still exist +- add compatibility translation for current runtime-channel helper namespaces into the new SDK modules +- maintain a parity matrix for each pilot migration + +Important rule: + +No legacy `ChannelPlugin` type or shim code may appear under `src/kernel/`. + +Exit criteria: + +- `thread-ownership` runs through the host path as the first non-channel pilot +- `telegram` runs through the host path as the first channel pilot +- both pilots have explicit parity results for discovery, config, activation, diagnostics, and runtime behavior + +## Phase 4: Canonical Event Pipeline + +Objective: + +Move runtime behavior onto explicit canonical events and stage rules. + +Tasks: + +- define canonical event and stage types +- define sync transcript-write stages explicitly +- bridge current plugin hooks, internal hooks, and agent event streams into canonical stages in the extension host only +- map legacy typed hooks into canonical stage semantics +- keep permission descriptors host-owned and policy-oriented until real isolation exists +- move compatibility facades into extension-host shims rather than adding new kernel leakage + +Exit criteria: + +- pilot extensions use canonical event stages with parity to current behavior +- any remaining legacy event buses are explicitly documented as compatibility-only + +## Phase 5: Catalog Migration + +Objective: + +Replace plugin-identity-driven catalog behavior with canonical family-based catalogs. + +Tasks: + +- compile active contributions into kernel internal and kernel agent catalogs +- publish host-owned operator and static setup catalogs +- migrate existing tool, provider, setup, and onboarding catalog surfaces onto canonical or host-owned catalog paths +- resolve naming conflicts +- collapse equivalent provider-specific actions into canonical agent tools where appropriate +- add explicit provider selection only when needed +- preserve dedicated prompt-mutation policy filtering during catalog compilation where relevant + +Implementation guardrail: + +- start with one kernel internal catalog, one kernel agent catalog, and host-owned operator or static registries +- do not build a larger publication framework until the registries are stable + +Exit criteria: + +- agent-visible tool inventory is generated from contribution metadata and kernel context +- setup and install catalogs no longer depend on duplicated legacy metadata paths + +## Phase 6: Arbitration Migration + +Objective: + +Absorb the existing conflict-resolution behavior into explicit arbitration. + +Tasks: + +- implement host-level activation conflict resolution +- implement kernel-level runtime provider selection +- migrate existing slot selection and provider selection logic onto canonical arbitration +- add explicit selection APIs for provider-scoped actions +- ensure session route and last-route semantics interact correctly with parallel messaging providers +- cover messaging parallelism, directory overlap, memory backend exclusivity, context-engine slot exclusivity, composed prompt or policy augmenters, and dynamic route conflicts + +Exit criteria: + +- at least one multi-provider family works through canonical arbitration +- legacy slot and provider-selection paths no longer operate as separate arbitration systems + +## Phase 7: Broader Migration And Legacy Removal + +Objective: + +Finish the cutover and remove compatibility-only surfaces in a controlled order. + +Tasks: + +- migrate remaining channels and non-channel extensions in batches +- remove legacy plugin registry entry points once no longer needed +- deprecate `runtime.channel` +- deprecate per-channel SDK subpaths where a neutral replacement exists +- retain only thin compatibility packages until extension migration is complete +- remove the built-in onboarding fallback only after host-owned setup surfaces reach parity for bundled channels + +Suggested second-wave compatibility candidates after the initial pilots: + +- `line` for channel plus command registration +- `device-pair` for command, service, and setup-flow coverage + +Exit criteria: + +- built-in channels behave like ordinary extensions through the host +- the legacy plugin runtime is no longer the default execution path +- kernel no longer imports old plugin infrastructure + +## Pilot Matrix + +Initial pilots: + +- non-channel pilot: `thread-ownership` +- channel pilot: `telegram` + +Why this order: + +- `thread-ownership` exercises typed hook behavior with limited surface area +- `telegram` exercises the `ChannelPlugin` compatibility path with a minimal top-level registration surface + +Each pilot must record parity for: + +- discovery and precedence +- manifest and static metadata loading +- config schema and UI hints +- enabled and disabled state handling +- activation and reload behavior +- diagnostics and status output +- runtime behavior on the migrated path +- compatibility-only gaps that still remain + +## Concrete Refactoring Targets + +The following current areas should be moved or replaced. + +### Move out of kernel + +- plugin registry logic now in `src/plugins/registry.ts:129` +- plugin loader logic now in `src/plugins/loader.ts:37` +- plugin runtime channel namespace in `src/plugins/runtime/types-channel.ts:16` +- direct plugin-specific API types in `src/plugins/types.ts:263` + +### Replace with neutral kernel services + +- route resolution entrypoints currently in `src/routing/resolve-route.ts:614` +- outbound pipeline seed in `src/infra/outbound/deliver.ts:141` +- session recording flow currently called by many channels +- canonical hook or event dispatch ordering +- transcript persistence and message-write stages currently embedded in `src/plugins/hooks.ts:465` + +### Keep only in host compatibility + +- `ChannelPlugin` contract in `src/channels/plugins/types.plugin.ts:49` +- plugin SDK subpath facades like `src/plugin-sdk/telegram.ts:1` + +## Verification Strategy + +## Boundary tests + +- kernel has no imports from `src/plugins`, `src/plugin-sdk`, or `extensions/*` +- host may import kernel, but kernel may not import host + +## Contract tests + +- contribution contract tests +- arbitration contract tests +- capability catalog tests +- adapter runtime contract tests + +## Behavior parity tests + +- route resolution parity +- session transcript parity +- message hook or event ordering parity +- outbound payload parity +- multi-account parity +- install and onboarding catalog parity +- context-engine slot parity +- sync transcript-write parity +- prompt-mutation policy parity +- path-safety and provenance parity +- startup-cost parity for lightweight UX paths + +## Parallel provider tests + +- one agent active on Slack and Telegram simultaneously +- reply follows inbound route by default +- explicit cross-channel send works +- session last-route does not break when multiple messaging providers are active + +## Conflict tests + +- duplicate contribution key +- duplicate exclusive slot +- duplicate agent-visible tool alias +- two ranked providers with clear default resolution + +## Operational Plan + +1. Introduce kernel and host boundaries first. +2. Add import guards and the boundary cutover inventory so the boundary cannot regress. +3. Add source-of-truth schema types, static metadata parsing, and minimal SDK compatibility loading. +4. Move plugin lifecycle and registry ownership into the host without behavior changes. +5. Add compatibility shims in the host and record pilot parity as each surface moves. +6. Migrate `thread-ownership` through the host path first. +7. Migrate `telegram` through the host path second. +8. Add canonical event routing for the pilot surfaces. +9. Migrate existing catalog and arbitration paths rather than adding parallel ones. +10. Migrate remaining extensions in batches. +11. Start deprecating old plugin-facing runtime surfaces. + +## Risks + +Risk: + +The contribution model becomes too abstract and hard to use. + +Mitigation: + +Provide good host-side helpers and templates. Keep kernel contracts narrow and transport-focused. + +Risk: + +Agent-visible catalog becomes confusing when many providers are active. + +Mitigation: + +Use canonical actions first, provider selectors second, provider-specific names only as a last resort. + +Risk: + +Parallel messaging providers create routing ambiguity. + +Mitigation: + +Define and test explicit outbound selection order. Route and session metadata must always carry adapter and account identity. + +Risk: + +Compatibility shims silently leak old plugin assumptions back into the kernel. + +Mitigation: + +Enforce import boundaries with CI and keep all legacy code under the host only. + +Risk: + +The cutover inventory misses one of the current plugin-owned surfaces, so behavior quietly stays on the legacy path. + +Mitigation: + +Treat the boundary cutover inventory as a tracked artifact, update it before changing ownership, and require each pilot to mark which surfaces are full parity, partial parity, or still compatibility-only. + +Risk: + +Bundled extensions are treated as privileged again over time. + +Mitigation: + +Run bundled extensions through the same host activation and contribution pipeline as external extensions. + +Risk: + +Permission descriptors overpromise security that the runtime does not yet provide. + +Mitigation: + +Keep permission language explicitly policy-oriented until OpenClaw ships a real isolation boundary. + +Risk: + +The migration drops current onboarding, install, or lightweight dock behavior while focusing only on runtime contributions. + +Mitigation: + +Treat static host descriptors as a first-class part of the migration, with parity tests for channel catalogs and onboarding flows. + +Risk: + +The host adds enough abstraction to regress startup cost or force heavy adapter loads on shared code paths. + +Mitigation: + +Make lazy activation, manifest caching, and lightweight dock descriptors explicit success criteria and test them. + +Risk: + +The migration breaks existing extensions because the SDK compatibility story is under-specified. + +Mitigation: + +Ship a versioned SDK contract, compatibility shims, and an explicit deprecation timeline before removing old subpaths. + +Risk: + +Catalog and arbitration migration leaves legacy tool, provider, or slot-selection systems running in parallel with the new model. + +Mitigation: + +Treat Phase 5 and Phase 6 as replacement work. Track the current tool catalog, provider-selection, and slot-selection paths explicitly and do not declare those phases complete until the duplicate systems are removed or downgraded to documented compatibility-only shims. + +## Suggested First PRs + +PR 1: + +Add `kernel` and `extension-host` directory structure, boundary ADR, import guards, and the boundary cutover inventory. + +PR 2: + +Define `ResolvedExtension`, `ResolvedContribution`, static metadata types, and the minimal SDK compatibility-loading surface. + +PR 3: + +Move existing plugin discovery, manifest parsing, provenance handling, and registry ownership into `extension-host` while preserving behavior. + +PR 4: + +Add host-side compatibility shim for current hook, provider, and `ChannelPlugin` surfaces. + +PR 5: + +Migrate `thread-ownership` through the new host-to-kernel path with explicit parity tracking. + +PR 6: + +Migrate `telegram` through the new host-to-kernel path with explicit parity tracking. + +## Success Criteria + +This transition succeeds when all of the following are true: + +- the kernel contains no plugin-specific concepts +- bundled and external extensions activate through the same host pipeline +- agent-visible capabilities are compiled centrally from active contributions +- duplicate or overlapping providers are resolved through explicit arbitration +- one agent can receive and send across multiple active messaging providers cleanly +- install, onboarding, and lightweight dock metadata still work through host-owned static descriptors +- context-engine and memory slot behavior are preserved through explicit slot-backed contributions +- transcript-write hooks are preserved through explicit canonical stages +- prompt-mutation policy behavior is preserved through explicit host policy +- startup-time lightweight paths do not force heavy runtime activation +- existing extensions have a documented compatibility and deprecation path through the host SDK +- legacy compatibility exists only in the extension host and can be deleted later without changing kernel semantics + +## Final Recommendation + +Adopt the stricter model. + +Do not let the universal adapter effort stop at “better plugin architecture.” The correct end state is a plugin-agnostic kernel with an extension host layered on top. That is the cleanest way to support optional bundled extensions, clean agent capability surfacing, deterministic conflict handling, and true parallel providers for messaging and other runtime capabilities.