feat(agents): detach video generation completion

2026-04-06 00:32:28 +01:00 · 2026-04-06 00:32:28 +01:00 · 3fcff952ba
parent 9fba0c6ac7
commit 3fcff952ba
9 changed files with 712 additions and 207 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -35,6 +35,7 @@ Docs: https://docs.openclaw.ai
 - Agents/tool prompts: remove the duplicate in-band tool inventory from agent system prompts so tool-calling models rely on the structured tool definitions as the single source of truth, improving prompt stability and reducing stale tool guidance.
 - Tools/video generation: add bundled xAI (`grok-imagine-video`) and Alibaba Model Studio Wan video providers, plus live-test/default model wiring for both.
 - Agents/video generation: register `video_generate` runs in the task ledger with task/run ids and lifecycle updates so long-running generations can be tracked more reliably.
+- Agents/video generation: make session-backed `video_generate` runs detach into background tasks, wake the same agent session on completion, and have the agent post the finished video back into the original channel as a follow-up reply.
 - Providers/CLI: remove bundled CLI text-provider backends and the `agents.defaults.cliBackends` surface, while keeping ACP harness sessions and Gemini media understanding on the native bundled providers.
 - Matrix/exec approvals: clarify unavailable-approval replies so Matrix no longer claims chat approvals are unsupported when native exec approvals are merely unconfigured. (#61424) Thanks @gumadeiras.
 - Docs/IRC: replace public IRC hostname examples with `irc.example.com` and recommend private servers for bot coordination while listing common public networks for intentional use.
--- a/docs/automation/tasks.md
+++ b/docs/automation/tasks.md
@ -77,9 +77,12 @@ openclaw tasks flow cancel <lookup>
 | Subagent orchestration | `subagent`   | Spawning a subagent via `sessions_spawn`               | `done_only`           |
 | Cron jobs (all types)  | `cron`       | Every cron execution (main-session and isolated)       | `silent`              |
 | CLI operations         | `cli`        | `openclaw agent` commands that run through the gateway | `silent`              |
+| Agent media jobs       | `cli`        | Session-backed `video_generate` runs                   | `silent`              |

 Main-session cron tasks use `silent` notify policy by default — they create records for tracking but do not generate notifications. Isolated cron tasks also default to `silent` but are more visible because they run in their own session.

+Session-backed `video_generate` runs also use `silent` notify policy. They still create task records, but completion is handed back to the original agent session as an internal wake so the agent can write the follow-up message and attach the finished video itself.
+
 **What does not create tasks:**

 - Heartbeat turns — main-session; see [Heartbeat](/gateway/heartbeat)
--- a/docs/tools/video-generation.md
+++ b/docs/tools/video-generation.md
@ -9,14 +9,14 @@ title: "Video Generation"

 # Video Generation

-The `video_generate` tool lets the agent create videos using your configured providers. Generated videos are delivered automatically as media attachments in the agent's reply.
+The `video_generate` tool lets the agent create videos using your configured providers. In agent sessions, OpenClaw starts video generation as a background task, tracks it in the task ledger, then wakes the agent again when the clip is ready so the agent can post the finished video back into the original channel.

 <Note>
 The tool only appears when at least one video-generation provider is available. If you don't see `video_generate` in your agent's tools, configure `agents.defaults.videoGenerationModel` or set up a provider API key.
 </Note>

 <Note>
-OpenClaw now records `video_generate` runs in the task ledger when the agent has a session key, so long-running generations can be tracked with task/run ids even though the tool still waits for completion in the current turn.
+In agent sessions, `video_generate` returns immediately with a task id/run id. The actual provider job continues in the background. When it finishes, OpenClaw wakes the same session with an internal completion event so the agent can send a normal follow-up plus the generated video attachment.
 </Note>

 ## Quick start
@ -40,6 +40,8 @@ OpenClaw now records `video_generate` runs in the task ledger when the agent has

 The agent calls `video_generate` automatically. No tool allow-listing needed — it's enabled by default when a provider is available.

+For direct synchronous contexts without a session-backed agent run, the tool still falls back to inline generation and returns the final media path in the tool result.
+
 ## Supported providers

 | Provider | Default model                   | Reference inputs   | API key                                                    |
@ -81,6 +83,13 @@ Use `action: "list"` to inspect available providers and models at runtime:

 Not all providers support all parameters. The tool validates provider capability limits before it submits the request. When a provider or model only supports a discrete set of video lengths, OpenClaw rounds `durationSeconds` to the nearest supported value and reports the normalized duration in the tool result.

+## Async behavior
+
+- Session-backed agent runs: `video_generate` creates a background task, returns a started/task response immediately, and posts the finished video later in a follow-up agent message.
+- Task tracking: use `openclaw tasks list` / `openclaw tasks show <taskId>` to inspect queued, running, and terminal status for the generation.
+- Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
+- No-session fallback: direct/local contexts without a real agent session still run inline and return the final video result in the same turn.
+
 ## Configuration

 ### Model selection
@ -128,6 +137,7 @@ The bundled Qwen provider supports text-to-video plus image/video reference mode
 ## Related

 - [Tools Overview](/tools) — all available agent tools
+- [Background Tasks](/automation/tasks) — task tracking for detached `video_generate` runs
 - [Alibaba Model Studio](/providers/alibaba) — direct Wan provider setup
 - [Google (Gemini)](/providers/google) — Veo provider setup
 - [MiniMax](/providers/minimax) — Hailuo provider setup
--- a/src/agents/internal-events.ts
+++ b/src/agents/internal-events.ts
@ -8,7 +8,7 @@ export type AgentInternalEventType = "task_completion";

 export type AgentTaskCompletionInternalEvent = {
  type: "task_completion";
-  source: "subagent" | "cron";
+  source: "subagent" | "cron" | "video_generation";
  childSessionKey: string;
  childSessionId?: string;
  announceType: string;
--- a/src/agents/tools/video-generate-background.test.ts
+++ b/src/agents/tools/video-generate-background.test.ts
@ -0,0 +1,119 @@
+import { beforeEach, describe, expect, it, vi } from "vitest";
+import {
+  createVideoGenerationTaskRun,
+  recordVideoGenerationTaskProgress,
+  wakeVideoGenerationTaskCompletion,
+} from "./video-generate-background.js";
+
+const taskExecutorMocks = vi.hoisted(() => ({
+  createRunningTaskRun: vi.fn(),
+  recordTaskRunProgressByRunId: vi.fn(),
+  completeTaskRunByRunId: vi.fn(),
+  failTaskRunByRunId: vi.fn(),
+}));
+
+const announceDeliveryMocks = vi.hoisted(() => ({
+  deliverSubagentAnnouncement: vi.fn(),
+}));
+
+vi.mock("../../tasks/task-executor.js", () => taskExecutorMocks);
+vi.mock("../subagent-announce-delivery.js", () => announceDeliveryMocks);
+
+describe("video generate background helpers", () => {
+  beforeEach(() => {
+    taskExecutorMocks.createRunningTaskRun.mockReset();
+    taskExecutorMocks.recordTaskRunProgressByRunId.mockReset();
+    announceDeliveryMocks.deliverSubagentAnnouncement.mockReset();
+  });
+
+  it("creates a running task with queued progress text", () => {
+    taskExecutorMocks.createRunningTaskRun.mockReturnValue({
+      taskId: "task-123",
+    });
+
+    const handle = createVideoGenerationTaskRun({
+      sessionKey: "agent:main:discord:direct:123",
+      requesterOrigin: {
+        channel: "discord",
+        to: "channel:1",
+      },
+      prompt: "friendly lobster surfing",
+      providerId: "openai",
+    });
+
+    expect(handle).toMatchObject({
+      taskId: "task-123",
+      requesterSessionKey: "agent:main:discord:direct:123",
+      taskLabel: "friendly lobster surfing",
+    });
+    expect(taskExecutorMocks.createRunningTaskRun).toHaveBeenCalledWith(
+      expect.objectContaining({
+        sourceId: "video_generate:openai",
+        progressSummary: "Queued video generation",
+      }),
+    );
+  });
+
+  it("records task progress updates", () => {
+    recordVideoGenerationTaskProgress({
+      handle: {
+        taskId: "task-123",
+        runId: "tool:video_generate:abc",
+        requesterSessionKey: "agent:main:discord:direct:123",
+        taskLabel: "friendly lobster surfing",
+      },
+      progressSummary: "Saving generated video",
+    });
+
+    expect(taskExecutorMocks.recordTaskRunProgressByRunId).toHaveBeenCalledWith(
+      expect.objectContaining({
+        runId: "tool:video_generate:abc",
+        progressSummary: "Saving generated video",
+      }),
+    );
+  });
+
+  it("wakes the session with a video-generation completion event", async () => {
+    announceDeliveryMocks.deliverSubagentAnnouncement.mockResolvedValue({
+      delivered: true,
+      path: "direct",
+    });
+
+    await wakeVideoGenerationTaskCompletion({
+      handle: {
+        taskId: "task-123",
+        runId: "tool:video_generate:abc",
+        requesterSessionKey: "agent:main:discord:direct:123",
+        requesterOrigin: {
+          channel: "discord",
+          to: "channel:1",
+          threadId: "thread-1",
+        },
+        taskLabel: "friendly lobster surfing",
+      },
+      status: "ok",
+      statusLabel: "completed successfully",
+      result: "Generated 1 video.\nMEDIA:/tmp/generated-lobster.mp4",
+    });
+
+    expect(announceDeliveryMocks.deliverSubagentAnnouncement).toHaveBeenCalledWith(
+      expect.objectContaining({
+        requesterSessionKey: "agent:main:discord:direct:123",
+        requesterOrigin: expect.objectContaining({
+          channel: "discord",
+          to: "channel:1",
+        }),
+        expectsCompletionMessage: true,
+        internalEvents: [
+          expect.objectContaining({
+            source: "video_generation",
+            announceType: "video generation task",
+            status: "ok",
+            result: expect.stringContaining("MEDIA:/tmp/generated-lobster.mp4"),
+            replyInstruction: expect.stringContaining("include those exact MEDIA: lines"),
+          }),
+        ],
+      }),
+    );
+  });
+});
--- a/src/agents/tools/video-generate-background.ts
+++ b/src/agents/tools/video-generate-background.ts
@ -0,0 +1,204 @@
+import crypto from "node:crypto";
+import { createSubsystemLogger } from "../../logging/subsystem.js";
+import {
+  completeTaskRunByRunId,
+  createRunningTaskRun,
+  failTaskRunByRunId,
+  recordTaskRunProgressByRunId,
+} from "../../tasks/task-executor.js";
+import type { DeliveryContext } from "../../utils/delivery-context.js";
+import { INTERNAL_MESSAGE_CHANNEL } from "../../utils/message-channel.js";
+import { formatAgentInternalEventsForPrompt, type AgentInternalEvent } from "../internal-events.js";
+import { deliverSubagentAnnouncement } from "../subagent-announce-delivery.js";
+
+const log = createSubsystemLogger("agents/tools/video-generate-background");
+
+export type VideoGenerationTaskHandle = {
+  taskId: string;
+  runId: string;
+  requesterSessionKey: string;
+  requesterOrigin?: DeliveryContext;
+  taskLabel: string;
+};
+
+export function createVideoGenerationTaskRun(params: {
+  sessionKey?: string;
+  requesterOrigin?: DeliveryContext;
+  prompt: string;
+  providerId?: string;
+}): VideoGenerationTaskHandle | null {
+  const sessionKey = params.sessionKey?.trim();
+  if (!sessionKey) {
+    return null;
+  }
+  const runId = `tool:video_generate:${crypto.randomUUID()}`;
+  try {
+    const task = createRunningTaskRun({
+      runtime: "cli",
+      sourceId: params.providerId ? `video_generate:${params.providerId}` : "video_generate",
+      requesterSessionKey: sessionKey,
+      ownerKey: sessionKey,
+      scopeKind: "session",
+      requesterOrigin: params.requesterOrigin,
+      childSessionKey: sessionKey,
+      runId,
+      label: "Video generation",
+      task: params.prompt,
+      deliveryStatus: "not_applicable",
+      notifyPolicy: "silent",
+      startedAt: Date.now(),
+      lastEventAt: Date.now(),
+      progressSummary: "Queued video generation",
+    });
+    return {
+      taskId: task.taskId,
+      runId,
+      requesterSessionKey: sessionKey,
+      requesterOrigin: params.requesterOrigin,
+      taskLabel: params.prompt,
+    };
+  } catch (error) {
+    log.warn("Failed to create video generation task ledger record", {
+      sessionKey,
+      providerId: params.providerId,
+      error,
+    });
+    return null;
+  }
+}
+
+export function recordVideoGenerationTaskProgress(params: {
+  handle: VideoGenerationTaskHandle | null;
+  progressSummary: string;
+  eventSummary?: string;
+}) {
+  if (!params.handle) {
+    return;
+  }
+  recordTaskRunProgressByRunId({
+    runId: params.handle.runId,
+    runtime: "cli",
+    sessionKey: params.handle.requesterSessionKey,
+    lastEventAt: Date.now(),
+    progressSummary: params.progressSummary,
+    eventSummary: params.eventSummary,
+  });
+}
+
+export function completeVideoGenerationTaskRun(params: {
+  handle: VideoGenerationTaskHandle | null;
+  provider: string;
+  model: string;
+  count: number;
+  paths: string[];
+}) {
+  if (!params.handle) {
+    return;
+  }
+  const endedAt = Date.now();
+  const target = params.count === 1 ? params.paths[0] : `${params.count} files`;
+  completeTaskRunByRunId({
+    runId: params.handle.runId,
+    runtime: "cli",
+    sessionKey: params.handle.requesterSessionKey,
+    endedAt,
+    lastEventAt: endedAt,
+    progressSummary: `Generated ${params.count} video${params.count === 1 ? "" : "s"}`,
+    terminalSummary: `Generated ${params.count} video${params.count === 1 ? "" : "s"} with ${params.provider}/${params.model}${target ? ` -> ${target}` : ""}.`,
+  });
+}
+
+export function failVideoGenerationTaskRun(params: {
+  handle: VideoGenerationTaskHandle | null;
+  error: unknown;
+}) {
+  if (!params.handle) {
+    return;
+  }
+  const endedAt = Date.now();
+  const errorText = params.error instanceof Error ? params.error.message : String(params.error);
+  failTaskRunByRunId({
+    runId: params.handle.runId,
+    runtime: "cli",
+    sessionKey: params.handle.requesterSessionKey,
+    endedAt,
+    lastEventAt: endedAt,
+    error: errorText,
+    progressSummary: "Video generation failed",
+    terminalSummary: errorText,
+  });
+}
+
+function buildVideoGenerationReplyInstruction(status: "ok" | "error"): string {
+  if (status === "ok") {
+    return [
+      "A completed video generation task is ready for user delivery.",
+      "Reply in your normal assistant voice and post the finished video to the original message channel now.",
+      "If the result includes MEDIA: lines, include those exact MEDIA: lines in your reply so OpenClaw attaches the video.",
+      "Keep internal task/session details private and do not copy the internal event text verbatim.",
+    ].join(" ");
+  }
+  return [
+    "A video generation task failed.",
+    "Reply in your normal assistant voice with the failure summary now.",
+    "Keep internal task/session details private and do not copy the internal event text verbatim.",
+  ].join(" ");
+}
+
+export async function wakeVideoGenerationTaskCompletion(params: {
+  handle: VideoGenerationTaskHandle | null;
+  status: "ok" | "error";
+  statusLabel: string;
+  result: string;
+  statsLine?: string;
+}) {
+  if (!params.handle) {
+    return;
+  }
+  const internalEvents: AgentInternalEvent[] = [
+    {
+      type: "task_completion",
+      source: "video_generation",
+      childSessionKey: `video_generate:${params.handle.taskId}`,
+      childSessionId: params.handle.taskId,
+      announceType: "video generation task",
+      taskLabel: params.handle.taskLabel,
+      status: params.status,
+      statusLabel: params.statusLabel,
+      result: params.result,
+      ...(params.statsLine?.trim() ? { statsLine: params.statsLine } : {}),
+      replyInstruction: buildVideoGenerationReplyInstruction(params.status),
+    },
+  ];
+  const triggerMessage =
+    formatAgentInternalEventsForPrompt(internalEvents) ||
+    "A video generation task finished. Process the completion update now.";
+  const announceId = `video-generate:${params.handle.taskId}:${params.status}`;
+  const delivery = await deliverSubagentAnnouncement({
+    requesterSessionKey: params.handle.requesterSessionKey,
+    targetRequesterSessionKey: params.handle.requesterSessionKey,
+    announceId,
+    triggerMessage,
+    steerMessage: triggerMessage,
+    internalEvents,
+    summaryLine: params.handle.taskLabel,
+    requesterSessionOrigin: params.handle.requesterOrigin,
+    requesterOrigin: params.handle.requesterOrigin,
+    completionDirectOrigin: params.handle.requesterOrigin,
+    directOrigin: params.handle.requesterOrigin,
+    sourceSessionKey: `video_generate:${params.handle.taskId}`,
+    sourceChannel: INTERNAL_MESSAGE_CHANNEL,
+    sourceTool: "video_generate",
+    requesterIsSubagent: false,
+    expectsCompletionMessage: true,
+    bestEffortDeliver: true,
+    directIdempotencyKey: announceId,
+  });
+  if (!delivery.delivered && delivery.error) {
+    log.warn("Video generation completion wake failed", {
+      taskId: params.handle.taskId,
+      runId: params.handle.runId,
+      error: delivery.error,
+    });
+  }
+}
--- a/src/agents/tools/video-generate-tool.test.ts
+++ b/src/agents/tools/video-generate-tool.test.ts
@ -2,12 +2,14 @@ import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
 import type { OpenClawConfig } from "../../config/config.js";
 import * as mediaStore from "../../media/store.js";
 import * as videoGenerationRuntime from "../../video-generation/runtime.js";
+import * as videoGenerateBackground from "./video-generate-background.js";
 import { createVideoGenerateTool } from "./video-generate-tool.js";

 const taskExecutorMocks = vi.hoisted(() => ({
  createRunningTaskRun: vi.fn(),
  completeTaskRunByRunId: vi.fn(),
  failTaskRunByRunId: vi.fn(),
+  recordTaskRunProgressByRunId: vi.fn(),
 }));

 vi.mock("../../tasks/task-executor.js", () => taskExecutorMocks);
@ -23,6 +25,7 @@ describe("createVideoGenerateTool", () => {
    taskExecutorMocks.createRunningTaskRun.mockReset();
    taskExecutorMocks.completeTaskRunByRunId.mockReset();
    taskExecutorMocks.failTaskRunByRunId.mockReset();
+    taskExecutorMocks.recordTaskRunProgressByRunId.mockReset();
  });

  afterEach(() => {
@ -49,7 +52,7 @@ describe("createVideoGenerateTool", () => {
    ).not.toBeNull();
  });

-  it("generates videos, saves them, and emits MEDIA paths", async () => {
+  it("generates videos, saves them, and emits MEDIA paths without a session-backed detach", async () => {
    taskExecutorMocks.createRunningTaskRun.mockReturnValue({
      taskId: "task-123",
      runtime: "cli",
@ -91,11 +94,6 @@ describe("createVideoGenerateTool", () => {
          },
        },
      }),
-      agentSessionKey: "agent:main:discord:direct:123",
-      requesterOrigin: {
-        channel: "discord",
-        to: "channel:1",
-      },
    });
    expect(tool).not.toBeNull();
    if (!tool) {
@ -111,22 +109,91 @@ describe("createVideoGenerateTool", () => {
      provider: "qwen",
      model: "wan2.6-t2v",
      count: 1,
-      task: {
-        taskId: "task-123",
-      },
      media: {
        mediaUrls: ["/tmp/generated-lobster.mp4"],
      },
      paths: ["/tmp/generated-lobster.mp4"],
      metadata: { taskId: "task-1" },
    });
-    expect(taskExecutorMocks.createRunningTaskRun).toHaveBeenCalledWith(
+    expect(taskExecutorMocks.createRunningTaskRun).not.toHaveBeenCalled();
+    expect(taskExecutorMocks.completeTaskRunByRunId).not.toHaveBeenCalled();
+  });
+
+  it("starts background generation and wakes the session with MEDIA lines", async () => {
+    taskExecutorMocks.createRunningTaskRun.mockReturnValue({
+      taskId: "task-123",
+      runtime: "cli",
+      requesterSessionKey: "agent:main:discord:direct:123",
+      ownerKey: "agent:main:discord:direct:123",
+      scopeKind: "session",
+      task: "friendly lobster surfing",
+      status: "running",
+      deliveryStatus: "not_applicable",
+      notifyPolicy: "silent",
+      createdAt: Date.now(),
+    });
+    const wakeSpy = vi
+      .spyOn(videoGenerateBackground, "wakeVideoGenerationTaskCompletion")
+      .mockResolvedValue(undefined);
+    vi.spyOn(videoGenerationRuntime, "generateVideo").mockResolvedValue({
+      provider: "qwen",
+      model: "wan2.6-t2v",
+      attempts: [],
+      videos: [
+        {
+          buffer: Buffer.from("video-bytes"),
+          mimeType: "video/mp4",
+          fileName: "lobster.mp4",
+        },
+      ],
+      metadata: { taskId: "task-1" },
+    });
+    vi.spyOn(mediaStore, "saveMediaBuffer").mockResolvedValueOnce({
+      path: "/tmp/generated-lobster.mp4",
+      id: "generated-lobster.mp4",
+      size: 11,
+      contentType: "video/mp4",
+    });
+
+    let scheduledWork: (() => Promise<void>) | undefined;
+    const tool = createVideoGenerateTool({
+      config: asConfig({
+        agents: {
+          defaults: {
+            videoGenerationModel: { primary: "qwen/wan2.6-t2v" },
+          },
+        },
+      }),
+      agentSessionKey: "agent:main:discord:direct:123",
+      requesterOrigin: {
+        channel: "discord",
+        to: "channel:1",
+      },
+      scheduleBackgroundWork: (work) => {
+        scheduledWork = work;
+      },
+    });
+    if (!tool) {
+      throw new Error("expected video_generate tool");
+    }
+
+    const result = await tool.execute("call-1", { prompt: "friendly lobster surfing" });
+    const text = (result.content?.[0] as { text: string } | undefined)?.text ?? "";
+
+    expect(text).toContain("Started video generation task task-123 in the background.");
+    expect(result.details).toMatchObject({
+      async: true,
+      status: "started",
+      task: {
+        taskId: "task-123",
+      },
+    });
+    expect(typeof scheduledWork).toBe("function");
+    await scheduledWork?.();
+    expect(taskExecutorMocks.recordTaskRunProgressByRunId).toHaveBeenCalledWith(
      expect.objectContaining({
-        runtime: "cli",
-        requesterSessionKey: "agent:main:discord:direct:123",
-        ownerKey: "agent:main:discord:direct:123",
-        label: "Video generation",
-        task: "friendly lobster surfing",
+        runId: expect.stringMatching(/^tool:video_generate:/),
+        progressSummary: "Generating video",
      }),
    );
    expect(taskExecutorMocks.completeTaskRunByRunId).toHaveBeenCalledWith(
@ -134,22 +201,18 @@ describe("createVideoGenerateTool", () => {
        runId: expect.stringMatching(/^tool:video_generate:/),
      }),
    );
+    expect(wakeSpy).toHaveBeenCalledWith(
+      expect.objectContaining({
+        handle: expect.objectContaining({
+          taskId: "task-123",
+        }),
+        status: "ok",
+        result: expect.stringContaining("MEDIA:/tmp/generated-lobster.mp4"),
+      }),
+    );
  });

-  it("marks the task failed when provider generation throws", async () => {
-    taskExecutorMocks.createRunningTaskRun.mockReturnValue({
-      taskId: "task-fail",
-      runtime: "cli",
-      requesterSessionKey: "agent:main:discord:direct:123",
-      ownerKey: "agent:main:discord:direct:123",
-      scopeKind: "session",
-      task: "broken lobster",
-      status: "running",
-      deliveryStatus: "not_applicable",
-      notifyPolicy: "silent",
-      createdAt: Date.now(),
-    });
-    taskExecutorMocks.failTaskRunByRunId.mockReturnValue(undefined);
+  it("surfaces provider generation failures inline when there is no detached session", async () => {
    vi.spyOn(videoGenerationRuntime, "generateVideo").mockRejectedValue(new Error("queue boom"));

    const tool = createVideoGenerateTool({
@ -160,7 +223,6 @@ describe("createVideoGenerateTool", () => {
          },
        },
      }),
-      agentSessionKey: "agent:main:discord:direct:123",
    });
    expect(tool).not.toBeNull();
    if (!tool) {
@ -170,12 +232,7 @@ describe("createVideoGenerateTool", () => {
    await expect(tool.execute("call-2", { prompt: "broken lobster" })).rejects.toThrow(
      "queue boom",
    );
-    expect(taskExecutorMocks.failTaskRunByRunId).toHaveBeenCalledWith(
-      expect.objectContaining({
-        runId: expect.stringMatching(/^tool:video_generate:/),
-        error: "queue boom",
-      }),
-    );
+    expect(taskExecutorMocks.failTaskRunByRunId).not.toHaveBeenCalled();
  });

  it("shows duration normalization details from runtime metadata", async () => {
--- a/src/agents/tools/video-generate-tool.ts
+++ b/src/agents/tools/video-generate-tool.ts
@ -1,4 +1,3 @@
-import crypto from "node:crypto";
 import { Type } from "@sinclair/typebox";
 import type { OpenClawConfig } from "../../config/config.js";
 import { loadConfig } from "../../config/config.js";
@ -7,11 +6,6 @@ import { saveMediaBuffer } from "../../media/store.js";
 import { loadWebMedia } from "../../media/web-media.js";
 import { readSnakeCaseParamRaw } from "../../param-key.js";
 import { getProviderEnvVars } from "../../secrets/provider-env-vars.js";
-import {
-  completeTaskRunByRunId,
-  createRunningTaskRun,
-  failTaskRunByRunId,
-} from "../../tasks/task-executor.js";
 import { resolveUserPath } from "../../utils.js";
 import type { DeliveryContext } from "../../utils/delivery-context.js";
 import { resolveVideoGenerationSupportedDurations } from "../../video-generation/duration-support.js";
@ -52,6 +46,14 @@ import {
  type SandboxFsBridge,
  type ToolFsPolicy,
 } from "./tool-runtime.helpers.js";
+import {
+  completeVideoGenerationTaskRun,
+  createVideoGenerationTaskRun,
+  failVideoGenerationTaskRun,
+  recordVideoGenerationTaskProgress,
+  type VideoGenerationTaskHandle,
+  wakeVideoGenerationTaskCompletion,
+} from "./video-generate-background.js";

 const log = createSubsystemLogger("agents/tools/video-generate");
 const MAX_INPUT_IMAGES = 5;
@ -407,91 +409,15 @@ type VideoGenerateSandboxConfig = {
  bridge: SandboxFsBridge;
 };

-type VideoGenerationTaskHandle = {
-  taskId: string;
-  runId: string;
-};
+type VideoGenerateBackgroundScheduler = (work: () => Promise<void>) => void;

-function createVideoGenerationTaskRun(params: {
-  sessionKey?: string;
-  requesterOrigin?: DeliveryContext;
-  prompt: string;
-  providerId?: string;
-}): VideoGenerationTaskHandle | null {
-  const sessionKey = params.sessionKey?.trim();
-  if (!sessionKey) {
-    return null;
-  }
-  const runId = `tool:video_generate:${crypto.randomUUID()}`;
-  try {
-    const task = createRunningTaskRun({
-      runtime: "cli",
-      sourceId: params.providerId ? `video_generate:${params.providerId}` : "video_generate",
-      requesterSessionKey: sessionKey,
-      ownerKey: sessionKey,
-      scopeKind: "session",
-      requesterOrigin: params.requesterOrigin,
-      childSessionKey: sessionKey,
-      runId,
-      label: "Video generation",
-      task: params.prompt,
-      deliveryStatus: "not_applicable",
-      notifyPolicy: "silent",
-      startedAt: Date.now(),
-      lastEventAt: Date.now(),
-      progressSummary: "Generating video",
+function defaultScheduleVideoGenerateBackgroundWork(work: () => Promise<void>) {
+  queueMicrotask(() => {
+    void work().catch((error) => {
+      log.error("Detached video generation job crashed", {
+        error,
+      });
    });
-    return {
-      taskId: task.taskId,
-      runId,
-    };
-  } catch (error) {
-    log.warn("Failed to create video generation task ledger record", {
-      sessionKey,
-      providerId: params.providerId,
-      error,
-    });
-    return null;
-  }
-}
-
-function completeVideoGenerationTaskRun(params: {
-  handle: VideoGenerationTaskHandle | null;
-  provider: string;
-  model: string;
-  count: number;
-  paths: string[];
-}) {
-  if (!params.handle) {
-    return;
-  }
-  const endedAt = Date.now();
-  const target = params.count === 1 ? params.paths[0] : `${params.count} files`;
-  completeTaskRunByRunId({
-    runId: params.handle.runId,
-    runtime: "cli",
-    endedAt,
-    lastEventAt: endedAt,
-    terminalSummary: `Generated ${params.count} video${params.count === 1 ? "" : "s"} with ${params.provider}/${params.model}${target ? ` -> ${target}` : ""}.`,
-  });
-}
-
-function failVideoGenerationTaskRun(params: {
-  handle: VideoGenerationTaskHandle | null;
-  error: unknown;
-}) {
-  if (!params.handle) {
-    return;
-  }
-  const endedAt = Date.now();
-  const errorText = params.error instanceof Error ? params.error.message : String(params.error);
-  failTaskRunByRunId({
-    runId: params.handle.runId,
-    runtime: "cli",
-    endedAt,
-    lastEventAt: endedAt,
-    error: errorText,
-    terminalSummary: errorText,
  });
 }

@ -610,6 +536,170 @@ async function loadReferenceAssets(params: {
  return loaded;
 }

+type LoadedReferenceAsset = Awaited<ReturnType<typeof loadReferenceAssets>>[number];
+
+type ExecutedVideoGeneration = {
+  provider: string;
+  model: string;
+  savedPaths: string[];
+  contentText: string;
+  details: Record<string, unknown>;
+  wakeResult: string;
+};
+
+async function executeVideoGenerationJob(params: {
+  effectiveCfg: OpenClawConfig;
+  prompt: string;
+  agentDir?: string;
+  model?: string;
+  size?: string;
+  aspectRatio?: string;
+  resolution?: VideoGenerationResolution;
+  durationSeconds?: number;
+  audio?: boolean;
+  watermark?: boolean;
+  filename?: string;
+  loadedReferenceImages: LoadedReferenceAsset[];
+  loadedReferenceVideos: LoadedReferenceAsset[];
+  taskHandle?: VideoGenerationTaskHandle | null;
+}): Promise<ExecutedVideoGeneration> {
+  if (params.taskHandle) {
+    recordVideoGenerationTaskProgress({
+      handle: params.taskHandle,
+      progressSummary: "Generating video",
+    });
+  }
+  const result = await generateVideo({
+    cfg: params.effectiveCfg,
+    prompt: params.prompt,
+    agentDir: params.agentDir,
+    modelOverride: params.model,
+    size: params.size,
+    aspectRatio: params.aspectRatio,
+    resolution: params.resolution,
+    durationSeconds: params.durationSeconds,
+    audio: params.audio,
+    watermark: params.watermark,
+    inputImages: params.loadedReferenceImages.map((entry) => entry.sourceAsset),
+    inputVideos: params.loadedReferenceVideos.map((entry) => entry.sourceAsset),
+  });
+  if (params.taskHandle) {
+    recordVideoGenerationTaskProgress({
+      handle: params.taskHandle,
+      progressSummary: "Saving generated video",
+    });
+  }
+  const savedVideos = await Promise.all(
+    result.videos.map((video) =>
+      saveMediaBuffer(
+        video.buffer,
+        video.mimeType,
+        "tool-video-generation",
+        undefined,
+        params.filename || video.fileName,
+      ),
+    ),
+  );
+  const requestedDurationSeconds =
+    typeof result.metadata?.requestedDurationSeconds === "number" &&
+    Number.isFinite(result.metadata.requestedDurationSeconds)
+      ? result.metadata.requestedDurationSeconds
+      : params.durationSeconds;
+  const normalizedDurationSeconds =
+    typeof result.metadata?.normalizedDurationSeconds === "number" &&
+    Number.isFinite(result.metadata.normalizedDurationSeconds)
+      ? result.metadata.normalizedDurationSeconds
+      : requestedDurationSeconds;
+  const supportedDurationSeconds = Array.isArray(result.metadata?.supportedDurationSeconds)
+    ? result.metadata.supportedDurationSeconds.filter(
+        (entry): entry is number => typeof entry === "number" && Number.isFinite(entry),
+      )
+    : undefined;
+  const lines = [
+    `Generated ${savedVideos.length} video${savedVideos.length === 1 ? "" : "s"} with ${result.provider}/${result.model}.`,
+    typeof requestedDurationSeconds === "number" &&
+    typeof normalizedDurationSeconds === "number" &&
+    requestedDurationSeconds !== normalizedDurationSeconds
+      ? `Duration normalized: requested ${requestedDurationSeconds}s; used ${normalizedDurationSeconds}s.`
+      : null,
+    ...savedVideos.map((video) => `MEDIA:${video.path}`),
+  ].filter((entry): entry is string => Boolean(entry));
+
+  return {
+    provider: result.provider,
+    model: result.model,
+    savedPaths: savedVideos.map((video) => video.path),
+    contentText: lines.join("\n"),
+    wakeResult: lines.join("\n"),
+    details: {
+      provider: result.provider,
+      model: result.model,
+      count: savedVideos.length,
+      media: {
+        mediaUrls: savedVideos.map((video) => video.path),
+      },
+      paths: savedVideos.map((video) => video.path),
+      ...(params.taskHandle
+        ? {
+            task: {
+              taskId: params.taskHandle.taskId,
+              runId: params.taskHandle.runId,
+            },
+          }
+        : {}),
+      ...(params.loadedReferenceImages.length === 1
+        ? {
+            image: params.loadedReferenceImages[0]?.resolvedInput,
+            ...(params.loadedReferenceImages[0]?.rewrittenFrom
+              ? { rewrittenFrom: params.loadedReferenceImages[0].rewrittenFrom }
+              : {}),
+          }
+        : params.loadedReferenceImages.length > 1
+          ? {
+              images: params.loadedReferenceImages.map((entry) => ({
+                image: entry.resolvedInput,
+                ...(entry.rewrittenFrom ? { rewrittenFrom: entry.rewrittenFrom } : {}),
+              })),
+            }
+          : {}),
+      ...(params.loadedReferenceVideos.length === 1
+        ? {
+            video: params.loadedReferenceVideos[0]?.resolvedInput,
+            ...(params.loadedReferenceVideos[0]?.rewrittenFrom
+              ? { videoRewrittenFrom: params.loadedReferenceVideos[0].rewrittenFrom }
+              : {}),
+          }
+        : params.loadedReferenceVideos.length > 1
+          ? {
+              videos: params.loadedReferenceVideos.map((entry) => ({
+                video: entry.resolvedInput,
+                ...(entry.rewrittenFrom ? { rewrittenFrom: entry.rewrittenFrom } : {}),
+              })),
+            }
+          : {}),
+      ...(params.size ? { size: params.size } : {}),
+      ...(params.aspectRatio ? { aspectRatio: params.aspectRatio } : {}),
+      ...(params.resolution ? { resolution: params.resolution } : {}),
+      ...(typeof normalizedDurationSeconds === "number"
+        ? { durationSeconds: normalizedDurationSeconds }
+        : {}),
+      ...(typeof requestedDurationSeconds === "number" &&
+      typeof normalizedDurationSeconds === "number" &&
+      requestedDurationSeconds !== normalizedDurationSeconds
+        ? { requestedDurationSeconds }
+        : {}),
+      ...(supportedDurationSeconds && supportedDurationSeconds.length > 0
+        ? { supportedDurationSeconds }
+        : {}),
+      ...(typeof params.audio === "boolean" ? { audio: params.audio } : {}),
+      ...(typeof params.watermark === "boolean" ? { watermark: params.watermark } : {}),
+      ...(params.filename ? { filename: params.filename } : {}),
+      attempts: result.attempts,
+      metadata: result.metadata,
+    },
+  };
+}
+
 export function createVideoGenerateTool(options?: {
  config?: OpenClawConfig;
  agentDir?: string;
@ -618,6 +708,7 @@ export function createVideoGenerateTool(options?: {
  workspaceDir?: string;
  sandbox?: VideoGenerateSandboxConfig;
  fsPolicy?: ToolFsPolicy;
+  scheduleBackgroundWork?: VideoGenerateBackgroundScheduler;
 }): AnyAgentTool | null {
  const cfg: OpenClawConfig = options?.config ?? loadConfig();
  const videoGenerationModelConfig = resolveVideoGenerationModelConfigForTool({
@ -635,6 +726,8 @@ export function createVideoGenerateTool(options?: {
        workspaceOnly: options.fsPolicy?.workspaceOnly === true,
      }
    : null;
+  const scheduleBackgroundWork =
+    options?.scheduleBackgroundWork ?? defaultScheduleVideoGenerateBackgroundWork;

  return {
    label: "Video Generation",
@ -773,77 +866,75 @@ export function createVideoGenerateTool(options?: {
        prompt,
        providerId: selectedProvider?.id,
      });
+      const shouldDetach = Boolean(taskHandle && options?.agentSessionKey?.trim());

-      try {
-        const result = await generateVideo({
-          cfg: effectiveCfg,
-          prompt,
-          agentDir: options?.agentDir,
-          modelOverride: model,
-          size,
-          aspectRatio,
-          resolution,
-          durationSeconds,
-          audio,
-          watermark,
-          inputImages: loadedReferenceImages.map((entry) => entry.sourceAsset),
-          inputVideos: loadedReferenceVideos.map((entry) => entry.sourceAsset),
+      if (shouldDetach) {
+        scheduleBackgroundWork(async () => {
+          try {
+            const executed = await executeVideoGenerationJob({
+              effectiveCfg,
+              prompt,
+              agentDir: options?.agentDir,
+              model,
+              size,
+              aspectRatio,
+              resolution,
+              durationSeconds,
+              audio,
+              watermark,
+              filename,
+              loadedReferenceImages,
+              loadedReferenceVideos,
+              taskHandle,
+            });
+            completeVideoGenerationTaskRun({
+              handle: taskHandle,
+              provider: executed.provider,
+              model: executed.model,
+              count: executed.savedPaths.length,
+              paths: executed.savedPaths,
+            });
+            try {
+              await wakeVideoGenerationTaskCompletion({
+                handle: taskHandle,
+                status: "ok",
+                statusLabel: "completed successfully",
+                result: executed.wakeResult,
+              });
+            } catch (error) {
+              log.warn("Video generation completion wake failed after successful generation", {
+                taskId: taskHandle?.taskId,
+                runId: taskHandle?.runId,
+                error,
+              });
+            }
+          } catch (error) {
+            failVideoGenerationTaskRun({
+              handle: taskHandle,
+              error,
+            });
+            await wakeVideoGenerationTaskCompletion({
+              handle: taskHandle,
+              status: "error",
+              statusLabel: "failed",
+              result: error instanceof Error ? error.message : String(error),
+            });
+            return;
+          }
        });
-        const savedVideos = await Promise.all(
-          result.videos.map((video) =>
-            saveMediaBuffer(
-              video.buffer,
-              video.mimeType,
-              "tool-video-generation",
-              undefined,
-              filename || video.fileName,
-            ),
-          ),
-        );
-        completeVideoGenerationTaskRun({
-          handle: taskHandle,
-          provider: result.provider,
-          model: result.model,
-          count: savedVideos.length,
-          paths: savedVideos.map((video) => video.path),
-        });
-        const requestedDurationSeconds =
-          typeof result.metadata?.requestedDurationSeconds === "number" &&
-          Number.isFinite(result.metadata.requestedDurationSeconds)
-            ? result.metadata.requestedDurationSeconds
-            : durationSeconds;
-        const normalizedDurationSeconds =
-          typeof result.metadata?.normalizedDurationSeconds === "number" &&
-          Number.isFinite(result.metadata.normalizedDurationSeconds)
-            ? result.metadata.normalizedDurationSeconds
-            : requestedDurationSeconds;
-        const supportedDurationSeconds = Array.isArray(result.metadata?.supportedDurationSeconds)
-          ? result.metadata.supportedDurationSeconds.filter(
-              (entry): entry is number => typeof entry === "number" && Number.isFinite(entry),
-            )
-          : undefined;
-        const lines = [
-          `Generated ${savedVideos.length} video${savedVideos.length === 1 ? "" : "s"} with ${result.provider}/${result.model}.`,
-          typeof requestedDurationSeconds === "number" &&
-          typeof normalizedDurationSeconds === "number" &&
-          requestedDurationSeconds !== normalizedDurationSeconds
-            ? `Duration normalized: requested ${requestedDurationSeconds}s; used ${normalizedDurationSeconds}s.`
-            : null,
-          ...savedVideos.map((video) => `MEDIA:${video.path}`),
-        ].filter((entry): entry is string => Boolean(entry));

        return {
-          content: [{ type: "text", text: lines.join("\n") }],
-          details: {
-            provider: result.provider,
-            model: result.model,
-            count: savedVideos.length,
-            media: {
-              mediaUrls: savedVideos.map((video) => video.path),
+          content: [
+            {
+              type: "text",
+              text: `Started video generation task ${taskHandle?.taskId ?? "unknown"} in the background. I'll post the finished video here when it's ready.`,
            },
-            paths: savedVideos.map((video) => video.path),
+          ],
+          details: {
+            async: true,
+            status: "started",
            ...(taskHandle
-                ? {
+              ? {
                  task: {
                    taskId: taskHandle.taskId,
                    runId: taskHandle.runId,
@ -880,27 +971,47 @@ export function createVideoGenerateTool(options?: {
                    })),
                  }
                : {}),
+            ...(model ? { model } : {}),
            ...(size ? { size } : {}),
            ...(aspectRatio ? { aspectRatio } : {}),
            ...(resolution ? { resolution } : {}),
-            ...(typeof normalizedDurationSeconds === "number"
-              ? { durationSeconds: normalizedDurationSeconds }
-              : {}),
-            ...(typeof requestedDurationSeconds === "number" &&
-            typeof normalizedDurationSeconds === "number" &&
-            requestedDurationSeconds !== normalizedDurationSeconds
-              ? { requestedDurationSeconds }
-              : {}),
-            ...(supportedDurationSeconds && supportedDurationSeconds.length > 0
-              ? { supportedDurationSeconds }
-              : {}),
+            ...(typeof durationSeconds === "number" ? { durationSeconds } : {}),
            ...(typeof audio === "boolean" ? { audio } : {}),
            ...(typeof watermark === "boolean" ? { watermark } : {}),
            ...(filename ? { filename } : {}),
-            attempts: result.attempts,
-            metadata: result.metadata,
          },
        };
+      }
+
+      try {
+        const executed = await executeVideoGenerationJob({
+          effectiveCfg,
+          prompt,
+          agentDir: options?.agentDir,
+          model,
+          size,
+          aspectRatio,
+          resolution,
+          durationSeconds,
+          audio,
+          watermark,
+          filename,
+          loadedReferenceImages,
+          loadedReferenceVideos,
+          taskHandle,
+        });
+        completeVideoGenerationTaskRun({
+          handle: taskHandle,
+          provider: executed.provider,
+          model: executed.model,
+          count: executed.savedPaths.length,
+          paths: executed.savedPaths,
+        });
+
+        return {
+          content: [{ type: "text", text: executed.contentText }],
+          details: executed.details,
+        };
      } catch (error) {
        failVideoGenerationTaskRun({
          handle: taskHandle,
--- a/src/gateway/protocol/schema/agent.ts
+++ b/src/gateway/protocol/schema/agent.ts
@ -4,7 +4,7 @@ import { InputProvenanceSchema, NonEmptyString, SessionLabelString } from "./pri
 export const AgentInternalEventSchema = Type.Object(
  {
    type: Type.Literal("task_completion"),
-    source: Type.String({ enum: ["subagent", "cron"] }),
+    source: Type.String({ enum: ["subagent", "cron", "video_generation"] }),
    childSessionKey: Type.String(),
    childSessionId: Type.Optional(Type.String()),
    announceType: Type.String(),