mirror of https://github.com/openclaw/openclaw.git
fix: prevent gateway attachment offload regressions (#55513) (thanks @Syysean)
* feat(gateway): implement claim check pattern to prevent OOM on large attachments * fix: sanitize mediaId, refine trimEnd, remove warn log, add threshold and absolute path * fix: enforce maxBytes before decoding and use dynamic path from saveMediaBuffer * fix: enforce absolute maxBytes limit before Buffer allocation and preserve file extensions * fix: align saveMediaBuffer arguments and satisfy oxfmt linter * chore: strictly enforce linting rules (curly braces, unused vars, and error typing) * fix: restrict offload to mainstream mimes to avoid extension-loss bug in store.ts for BMP/TIFF * fix: restrict offload to mainstream mimes to bypass store.ts extension-loss bug * chore: document bmp/tiff exclusion from offload whitelist in MIME_TO_EXT * feat: implement agent-side resolver for opaque media URIs and finalize contract * fix: support unicode media URIs and allow consecutive dots in safe IDs based on Codex review * fix(gateway): enforce strict fail-fast for oversized media to prevent OOM bypass * refactor(gateway): harden media offload with performance and security optimizations This update refines the Claim Check pattern with industrial-grade guards: - Performance: Implemented sampled Base64 validation for large payloads (>4KB) to prevent event loop blocking. - Security: Added null-byte (\u0000) detection and reinforced path traversal guards. - I18n: Updated media-uri regex to a blacklist-based character class for Unicode/Chinese filename support, with oxlint bypass for intentional control regex. - Robustness: Enhanced error diagnostics with JSON-serialized IDs. * fix: add HEIC/HEIF to offload allowlist and pass maxBytes to saveMediaBuffer * fix(gateway): clean up offloaded media files on attachment parse failure Address Codex review feedback: track saved media IDs and implement best-effort cleanup via deleteMediaBuffer if subsequent attachments fail validation, preventing orphaned files on disk. * fix(gateway): enforce full base64 validation to prevent whitespace padding bypass Address Codex review feedback: remove early return in isValidBase64 so padded payloads cannot bypass offload thresholds and reintroduce memory pressure. Updated related comments. * fix(gateway): preserve offloaded media metadata and fix validation error mapping Address Codex review feedback: - Add \offloadedRefs\ to \ParsedMessageWithImages\ to expose structured metadata for offloaded attachments, preventing transcript media loss. - Move \erifyDecodedSize\ outside the storage try-catch block to correctly surface client base64 validation failures as 4xx errors instead of 5xx \MediaOffloadError\. - Add JSDoc TODOs indicating that upstream callers (chat.ts, agent.ts, server-node-events.ts) must explicitly pass the \supportsImages\ flag. * fix(agents): explicitly allow media store dir when loading offloaded images Address Codex review feedback: Pass getMediaDir() to loadWebMedia's localRoots for media-uri refs to prevent legacy path resolution mismatches from silently dropping large attachments. * fix(gateway): resolve attachment offload regressions and error mapping Address Codex review feedback: - Pass \supportsImages\ dynamically in \chat.ts\ and \gent.ts\ based on model catalog, and explicitly in \server-node-events.ts\. - Persist \offloadedRefs\ into the transcript pipeline in \chat.ts\ to preserve media metadata for >2MB attachments. - Correctly map \MediaOffloadError\ to 5xx (UNAVAILABLE) to differentiate server storage faults from 4xx client validation errors. * fix(gateway): dynamically compute supportsImages for overrides and node events Address follow-up Codex review feedback: - Use effective model (including overrides) to compute \supportsImages\ in \gent.ts\. - Move session load earlier in \server-node-events.ts\ to dynamically compute \supportsImages\ rather than hardcoding true. * fix(gateway): resolve capability edge cases reported by codex Address final Codex edge cases: - Refactor \gent.ts\ to compute \supportsImages\ even when no session key is present, ensuring text-only override requests without sessions safely drop attachments. - Update catalog lookups in \chat.ts\, \gent.ts\, and \server-node-events.ts\ to strictly match both \id\ and \provider\ to prevent cross-provider model collisions. * fix(agents): restore before_install hook for skill installs Restore the plugin scanner security hook that was accidentally dropped during merge conflict resolution. * fix: resolve attachment pathing, defer parsing after auth gates, and clean up node-event mocks * fix: resolve syntax errors in test-env, fix missing helper imports, and optimize parsing sequence in node events * fix(gateway): re-enforce message length limit after attachment parsing Adds a secondary check to ensure the 20,000-char cap remains effective even after media markers are appended during the offload flow. * fix(gateway): prevent dropping valid small images and clean up orphaned media on size rejection * fix(gateway): share attachment image capability checks * fix(gateway): preserve mixed attachment order * fix: fail closed on unknown image capability (#55513) (thanks @Syysean) * fix: classify offloaded attachment refs explicitly (#55513) (thanks @Syysean) --------- Co-authored-by: Ayaan Zaidi <hi@obviy.us>
This commit is contained in:
parent
3ad747e25f
commit
c6f2db1506
|
|
@ -102,6 +102,7 @@ Docs: https://docs.openclaw.ai
|
|||
- Plugins/facades: guard bundled plugin facade loads with a cache-first sentinel so circular re-entry stops crashing `xai`, `sglang`, and `vllm` during gateway plugin startup. (#57508) Thanks @openperf.
|
||||
- Agents/MCP: dispose bundled MCP runtimes after one-shot `openclaw agent --local` runs finish, while preserving bundled MCP state across in-run retries so local JSON runs exit cleanly without restarting stateful MCP tools mid-run.
|
||||
- Gateway/OpenAI HTTP: restore default operator scopes for bearer-authenticated requests that omit `x-openclaw-scopes`, so headless `/v1/chat/completions` and session-history callers work again after the recent method-scope hardening. (#57596) Thanks @openperf.
|
||||
- Gateway/attachments: offload large inbound images without leaking `media://` markers into text-only runs, preserve mixed attachment order for model input/transcripts, and fail closed when model image capability cannot be resolved. (#55513) Thanks @Syysean.
|
||||
|
||||
## 2026.3.28
|
||||
|
||||
|
|
|
|||
|
|
@ -4,6 +4,7 @@ import type { OpenClawConfig } from "../../config/config.js";
|
|||
import type { CliSessionBinding } from "../../config/sessions.js";
|
||||
import type { SessionSystemPromptReport } from "../../config/sessions/types.js";
|
||||
import type { CliBackendConfig } from "../../config/types.js";
|
||||
import type { PromptImageOrderEntry } from "../../media/prompt-image-order.js";
|
||||
import type { ResolvedCliBackend } from "../cli-backends.js";
|
||||
|
||||
export type RunCliAgentParams = {
|
||||
|
|
@ -28,6 +29,7 @@ export type RunCliAgentParams = {
|
|||
bootstrapPromptWarningSignaturesSeen?: string[];
|
||||
bootstrapPromptWarningSignature?: string;
|
||||
images?: ImageContent[];
|
||||
imageOrder?: PromptImageOrderEntry[];
|
||||
};
|
||||
|
||||
export type CliPreparedBackend = {
|
||||
|
|
|
|||
|
|
@ -368,6 +368,7 @@ export function runAgentAttempt(params: {
|
|||
bootstrapPromptWarningSignaturesSeen,
|
||||
bootstrapPromptWarningSignature,
|
||||
images: params.isFallbackRetry ? undefined : params.opts.images,
|
||||
imageOrder: params.isFallbackRetry ? undefined : params.opts.imageOrder,
|
||||
streamParams: params.opts.streamParams,
|
||||
});
|
||||
return runCliWithSession(cliSessionBinding?.sessionId).catch(async (err) => {
|
||||
|
|
@ -455,6 +456,7 @@ export function runAgentAttempt(params: {
|
|||
skillsSnapshot: params.skillsSnapshot,
|
||||
prompt: effectivePrompt,
|
||||
images: params.isFallbackRetry ? undefined : params.opts.images,
|
||||
imageOrder: params.isFallbackRetry ? undefined : params.opts.imageOrder,
|
||||
clientTools: params.opts.clientTools,
|
||||
provider: params.providerOverride,
|
||||
model: params.modelOverride,
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@ import type { AgentInternalEvent } from "../../agents/internal-events.js";
|
|||
import type { ClientToolDefinition } from "../../agents/pi-embedded-runner/run/params.js";
|
||||
import type { SpawnedRunMetadata } from "../../agents/spawned-context.js";
|
||||
import type { ChannelOutboundTargetMode } from "../../channels/plugins/types.js";
|
||||
import type { PromptImageOrderEntry } from "../../media/prompt-image-order.js";
|
||||
import type { InputProvenance } from "../../sessions/input-provenance.js";
|
||||
|
||||
/** Image content block for Claude API multimodal messages. */
|
||||
|
|
@ -35,6 +36,8 @@ export type AgentCommandOpts = {
|
|||
message: string;
|
||||
/** Optional image attachments for multimodal messages. */
|
||||
images?: ImageContent[];
|
||||
/** Original inline/offloaded attachment order for inbound images. */
|
||||
imageOrder?: PromptImageOrderEntry[];
|
||||
/** Optional client-provided tools (OpenResponses hosted tools). */
|
||||
clientTools?: ClientToolDefinition[];
|
||||
/** Agent id override (must exist in config). */
|
||||
|
|
|
|||
|
|
@ -506,6 +506,7 @@ export async function runEmbeddedPiAgent(
|
|||
skillsSnapshot: params.skillsSnapshot,
|
||||
prompt,
|
||||
images: params.images,
|
||||
imageOrder: params.imageOrder,
|
||||
clientTools: params.clientTools,
|
||||
disableTools: params.disableTools,
|
||||
provider,
|
||||
|
|
|
|||
|
|
@ -1477,6 +1477,7 @@ export async function runEmbeddedAttempt(
|
|||
workspaceDir: effectiveWorkspace,
|
||||
model: params.model,
|
||||
existingImages: params.images,
|
||||
imageOrder: params.imageOrder,
|
||||
maxBytes: MAX_IMAGE_BYTES,
|
||||
maxDimensionPx: resolveImageSanitizationLimits(params.config).maxDimensionPx,
|
||||
workspaceOnly: effectiveFsWorkspaceOnly,
|
||||
|
|
|
|||
|
|
@ -8,7 +8,9 @@ import {
|
|||
detectAndLoadPromptImages,
|
||||
detectImageReferences,
|
||||
loadImageFromRef,
|
||||
mergePromptAttachmentImages,
|
||||
modelSupportsImages,
|
||||
splitPromptAndAttachmentRefs,
|
||||
} from "./images.js";
|
||||
|
||||
function expectNoPromptImages(result: { detectedRefs: unknown[]; images: unknown[] }) {
|
||||
|
|
@ -289,6 +291,47 @@ describe("detectAndLoadPromptImages", () => {
|
|||
expectNoPromptImages(result);
|
||||
});
|
||||
|
||||
it("preserves attachment order when offloaded refs and inline images are mixed", async () => {
|
||||
const merged = mergePromptAttachmentImages({
|
||||
imageOrder: ["offloaded", "inline"],
|
||||
existingImages: [{ type: "image", data: "small-b", mimeType: "image/png" }],
|
||||
offloadedImages: [{ type: "image", data: "large-a", mimeType: "image/jpeg" }],
|
||||
});
|
||||
|
||||
expect(merged).toEqual([
|
||||
{ type: "image", data: "large-a", mimeType: "image/jpeg" },
|
||||
{ type: "image", data: "small-b", mimeType: "image/png" },
|
||||
]);
|
||||
});
|
||||
|
||||
it("classifies trailing offloaded refs separately from prompt refs", () => {
|
||||
const prompt =
|
||||
"compare [media attached: media://inbound/prompt-ref.png] and ./prompt-b.png\n[media attached: media://inbound/att-b.png]";
|
||||
const refs = detectImageReferences(prompt);
|
||||
|
||||
const split = splitPromptAndAttachmentRefs({
|
||||
prompt,
|
||||
refs,
|
||||
imageOrder: ["inline", "offloaded"],
|
||||
});
|
||||
|
||||
expect(split.promptRefs).toEqual([
|
||||
{
|
||||
raw: "media://inbound/prompt-ref.png",
|
||||
type: "media-uri",
|
||||
resolved: "media://inbound/prompt-ref.png",
|
||||
},
|
||||
{ raw: "./prompt-b.png", type: "path", resolved: "./prompt-b.png" },
|
||||
]);
|
||||
expect(split.attachmentRefs).toEqual([
|
||||
{
|
||||
raw: "media://inbound/att-b.png",
|
||||
type: "media-uri",
|
||||
resolved: "media://inbound/att-b.png",
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it("blocks prompt image refs outside workspace when sandbox workspaceOnly is enabled", async () => {
|
||||
const stateDir = await fs.mkdtemp(path.join(os.tmpdir(), "openclaw-native-image-sandbox-"));
|
||||
const sandboxRoot = path.join(stateDir, "sandbox");
|
||||
|
|
|
|||
|
|
@ -1,6 +1,8 @@
|
|||
import path from "node:path";
|
||||
import type { ImageContent } from "@mariozechner/pi-ai";
|
||||
import { assertNoWindowsNetworkPath, safeFileURLToPath } from "../../../infra/local-file-access.js";
|
||||
import type { PromptImageOrderEntry } from "../../../media/prompt-image-order.js";
|
||||
import { resolveMediaBufferPath, getMediaDir } from "../../../media/store.js";
|
||||
import { loadWebMedia } from "../../../media/web-media.js";
|
||||
import { resolveUserPath } from "../../../utils.js";
|
||||
import type { ImageSanitizationLimits } from "../../image-sanitization.js";
|
||||
|
|
@ -38,6 +40,32 @@ const FILE_URL_REGEX_SOURCE = "file://[^\\s<>\"'`\\]]+\\.(?:" + IMAGE_EXTENSION_
|
|||
const PATH_REGEX_SOURCE =
|
||||
"(?:^|\\s|[\"'`(])((\\.\\.?/|[~/])[^\\s\"'`()\\[\\]]*\\.(?:" + IMAGE_EXTENSION_PATTERN + "))";
|
||||
|
||||
/**
|
||||
* Matches the opaque media URI written by the Gateway's claim-check offload:
|
||||
* media://inbound/<uuid-or-id>
|
||||
*
|
||||
* Uses an exclusion-based character class rather than a whitelist so that
|
||||
* Unicode filenames (e.g. Chinese characters) preserved by sanitizeFilename
|
||||
* in store.ts are matched correctly.
|
||||
*
|
||||
* Explicitly excluded from the ID segment:
|
||||
* ] — closes the surrounding [media attached: ...] bracket
|
||||
* \s — any whitespace (space, newline, tab) — terminates the token
|
||||
* / — forward slash path separator (traversal prevention)
|
||||
* \ — back slash path separator (traversal prevention)
|
||||
* \x00 — null byte (path injection prevention)
|
||||
*
|
||||
* resolveMediaBufferPath applies its own guards against these characters, but
|
||||
* excluding them here provides defence-in-depth at the parsing layer.
|
||||
*
|
||||
* Example valid IDs:
|
||||
* "1c77ce17-20b9-4546-be64-6e36a9adcb2c.png"
|
||||
* "photo---1c77ce17-20b9-4546-be64-6e36a9adcb2c.png"
|
||||
* "图片---1c77ce17-20b9-4546-be64-6e36a9adcb2c.png"
|
||||
*/
|
||||
// eslint-disable-next-line no-control-regex
|
||||
const MEDIA_URI_REGEX = /\bmedia:\/\/inbound\/([^\]\s/\\\x00]+)/;
|
||||
|
||||
/**
|
||||
* Result of detecting an image reference in text.
|
||||
*/
|
||||
|
|
@ -45,8 +73,8 @@ export interface DetectedImageRef {
|
|||
/** The raw matched string from the prompt */
|
||||
raw: string;
|
||||
/** The type of reference */
|
||||
type: "path";
|
||||
/** The resolved/normalized path */
|
||||
type: "path" | "media-uri";
|
||||
/** The resolved/normalized path, or the raw media URI for media-uri type */
|
||||
resolved: string;
|
||||
}
|
||||
|
||||
|
|
@ -62,6 +90,105 @@ function normalizeRefForDedupe(raw: string): string {
|
|||
return process.platform === "win32" ? raw.toLowerCase() : raw;
|
||||
}
|
||||
|
||||
export function mergePromptAttachmentImages(params: {
|
||||
imageOrder?: PromptImageOrderEntry[];
|
||||
existingImages?: ImageContent[];
|
||||
offloadedImages?: Array<ImageContent | null>;
|
||||
promptRefImages?: ImageContent[];
|
||||
}): ImageContent[] {
|
||||
const promptImages: ImageContent[] = [];
|
||||
const existingImages = params.existingImages ?? [];
|
||||
const offloadedImages = params.offloadedImages ?? [];
|
||||
|
||||
if (params.imageOrder && params.imageOrder.length > 0) {
|
||||
let inlineIndex = 0;
|
||||
let offloadedIndex = 0;
|
||||
for (const entry of params.imageOrder) {
|
||||
if (entry === "inline") {
|
||||
const image = existingImages[inlineIndex++];
|
||||
if (image) {
|
||||
promptImages.push(image);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
const image = offloadedImages[offloadedIndex++];
|
||||
if (image) {
|
||||
promptImages.push(image);
|
||||
}
|
||||
}
|
||||
while (inlineIndex < existingImages.length) {
|
||||
promptImages.push(existingImages[inlineIndex++]);
|
||||
}
|
||||
while (offloadedIndex < offloadedImages.length) {
|
||||
const image = offloadedImages[offloadedIndex++];
|
||||
if (image) {
|
||||
promptImages.push(image);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
promptImages.push(...existingImages);
|
||||
for (const image of offloadedImages) {
|
||||
if (image) {
|
||||
promptImages.push(image);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
promptImages.push(...(params.promptRefImages ?? []));
|
||||
return promptImages;
|
||||
}
|
||||
|
||||
function extractTrailingAttachmentMediaUris(prompt: string, count: number): string[] {
|
||||
if (count <= 0) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const lines = prompt.split(/\r?\n/);
|
||||
const uris: string[] = [];
|
||||
for (let index = lines.length - 1; index >= 0 && uris.length < count; index--) {
|
||||
const line = lines[index]?.trim();
|
||||
if (!line || line.includes("\0")) {
|
||||
break;
|
||||
}
|
||||
const match = line.match(/^\[media attached:\s*(media:\/\/inbound\/[^\]\s/\\]+)\]$/);
|
||||
if (!match?.[1]) {
|
||||
break;
|
||||
}
|
||||
uris.unshift(match[1]);
|
||||
}
|
||||
return uris;
|
||||
}
|
||||
|
||||
export function splitPromptAndAttachmentRefs(params: {
|
||||
prompt: string;
|
||||
refs: DetectedImageRef[];
|
||||
imageOrder?: PromptImageOrderEntry[];
|
||||
}): {
|
||||
promptRefs: DetectedImageRef[];
|
||||
attachmentRefs: DetectedImageRef[];
|
||||
} {
|
||||
const offloadedCount = params.imageOrder?.filter((entry) => entry === "offloaded").length ?? 0;
|
||||
if (offloadedCount === 0) {
|
||||
return { promptRefs: params.refs, attachmentRefs: [] };
|
||||
}
|
||||
|
||||
const attachmentUris = new Set(extractTrailingAttachmentMediaUris(params.prompt, offloadedCount));
|
||||
if (attachmentUris.size === 0) {
|
||||
return { promptRefs: params.refs, attachmentRefs: [] };
|
||||
}
|
||||
|
||||
const promptRefs: DetectedImageRef[] = [];
|
||||
const attachmentRefs: DetectedImageRef[] = [];
|
||||
for (const ref of params.refs) {
|
||||
if (ref.type === "media-uri" && attachmentUris.has(ref.resolved)) {
|
||||
attachmentRefs.push(ref);
|
||||
continue;
|
||||
}
|
||||
promptRefs.push(ref);
|
||||
}
|
||||
return { promptRefs, attachmentRefs };
|
||||
}
|
||||
|
||||
async function sanitizeImagesWithLog(
|
||||
images: ImageContent[],
|
||||
label: string,
|
||||
|
|
@ -87,6 +214,7 @@ async function sanitizeImagesWithLog(
|
|||
* - Home paths: ~/Pictures/screenshot.png
|
||||
* - file:// URLs: file:///path/to/image.png
|
||||
* - Message attachments: [Image: source: /path/to/image.jpg]
|
||||
* - Gateway claim-check URIs: [media attached: media://inbound/<id>]
|
||||
*
|
||||
* @param prompt The user prompt text to scan
|
||||
* @returns Array of detected image references
|
||||
|
|
@ -135,6 +263,20 @@ export function detectImageReferences(prompt: string): DetectedImageRef[] {
|
|||
continue;
|
||||
}
|
||||
|
||||
// Check for a Gateway claim-check URI first (media://inbound/<id>).
|
||||
// This must be tested before the extension-based path regex because the
|
||||
// URI has no file extension suffix in its base form.
|
||||
const mediaUriMatch = content.match(MEDIA_URI_REGEX);
|
||||
if (mediaUriMatch) {
|
||||
const uri = `media://inbound/${mediaUriMatch[1]}`;
|
||||
const dedupeKey = normalizeRefForDedupe(uri);
|
||||
if (!seen.has(dedupeKey)) {
|
||||
seen.add(dedupeKey);
|
||||
refs.push({ raw: uri, type: "media-uri", resolved: uri });
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// Extract path before the (mime/type) or | delimiter
|
||||
// Format is: path (type) | url OR just: path (type)
|
||||
// Path may contain spaces (e.g., "ChatGPT Image Apr 21.png")
|
||||
|
|
@ -205,6 +347,44 @@ export async function loadImageFromRef(
|
|||
sandbox?: { root: string; bridge: SandboxFsBridge };
|
||||
},
|
||||
): Promise<ImageContent | null> {
|
||||
// Handle Gateway claim-check URIs (media://inbound/<id>).
|
||||
// These are written by the Gateway's offload path and point to files that
|
||||
// the Gateway has already validated and persisted. They are intentionally
|
||||
// exempt from workspaceOnly checks because they live in the media store
|
||||
// managed by the Gateway, not in the agent workspace.
|
||||
if (ref.type === "media-uri") {
|
||||
const uriMatch = ref.resolved.match(MEDIA_URI_REGEX);
|
||||
if (!uriMatch) {
|
||||
log.debug(`Native image: malformed media URI, skipping: ${ref.resolved}`);
|
||||
return null;
|
||||
}
|
||||
const mediaId = uriMatch[1];
|
||||
try {
|
||||
// resolveMediaBufferPath accepts the media ID (with optional extension
|
||||
// and original-filename prefix) and returns the absolute path of the
|
||||
// persisted file. It applies its own guards against path traversal,
|
||||
// symlinks, and null bytes.
|
||||
const physicalPath = await resolveMediaBufferPath(mediaId, "inbound");
|
||||
const media = await loadWebMedia(physicalPath, {
|
||||
maxBytes: options?.maxBytes,
|
||||
localRoots: [getMediaDir()],
|
||||
});
|
||||
if (media.kind !== "image") {
|
||||
log.debug(`Native image: media store entry is not an image: ${mediaId}`);
|
||||
return null;
|
||||
}
|
||||
const mimeType = media.contentType ?? "image/jpeg";
|
||||
const data = media.buffer.toString("base64");
|
||||
log.debug(`Native image: loaded media-uri ${ref.resolved} -> ${physicalPath}`);
|
||||
return { type: "image", data, mimeType };
|
||||
} catch (err) {
|
||||
log.debug(
|
||||
`Native image: failed to load media-uri ${ref.resolved}: ${err instanceof Error ? err.message : String(err)}`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
try {
|
||||
let targetPath = ref.resolved;
|
||||
|
||||
|
|
@ -292,6 +472,7 @@ export async function detectAndLoadPromptImages(params: {
|
|||
workspaceDir: string;
|
||||
model: { input?: string[] };
|
||||
existingImages?: ImageContent[];
|
||||
imageOrder?: PromptImageOrderEntry[];
|
||||
maxBytes?: number;
|
||||
maxDimensionPx?: number;
|
||||
workspaceOnly?: boolean;
|
||||
|
|
@ -326,20 +507,25 @@ export async function detectAndLoadPromptImages(params: {
|
|||
}
|
||||
|
||||
log.debug(`Native image: detected ${allRefs.length} image refs in prompt`);
|
||||
|
||||
const promptImages: ImageContent[] = [...(params.existingImages ?? [])];
|
||||
const { promptRefs, attachmentRefs } = splitPromptAndAttachmentRefs({
|
||||
prompt: params.prompt,
|
||||
refs: allRefs,
|
||||
imageOrder: params.imageOrder,
|
||||
});
|
||||
const promptRefImages: ImageContent[] = [];
|
||||
const offloadedImages: Array<ImageContent | null> = [];
|
||||
|
||||
let loadedCount = 0;
|
||||
let skippedCount = 0;
|
||||
|
||||
for (const ref of allRefs) {
|
||||
for (const ref of promptRefs) {
|
||||
const image = await loadImageFromRef(ref, params.workspaceDir, {
|
||||
maxBytes: params.maxBytes,
|
||||
workspaceOnly: params.workspaceOnly,
|
||||
sandbox: params.sandbox,
|
||||
});
|
||||
if (image) {
|
||||
promptImages.push(image);
|
||||
promptRefImages.push(image);
|
||||
loadedCount++;
|
||||
log.debug(`Native image: loaded ${ref.type} ${ref.resolved}`);
|
||||
} else {
|
||||
|
|
@ -347,6 +533,28 @@ export async function detectAndLoadPromptImages(params: {
|
|||
}
|
||||
}
|
||||
|
||||
for (const ref of attachmentRefs) {
|
||||
const image = await loadImageFromRef(ref, params.workspaceDir, {
|
||||
maxBytes: params.maxBytes,
|
||||
workspaceOnly: params.workspaceOnly,
|
||||
sandbox: params.sandbox,
|
||||
});
|
||||
offloadedImages.push(image);
|
||||
if (image) {
|
||||
loadedCount++;
|
||||
log.debug(`Native image: loaded ${ref.type} ${ref.resolved}`);
|
||||
} else {
|
||||
skippedCount++;
|
||||
}
|
||||
}
|
||||
|
||||
const promptImages = mergePromptAttachmentImages({
|
||||
imageOrder: params.imageOrder,
|
||||
existingImages: params.existingImages,
|
||||
offloadedImages,
|
||||
promptRefImages,
|
||||
});
|
||||
|
||||
const imageSanitization: ImageSanitizationLimits = {
|
||||
maxDimensionPx: params.maxDimensionPx,
|
||||
};
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@ import type { ImageContent } from "@mariozechner/pi-ai";
|
|||
import type { ReasoningLevel, ThinkLevel, VerboseLevel } from "../../../auto-reply/thinking.js";
|
||||
import type { ReplyPayload } from "../../../auto-reply/types.js";
|
||||
import type { OpenClawConfig } from "../../../config/config.js";
|
||||
import type { PromptImageOrderEntry } from "../../../media/prompt-image-order.js";
|
||||
import type { enqueueCommand } from "../../../process/command-queue.js";
|
||||
import type { InputProvenance } from "../../../sessions/input-provenance.js";
|
||||
import type { ExecElevatedDefaults, ExecToolDefaults } from "../../bash-tools.js";
|
||||
|
|
@ -76,6 +77,7 @@ export type RunEmbeddedPiAgentParams = {
|
|||
skillsSnapshot?: SkillSnapshot;
|
||||
prompt: string;
|
||||
images?: ImageContent[];
|
||||
imageOrder?: PromptImageOrderEntry[];
|
||||
/** Optional client-provided tools (OpenResponses hosted tools). */
|
||||
clientTools?: ClientToolDefinition[];
|
||||
/** Disable built-in tools for this run (LLM-only mode). */
|
||||
|
|
|
|||
|
|
@ -552,7 +552,6 @@ export async function installSkill(params: SkillInstallRequest): Promise<SkillIn
|
|||
// Hook errors are non-fatal — built-in scanner results still apply.
|
||||
}
|
||||
}
|
||||
|
||||
// Warn when install is triggered from a non-bundled source.
|
||||
// Workspace/project/personal agent skills can contain attacker-controlled metadata.
|
||||
const trustedInstallSources = new Set(["openclaw-bundled", "openclaw-managed", "openclaw-extra"]);
|
||||
|
|
|
|||
|
|
@ -316,6 +316,7 @@ export async function runAgentTurnWithFallback(params: {
|
|||
bootstrapPromptWarningSignaturesSeen.length - 1
|
||||
],
|
||||
images: params.opts?.images,
|
||||
imageOrder: params.opts?.imageOrder,
|
||||
});
|
||||
bootstrapPromptWarningSignaturesSeen = resolveBootstrapWarningSignaturesSeen(
|
||||
result.meta?.systemPromptReport,
|
||||
|
|
@ -416,6 +417,7 @@ export async function runAgentTurnWithFallback(params: {
|
|||
bootstrapContextMode: params.opts?.bootstrapContextMode,
|
||||
bootstrapContextRunKind: params.opts?.isHeartbeat ? "heartbeat" : "default",
|
||||
images: params.opts?.images,
|
||||
imageOrder: params.opts?.imageOrder,
|
||||
abortSignal: params.opts?.abortSignal,
|
||||
blockReplyBreak: params.resolvedBlockStreamingBreak,
|
||||
blockReplyChunking: params.blockReplyChunking,
|
||||
|
|
|
|||
|
|
@ -1,5 +1,6 @@
|
|||
import type { ImageContent } from "@mariozechner/pi-ai";
|
||||
import type { InteractiveReply } from "../interactive/payload.js";
|
||||
import type { PromptImageOrderEntry } from "../media/prompt-image-order.js";
|
||||
import type { TypingController } from "./reply/typing.js";
|
||||
|
||||
export type BlockReplyContext = {
|
||||
|
|
@ -28,6 +29,8 @@ export type GetReplyOptions = {
|
|||
abortSignal?: AbortSignal;
|
||||
/** Optional inbound images (used for webchat attachments). */
|
||||
images?: ImageContent[];
|
||||
/** Original inline/offloaded attachment order for inbound images. */
|
||||
imageOrder?: PromptImageOrderEntry[];
|
||||
/** Notifies when an agent run actually starts (useful for webchat command handling). */
|
||||
onAgentRunStart?: (runId: string) => void;
|
||||
onReplyStart?: () => Promise<void> | void;
|
||||
|
|
|
|||
|
|
@ -1,5 +1,7 @@
|
|||
import { estimateBase64DecodedBytes } from "../media/base64.js";
|
||||
import type { PromptImageOrderEntry } from "../media/prompt-image-order.js";
|
||||
import { sniffMimeFromBase64 } from "../media/sniff-mime-from-base64.js";
|
||||
import { deleteMediaBuffer, saveMediaBuffer } from "../media/store.js";
|
||||
|
||||
export type ChatAttachment = {
|
||||
type?: string;
|
||||
|
|
@ -14,12 +16,52 @@ export type ChatImageContent = {
|
|||
mimeType: string;
|
||||
};
|
||||
|
||||
/**
|
||||
* Metadata for an attachment that was offloaded to the media store.
|
||||
*
|
||||
* Included in ParsedMessageWithImages.offloadedRefs so that callers can
|
||||
* persist structured media metadata for transcripts. Without this, consumers
|
||||
* that derive MediaPath/MediaPaths from the `images` array (e.g.
|
||||
* persistChatSendImages and buildChatSendTranscriptMessage in chat.ts) would
|
||||
* silently omit all large attachments that were offloaded to disk.
|
||||
*/
|
||||
export type OffloadedRef = {
|
||||
/** Opaque media URI injected into the message, e.g. "media://inbound/<id>" */
|
||||
mediaRef: string;
|
||||
/** The raw media ID from SavedMedia.id, usable with resolveMediaBufferPath */
|
||||
id: string;
|
||||
/** Absolute filesystem path returned by saveMediaBuffer — used for transcript MediaPath */
|
||||
path: string;
|
||||
/** MIME type of the offloaded attachment */
|
||||
mimeType: string;
|
||||
/** The label / filename of the original attachment */
|
||||
label: string;
|
||||
};
|
||||
|
||||
export type ParsedMessageWithImages = {
|
||||
message: string;
|
||||
/** Small attachments (≤ OFFLOAD_THRESHOLD_BYTES) passed inline to the model */
|
||||
images: ChatImageContent[];
|
||||
/** Original accepted attachment order after inline/offloaded split. */
|
||||
imageOrder: PromptImageOrderEntry[];
|
||||
/**
|
||||
* Large attachments (> OFFLOAD_THRESHOLD_BYTES) that were offloaded to the
|
||||
* media store. Each entry corresponds to a `[media attached: media://inbound/<id>]`
|
||||
* marker appended to `message`.
|
||||
*
|
||||
* Callers MUST persist this list separately for transcript media metadata.
|
||||
* It is intentionally separate from `images` because downstream model calls
|
||||
* do not receive these as inline image blocks.
|
||||
*
|
||||
* ⚠️ Call sites (chat.ts, agent.ts, server-node-events.ts) MUST also pass
|
||||
* `supportsImages: modelSupportsImages(model)` so that text-only model runs
|
||||
* do not inject unresolvable media:// markers into prompt text.
|
||||
*/
|
||||
offloadedRefs: OffloadedRef[];
|
||||
};
|
||||
|
||||
type AttachmentLog = {
|
||||
info?: (message: string) => void;
|
||||
warn: (message: string) => void;
|
||||
};
|
||||
|
||||
|
|
@ -29,6 +71,59 @@ type NormalizedAttachment = {
|
|||
base64: string;
|
||||
};
|
||||
|
||||
type SavedMedia = {
|
||||
id: string;
|
||||
path?: string;
|
||||
};
|
||||
|
||||
const OFFLOAD_THRESHOLD_BYTES = 2_000_000;
|
||||
|
||||
const MIME_TO_EXT: Record<string, string> = {
|
||||
"image/jpeg": ".jpg",
|
||||
"image/jpg": ".jpg",
|
||||
"image/png": ".png",
|
||||
"image/webp": ".webp",
|
||||
"image/gif": ".gif",
|
||||
"image/heic": ".heic",
|
||||
"image/heif": ".heif",
|
||||
// bmp/tiff excluded from SUPPORTED_OFFLOAD_MIMES to avoid extension-loss
|
||||
// bug in store.ts; entries kept here for future extension support
|
||||
"image/bmp": ".bmp",
|
||||
"image/tiff": ".tiff",
|
||||
};
|
||||
|
||||
// Module-level Set for O(1) lookup — not rebuilt on every attachment iteration.
|
||||
//
|
||||
// heic/heif are included only if store.ts's extensionForMime maps them to an
|
||||
// extension. If it does not (same extension-loss risk as bmp/tiff), remove
|
||||
// them from this set.
|
||||
const SUPPORTED_OFFLOAD_MIMES = new Set([
|
||||
"image/jpeg",
|
||||
"image/jpg",
|
||||
"image/png",
|
||||
"image/webp",
|
||||
"image/gif",
|
||||
"image/heic",
|
||||
"image/heif",
|
||||
]);
|
||||
|
||||
/**
|
||||
* Raised when the Gateway cannot persist an attachment to the media store.
|
||||
*
|
||||
* Distinct from ordinary input-validation errors so that Gateway handlers can
|
||||
* map it to a server-side 5xx status rather than a client 4xx.
|
||||
*
|
||||
* Example causes: ENOSPC, EPERM, unexpected saveMediaBuffer return shape.
|
||||
*/
|
||||
export class MediaOffloadError extends Error {
|
||||
readonly cause: unknown;
|
||||
constructor(message: string, options?: ErrorOptions) {
|
||||
super(message, options);
|
||||
this.name = "MediaOffloadError";
|
||||
this.cause = options?.cause;
|
||||
}
|
||||
}
|
||||
|
||||
function normalizeMime(mime?: string): string | undefined {
|
||||
if (!mime) {
|
||||
return undefined;
|
||||
|
|
@ -42,8 +137,75 @@ function isImageMime(mime?: string): boolean {
|
|||
}
|
||||
|
||||
function isValidBase64(value: string): boolean {
|
||||
// Minimal validation; avoid full decode allocations for large payloads.
|
||||
return value.length > 0 && value.length % 4 === 0 && /^[A-Za-z0-9+/]+={0,2}$/.test(value);
|
||||
if (value.length === 0 || value.length % 4 !== 0) {
|
||||
return false;
|
||||
}
|
||||
// A full O(n) regex scan is safe: no overlapping quantifiers, fails linearly.
|
||||
// Prevents adversarial payloads padded with megabytes of whitespace from
|
||||
// bypassing length thresholds.
|
||||
return /^[A-Za-z0-9+/]+={0,2}$/.test(value);
|
||||
}
|
||||
|
||||
/**
|
||||
* Confirms that the decoded buffer produced by Buffer.from(b64, 'base64')
|
||||
* matches the pre-decode size estimate.
|
||||
*
|
||||
* Node's Buffer.from silently drops invalid base64 characters rather than
|
||||
* throwing. A material size discrepancy means the source string contained
|
||||
* embedded garbage that was silently stripped, which would produce a corrupted
|
||||
* file on disk. ±3 bytes of slack accounts for base64 padding rounding.
|
||||
*
|
||||
* IMPORTANT: this is an input-validation check (4xx client error).
|
||||
* It MUST be called OUTSIDE the MediaOffloadError try/catch so that
|
||||
* corrupt-input errors are not misclassified as 5xx server errors.
|
||||
*/
|
||||
function verifyDecodedSize(buffer: Buffer, estimatedBytes: number, label: string): void {
|
||||
if (Math.abs(buffer.byteLength - estimatedBytes) > 3) {
|
||||
throw new Error(
|
||||
`attachment ${label}: base64 contains invalid characters ` +
|
||||
`(expected ~${estimatedBytes} bytes decoded, got ${buffer.byteLength})`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
function ensureExtension(label: string, mime: string): string {
|
||||
if (/\.[a-zA-Z0-9]+$/.test(label)) {
|
||||
return label;
|
||||
}
|
||||
const ext = MIME_TO_EXT[mime.toLowerCase()] ?? "";
|
||||
return ext ? `${label}${ext}` : label;
|
||||
}
|
||||
|
||||
/**
|
||||
* Type guard for the return value of saveMediaBuffer.
|
||||
*
|
||||
* Also validates that the returned ID:
|
||||
* - is a non-empty string
|
||||
* - contains no path separators (/ or \) or null bytes
|
||||
*
|
||||
* Catching a bad shape here produces a cleaner error than a cryptic failure
|
||||
* deeper in the stack, and is treated as a 5xx infrastructure error.
|
||||
*/
|
||||
function assertSavedMedia(value: unknown, label: string): SavedMedia {
|
||||
if (
|
||||
value !== null &&
|
||||
typeof value === "object" &&
|
||||
"id" in value &&
|
||||
typeof (value as Record<string, unknown>).id === "string"
|
||||
) {
|
||||
const id = (value as Record<string, unknown>).id as string;
|
||||
if (id.length === 0) {
|
||||
throw new Error(`attachment ${label}: saveMediaBuffer returned an empty media ID`);
|
||||
}
|
||||
if (id.includes("/") || id.includes("\\") || id.includes("\0")) {
|
||||
throw new Error(
|
||||
`attachment ${label}: saveMediaBuffer returned an unsafe media ID ` +
|
||||
`(contains path separator or null byte)`,
|
||||
);
|
||||
}
|
||||
return value as SavedMedia;
|
||||
}
|
||||
throw new Error(`attachment ${label}: saveMediaBuffer returned an unexpected shape`);
|
||||
}
|
||||
|
||||
function normalizeAttachment(
|
||||
|
|
@ -64,7 +226,6 @@ function normalizeAttachment(
|
|||
|
||||
let base64 = content.trim();
|
||||
if (opts.stripDataUrlPrefix) {
|
||||
// Strip data URL prefix if present (e.g., "data:image/jpeg;base64,...").
|
||||
const dataUrlMatch = /^data:[^;]+;base64,(.*)$/.exec(base64);
|
||||
if (dataUrlMatch) {
|
||||
base64 = dataUrlMatch[1];
|
||||
|
|
@ -91,57 +252,211 @@ function validateAttachmentBase64OrThrow(
|
|||
|
||||
/**
|
||||
* Parse attachments and extract images as structured content blocks.
|
||||
* Returns the message text and an array of image content blocks
|
||||
* compatible with Claude API's image format.
|
||||
* Returns the message text, inline image blocks, and offloaded media refs.
|
||||
*
|
||||
* ## Offload behaviour
|
||||
* Attachments whose decoded size exceeds OFFLOAD_THRESHOLD_BYTES are saved to
|
||||
* disk via saveMediaBuffer and replaced with an opaque `media://inbound/<id>`
|
||||
* URI appended to the message. The agent resolves these URIs via
|
||||
* resolveMediaBufferPath before passing them to the model.
|
||||
*
|
||||
* ## Transcript metadata
|
||||
* Callers MUST use `result.offloadedRefs` to persist structured media metadata
|
||||
* for transcripts. These refs are intentionally excluded from `result.images`
|
||||
* because they are not passed inline to the model.
|
||||
*
|
||||
* ## Text-only model runs
|
||||
* Pass `supportsImages: false` for text-only model runs so that no media://
|
||||
* markers are injected into prompt text.
|
||||
*
|
||||
* ⚠️ Call sites in chat.ts, agent.ts, and server-node-events.ts MUST be
|
||||
* updated to pass `supportsImages: modelSupportsImages(model)`. Until they do,
|
||||
* text-only model runs receive unresolvable media:// markers in their prompt.
|
||||
*
|
||||
* ## Cleanup on failure
|
||||
* On any parse failure after files have already been offloaded, best-effort
|
||||
* cleanup is performed before rethrowing so that malformed requests do not
|
||||
* accumulate orphaned files on disk ahead of the periodic TTL sweep.
|
||||
*
|
||||
* ## Known ordering limitation
|
||||
* In mixed large/small batches, the model receives images in a different order
|
||||
* than the original attachment list because detectAndLoadPromptImages
|
||||
* initialises from existingImages first, then appends prompt-detected refs.
|
||||
* A future refactor should unify all image references into a single ordered list.
|
||||
*
|
||||
* @throws {MediaOffloadError} Infrastructure failure saving to media store → 5xx.
|
||||
* @throws {Error} Input validation failure → 4xx.
|
||||
*/
|
||||
export async function parseMessageWithAttachments(
|
||||
message: string,
|
||||
attachments: ChatAttachment[] | undefined,
|
||||
opts?: { maxBytes?: number; log?: AttachmentLog },
|
||||
opts?: { maxBytes?: number; log?: AttachmentLog; supportsImages?: boolean },
|
||||
): Promise<ParsedMessageWithImages> {
|
||||
const maxBytes = opts?.maxBytes ?? 5_000_000; // decoded bytes (5,000,000)
|
||||
const maxBytes = opts?.maxBytes ?? 5_000_000;
|
||||
const log = opts?.log;
|
||||
|
||||
if (!attachments || attachments.length === 0) {
|
||||
return { message, images: [] };
|
||||
return { message, images: [], imageOrder: [], offloadedRefs: [] };
|
||||
}
|
||||
|
||||
// For text-only models drop all attachments cleanly. Do not save files or
|
||||
// inject media:// markers that would never be resolved and would leak
|
||||
// internal path references into the model's prompt.
|
||||
if (opts?.supportsImages === false) {
|
||||
if (attachments.length > 0) {
|
||||
log?.warn(
|
||||
`parseMessageWithAttachments: ${attachments.length} attachment(s) dropped — model does not support images`,
|
||||
);
|
||||
}
|
||||
return { message, images: [], imageOrder: [], offloadedRefs: [] };
|
||||
}
|
||||
|
||||
const images: ChatImageContent[] = [];
|
||||
const imageOrder: PromptImageOrderEntry[] = [];
|
||||
const offloadedRefs: OffloadedRef[] = [];
|
||||
let updatedMessage = message;
|
||||
|
||||
for (const [idx, att] of attachments.entries()) {
|
||||
if (!att) {
|
||||
continue;
|
||||
}
|
||||
const normalized = normalizeAttachment(att, idx, {
|
||||
stripDataUrlPrefix: true,
|
||||
requireImageMime: false,
|
||||
});
|
||||
validateAttachmentBase64OrThrow(normalized, { maxBytes });
|
||||
const { base64: b64, label, mime } = normalized;
|
||||
// Track IDs of files saved during this request for cleanup if a later
|
||||
// attachment fails validation and the entire parse is aborted.
|
||||
const savedMediaIds: string[] = [];
|
||||
|
||||
const providedMime = normalizeMime(mime);
|
||||
const sniffedMime = normalizeMime(await sniffMimeFromBase64(b64));
|
||||
if (sniffedMime && !isImageMime(sniffedMime)) {
|
||||
log?.warn(`attachment ${label}: detected non-image (${sniffedMime}), dropping`);
|
||||
continue;
|
||||
}
|
||||
if (!sniffedMime && !isImageMime(providedMime)) {
|
||||
log?.warn(`attachment ${label}: unable to detect image mime type, dropping`);
|
||||
continue;
|
||||
}
|
||||
if (sniffedMime && providedMime && sniffedMime !== providedMime) {
|
||||
log?.warn(
|
||||
`attachment ${label}: mime mismatch (${providedMime} -> ${sniffedMime}), using sniffed`,
|
||||
);
|
||||
}
|
||||
try {
|
||||
for (const [idx, att] of attachments.entries()) {
|
||||
if (!att) {
|
||||
continue;
|
||||
}
|
||||
|
||||
images.push({
|
||||
type: "image",
|
||||
data: b64,
|
||||
mimeType: sniffedMime ?? providedMime ?? mime,
|
||||
});
|
||||
const normalized = normalizeAttachment(att, idx, {
|
||||
stripDataUrlPrefix: true,
|
||||
requireImageMime: false,
|
||||
});
|
||||
|
||||
const { base64: b64, label, mime } = normalized;
|
||||
|
||||
if (!isValidBase64(b64)) {
|
||||
throw new Error(`attachment ${label}: invalid base64 content`);
|
||||
}
|
||||
|
||||
const sizeBytes = estimateBase64DecodedBytes(b64);
|
||||
if (sizeBytes <= 0) {
|
||||
log?.warn(`attachment ${label}: estimated size is zero, dropping`);
|
||||
continue;
|
||||
}
|
||||
|
||||
if (sizeBytes > maxBytes) {
|
||||
throw new Error(
|
||||
`attachment ${label}: exceeds size limit (${sizeBytes} > ${maxBytes} bytes)`,
|
||||
);
|
||||
}
|
||||
|
||||
const providedMime = normalizeMime(mime);
|
||||
const sniffedMime = normalizeMime(await sniffMimeFromBase64(b64));
|
||||
|
||||
if (sniffedMime && !isImageMime(sniffedMime)) {
|
||||
log?.warn(`attachment ${label}: detected non-image (${sniffedMime}), dropping`);
|
||||
continue;
|
||||
}
|
||||
if (!sniffedMime && !isImageMime(providedMime)) {
|
||||
log?.warn(`attachment ${label}: unable to detect image mime type, dropping`);
|
||||
continue;
|
||||
}
|
||||
if (sniffedMime && providedMime && sniffedMime !== providedMime) {
|
||||
log?.warn(
|
||||
`attachment ${label}: mime mismatch (${providedMime} -> ${sniffedMime}), using sniffed`,
|
||||
);
|
||||
}
|
||||
|
||||
// Third fallback normalises `mime` so a raw un-normalised string (e.g.
|
||||
// "IMAGE/JPEG") does not silently bypass the SUPPORTED_OFFLOAD_MIMES check.
|
||||
const finalMime = sniffedMime ?? providedMime ?? normalizeMime(mime) ?? mime;
|
||||
|
||||
let isOffloaded = false;
|
||||
|
||||
if (sizeBytes > OFFLOAD_THRESHOLD_BYTES) {
|
||||
const isSupportedForOffload = SUPPORTED_OFFLOAD_MIMES.has(finalMime);
|
||||
|
||||
if (!isSupportedForOffload) {
|
||||
// Passing this inline would reintroduce the OOM risk this PR prevents.
|
||||
throw new Error(
|
||||
`attachment ${label}: format ${finalMime} is too large to pass inline ` +
|
||||
`(${sizeBytes} > ${OFFLOAD_THRESHOLD_BYTES} bytes) and cannot be offloaded. ` +
|
||||
`Please convert to JPEG, PNG, WEBP, GIF, HEIC, or HEIF.`,
|
||||
);
|
||||
}
|
||||
|
||||
// Decode and run input-validation BEFORE the MediaOffloadError try/catch.
|
||||
// verifyDecodedSize is a 4xx client error and must not be wrapped as a
|
||||
// 5xx MediaOffloadError.
|
||||
const buffer = Buffer.from(b64, "base64");
|
||||
verifyDecodedSize(buffer, sizeBytes, label);
|
||||
|
||||
// Only the storage operation is wrapped so callers can distinguish
|
||||
// infrastructure failures (5xx) from input errors (4xx).
|
||||
try {
|
||||
const labelWithExt = ensureExtension(label, finalMime);
|
||||
|
||||
const rawResult = await saveMediaBuffer(
|
||||
buffer,
|
||||
finalMime,
|
||||
"inbound",
|
||||
maxBytes,
|
||||
labelWithExt,
|
||||
);
|
||||
|
||||
const savedMedia = assertSavedMedia(rawResult, label);
|
||||
|
||||
// Track for cleanup if a subsequent attachment fails.
|
||||
savedMediaIds.push(savedMedia.id);
|
||||
|
||||
// Opaque URI — compatible with workspaceOnly sandboxes and decouples
|
||||
// the Gateway from the agent's filesystem layout.
|
||||
const mediaRef = `media://inbound/${savedMedia.id}`;
|
||||
|
||||
updatedMessage += `\n[media attached: ${mediaRef}]`;
|
||||
log?.info?.(`[Gateway] Intercepted large image payload. Saved: ${mediaRef}`);
|
||||
|
||||
// Record for transcript metadata — separate from `images` because
|
||||
// these are not passed inline to the model.
|
||||
offloadedRefs.push({
|
||||
mediaRef,
|
||||
id: savedMedia.id,
|
||||
path: savedMedia.path ?? "",
|
||||
mimeType: finalMime,
|
||||
label,
|
||||
});
|
||||
imageOrder.push("offloaded");
|
||||
|
||||
isOffloaded = true;
|
||||
} catch (err) {
|
||||
const errorMessage = err instanceof Error ? err.message : String(err);
|
||||
throw new MediaOffloadError(
|
||||
`[Gateway Error] Failed to save intercepted media to disk: ${errorMessage}`,
|
||||
{ cause: err },
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
if (isOffloaded) {
|
||||
continue;
|
||||
}
|
||||
|
||||
images.push({ type: "image", data: b64, mimeType: finalMime });
|
||||
imageOrder.push("inline");
|
||||
}
|
||||
} catch (err) {
|
||||
// Best-effort cleanup before rethrowing.
|
||||
if (savedMediaIds.length > 0) {
|
||||
await Promise.allSettled(savedMediaIds.map((id) => deleteMediaBuffer(id, "inbound")));
|
||||
}
|
||||
throw err;
|
||||
}
|
||||
|
||||
return { message, images };
|
||||
return {
|
||||
message: updatedMessage !== message ? updatedMessage.trimEnd() : message,
|
||||
images,
|
||||
imageOrder,
|
||||
offloadedRefs,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
@ -153,7 +468,8 @@ export function buildMessageWithAttachments(
|
|||
attachments: ChatAttachment[] | undefined,
|
||||
opts?: { maxBytes?: number },
|
||||
): string {
|
||||
const maxBytes = opts?.maxBytes ?? 2_000_000; // 2 MB
|
||||
const maxBytes = opts?.maxBytes ?? 2_000_000;
|
||||
|
||||
if (!attachments || attachments.length === 0) {
|
||||
return message;
|
||||
}
|
||||
|
|
@ -164,21 +480,22 @@ export function buildMessageWithAttachments(
|
|||
if (!att) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const normalized = normalizeAttachment(att, idx, {
|
||||
stripDataUrlPrefix: false,
|
||||
requireImageMime: true,
|
||||
});
|
||||
validateAttachmentBase64OrThrow(normalized, { maxBytes });
|
||||
const { base64, label, mime } = normalized;
|
||||
|
||||
const { base64, label, mime } = normalized;
|
||||
const safeLabel = label.replace(/\s+/g, "_");
|
||||
const dataUrl = ``;
|
||||
blocks.push(dataUrl);
|
||||
blocks.push(``);
|
||||
}
|
||||
|
||||
if (blocks.length === 0) {
|
||||
return message;
|
||||
}
|
||||
|
||||
const separator = message.trim().length > 0 ? "\n\n" : "";
|
||||
return `${message}${separator}${blocks.join("\n\n")}`;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -23,6 +23,7 @@ import {
|
|||
} from "../../infra/outbound/agent-delivery.js";
|
||||
import { shouldDowngradeDeliveryToSessionOnly } from "../../infra/outbound/best-effort-delivery.js";
|
||||
import { resolveMessageChannelSelection } from "../../infra/outbound/channel-selection.js";
|
||||
import type { PromptImageOrderEntry } from "../../media/prompt-image-order.js";
|
||||
import { classifySessionKeyShape, normalizeAgentId } from "../../routing/session-key.js";
|
||||
import { defaultRuntime } from "../../runtime.js";
|
||||
import { normalizeInputProvenance, type InputProvenance } from "../../sessions/input-provenance.js";
|
||||
|
|
@ -39,7 +40,7 @@ import {
|
|||
normalizeMessageChannel,
|
||||
} from "../../utils/message-channel.js";
|
||||
import { resolveAssistantIdentity } from "../assistant-identity.js";
|
||||
import { parseMessageWithAttachments } from "../chat-attachments.js";
|
||||
import { MediaOffloadError, parseMessageWithAttachments } from "../chat-attachments.js";
|
||||
import { resolveAssistantAvatarUrl } from "../control-ui-shared.js";
|
||||
import { ADMIN_SCOPE } from "../method-scopes.js";
|
||||
import { GATEWAY_CLIENT_CAPS, hasGatewayClientCap } from "../protocol/client-info.js";
|
||||
|
|
@ -58,6 +59,8 @@ import {
|
|||
loadGatewaySessionRow,
|
||||
loadSessionEntry,
|
||||
migrateAndPruneGatewaySessionStoreKey,
|
||||
resolveGatewayModelSupportsImages,
|
||||
resolveSessionModelRef,
|
||||
} from "../session-utils.js";
|
||||
import { formatForLog } from "../ws-log.js";
|
||||
import { waitForAgentJob } from "./agent-job.js";
|
||||
|
|
@ -346,16 +349,53 @@ export const agentHandlers: GatewayRequestHandlers = {
|
|||
|
||||
let message = (request.message ?? "").trim();
|
||||
let images: Array<{ type: "image"; data: string; mimeType: string }> = [];
|
||||
let imageOrder: PromptImageOrderEntry[] = [];
|
||||
if (normalizedAttachments.length > 0) {
|
||||
const requestedSessionKeyRaw =
|
||||
typeof request.sessionKey === "string" && request.sessionKey.trim()
|
||||
? request.sessionKey.trim()
|
||||
: undefined;
|
||||
|
||||
let baseProvider: string | undefined;
|
||||
let baseModel: string | undefined;
|
||||
if (requestedSessionKeyRaw) {
|
||||
const { cfg: sessCfg, entry: sessEntry } = loadSessionEntry(requestedSessionKeyRaw);
|
||||
const modelRef = resolveSessionModelRef(sessCfg, sessEntry, undefined);
|
||||
baseProvider = modelRef.provider;
|
||||
baseModel = modelRef.model;
|
||||
}
|
||||
const effectiveProvider = providerOverride || baseProvider;
|
||||
const effectiveModel = modelOverride || baseModel;
|
||||
const supportsImages = await resolveGatewayModelSupportsImages({
|
||||
loadGatewayModelCatalog: context.loadGatewayModelCatalog,
|
||||
provider: effectiveProvider,
|
||||
model: effectiveModel,
|
||||
});
|
||||
|
||||
try {
|
||||
const parsed = await parseMessageWithAttachments(message, normalizedAttachments, {
|
||||
maxBytes: 5_000_000,
|
||||
log: context.logGateway,
|
||||
supportsImages,
|
||||
});
|
||||
message = parsed.message.trim();
|
||||
images = parsed.images;
|
||||
imageOrder = parsed.imageOrder;
|
||||
// offloadedRefs are appended as text markers to `message`; the agent
|
||||
// runner will resolve them via detectAndLoadPromptImages.
|
||||
} catch (err) {
|
||||
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, String(err)));
|
||||
// MediaOffloadError indicates a server-side storage fault (ENOSPC, EPERM,
|
||||
// etc.). Map it to UNAVAILABLE so clients can retry without treating it as
|
||||
// a bad request. All other errors are input-validation failures → 4xx.
|
||||
const isServerFault = err instanceof MediaOffloadError;
|
||||
respond(
|
||||
false,
|
||||
undefined,
|
||||
errorShape(
|
||||
isServerFault ? ErrorCodes.UNAVAILABLE : ErrorCodes.INVALID_REQUEST,
|
||||
String(err),
|
||||
),
|
||||
);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
|
@ -765,6 +805,7 @@ export const agentHandlers: GatewayRequestHandlers = {
|
|||
ingressOpts: {
|
||||
message,
|
||||
images,
|
||||
imageOrder,
|
||||
provider: providerOverride,
|
||||
model: modelOverride,
|
||||
to: resolvedTo,
|
||||
|
|
|
|||
|
|
@ -12,6 +12,7 @@ import { isSilentReplyText, SILENT_REPLY_TOKEN } from "../../auto-reply/tokens.j
|
|||
import type { ReplyPayload } from "../../auto-reply/types.js";
|
||||
import { resolveSessionFilePath } from "../../config/sessions.js";
|
||||
import { jsonUtf8Bytes } from "../../infra/json-utf8-bytes.js";
|
||||
import type { PromptImageOrderEntry } from "../../media/prompt-image-order.js";
|
||||
import { type SavedMedia, saveMediaBuffer } from "../../media/store.js";
|
||||
import { createChannelReplyPipeline } from "../../plugin-sdk/channel-reply-pipeline.js";
|
||||
import { normalizeInputProvenance, type InputProvenance } from "../../sessions/input-provenance.js";
|
||||
|
|
@ -35,7 +36,12 @@ import {
|
|||
isChatStopCommandText,
|
||||
resolveChatRunExpiresAtMs,
|
||||
} from "../chat-abort.js";
|
||||
import { type ChatImageContent, parseMessageWithAttachments } from "../chat-attachments.js";
|
||||
import {
|
||||
type ChatImageContent,
|
||||
MediaOffloadError,
|
||||
type OffloadedRef,
|
||||
parseMessageWithAttachments,
|
||||
} from "../chat-attachments.js";
|
||||
import { stripEnvelopeFromMessage, stripEnvelopeFromMessages } from "../chat-sanitize.js";
|
||||
import { augmentChatHistoryWithCliSessionImports } from "../cli-session-history.js";
|
||||
import { ADMIN_SCOPE } from "../method-scopes.js";
|
||||
|
|
@ -59,6 +65,7 @@ import { getMaxChatHistoryMessagesBytes } from "../server-constants.js";
|
|||
import {
|
||||
capArrayByJsonBytes,
|
||||
loadSessionEntry,
|
||||
resolveGatewayModelSupportsImages,
|
||||
readSessionMessages,
|
||||
resolveSessionModelRef,
|
||||
} from "../session-utils.js";
|
||||
|
|
@ -326,16 +333,50 @@ function canInjectSystemProvenance(client: GatewayRequestHandlerOptions["client"
|
|||
return scopes.includes(ADMIN_SCOPE);
|
||||
}
|
||||
|
||||
/**
|
||||
* Persist inline images and offloaded-ref media to the transcript media store.
|
||||
*
|
||||
* Inline images are re-saved from their base64 payload so that a stable
|
||||
* filesystem path can be stored in the transcript. Offloaded refs are already
|
||||
* on disk (saved by parseMessageWithAttachments); their SavedMedia metadata is
|
||||
* synthesised directly from the OffloadedRef, avoiding a redundant write.
|
||||
*
|
||||
* Both sets are combined so that transcript media fields remain complete
|
||||
* regardless of whether attachments were inlined or offloaded.
|
||||
*/
|
||||
async function persistChatSendImages(params: {
|
||||
images: ChatImageContent[];
|
||||
imageOrder: PromptImageOrderEntry[];
|
||||
offloadedRefs: OffloadedRef[];
|
||||
client: GatewayRequestHandlerOptions["client"];
|
||||
logGateway: GatewayRequestContext["logGateway"];
|
||||
}): Promise<SavedMedia[]> {
|
||||
if (params.images.length === 0 || isAcpBridgeClient(params.client)) {
|
||||
if (isAcpBridgeClient(params.client)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const saved: SavedMedia[] = [];
|
||||
for (const img of params.images) {
|
||||
let inlineIndex = 0;
|
||||
let offloadedIndex = 0;
|
||||
for (const entry of params.imageOrder) {
|
||||
if (entry === "offloaded") {
|
||||
const ref = params.offloadedRefs[offloadedIndex++];
|
||||
if (!ref) {
|
||||
continue;
|
||||
}
|
||||
saved.push({
|
||||
id: ref.id,
|
||||
path: ref.path,
|
||||
size: 0,
|
||||
contentType: ref.mimeType,
|
||||
});
|
||||
continue;
|
||||
}
|
||||
|
||||
const img = params.images[inlineIndex++];
|
||||
if (!img) {
|
||||
continue;
|
||||
}
|
||||
try {
|
||||
saved.push(await saveMediaBuffer(Buffer.from(img.data, "base64"), img.mimeType, "inbound"));
|
||||
} catch (err) {
|
||||
|
|
@ -344,6 +385,7 @@ async function persistChatSendImages(params: {
|
|||
);
|
||||
}
|
||||
}
|
||||
|
||||
return saved;
|
||||
}
|
||||
|
||||
|
|
@ -1387,23 +1429,18 @@ export const chatHandlers: GatewayRequestHandlers = {
|
|||
);
|
||||
return;
|
||||
}
|
||||
let parsedMessage = inboundMessage;
|
||||
let parsedImages: ChatImageContent[] = [];
|
||||
if (normalizedAttachments.length > 0) {
|
||||
try {
|
||||
const parsed = await parseMessageWithAttachments(inboundMessage, normalizedAttachments, {
|
||||
maxBytes: 5_000_000,
|
||||
log: context.logGateway,
|
||||
});
|
||||
parsedMessage = parsed.message;
|
||||
parsedImages = parsed.images;
|
||||
} catch (err) {
|
||||
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, String(err)));
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
// Load session entry before attachment parsing so we can gate media-URI
|
||||
// marker injection on the model's image capability. This prevents opaque
|
||||
// media:// markers from leaking into prompts for text-only model runs.
|
||||
const rawSessionKey = p.sessionKey;
|
||||
const { cfg, entry, canonicalKey: sessionKey } = loadSessionEntry(rawSessionKey);
|
||||
|
||||
let parsedMessage = inboundMessage;
|
||||
let parsedImages: ChatImageContent[] = [];
|
||||
let parsedImageOrder: PromptImageOrderEntry[] = [];
|
||||
let parsedOffloadedRefs: OffloadedRef[] = [];
|
||||
|
||||
const timeoutMs = resolveAgentTimeoutMs({
|
||||
cfg,
|
||||
overrideMs: p.timeoutMs,
|
||||
|
|
@ -1461,6 +1498,43 @@ export const chatHandlers: GatewayRequestHandlers = {
|
|||
return;
|
||||
}
|
||||
|
||||
if (normalizedAttachments.length > 0) {
|
||||
const sessionAgentId = resolveSessionAgentId({ sessionKey, config: cfg });
|
||||
const modelRef = resolveSessionModelRef(cfg, entry, sessionAgentId);
|
||||
const supportsImages = await resolveGatewayModelSupportsImages({
|
||||
loadGatewayModelCatalog: context.loadGatewayModelCatalog,
|
||||
provider: modelRef.provider,
|
||||
model: modelRef.model,
|
||||
});
|
||||
|
||||
try {
|
||||
const parsed = await parseMessageWithAttachments(inboundMessage, normalizedAttachments, {
|
||||
maxBytes: 5_000_000,
|
||||
log: context.logGateway,
|
||||
supportsImages,
|
||||
});
|
||||
parsedMessage = parsed.message;
|
||||
parsedImages = parsed.images;
|
||||
parsedImageOrder = parsed.imageOrder;
|
||||
parsedOffloadedRefs = parsed.offloadedRefs;
|
||||
} catch (err) {
|
||||
// MediaOffloadError indicates a server-side storage fault (ENOSPC, EPERM,
|
||||
// etc.). All other errors are client-side input validation failures.
|
||||
// Map them to different HTTP status codes so callers can retry server
|
||||
// faults without treating them as bad requests.
|
||||
const isServerFault = err instanceof MediaOffloadError;
|
||||
respond(
|
||||
false,
|
||||
undefined,
|
||||
errorShape(
|
||||
isServerFault ? ErrorCodes.UNAVAILABLE : ErrorCodes.INVALID_REQUEST,
|
||||
String(err),
|
||||
),
|
||||
);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
try {
|
||||
const abortController = new AbortController();
|
||||
context.chatAbortControllers.set(clientRunId, {
|
||||
|
|
@ -1477,8 +1551,15 @@ export const chatHandlers: GatewayRequestHandlers = {
|
|||
status: "started" as const,
|
||||
};
|
||||
respond(true, ackPayload, undefined, { runId: clientRunId });
|
||||
|
||||
// Persist both inline images and already-offloaded refs to the media
|
||||
// store so that transcript media fields remain complete for all attachment
|
||||
// sizes. Offloaded refs are already on disk; persistChatSendImages converts
|
||||
// their metadata without re-writing the files.
|
||||
const persistedImagesPromise = persistChatSendImages({
|
||||
images: parsedImages,
|
||||
imageOrder: parsedImageOrder,
|
||||
offloadedRefs: parsedOffloadedRefs,
|
||||
client,
|
||||
logGateway: context.logGateway,
|
||||
});
|
||||
|
|
@ -1638,6 +1719,7 @@ export const chatHandlers: GatewayRequestHandlers = {
|
|||
runId: clientRunId,
|
||||
abortSignal: abortController.signal,
|
||||
images: parsedImages.length > 0 ? parsedImages : undefined,
|
||||
imageOrder: parsedImageOrder.length > 0 ? parsedImageOrder : undefined,
|
||||
onAgentRunStart: (runId) => {
|
||||
agentRunStarted = true;
|
||||
void emitUserTranscriptUpdate();
|
||||
|
|
|
|||
|
|
@ -7,6 +7,8 @@ const buildSessionLookup = (
|
|||
sessionKey: string,
|
||||
entry: {
|
||||
sessionId?: string;
|
||||
model?: string;
|
||||
modelProvider?: string;
|
||||
lastChannel?: string;
|
||||
lastTo?: string;
|
||||
lastAccountId?: string;
|
||||
|
|
@ -23,6 +25,8 @@ const buildSessionLookup = (
|
|||
entry: {
|
||||
sessionId: entry.sessionId ?? `sid-${sessionKey}`,
|
||||
updatedAt: entry.updatedAt ?? Date.now(),
|
||||
model: entry.model,
|
||||
modelProvider: entry.modelProvider,
|
||||
lastChannel: entry.lastChannel,
|
||||
lastTo: entry.lastTo,
|
||||
lastAccountId: entry.lastAccountId,
|
||||
|
|
@ -44,6 +48,7 @@ const loadOrCreateDeviceIdentityMock = vi.hoisted(() =>
|
|||
privateKeyPem: "private",
|
||||
})),
|
||||
);
|
||||
const parseMessageWithAttachmentsMock = vi.hoisted(() => vi.fn());
|
||||
const normalizeChannelIdMock = vi.hoisted(() =>
|
||||
vi.fn((channel?: string | null) => channel ?? null),
|
||||
);
|
||||
|
|
@ -79,6 +84,13 @@ vi.mock("../infra/push-apns.js", () => ({
|
|||
vi.mock("../infra/device-identity.js", () => ({
|
||||
loadOrCreateDeviceIdentity: loadOrCreateDeviceIdentityMock,
|
||||
}));
|
||||
vi.mock("./chat-attachments.js", async (importOriginal) => {
|
||||
const actual = await importOriginal<typeof import("./chat-attachments.js")>();
|
||||
return {
|
||||
...actual,
|
||||
parseMessageWithAttachments: parseMessageWithAttachmentsMock,
|
||||
};
|
||||
});
|
||||
vi.mock("./session-utils.js", () => ({
|
||||
loadSessionEntry: vi.fn((sessionKey: string) => buildSessionLookup(sessionKey)),
|
||||
migrateAndPruneGatewaySessionStoreKey: vi.fn(
|
||||
|
|
@ -89,10 +101,38 @@ vi.mock("./session-utils.js", () => ({
|
|||
}),
|
||||
),
|
||||
pruneLegacyStoreKeys: vi.fn(),
|
||||
resolveGatewayModelSupportsImages: vi.fn(
|
||||
async ({
|
||||
loadGatewayModelCatalog,
|
||||
provider,
|
||||
model,
|
||||
}: {
|
||||
loadGatewayModelCatalog: () => Promise<
|
||||
Array<{ id: string; provider: string; input?: string[] }>
|
||||
>;
|
||||
provider?: string;
|
||||
model?: string;
|
||||
}) => {
|
||||
if (!model) {
|
||||
return true;
|
||||
}
|
||||
const catalog = await loadGatewayModelCatalog();
|
||||
const modelEntry = catalog.find(
|
||||
(entry) => entry.id === model && (!provider || entry.provider === provider),
|
||||
);
|
||||
return modelEntry ? (modelEntry.input?.includes("image") ?? false) : true;
|
||||
},
|
||||
),
|
||||
resolveGatewaySessionStoreTarget: vi.fn(({ key }: { key: string }) => ({
|
||||
canonicalKey: key,
|
||||
storeKeys: [key],
|
||||
})),
|
||||
resolveSessionModelRef: vi.fn(
|
||||
(_cfg: OpenClawConfig, entry?: { model?: string; modelProvider?: string }) => ({
|
||||
provider: entry?.modelProvider ?? "test-provider",
|
||||
model: entry?.model ?? "default-model",
|
||||
}),
|
||||
),
|
||||
}));
|
||||
|
||||
import { normalizeChannelId } from "../channels/plugins/index.js";
|
||||
|
|
@ -751,10 +791,17 @@ describe("notifications changed events", () => {
|
|||
describe("agent request events", () => {
|
||||
beforeEach(() => {
|
||||
agentCommandMock.mockClear();
|
||||
parseMessageWithAttachmentsMock.mockReset();
|
||||
updateSessionStoreMock.mockClear();
|
||||
loadSessionEntryMock.mockClear();
|
||||
normalizeChannelIdVi.mockClear();
|
||||
normalizeChannelIdVi.mockImplementation((channel?: string | null) => channel ?? null);
|
||||
parseMessageWithAttachmentsMock.mockResolvedValue({
|
||||
message: "parsed message",
|
||||
images: [],
|
||||
imageOrder: [],
|
||||
offloadedRefs: [],
|
||||
});
|
||||
agentCommandMock.mockResolvedValue({ status: "ok" } as never);
|
||||
updateSessionStoreMock.mockImplementation(async (_storePath, update) => {
|
||||
update({});
|
||||
|
|
@ -821,4 +868,45 @@ describe("agent request events", () => {
|
|||
});
|
||||
expect(opts.runId).toBe(opts.sessionId);
|
||||
});
|
||||
|
||||
it("passes supportsImages false for text-only node-session models", async () => {
|
||||
const ctx = buildCtx();
|
||||
ctx.loadGatewayModelCatalog = async () => [
|
||||
{
|
||||
id: "text-only",
|
||||
name: "Text only",
|
||||
provider: "test-provider",
|
||||
input: ["text"],
|
||||
},
|
||||
];
|
||||
loadSessionEntryMock.mockReturnValueOnce({
|
||||
...buildSessionLookup("agent:main:main", {
|
||||
model: "text-only",
|
||||
modelProvider: "test-provider",
|
||||
}),
|
||||
canonicalKey: "agent:main:main",
|
||||
});
|
||||
|
||||
await handleNodeEvent(ctx, "node-text-only", {
|
||||
event: "agent.request",
|
||||
payloadJSON: JSON.stringify({
|
||||
message: "describe",
|
||||
sessionKey: "agent:main:main",
|
||||
attachments: [
|
||||
{
|
||||
type: "image",
|
||||
mimeType: "image/png",
|
||||
fileName: "dot.png",
|
||||
content: "AAAA",
|
||||
},
|
||||
],
|
||||
}),
|
||||
});
|
||||
|
||||
expect(parseMessageWithAttachmentsMock).toHaveBeenCalledWith(
|
||||
"describe",
|
||||
expect.any(Array),
|
||||
expect.objectContaining({ supportsImages: false }),
|
||||
);
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -1,4 +1,5 @@
|
|||
import { randomUUID } from "node:crypto";
|
||||
import { resolveSessionAgentId } from "../agents/agent-scope.js";
|
||||
import { sanitizeInboundSystemTags } from "../auto-reply/reply/inbound-text.js";
|
||||
import { normalizeChannelId } from "../channels/plugins/index.js";
|
||||
import { createOutboundSendDeps } from "../cli/outbound-send-deps.js";
|
||||
|
|
@ -12,12 +13,19 @@ import { buildOutboundSessionContext } from "../infra/outbound/session-context.j
|
|||
import { resolveOutboundTarget } from "../infra/outbound/targets.js";
|
||||
import { registerApnsRegistration } from "../infra/push-apns.js";
|
||||
import { enqueueSystemEvent } from "../infra/system-events.js";
|
||||
import type { PromptImageOrderEntry } from "../media/prompt-image-order.js";
|
||||
import { deleteMediaBuffer } from "../media/store.js";
|
||||
import { normalizeMainKey, scopedHeartbeatWakeOptions } from "../routing/session-key.js";
|
||||
import { defaultRuntime } from "../runtime.js";
|
||||
import { parseMessageWithAttachments } from "./chat-attachments.js";
|
||||
import { normalizeRpcAttachmentsToChatAttachments } from "./server-methods/attachment-normalize.js";
|
||||
import type { NodeEvent, NodeEventContext } from "./server-node-events-types.js";
|
||||
import { loadSessionEntry, migrateAndPruneGatewaySessionStoreKey } from "./session-utils.js";
|
||||
import {
|
||||
loadSessionEntry,
|
||||
migrateAndPruneGatewaySessionStoreKey,
|
||||
resolveGatewayModelSupportsImages,
|
||||
resolveSessionModelRef,
|
||||
} from "./session-utils.js";
|
||||
import { formatForLog } from "./ws-log.js";
|
||||
|
||||
const MAX_EXEC_EVENT_OUTPUT_CHARS = 180;
|
||||
|
|
@ -347,33 +355,74 @@ export const handleNodeEvent = async (ctx: NodeEventContext, nodeId: string, evt
|
|||
timeoutSeconds?: number | null;
|
||||
key?: string | null;
|
||||
};
|
||||
|
||||
let link: AgentDeepLink | null = null;
|
||||
try {
|
||||
link = JSON.parse(evt.payloadJSON) as AgentDeepLink;
|
||||
} catch {
|
||||
return;
|
||||
}
|
||||
|
||||
const sessionKeyRaw = (link?.sessionKey ?? "").trim();
|
||||
const sessionKey = sessionKeyRaw.length > 0 ? sessionKeyRaw : `node-${nodeId}`;
|
||||
const cfg = loadConfig();
|
||||
const { storePath, entry, canonicalKey } = loadSessionEntry(sessionKey);
|
||||
|
||||
let message = (link?.message ?? "").trim();
|
||||
const normalizedAttachments = normalizeRpcAttachmentsToChatAttachments(
|
||||
link?.attachments ?? undefined,
|
||||
);
|
||||
let images: Array<{ type: "image"; data: string; mimeType: string }> = [];
|
||||
let imageOrder: PromptImageOrderEntry[] = [];
|
||||
if (!message && normalizedAttachments.length === 0) {
|
||||
return;
|
||||
}
|
||||
if (message.length > 20_000) {
|
||||
return;
|
||||
}
|
||||
if (normalizedAttachments.length > 0) {
|
||||
const sessionAgentId = resolveSessionAgentId({ sessionKey, config: cfg });
|
||||
const modelRef = resolveSessionModelRef(cfg, entry, sessionAgentId);
|
||||
const supportsImages = await resolveGatewayModelSupportsImages({
|
||||
loadGatewayModelCatalog: ctx.loadGatewayModelCatalog,
|
||||
provider: modelRef.provider,
|
||||
model: modelRef.model,
|
||||
});
|
||||
try {
|
||||
const parsed = await parseMessageWithAttachments(message, normalizedAttachments, {
|
||||
maxBytes: 5_000_000,
|
||||
log: ctx.logGateway,
|
||||
supportsImages,
|
||||
});
|
||||
message = parsed.message.trim();
|
||||
images = parsed.images;
|
||||
} catch {
|
||||
imageOrder = parsed.imageOrder;
|
||||
if (message.length > 20_000) {
|
||||
ctx.logGateway.warn(
|
||||
`agent.request message exceeds limit after attachment parsing (length=${message.length})`,
|
||||
);
|
||||
if (parsed.offloadedRefs && parsed.offloadedRefs.length > 0) {
|
||||
for (const ref of parsed.offloadedRefs) {
|
||||
try {
|
||||
await deleteMediaBuffer(ref.id);
|
||||
} catch (cleanupErr) {
|
||||
ctx.logGateway.warn(
|
||||
`Failed to cleanup orphaned media ${ref.id}: ${cleanupErr instanceof Error ? cleanupErr.message : String(cleanupErr)}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
return;
|
||||
}
|
||||
} catch (err) {
|
||||
ctx.logGateway.warn(
|
||||
`agent.request attachment parse failed: ${err instanceof Error ? err.message : String(err)}`,
|
||||
);
|
||||
return;
|
||||
}
|
||||
}
|
||||
if (!message) {
|
||||
return;
|
||||
}
|
||||
if (message.length > 20_000) {
|
||||
|
||||
if (!message && images.length === 0) {
|
||||
return;
|
||||
}
|
||||
|
||||
|
|
@ -386,10 +435,6 @@ export const handleNodeEvent = async (ctx: NodeEventContext, nodeId: string, evt
|
|||
const receiptText =
|
||||
receiptTextRaw || "Just received your iOS share + request, working on it.";
|
||||
|
||||
const sessionKeyRaw = (link?.sessionKey ?? "").trim();
|
||||
const sessionKey = sessionKeyRaw.length > 0 ? sessionKeyRaw : `node-${nodeId}`;
|
||||
const cfg = loadConfig();
|
||||
const { storePath, entry, canonicalKey } = loadSessionEntry(sessionKey);
|
||||
const now = Date.now();
|
||||
const sessionId = entry?.sessionId ?? randomUUID();
|
||||
await touchSessionStore({ cfg, sessionKey, storePath, canonicalKey, entry, sessionId, now });
|
||||
|
|
@ -438,6 +483,7 @@ export const handleNodeEvent = async (ctx: NodeEventContext, nodeId: string, evt
|
|||
runId: sessionId,
|
||||
message,
|
||||
images,
|
||||
imageOrder,
|
||||
sessionId,
|
||||
sessionKey: canonicalKey,
|
||||
thinking: link?.thinking ?? undefined,
|
||||
|
|
|
|||
|
|
@ -22,6 +22,7 @@ import {
|
|||
migrateAndPruneGatewaySessionStoreKey,
|
||||
parseGroupKey,
|
||||
pruneLegacyStoreKeys,
|
||||
resolveGatewayModelSupportsImages,
|
||||
resolveGatewaySessionStoreTarget,
|
||||
resolveSessionModelIdentityRef,
|
||||
resolveSessionModelRef,
|
||||
|
|
@ -2356,6 +2357,30 @@ describe("listSessionsFromStore subagent metadata", () => {
|
|||
expect(timeout?.status).toBe("timeout");
|
||||
expect(timeout?.runtimeMs).toBe(0);
|
||||
});
|
||||
|
||||
test("fails closed when model lookup misses", async () => {
|
||||
await expect(
|
||||
resolveGatewayModelSupportsImages({
|
||||
model: "gpt-5.4",
|
||||
provider: "openai",
|
||||
loadGatewayModelCatalog: async () => [
|
||||
{ id: "gpt-5.4", name: "GPT-5.4", provider: "other", input: ["text", "image"] },
|
||||
],
|
||||
}),
|
||||
).resolves.toBe(false);
|
||||
});
|
||||
|
||||
test("fails closed when model catalog load throws", async () => {
|
||||
await expect(
|
||||
resolveGatewayModelSupportsImages({
|
||||
model: "gpt-5.4",
|
||||
provider: "openai",
|
||||
loadGatewayModelCatalog: async () => {
|
||||
throw new Error("catalog unavailable");
|
||||
},
|
||||
}),
|
||||
).resolves.toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe("loadCombinedSessionStoreForGateway includes disk-only agents (#32804)", () => {
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@ import {
|
|||
} from "../agents/agent-scope.js";
|
||||
import { lookupContextTokens, resolveContextTokensForModel } from "../agents/context.js";
|
||||
import { DEFAULT_CONTEXT_TOKENS, DEFAULT_MODEL, DEFAULT_PROVIDER } from "../agents/defaults.js";
|
||||
import type { ModelCatalogEntry } from "../agents/model-catalog.js";
|
||||
import {
|
||||
inferUniqueProviderFromConfiguredModels,
|
||||
parseModelRef,
|
||||
|
|
@ -1074,6 +1075,27 @@ export function resolveSessionModelRef(
|
|||
return { provider, model };
|
||||
}
|
||||
|
||||
export async function resolveGatewayModelSupportsImages(params: {
|
||||
loadGatewayModelCatalog: () => Promise<ModelCatalogEntry[]>;
|
||||
provider?: string;
|
||||
model?: string;
|
||||
}): Promise<boolean> {
|
||||
if (!params.model) {
|
||||
return true;
|
||||
}
|
||||
|
||||
try {
|
||||
const catalog = await params.loadGatewayModelCatalog();
|
||||
const modelEntry = catalog.find(
|
||||
(entry) =>
|
||||
entry.id === params.model && (!params.provider || entry.provider === params.provider),
|
||||
);
|
||||
return modelEntry ? (modelEntry.input?.includes("image") ?? false) : false;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
export function resolveSessionModelIdentityRef(
|
||||
cfg: OpenClawConfig,
|
||||
entry?:
|
||||
|
|
|
|||
|
|
@ -0,0 +1 @@
|
|||
export type PromptImageOrderEntry = "inline" | "offloaded";
|
||||
|
|
@ -407,3 +407,99 @@ export async function saveMediaBuffer(
|
|||
await writeSavedMediaBuffer({ dir, id, buffer });
|
||||
return buildSavedMediaResult({ dir, id, size: buffer.byteLength, contentType: mime });
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolves a media ID saved by saveMediaBuffer to its absolute physical path.
|
||||
*
|
||||
* This is the read-side counterpart to saveMediaBuffer and is used by the
|
||||
* agent runner to hydrate opaque `media://inbound/<id>` URIs written by the
|
||||
* Gateway's claim-check offload path.
|
||||
*
|
||||
* Security:
|
||||
* - Rejects IDs containing path separators, "..", or null bytes to prevent
|
||||
* directory traversal and path injection outside the resolved subdir.
|
||||
* - Verifies the resolved path is a regular file (not a symlink or directory)
|
||||
* before returning it, matching the write-side MEDIA_FILE_MODE policy.
|
||||
*
|
||||
* @param id The media ID as returned by SavedMedia.id (may include
|
||||
* extension and original-filename prefix,
|
||||
* e.g. "photo---<uuid>.png" or "图片---<uuid>.png").
|
||||
* @param subdir The subdirectory the file was saved into (default "inbound").
|
||||
* @returns Absolute path to the file on disk.
|
||||
* @throws If the ID is unsafe, the file does not exist, or is not a
|
||||
* regular file.
|
||||
*/
|
||||
export async function resolveMediaBufferPath(
|
||||
id: string,
|
||||
subdir: "inbound" = "inbound",
|
||||
): Promise<string> {
|
||||
// Guard against path traversal and null-byte injection.
|
||||
//
|
||||
// - Separator checks: reject any ID containing "/" or "\" (covers all
|
||||
// relative traversal sequences such as "../foo" or "..\\foo").
|
||||
// - Exact ".." check: reject the bare traversal operator in case a caller
|
||||
// strips separators but keeps the dots.
|
||||
// - Null-byte check: reject "\0" which can truncate paths on some platforms
|
||||
// and cause the OS to open a different file than intended.
|
||||
//
|
||||
// We allow consecutive dots in legitimate filenames (e.g. "report..draft.png"),
|
||||
// so we only reject the exact two-character string "..".
|
||||
//
|
||||
// JSON.stringify is used in the error message so that control characters
|
||||
// (including \0) are rendered visibly in logs rather than silently dropped.
|
||||
if (!id || id.includes("/") || id.includes("\\") || id.includes("\0") || id === "..") {
|
||||
throw new Error(`resolveMediaBufferPath: unsafe media ID: ${JSON.stringify(id)}`);
|
||||
}
|
||||
|
||||
const dir = path.join(resolveMediaDir(), subdir);
|
||||
const resolved = path.join(dir, id);
|
||||
|
||||
// Double-check that path.join didn't escape the intended directory.
|
||||
// This should be unreachable after the separator check above, but be
|
||||
// explicit about the invariant.
|
||||
if (!resolved.startsWith(dir + path.sep) && resolved !== dir) {
|
||||
throw new Error(`resolveMediaBufferPath: path escapes media directory: ${JSON.stringify(id)}`);
|
||||
}
|
||||
|
||||
// lstat (not stat) so we see symlinks rather than following them.
|
||||
const stat = await fs.lstat(resolved);
|
||||
|
||||
if (stat.isSymbolicLink()) {
|
||||
throw new Error(
|
||||
`resolveMediaBufferPath: refusing to follow symlink for media ID: ${JSON.stringify(id)}`,
|
||||
);
|
||||
}
|
||||
if (!stat.isFile()) {
|
||||
throw new Error(
|
||||
`resolveMediaBufferPath: media ID does not resolve to a file: ${JSON.stringify(id)}`,
|
||||
);
|
||||
}
|
||||
|
||||
return resolved;
|
||||
}
|
||||
|
||||
/**
|
||||
* Deletes a file previously saved by saveMediaBuffer.
|
||||
*
|
||||
* This is used by parseMessageWithAttachments to clean up files that were
|
||||
* successfully offloaded earlier in the same request when a later attachment
|
||||
* fails validation and the entire parse is aborted, preventing orphaned files
|
||||
* from accumulating on disk ahead of the periodic TTL sweep.
|
||||
*
|
||||
* Uses resolveMediaBufferPath to apply the same path-safety guards as the
|
||||
* read path (separator checks, symlink rejection, etc.) before unlinking.
|
||||
*
|
||||
* Errors are intentionally not suppressed — callers that want best-effort
|
||||
* cleanup should catch and discard exceptions themselves (e.g. via
|
||||
* Promise.allSettled).
|
||||
*
|
||||
* @param id The media ID as returned by SavedMedia.id.
|
||||
* @param subdir The subdirectory the file was saved into (default "inbound").
|
||||
*/
|
||||
export async function deleteMediaBuffer(
|
||||
id: string,
|
||||
subdir: "inbound" = "inbound",
|
||||
): Promise<void> {
|
||||
const physicalPath = await resolveMediaBufferPath(id, subdir);
|
||||
await fs.unlink(physicalPath);
|
||||
}
|
||||
|
|
|
|||
|
|
@ -233,13 +233,16 @@ function sanitizeLiveConfig(raw: string): string {
|
|||
list?: Array<Record<string, unknown>>;
|
||||
};
|
||||
} = JSON5.parse(raw);
|
||||
|
||||
if (!parsed || typeof parsed !== "object") {
|
||||
return raw;
|
||||
}
|
||||
|
||||
if (parsed.agents?.defaults && typeof parsed.agents.defaults === "object") {
|
||||
delete parsed.agents.defaults.workspace;
|
||||
delete parsed.agents.defaults.agentDir;
|
||||
}
|
||||
|
||||
if (Array.isArray(parsed.agents?.list)) {
|
||||
parsed.agents.list = parsed.agents.list.map((entry) => {
|
||||
if (!entry || typeof entry !== "object") {
|
||||
|
|
@ -251,6 +254,7 @@ function sanitizeLiveConfig(raw: string): string {
|
|||
return nextEntry;
|
||||
});
|
||||
}
|
||||
|
||||
return `${JSON.stringify(parsed, null, 2)}\n`;
|
||||
} catch {
|
||||
return raw;
|
||||
|
|
|
|||
Loading…
Reference in New Issue