feat: add PDF analysis tool with native provider support (#31319)

* feat: add PDF analysis tool with native provider support

New `pdf` tool for analyzing PDF documents with model-powered analysis.

Architecture:
- Native PDF path: sends raw PDF bytes directly to providers that support
  inline document input (Anthropic via DocumentBlockParam, Google Gemini
  via inlineData with application/pdf MIME type)
- Extraction fallback: for providers without native PDF support, extracts
  text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas,
  then sends through the standard vision/text completion path

Key features:
- Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10)
- Page range selection (`pages` param, e.g. "1-5", "1,3,7-9")
- Model override (`model` param) and file size limits (`maxBytesMb`)
- Auto-detects provider capability and falls back gracefully
- Same security patterns as image tool (SSRF guards, sandbox support,
  local path roots, workspace-only policy)

Config (agents.defaults):
- pdfModel: primary/fallbacks (defaults to imageModel, then session model)
- pdfMaxBytesMb: max PDF file size (default: 10)
- pdfMaxPages: max pages to process (default: 20)

Model catalog:
- Extended ModelInputType to include "document" alongside "text"/"image"
- Added modelSupportsDocument() capability check

Files:
- src/agents/tools/pdf-tool.ts - main tool factory
- src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.)
- src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google
- src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths
- Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help

* fix: prepare pdf tool for merge (#31319) (thanks @tyler6204)
This commit is contained in:
Tyler Yust 2026-03-01 22:39:12 -08:00 committed by GitHub
parent 31b6e58a1b
commit d0ac1b0195
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
17 changed files with 2008 additions and 100 deletions

View File

@ -9,6 +9,7 @@ Docs: https://docs.openclaw.ai
- CLI/Config validation: add `openclaw config validate` (with `--json`) to validate config files before gateway startup, and include detailed invalid-key paths in startup invalid-config errors. (#31220) thanks @Sid-Qin.
- Sessions/Attachments: add inline file attachment support for `sessions_spawn` (subagent runtime only) with base64/utf8 encoding, transcript content redaction, lifecycle cleanup, and configurable limits via `tools.sessions_spawn.attachments`. (#16761) Thanks @napetrov.
- Agents/Thinking defaults: set `adaptive` as the default thinking level for Anthropic Claude 4.6 models (including Bedrock Claude 4.6 refs) while keeping other reasoning-capable models at `low` unless explicitly configured.
- Tools/PDF analysis: add a first-class `pdf` tool with native Anthropic and Google PDF provider support, extraction fallback for non-native models, configurable defaults (`agents.defaults.pdfModel`, `pdfMaxBytesMb`, `pdfMaxPages`), and docs/tests covering routing, validation, and registration. (#31319) Thanks @tyler6204.
- Gateway/Container probes: add built-in HTTP liveness/readiness endpoints (`/health`, `/healthz`, `/ready`, `/readyz`) for Docker/Kubernetes health checks, with fallback routing so existing handlers on those paths are not shadowed. (#31272) Thanks @vincentkoc.
- Android/Nodes: add `camera.list`, `device.permissions`, `device.health`, and `notifications.actions` (`open`/`dismiss`/`reply`) on Android nodes, plus first-class node-tool actions for the new device/notification commands. (#28260) Thanks @obviyus.
- Discord/Thread bindings: replace fixed TTL lifecycle with inactivity (`idleHours`, default 24h) plus optional hard `maxAgeHours` lifecycle controls, and add `/session idle` + `/session max-age` commands for focused thread-bound sessions. (#27845) Thanks @osolmaz.

View File

@ -835,6 +835,12 @@ Time format in system prompt. Default: `auto` (OS preference).
primary: "openrouter/qwen/qwen-2.5-vl-72b-instruct:free",
fallbacks: ["openrouter/google/gemini-2.0-flash-vision:free"],
},
pdfModel: {
primary: "anthropic/claude-opus-4-6",
fallbacks: ["openai/gpt-5-mini"],
},
pdfMaxBytesMb: 10,
pdfMaxPages: 20,
thinkingDefault: "low",
verboseDefault: "off",
elevatedDefault: "on",
@ -853,6 +859,11 @@ Time format in system prompt. Default: `auto` (OS preference).
- `imageModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
- Used by the `image` tool path as its vision-model config.
- Also used as fallback routing when the selected/default model cannot accept image input.
- `pdfModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
- Used by the `pdf` tool for model routing.
- If omitted, the PDF tool falls back to `imageModel`, then to best-effort provider defaults.
- `pdfMaxBytesMb`: default PDF size limit for the `pdf` tool when `maxBytesMb` is not passed at call time.
- `pdfMaxPages`: default maximum pages considered by extraction fallback mode in the `pdf` tool.
- `model.primary`: format `provider/model` (e.g. `anthropic/claude-opus-4-6`). If you omit the provider, OpenClaw assumes `anthropic` (deprecated).
- `models`: the configured model catalog and allowlist for `/model`. Each entry can include `alias` (shortcut) and `params` (provider-specific, for example `temperature`, `maxTokens`, `cacheRetention`, `context1m`).
- `params` merge precedence (config): `agents.defaults.models["provider/model"].params` is the base, then `agents.list[].params` (matching agent id) overrides by key.

View File

@ -397,6 +397,26 @@ Notes:
- Only available when `agents.defaults.imageModel` is configured (primary or fallbacks), or when an implicit image model can be inferred from your default model + configured auth (best-effort pairing).
- Uses the image model directly (independent of the main chat model).
### `pdf`
Analyze one or more PDF documents.
Core parameters:
- `pdf` (single path or URL)
- `pdfs` (multiple paths or URLs, up to 10)
- `prompt` (optional, defaults to "Analyze this PDF document.")
- `pages` (optional page range like `1-5` or `1,3,7-9`)
- `model` (optional model override)
- `maxBytesMb` (optional size cap)
Notes:
- Native PDF provider mode is supported for Anthropic and Google models.
- Non-native models use PDF extraction fallback, text first, then rasterized page images when needed.
- `pages` filtering is only supported in extraction fallback mode. Native providers return a clear error when `pages` is set.
- Defaults are configurable via `agents.defaults.pdfModel`, `agents.defaults.pdfMaxBytesMb`, and `agents.defaults.pdfMaxPages`.
### `message`
Send messages and channel actions across Discord/Google Chat/Slack/Telegram/WhatsApp/Signal/iMessage/MS Teams.

View File

@ -5,13 +5,15 @@ import { ensureOpenClawModelsJson } from "./models-config.js";
const log = createSubsystemLogger("model-catalog");
export type ModelInputType = "text" | "image" | "document";
export type ModelCatalogEntry = {
id: string;
name: string;
provider: string;
contextWindow?: number;
reasoning?: boolean;
input?: Array<"text" | "image">;
input?: ModelInputType[];
};
type DiscoveredModel = {
@ -20,7 +22,7 @@ type DiscoveredModel = {
provider: string;
contextWindow?: number;
reasoning?: boolean;
input?: Array<"text" | "image">;
input?: ModelInputType[];
};
type PiSdkModule = typeof import("./pi-model-discovery.js");
@ -60,12 +62,12 @@ function applyOpenAICodexSparkFallback(models: ModelCatalogEntry[]): void {
});
}
function normalizeConfiguredModelInput(input: unknown): Array<"text" | "image"> | undefined {
function normalizeConfiguredModelInput(input: unknown): ModelInputType[] | undefined {
if (!Array.isArray(input)) {
return undefined;
}
const normalized = input.filter(
(item): item is "text" | "image" => item === "text" || item === "image",
(item): item is ModelInputType => item === "text" || item === "image" || item === "document",
);
return normalized.length > 0 ? normalized : undefined;
}
@ -248,6 +250,13 @@ export function modelSupportsVision(entry: ModelCatalogEntry | undefined): boole
return entry?.input?.includes("image") ?? false;
}
/**
* Check if a model supports native document/PDF input based on its catalog entry.
*/
export function modelSupportsDocument(entry: ModelCatalogEntry | undefined): boolean {
return entry?.input?.includes("document") ?? false;
}
/**
* Find a model in the catalog by provider and model ID.
*/

View File

@ -0,0 +1,33 @@
import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";
import { describe, expect, it } from "vitest";
import type { OpenClawConfig } from "../config/config.js";
import "./test-helpers/fast-core-tools.js";
import { createOpenClawTools } from "./openclaw-tools.js";
async function withTempAgentDir<T>(run: (agentDir: string) => Promise<T>): Promise<T> {
const agentDir = await fs.mkdtemp(path.join(os.tmpdir(), "openclaw-tools-pdf-"));
try {
return await run(agentDir);
} finally {
await fs.rm(agentDir, { recursive: true, force: true });
}
}
describe("createOpenClawTools PDF registration", () => {
it("includes pdf tool when pdfModel is configured", async () => {
await withTempAgentDir(async (agentDir) => {
const cfg: OpenClawConfig = {
agents: {
defaults: {
pdfModel: { primary: "openai/gpt-5-mini" },
},
},
};
const tools = createOpenClawTools({ config: cfg, agentDir });
expect(tools.some((tool) => tool.name === "pdf")).toBe(true);
});
});
});

View File

@ -13,6 +13,7 @@ import { createGatewayTool } from "./tools/gateway-tool.js";
import { createImageTool } from "./tools/image-tool.js";
import { createMessageTool } from "./tools/message-tool.js";
import { createNodesTool } from "./tools/nodes-tool.js";
import { createPdfTool } from "./tools/pdf-tool.js";
import { createSessionStatusTool } from "./tools/session-status-tool.js";
import { createSessionsHistoryTool } from "./tools/sessions-history-tool.js";
import { createSessionsListTool } from "./tools/sessions-list-tool.js";
@ -84,6 +85,18 @@ export function createOpenClawTools(options?: {
modelHasVision: options?.modelHasVision,
})
: null;
const pdfTool = options?.agentDir?.trim()
? createPdfTool({
config: options?.config,
agentDir: options.agentDir,
workspaceDir,
sandbox:
options?.sandboxRoot && options?.sandboxFsBridge
? { root: options.sandboxRoot, bridge: options.sandboxFsBridge }
: undefined,
fsPolicy: options?.fsPolicy,
})
: null;
const webSearchTool = createWebSearchTool({
config: options?.config,
sandboxed: options?.sandboxed,
@ -173,6 +186,7 @@ export function createOpenClawTools(options?: {
...(webSearchTool ? [webSearchTool] : []),
...(webFetchTool ? [webFetchTool] : []),
...(imageTool ? [imageTool] : []),
...(pdfTool ? [pdfTool] : []),
];
const pluginTools = resolvePluginTools({

View File

@ -0,0 +1,179 @@
/**
* Direct SDK/HTTP calls for providers that support native PDF document input.
* This bypasses pi-ai's content type system which does not have a "document" type.
*/
import { isRecord } from "../../utils.js";
import { normalizeSecretInput } from "../../utils/normalize-secret-input.js";
type PdfInput = {
base64: string;
filename?: string;
};
// ---------------------------------------------------------------------------
// Anthropic native PDF via Messages API
// ---------------------------------------------------------------------------
type AnthropicDocBlock = {
type: "document";
source: {
type: "base64";
media_type: "application/pdf";
data: string;
};
};
type AnthropicTextBlock = {
type: "text";
text: string;
};
type AnthropicContentBlock = AnthropicDocBlock | AnthropicTextBlock;
type AnthropicResponseContent = Array<{ type: string; text?: string }>;
export async function anthropicAnalyzePdf(params: {
apiKey: string;
modelId: string;
prompt: string;
pdfs: PdfInput[];
maxTokens?: number;
baseUrl?: string;
}): Promise<string> {
const apiKey = normalizeSecretInput(params.apiKey);
if (!apiKey) {
throw new Error("Anthropic PDF: apiKey required");
}
const content: AnthropicContentBlock[] = [];
for (const pdf of params.pdfs) {
content.push({
type: "document",
source: {
type: "base64",
media_type: "application/pdf",
data: pdf.base64,
},
});
}
content.push({ type: "text", text: params.prompt });
const baseUrl = (params.baseUrl ?? "https://api.anthropic.com").replace(/\/+$/, "");
const res = await fetch(`${baseUrl}/v1/messages`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": apiKey,
"anthropic-version": "2023-06-01",
"anthropic-beta": "pdfs-2024-09-25",
},
body: JSON.stringify({
model: params.modelId,
max_tokens: params.maxTokens ?? 4096,
messages: [{ role: "user", content }],
}),
});
if (!res.ok) {
const body = await res.text().catch(() => "");
throw new Error(
`Anthropic PDF request failed (${res.status} ${res.statusText})${body ? `: ${body.slice(0, 400)}` : ""}`,
);
}
const json = (await res.json().catch(() => null)) as unknown;
if (!isRecord(json)) {
throw new Error("Anthropic PDF response was not JSON.");
}
const responseContent = json.content as AnthropicResponseContent | undefined;
if (!Array.isArray(responseContent)) {
throw new Error("Anthropic PDF response missing content array.");
}
const text = responseContent
.filter((block) => block.type === "text" && typeof block.text === "string")
.map((block) => block.text!)
.join("");
if (!text.trim()) {
throw new Error("Anthropic PDF returned no text.");
}
return text.trim();
}
// ---------------------------------------------------------------------------
// Google Gemini native PDF via generateContent API
// ---------------------------------------------------------------------------
type GeminiPart = { inline_data: { mime_type: string; data: string } } | { text: string };
type GeminiCandidate = {
content?: { parts?: Array<{ text?: string }> };
};
export async function geminiAnalyzePdf(params: {
apiKey: string;
modelId: string;
prompt: string;
pdfs: PdfInput[];
baseUrl?: string;
}): Promise<string> {
const apiKey = normalizeSecretInput(params.apiKey);
if (!apiKey) {
throw new Error("Gemini PDF: apiKey required");
}
const parts: GeminiPart[] = [];
for (const pdf of params.pdfs) {
parts.push({
inline_data: {
mime_type: "application/pdf",
data: pdf.base64,
},
});
}
parts.push({ text: params.prompt });
const baseUrl = (params.baseUrl ?? "https://generativelanguage.googleapis.com").replace(
/\/+$/,
"",
);
const url = `${baseUrl}/v1beta/models/${encodeURIComponent(params.modelId)}:generateContent?key=${encodeURIComponent(apiKey)}`;
const res = await fetch(url, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
contents: [{ role: "user", parts }],
}),
});
if (!res.ok) {
const body = await res.text().catch(() => "");
throw new Error(
`Gemini PDF request failed (${res.status} ${res.statusText})${body ? `: ${body.slice(0, 400)}` : ""}`,
);
}
const json = (await res.json().catch(() => null)) as unknown;
if (!isRecord(json)) {
throw new Error("Gemini PDF response was not JSON.");
}
const candidates = json.candidates as GeminiCandidate[] | undefined;
if (!Array.isArray(candidates) || candidates.length === 0) {
throw new Error("Gemini PDF returned no candidates.");
}
const textParts = candidates[0].content?.parts?.filter((p) => typeof p.text === "string") ?? [];
const text = textParts.map((p) => p.text!).join("");
if (!text.trim()) {
throw new Error("Gemini PDF returned no text.");
}
return text.trim();
}

View File

@ -0,0 +1,103 @@
import type { AssistantMessage } from "@mariozechner/pi-ai";
import type { OpenClawConfig } from "../../config/config.js";
import {
resolveAgentModelFallbackValues,
resolveAgentModelPrimaryValue,
} from "../../config/model-input.js";
import { extractAssistantText } from "../pi-embedded-utils.js";
export type PdfModelConfig = { primary?: string; fallbacks?: string[] };
/**
* Providers known to support native PDF document input.
* When the model's provider is in this set, the tool sends raw PDF bytes
* via provider-specific API calls instead of extracting text/images first.
*/
export const NATIVE_PDF_PROVIDERS = new Set(["anthropic", "google"]);
/**
* Check whether a provider supports native PDF document input.
*/
export function providerSupportsNativePdf(provider: string): boolean {
return NATIVE_PDF_PROVIDERS.has(provider.toLowerCase().trim());
}
/**
* Parse a page range string (e.g. "1-5", "3", "1-3,7-9") into an array of 1-based page numbers.
*/
export function parsePageRange(range: string, maxPages: number): number[] {
const pages = new Set<number>();
const parts = range.split(",").map((p) => p.trim());
for (const part of parts) {
if (!part) {
continue;
}
const dashMatch = /^(\d+)\s*-\s*(\d+)$/.exec(part);
if (dashMatch) {
const start = Number(dashMatch[1]);
const end = Number(dashMatch[2]);
if (!Number.isFinite(start) || !Number.isFinite(end) || start < 1 || end < start) {
throw new Error(`Invalid page range: "${part}"`);
}
for (let i = start; i <= Math.min(end, maxPages); i++) {
pages.add(i);
}
} else {
const num = Number(part);
if (!Number.isFinite(num) || num < 1) {
throw new Error(`Invalid page number: "${part}"`);
}
if (num <= maxPages) {
pages.add(num);
}
}
}
return Array.from(pages).toSorted((a, b) => a - b);
}
export function coercePdfAssistantText(params: {
message: AssistantMessage;
provider: string;
model: string;
}): string {
const stop = params.message.stopReason;
const errorMessage = params.message.errorMessage?.trim();
if (stop === "error" || stop === "aborted") {
throw new Error(
errorMessage
? `PDF model failed (${params.provider}/${params.model}): ${errorMessage}`
: `PDF model failed (${params.provider}/${params.model})`,
);
}
if (errorMessage) {
throw new Error(`PDF model failed (${params.provider}/${params.model}): ${errorMessage}`);
}
const text = extractAssistantText(params.message);
if (text.trim()) {
return text.trim();
}
throw new Error(`PDF model returned no text (${params.provider}/${params.model}).`);
}
export function coercePdfModelConfig(cfg?: OpenClawConfig): PdfModelConfig {
const primary = resolveAgentModelPrimaryValue(cfg?.agents?.defaults?.pdfModel);
const fallbacks = resolveAgentModelFallbackValues(cfg?.agents?.defaults?.pdfModel);
return {
...(primary?.trim() ? { primary: primary.trim() } : {}),
...(fallbacks.length > 0 ? { fallbacks } : {}),
};
}
export function resolvePdfToolMaxTokens(
modelMaxTokens: number | undefined,
requestedMaxTokens = 4096,
) {
if (
typeof modelMaxTokens !== "number" ||
!Number.isFinite(modelMaxTokens) ||
modelMaxTokens <= 0
) {
return requestedMaxTokens;
}
return Math.min(requestedMaxTokens, modelMaxTokens);
}

View File

@ -0,0 +1,861 @@
import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import type { OpenClawConfig } from "../../config/config.js";
import {
coercePdfAssistantText,
coercePdfModelConfig,
parsePageRange,
providerSupportsNativePdf,
resolvePdfToolMaxTokens,
} from "./pdf-tool.helpers.js";
import { createPdfTool, resolvePdfModelConfigForTool } from "./pdf-tool.js";
vi.mock("@mariozechner/pi-ai", async (importOriginal) => {
const actual = await importOriginal<typeof import("@mariozechner/pi-ai")>();
return {
...actual,
complete: vi.fn(),
};
});
async function withTempAgentDir<T>(run: (agentDir: string) => Promise<T>): Promise<T> {
const agentDir = await fs.mkdtemp(path.join(os.tmpdir(), "openclaw-pdf-"));
try {
return await run(agentDir);
} finally {
await fs.rm(agentDir, { recursive: true, force: true });
}
}
// ---------------------------------------------------------------------------
// parsePageRange tests
// ---------------------------------------------------------------------------
describe("parsePageRange", () => {
it("parses a single page number", () => {
expect(parsePageRange("3", 20)).toEqual([3]);
});
it("parses a page range", () => {
expect(parsePageRange("1-5", 20)).toEqual([1, 2, 3, 4, 5]);
});
it("parses comma-separated pages and ranges", () => {
expect(parsePageRange("1,3,5-7", 20)).toEqual([1, 3, 5, 6, 7]);
});
it("clamps to maxPages", () => {
expect(parsePageRange("1-100", 5)).toEqual([1, 2, 3, 4, 5]);
});
it("deduplicates and sorts", () => {
expect(parsePageRange("5,3,1,3,5", 20)).toEqual([1, 3, 5]);
});
it("throws on invalid page number", () => {
expect(() => parsePageRange("abc", 20)).toThrow("Invalid page number");
});
it("throws on invalid range (start > end)", () => {
expect(() => parsePageRange("5-3", 20)).toThrow("Invalid page range");
});
it("throws on zero page number", () => {
expect(() => parsePageRange("0", 20)).toThrow("Invalid page number");
});
it("throws on negative page number", () => {
expect(() => parsePageRange("-1", 20)).toThrow("Invalid page number");
});
it("handles empty parts gracefully", () => {
expect(parsePageRange("1,,3", 20)).toEqual([1, 3]);
});
});
// ---------------------------------------------------------------------------
// providerSupportsNativePdf tests
// ---------------------------------------------------------------------------
describe("providerSupportsNativePdf", () => {
it("returns true for anthropic", () => {
expect(providerSupportsNativePdf("anthropic")).toBe(true);
});
it("returns true for google", () => {
expect(providerSupportsNativePdf("google")).toBe(true);
});
it("returns false for openai", () => {
expect(providerSupportsNativePdf("openai")).toBe(false);
});
it("returns false for minimax", () => {
expect(providerSupportsNativePdf("minimax")).toBe(false);
});
it("is case-insensitive", () => {
expect(providerSupportsNativePdf("Anthropic")).toBe(true);
expect(providerSupportsNativePdf("GOOGLE")).toBe(true);
});
});
// ---------------------------------------------------------------------------
// PDF model config resolution
// ---------------------------------------------------------------------------
describe("resolvePdfModelConfigForTool", () => {
const priorFetch = global.fetch;
beforeEach(() => {
vi.stubEnv("OPENAI_API_KEY", "");
vi.stubEnv("ANTHROPIC_API_KEY", "");
vi.stubEnv("ANTHROPIC_OAUTH_TOKEN", "");
vi.stubEnv("GOOGLE_API_KEY", "");
vi.stubEnv("COPILOT_GITHUB_TOKEN", "");
vi.stubEnv("GH_TOKEN", "");
vi.stubEnv("GITHUB_TOKEN", "");
});
afterEach(() => {
vi.unstubAllEnvs();
global.fetch = priorFetch;
});
it("returns null without any auth", async () => {
await withTempAgentDir(async (agentDir) => {
const cfg: OpenClawConfig = {
agents: { defaults: { model: { primary: "openai/gpt-5.2" } } },
};
expect(resolvePdfModelConfigForTool({ cfg, agentDir })).toBeNull();
});
});
it("prefers explicit pdfModel config", async () => {
await withTempAgentDir(async (agentDir) => {
const cfg: OpenClawConfig = {
agents: {
defaults: {
model: { primary: "openai/gpt-5.2" },
pdfModel: { primary: "anthropic/claude-opus-4-6" },
},
},
} as OpenClawConfig;
expect(resolvePdfModelConfigForTool({ cfg, agentDir })).toEqual({
primary: "anthropic/claude-opus-4-6",
});
});
});
it("falls back to imageModel config when no pdfModel set", async () => {
await withTempAgentDir(async (agentDir) => {
const cfg: OpenClawConfig = {
agents: {
defaults: {
model: { primary: "openai/gpt-5.2" },
imageModel: { primary: "openai/gpt-5-mini" },
},
},
};
expect(resolvePdfModelConfigForTool({ cfg, agentDir })).toEqual({
primary: "openai/gpt-5-mini",
});
});
});
it("prefers anthropic when available for native PDF support", async () => {
await withTempAgentDir(async (agentDir) => {
vi.stubEnv("ANTHROPIC_API_KEY", "anthropic-test");
vi.stubEnv("OPENAI_API_KEY", "openai-test");
const cfg: OpenClawConfig = {
agents: { defaults: { model: { primary: "openai/gpt-5.2" } } },
};
const config = resolvePdfModelConfigForTool({ cfg, agentDir });
expect(config).not.toBeNull();
// Should prefer anthropic for native PDF
expect(config?.primary).toBe("anthropic/claude-opus-4-6");
});
});
it("uses anthropic primary when provider is anthropic", async () => {
await withTempAgentDir(async (agentDir) => {
vi.stubEnv("ANTHROPIC_API_KEY", "anthropic-test");
const cfg: OpenClawConfig = {
agents: { defaults: { model: { primary: "anthropic/claude-opus-4-6" } } },
};
const config = resolvePdfModelConfigForTool({ cfg, agentDir });
expect(config?.primary).toBe("anthropic/claude-opus-4-6");
});
});
});
// ---------------------------------------------------------------------------
// createPdfTool
// ---------------------------------------------------------------------------
describe("createPdfTool", () => {
const priorFetch = global.fetch;
beforeEach(() => {
vi.stubEnv("OPENAI_API_KEY", "");
vi.stubEnv("ANTHROPIC_API_KEY", "");
vi.stubEnv("ANTHROPIC_OAUTH_TOKEN", "");
vi.stubEnv("GOOGLE_API_KEY", "");
vi.stubEnv("COPILOT_GITHUB_TOKEN", "");
vi.stubEnv("GH_TOKEN", "");
vi.stubEnv("GITHUB_TOKEN", "");
});
afterEach(() => {
vi.restoreAllMocks();
vi.unstubAllEnvs();
global.fetch = priorFetch;
});
it("returns null without agentDir and no explicit config", () => {
expect(createPdfTool()).toBeNull();
});
it("returns null without any auth configured", async () => {
await withTempAgentDir(async (agentDir) => {
const cfg: OpenClawConfig = {
agents: { defaults: { model: { primary: "openai/gpt-5.2" } } },
};
expect(createPdfTool({ config: cfg, agentDir })).toBeNull();
});
});
it("throws when agentDir missing but explicit config present", () => {
const cfg: OpenClawConfig = {
agents: {
defaults: {
pdfModel: { primary: "anthropic/claude-opus-4-6" },
},
},
} as OpenClawConfig;
expect(() => createPdfTool({ config: cfg })).toThrow("requires agentDir");
});
it("creates tool when auth is available", async () => {
await withTempAgentDir(async (agentDir) => {
vi.stubEnv("ANTHROPIC_API_KEY", "anthropic-test");
const cfg: OpenClawConfig = {
agents: { defaults: { model: { primary: "anthropic/claude-opus-4-6" } } },
};
const tool = createPdfTool({ config: cfg, agentDir });
expect(tool).not.toBeNull();
expect(tool?.name).toBe("pdf");
expect(tool?.label).toBe("PDF");
expect(tool?.description).toContain("PDF documents");
});
});
it("rejects when no pdf input provided", async () => {
await withTempAgentDir(async (agentDir) => {
vi.stubEnv("ANTHROPIC_API_KEY", "anthropic-test");
const cfg: OpenClawConfig = {
agents: { defaults: { model: { primary: "anthropic/claude-opus-4-6" } } },
};
const tool = createPdfTool({ config: cfg, agentDir });
expect(tool).not.toBeNull();
await expect(tool!.execute("t1", { prompt: "test" })).rejects.toThrow("pdf required");
});
});
it("rejects too many PDFs", async () => {
await withTempAgentDir(async (agentDir) => {
vi.stubEnv("ANTHROPIC_API_KEY", "anthropic-test");
const cfg: OpenClawConfig = {
agents: { defaults: { model: { primary: "anthropic/claude-opus-4-6" } } },
};
const tool = createPdfTool({ config: cfg, agentDir });
expect(tool).not.toBeNull();
const manyPdfs = Array.from({ length: 15 }, (_, i) => `/tmp/doc${i}.pdf`);
const result = await tool!.execute("t1", { prompt: "test", pdfs: manyPdfs });
expect(result).toMatchObject({
details: { error: "too_many_pdfs" },
});
});
});
it("rejects unsupported scheme references", async () => {
await withTempAgentDir(async (agentDir) => {
vi.stubEnv("ANTHROPIC_API_KEY", "anthropic-test");
const cfg: OpenClawConfig = {
agents: { defaults: { model: { primary: "anthropic/claude-opus-4-6" } } },
};
const tool = createPdfTool({ config: cfg, agentDir });
expect(tool).not.toBeNull();
const result = await tool!.execute("t1", {
prompt: "test",
pdf: "ftp://example.com/doc.pdf",
});
expect(result).toMatchObject({
details: { error: "unsupported_pdf_reference" },
});
});
});
it("deduplicates pdf inputs before loading", async () => {
await withTempAgentDir(async (agentDir) => {
const webMedia = await import("../../web/media.js");
const loadSpy = vi.spyOn(webMedia, "loadWebMediaRaw").mockResolvedValue({
kind: "document",
buffer: Buffer.from("%PDF-1.4 fake"),
contentType: "application/pdf",
fileName: "doc.pdf",
} as never);
const modelDiscovery = await import("../pi-model-discovery.js");
vi.spyOn(modelDiscovery, "discoverAuthStorage").mockReturnValue({
setRuntimeApiKey: vi.fn(),
} as never);
vi.spyOn(modelDiscovery, "discoverModels").mockReturnValue({ find: () => null } as never);
const modelsConfig = await import("../models-config.js");
vi.spyOn(modelsConfig, "ensureOpenClawModelsJson").mockResolvedValue(undefined);
const modelAuth = await import("../model-auth.js");
vi.spyOn(modelAuth, "getApiKeyForModel").mockResolvedValue({ apiKey: "test-key" } as never);
vi.spyOn(modelAuth, "requireApiKey").mockReturnValue("test-key");
const cfg: OpenClawConfig = {
agents: {
defaults: {
pdfModel: { primary: "anthropic/claude-opus-4-6" },
},
},
};
const tool = createPdfTool({ config: cfg, agentDir });
expect(tool).not.toBeNull();
await expect(
tool!.execute("t1", {
prompt: "test",
pdf: "/tmp/nonexistent.pdf",
pdfs: ["/tmp/nonexistent.pdf"],
}),
).rejects.toThrow("Unknown model");
expect(loadSpy).toHaveBeenCalledTimes(1);
});
});
it("uses native PDF path without eager extraction", async () => {
await withTempAgentDir(async (agentDir) => {
const webMedia = await import("../../web/media.js");
vi.spyOn(webMedia, "loadWebMediaRaw").mockResolvedValue({
kind: "document",
buffer: Buffer.from("%PDF-1.4 fake"),
contentType: "application/pdf",
fileName: "doc.pdf",
} as never);
const modelDiscovery = await import("../pi-model-discovery.js");
vi.spyOn(modelDiscovery, "discoverAuthStorage").mockReturnValue({
setRuntimeApiKey: vi.fn(),
} as never);
vi.spyOn(modelDiscovery, "discoverModels").mockReturnValue({
find: () =>
({
provider: "anthropic",
maxTokens: 8192,
input: ["text", "document"],
}) as never,
} as never);
const modelsConfig = await import("../models-config.js");
vi.spyOn(modelsConfig, "ensureOpenClawModelsJson").mockResolvedValue(undefined);
const modelAuth = await import("../model-auth.js");
vi.spyOn(modelAuth, "getApiKeyForModel").mockResolvedValue({ apiKey: "test-key" } as never);
vi.spyOn(modelAuth, "requireApiKey").mockReturnValue("test-key");
const nativeProviders = await import("./pdf-native-providers.js");
vi.spyOn(nativeProviders, "anthropicAnalyzePdf").mockResolvedValue("native summary");
const extractModule = await import("../../media/pdf-extract.js");
const extractSpy = vi.spyOn(extractModule, "extractPdfContent");
const cfg: OpenClawConfig = {
agents: {
defaults: {
pdfModel: { primary: "anthropic/claude-opus-4-6" },
},
},
};
const tool = createPdfTool({ config: cfg, agentDir });
expect(tool).not.toBeNull();
const result = await tool!.execute("t1", {
prompt: "summarize",
pdf: "/tmp/doc.pdf",
});
expect(extractSpy).not.toHaveBeenCalled();
expect(result).toMatchObject({
content: [{ type: "text", text: "native summary" }],
details: { native: true, model: "anthropic/claude-opus-4-6" },
});
});
});
it("rejects pages parameter for native PDF providers", async () => {
await withTempAgentDir(async (agentDir) => {
const webMedia = await import("../../web/media.js");
vi.spyOn(webMedia, "loadWebMediaRaw").mockResolvedValue({
kind: "document",
buffer: Buffer.from("%PDF-1.4 fake"),
contentType: "application/pdf",
fileName: "doc.pdf",
} as never);
const modelDiscovery = await import("../pi-model-discovery.js");
vi.spyOn(modelDiscovery, "discoverAuthStorage").mockReturnValue({
setRuntimeApiKey: vi.fn(),
} as never);
vi.spyOn(modelDiscovery, "discoverModels").mockReturnValue({
find: () =>
({
provider: "anthropic",
maxTokens: 8192,
input: ["text", "document"],
}) as never,
} as never);
const modelsConfig = await import("../models-config.js");
vi.spyOn(modelsConfig, "ensureOpenClawModelsJson").mockResolvedValue(undefined);
const modelAuth = await import("../model-auth.js");
vi.spyOn(modelAuth, "getApiKeyForModel").mockResolvedValue({ apiKey: "test-key" } as never);
vi.spyOn(modelAuth, "requireApiKey").mockReturnValue("test-key");
const cfg: OpenClawConfig = {
agents: {
defaults: {
pdfModel: { primary: "anthropic/claude-opus-4-6" },
},
},
};
const tool = createPdfTool({ config: cfg, agentDir });
expect(tool).not.toBeNull();
await expect(
tool!.execute("t1", {
prompt: "summarize",
pdf: "/tmp/doc.pdf",
pages: "1-2",
}),
).rejects.toThrow("pages is not supported with native PDF providers");
});
});
it("uses extraction fallback for non-native models", async () => {
await withTempAgentDir(async (agentDir) => {
const webMedia = await import("../../web/media.js");
vi.spyOn(webMedia, "loadWebMediaRaw").mockResolvedValue({
kind: "document",
buffer: Buffer.from("%PDF-1.4 fake"),
contentType: "application/pdf",
fileName: "doc.pdf",
} as never);
const modelDiscovery = await import("../pi-model-discovery.js");
vi.spyOn(modelDiscovery, "discoverAuthStorage").mockReturnValue({
setRuntimeApiKey: vi.fn(),
} as never);
vi.spyOn(modelDiscovery, "discoverModels").mockReturnValue({
find: () =>
({
provider: "openai",
maxTokens: 8192,
input: ["text"],
}) as never,
} as never);
const modelsConfig = await import("../models-config.js");
vi.spyOn(modelsConfig, "ensureOpenClawModelsJson").mockResolvedValue(undefined);
const modelAuth = await import("../model-auth.js");
vi.spyOn(modelAuth, "getApiKeyForModel").mockResolvedValue({ apiKey: "test-key" } as never);
vi.spyOn(modelAuth, "requireApiKey").mockReturnValue("test-key");
const extractModule = await import("../../media/pdf-extract.js");
const extractSpy = vi.spyOn(extractModule, "extractPdfContent").mockResolvedValue({
text: "Extracted content",
images: [],
});
const piAi = await import("@mariozechner/pi-ai");
vi.mocked(piAi.complete).mockResolvedValue({
role: "assistant",
stopReason: "stop",
content: [{ type: "text", text: "fallback summary" }],
} as never);
const cfg: OpenClawConfig = {
agents: {
defaults: {
pdfModel: { primary: "openai/gpt-5-mini" },
},
},
};
const tool = createPdfTool({ config: cfg, agentDir });
expect(tool).not.toBeNull();
const result = await tool!.execute("t1", {
prompt: "summarize",
pdf: "/tmp/doc.pdf",
});
expect(extractSpy).toHaveBeenCalledTimes(1);
expect(result).toMatchObject({
content: [{ type: "text", text: "fallback summary" }],
details: { native: false, model: "openai/gpt-5-mini" },
});
});
});
it("tool parameters have correct schema shape", async () => {
await withTempAgentDir(async (agentDir) => {
vi.stubEnv("ANTHROPIC_API_KEY", "anthropic-test");
const cfg: OpenClawConfig = {
agents: { defaults: { model: { primary: "anthropic/claude-opus-4-6" } } },
};
const tool = createPdfTool({ config: cfg, agentDir });
expect(tool).not.toBeNull();
const schema = tool!.parameters;
expect(schema.type).toBe("object");
expect(schema.properties).toBeDefined();
const props = schema.properties as Record<string, { type?: string }>;
expect(props.prompt).toBeDefined();
expect(props.pdf).toBeDefined();
expect(props.pdfs).toBeDefined();
expect(props.pages).toBeDefined();
expect(props.model).toBeDefined();
expect(props.maxBytesMb).toBeDefined();
});
});
});
// ---------------------------------------------------------------------------
// Native provider detection
// ---------------------------------------------------------------------------
describe("native PDF provider API calls", () => {
const priorFetch = global.fetch;
afterEach(() => {
global.fetch = priorFetch;
});
it("anthropicAnalyzePdf sends correct request shape", async () => {
const { anthropicAnalyzePdf } = await import("./pdf-native-providers.js");
const fetch = vi.fn().mockResolvedValue({
ok: true,
json: async () => ({
content: [{ type: "text", text: "Analysis of PDF" }],
}),
});
global.fetch = fetch;
const result = await anthropicAnalyzePdf({
apiKey: "test-key",
modelId: "claude-opus-4-6",
prompt: "Summarize this document",
pdfs: [{ base64: "dGVzdA==", filename: "doc.pdf" }],
maxTokens: 4096,
});
expect(result).toBe("Analysis of PDF");
expect(fetch).toHaveBeenCalledTimes(1);
const [url, opts] = fetch.mock.calls[0];
expect(url).toContain("/v1/messages");
const body = JSON.parse(opts.body);
expect(body.model).toBe("claude-opus-4-6");
expect(body.messages[0].content).toHaveLength(2);
expect(body.messages[0].content[0].type).toBe("document");
expect(body.messages[0].content[0].source.media_type).toBe("application/pdf");
expect(body.messages[0].content[1].type).toBe("text");
});
it("anthropicAnalyzePdf throws on API error", async () => {
const { anthropicAnalyzePdf } = await import("./pdf-native-providers.js");
const fetch = vi.fn().mockResolvedValue({
ok: false,
status: 400,
statusText: "Bad Request",
text: async () => "invalid request",
});
global.fetch = fetch;
await expect(
anthropicAnalyzePdf({
apiKey: "test-key",
modelId: "claude-opus-4-6",
prompt: "test",
pdfs: [{ base64: "dGVzdA==", filename: "doc.pdf" }],
}),
).rejects.toThrow("Anthropic PDF request failed");
});
it("anthropicAnalyzePdf throws when response has no text", async () => {
const { anthropicAnalyzePdf } = await import("./pdf-native-providers.js");
const fetch = vi.fn().mockResolvedValue({
ok: true,
json: async () => ({
content: [{ type: "text", text: " " }],
}),
});
global.fetch = fetch;
await expect(
anthropicAnalyzePdf({
apiKey: "test-key",
modelId: "claude-opus-4-6",
prompt: "test",
pdfs: [{ base64: "dGVzdA==", filename: "doc.pdf" }],
}),
).rejects.toThrow("Anthropic PDF returned no text");
});
it("geminiAnalyzePdf sends correct request shape", async () => {
const { geminiAnalyzePdf } = await import("./pdf-native-providers.js");
const fetch = vi.fn().mockResolvedValue({
ok: true,
json: async () => ({
candidates: [
{
content: { parts: [{ text: "Gemini PDF analysis" }] },
},
],
}),
});
global.fetch = fetch;
const result = await geminiAnalyzePdf({
apiKey: "test-key",
modelId: "gemini-2.5-pro",
prompt: "Summarize this",
pdfs: [{ base64: "dGVzdA==", filename: "doc.pdf" }],
});
expect(result).toBe("Gemini PDF analysis");
expect(fetch).toHaveBeenCalledTimes(1);
const [url, opts] = fetch.mock.calls[0];
expect(url).toContain("generateContent");
expect(url).toContain("gemini-2.5-pro");
const body = JSON.parse(opts.body);
expect(body.contents[0].parts).toHaveLength(2);
expect(body.contents[0].parts[0].inline_data.mime_type).toBe("application/pdf");
expect(body.contents[0].parts[1].text).toBe("Summarize this");
});
it("geminiAnalyzePdf throws on API error", async () => {
const { geminiAnalyzePdf } = await import("./pdf-native-providers.js");
const fetch = vi.fn().mockResolvedValue({
ok: false,
status: 500,
statusText: "Internal Server Error",
text: async () => "server error",
});
global.fetch = fetch;
await expect(
geminiAnalyzePdf({
apiKey: "test-key",
modelId: "gemini-2.5-pro",
prompt: "test",
pdfs: [{ base64: "dGVzdA==", filename: "doc.pdf" }],
}),
).rejects.toThrow("Gemini PDF request failed");
});
it("geminiAnalyzePdf throws when no candidates returned", async () => {
const { geminiAnalyzePdf } = await import("./pdf-native-providers.js");
const fetch = vi.fn().mockResolvedValue({
ok: true,
json: async () => ({ candidates: [] }),
});
global.fetch = fetch;
await expect(
geminiAnalyzePdf({
apiKey: "test-key",
modelId: "gemini-2.5-pro",
prompt: "test",
pdfs: [{ base64: "dGVzdA==", filename: "doc.pdf" }],
}),
).rejects.toThrow("Gemini PDF returned no candidates");
});
it("anthropicAnalyzePdf supports multiple PDFs", async () => {
const { anthropicAnalyzePdf } = await import("./pdf-native-providers.js");
const fetch = vi.fn().mockResolvedValue({
ok: true,
json: async () => ({
content: [{ type: "text", text: "Multi-doc analysis" }],
}),
});
global.fetch = fetch;
await anthropicAnalyzePdf({
apiKey: "test-key",
modelId: "claude-opus-4-6",
prompt: "Compare these documents",
pdfs: [
{ base64: "cGRmMQ==", filename: "doc1.pdf" },
{ base64: "cGRmMg==", filename: "doc2.pdf" },
],
});
const body = JSON.parse(fetch.mock.calls[0][1].body);
// 2 document blocks + 1 text block
expect(body.messages[0].content).toHaveLength(3);
expect(body.messages[0].content[0].type).toBe("document");
expect(body.messages[0].content[1].type).toBe("document");
expect(body.messages[0].content[2].type).toBe("text");
});
it("anthropicAnalyzePdf uses custom base URL", async () => {
const { anthropicAnalyzePdf } = await import("./pdf-native-providers.js");
const fetch = vi.fn().mockResolvedValue({
ok: true,
json: async () => ({
content: [{ type: "text", text: "ok" }],
}),
});
global.fetch = fetch;
await anthropicAnalyzePdf({
apiKey: "test-key",
modelId: "claude-opus-4-6",
prompt: "test",
pdfs: [{ base64: "dGVzdA==", filename: "doc.pdf" }],
baseUrl: "https://custom.example.com",
});
expect(fetch.mock.calls[0][0]).toContain("https://custom.example.com/v1/messages");
});
it("anthropicAnalyzePdf requires apiKey", async () => {
const { anthropicAnalyzePdf } = await import("./pdf-native-providers.js");
await expect(
anthropicAnalyzePdf({
apiKey: "",
modelId: "claude-opus-4-6",
prompt: "test",
pdfs: [{ base64: "dGVzdA==", filename: "doc.pdf" }],
}),
).rejects.toThrow("apiKey required");
});
it("geminiAnalyzePdf requires apiKey", async () => {
const { geminiAnalyzePdf } = await import("./pdf-native-providers.js");
await expect(
geminiAnalyzePdf({
apiKey: "",
modelId: "gemini-2.5-pro",
prompt: "test",
pdfs: [{ base64: "dGVzdA==", filename: "doc.pdf" }],
}),
).rejects.toThrow("apiKey required");
});
});
// ---------------------------------------------------------------------------
// PDF tool helpers
// ---------------------------------------------------------------------------
describe("pdf-tool.helpers", () => {
it("resolvePdfToolMaxTokens respects model limit", () => {
expect(resolvePdfToolMaxTokens(2048, 4096)).toBe(2048);
expect(resolvePdfToolMaxTokens(8192, 4096)).toBe(4096);
expect(resolvePdfToolMaxTokens(undefined, 4096)).toBe(4096);
});
it("coercePdfModelConfig reads primary and fallbacks", () => {
const cfg: OpenClawConfig = {
agents: {
defaults: {
pdfModel: {
primary: "anthropic/claude-opus-4-6",
fallbacks: ["google/gemini-2.5-pro"],
},
},
},
};
expect(coercePdfModelConfig(cfg)).toEqual({
primary: "anthropic/claude-opus-4-6",
fallbacks: ["google/gemini-2.5-pro"],
});
});
it("coercePdfAssistantText returns trimmed text", () => {
const text = coercePdfAssistantText({
provider: "anthropic",
model: "claude-opus-4-6",
message: {
role: "assistant",
stopReason: "stop",
content: [{ type: "text", text: " summary " }],
} as never,
});
expect(text).toBe("summary");
});
it("coercePdfAssistantText throws clear error for failed model output", () => {
expect(() =>
coercePdfAssistantText({
provider: "google",
model: "gemini-2.5-pro",
message: {
role: "assistant",
stopReason: "error",
errorMessage: "bad request",
content: [],
} as never,
}),
).toThrow("PDF model failed (google/gemini-2.5-pro): bad request");
});
});
// ---------------------------------------------------------------------------
// Model catalog document support
// ---------------------------------------------------------------------------
describe("model catalog document support", () => {
it("modelSupportsDocument returns true when input includes document", async () => {
const { modelSupportsDocument } = await import("../model-catalog.js");
expect(
modelSupportsDocument({
id: "test",
name: "test",
provider: "test",
input: ["text", "document"],
}),
).toBe(true);
});
it("modelSupportsDocument returns false when input lacks document", async () => {
const { modelSupportsDocument } = await import("../model-catalog.js");
expect(
modelSupportsDocument({
id: "test",
name: "test",
provider: "test",
input: ["text", "image"],
}),
).toBe(false);
});
it("modelSupportsDocument returns false for undefined entry", async () => {
const { modelSupportsDocument } = await import("../model-catalog.js");
expect(modelSupportsDocument(undefined)).toBe(false);
});
});

View File

@ -0,0 +1,604 @@
import { type Api, type Context, complete, type Model } from "@mariozechner/pi-ai";
import { Type } from "@sinclair/typebox";
import type { OpenClawConfig } from "../../config/config.js";
import { extractPdfContent, type PdfExtractedContent } from "../../media/pdf-extract.js";
import { resolveUserPath } from "../../utils.js";
import { getDefaultLocalRoots, loadWebMediaRaw } from "../../web/media.js";
import { ensureAuthProfileStore, listProfilesForProvider } from "../auth-profiles.js";
import { DEFAULT_MODEL, DEFAULT_PROVIDER } from "../defaults.js";
import { getApiKeyForModel, requireApiKey, resolveEnvApiKey } from "../model-auth.js";
import { runWithImageModelFallback } from "../model-fallback.js";
import { resolveConfiguredModelRef } from "../model-selection.js";
import { ensureOpenClawModelsJson } from "../models-config.js";
import { discoverAuthStorage, discoverModels } from "../pi-model-discovery.js";
import {
createSandboxBridgeReadFile,
resolveSandboxedBridgeMediaPath,
type SandboxedBridgeMediaPathConfig,
} from "../sandbox-media-paths.js";
import type { SandboxFsBridge } from "../sandbox/fs-bridge.js";
import type { ToolFsPolicy } from "../tool-fs-policy.js";
import { normalizeWorkspaceDir } from "../workspace-dir.js";
import type { AnyAgentTool } from "./common.js";
import {
coerceImageModelConfig,
type ImageModelConfig,
resolveProviderVisionModelFromConfig,
} from "./image-tool.helpers.js";
import { anthropicAnalyzePdf, geminiAnalyzePdf } from "./pdf-native-providers.js";
import {
coercePdfAssistantText,
coercePdfModelConfig,
parsePageRange,
providerSupportsNativePdf,
resolvePdfToolMaxTokens,
} from "./pdf-tool.helpers.js";
const DEFAULT_PROMPT = "Analyze this PDF document.";
const DEFAULT_MAX_PDFS = 10;
const DEFAULT_MAX_BYTES_MB = 10;
const DEFAULT_MAX_PAGES = 20;
const ANTHROPIC_PDF_PRIMARY = "anthropic/claude-opus-4-6";
const ANTHROPIC_PDF_FALLBACK = "anthropic/claude-opus-4-5";
const PDF_MIN_TEXT_CHARS = 200;
const PDF_MAX_PIXELS = 4_000_000;
// ---------------------------------------------------------------------------
// Model resolution (mirrors image tool pattern)
// ---------------------------------------------------------------------------
function resolveDefaultModelRef(cfg?: OpenClawConfig): { provider: string; model: string } {
if (cfg) {
const resolved = resolveConfiguredModelRef({
cfg,
defaultProvider: DEFAULT_PROVIDER,
defaultModel: DEFAULT_MODEL,
});
return { provider: resolved.provider, model: resolved.model };
}
return { provider: DEFAULT_PROVIDER, model: DEFAULT_MODEL };
}
function hasAuthForProvider(params: { provider: string; agentDir: string }): boolean {
if (resolveEnvApiKey(params.provider)?.apiKey) {
return true;
}
const store = ensureAuthProfileStore(params.agentDir, { allowKeychainPrompt: false });
return listProfilesForProvider(store, params.provider).length > 0;
}
/**
* Resolve the effective PDF model config.
* Falls back to the image model config, then to provider-specific defaults.
*/
export function resolvePdfModelConfigForTool(params: {
cfg?: OpenClawConfig;
agentDir: string;
}): ImageModelConfig | null {
// Check for explicit PDF model config first
const explicitPdf = coercePdfModelConfig(params.cfg);
if (explicitPdf.primary?.trim() || (explicitPdf.fallbacks?.length ?? 0) > 0) {
return explicitPdf;
}
// Fall back to the image model config
const explicitImage = coerceImageModelConfig(params.cfg);
if (explicitImage.primary?.trim() || (explicitImage.fallbacks?.length ?? 0) > 0) {
return explicitImage;
}
// Auto-detect from available providers
const primary = resolveDefaultModelRef(params.cfg);
const anthropicOk = hasAuthForProvider({ provider: "anthropic", agentDir: params.agentDir });
const googleOk = hasAuthForProvider({ provider: "google", agentDir: params.agentDir });
const openaiOk = hasAuthForProvider({ provider: "openai", agentDir: params.agentDir });
const fallbacks: string[] = [];
const addFallback = (ref: string) => {
const trimmed = ref.trim();
if (trimmed && !fallbacks.includes(trimmed)) {
fallbacks.push(trimmed);
}
};
// Prefer providers with native PDF support
let preferred: string | null = null;
const providerOk = hasAuthForProvider({ provider: primary.provider, agentDir: params.agentDir });
const providerVision = resolveProviderVisionModelFromConfig({
cfg: params.cfg,
provider: primary.provider,
});
if (primary.provider === "anthropic" && anthropicOk) {
preferred = ANTHROPIC_PDF_PRIMARY;
} else if (primary.provider === "google" && googleOk && providerVision) {
preferred = providerVision;
} else if (providerOk && providerVision) {
preferred = providerVision;
} else if (anthropicOk) {
preferred = ANTHROPIC_PDF_PRIMARY;
} else if (googleOk) {
preferred = "google/gemini-2.5-pro";
} else if (openaiOk) {
preferred = "openai/gpt-5-mini";
}
if (preferred?.trim()) {
if (anthropicOk && preferred !== ANTHROPIC_PDF_PRIMARY) {
addFallback(ANTHROPIC_PDF_PRIMARY);
}
if (anthropicOk) {
addFallback(ANTHROPIC_PDF_FALLBACK);
}
if (openaiOk) {
addFallback("openai/gpt-5-mini");
}
const pruned = fallbacks.filter((ref) => ref !== preferred);
return { primary: preferred, ...(pruned.length > 0 ? { fallbacks: pruned } : {}) };
}
return null;
}
// ---------------------------------------------------------------------------
// Build context for extraction fallback path
// ---------------------------------------------------------------------------
function buildPdfExtractionContext(prompt: string, extractions: PdfExtractedContent[]): Context {
const content: Array<
{ type: "text"; text: string } | { type: "image"; data: string; mimeType: string }
> = [];
// Add extracted text and images
for (let i = 0; i < extractions.length; i++) {
const extraction = extractions[i];
if (extraction.text.trim()) {
const label = extractions.length > 1 ? `[PDF ${i + 1} text]\n` : "[PDF text]\n";
content.push({ type: "text", text: label + extraction.text });
}
for (const img of extraction.images) {
content.push({ type: "image", data: img.data, mimeType: img.mimeType });
}
}
// Add the user prompt
content.push({ type: "text", text: prompt });
return {
messages: [{ role: "user", content, timestamp: Date.now() }],
};
}
// ---------------------------------------------------------------------------
// Run PDF prompt with model fallback
// ---------------------------------------------------------------------------
type PdfSandboxConfig = {
root: string;
bridge: SandboxFsBridge;
};
async function runPdfPrompt(params: {
cfg?: OpenClawConfig;
agentDir: string;
pdfModelConfig: ImageModelConfig;
modelOverride?: string;
prompt: string;
pdfBuffers: Array<{ base64: string; filename: string }>;
pageNumbers?: number[];
getExtractions: () => Promise<PdfExtractedContent[]>;
}): Promise<{
text: string;
provider: string;
model: string;
native: boolean;
attempts: Array<{ provider: string; model: string; error: string }>;
}> {
const effectiveCfg: OpenClawConfig | undefined = params.cfg
? {
...params.cfg,
agents: {
...params.cfg.agents,
defaults: {
...params.cfg.agents?.defaults,
imageModel: params.pdfModelConfig,
},
},
}
: undefined;
await ensureOpenClawModelsJson(effectiveCfg, params.agentDir);
const authStorage = discoverAuthStorage(params.agentDir);
const modelRegistry = discoverModels(authStorage, params.agentDir);
let extractionCache: PdfExtractedContent[] | null = null;
const getExtractions = async (): Promise<PdfExtractedContent[]> => {
if (!extractionCache) {
extractionCache = await params.getExtractions();
}
return extractionCache;
};
const result = await runWithImageModelFallback({
cfg: effectiveCfg,
modelOverride: params.modelOverride,
run: async (provider, modelId) => {
const model = modelRegistry.find(provider, modelId) as Model<Api> | null;
if (!model) {
throw new Error(`Unknown model: ${provider}/${modelId}`);
}
const apiKeyInfo = await getApiKeyForModel({
model,
cfg: effectiveCfg,
agentDir: params.agentDir,
});
const apiKey = requireApiKey(apiKeyInfo, model.provider);
authStorage.setRuntimeApiKey(model.provider, apiKey);
if (providerSupportsNativePdf(provider)) {
if (params.pageNumbers && params.pageNumbers.length > 0) {
throw new Error(
`pages is not supported with native PDF providers (${provider}/${modelId}). Remove pages, or use a non-native model for page filtering.`,
);
}
const pdfs = params.pdfBuffers.map((p) => ({
base64: p.base64,
filename: p.filename,
}));
if (provider === "anthropic") {
const text = await anthropicAnalyzePdf({
apiKey,
modelId,
prompt: params.prompt,
pdfs,
maxTokens: resolvePdfToolMaxTokens(model.maxTokens),
baseUrl: model.baseUrl,
});
return { text, provider, model: modelId, native: true };
}
if (provider === "google") {
const text = await geminiAnalyzePdf({
apiKey,
modelId,
prompt: params.prompt,
pdfs,
baseUrl: model.baseUrl,
});
return { text, provider, model: modelId, native: true };
}
}
const extractions = await getExtractions();
const hasImages = extractions.some((e) => e.images.length > 0);
if (hasImages && !model.input?.includes("image")) {
const hasText = extractions.some((e) => e.text.trim().length > 0);
if (!hasText) {
throw new Error(
`Model ${provider}/${modelId} does not support images and PDF has no extractable text.`,
);
}
const textOnlyExtractions: PdfExtractedContent[] = extractions.map((e) => ({
text: e.text,
images: [],
}));
const context = buildPdfExtractionContext(params.prompt, textOnlyExtractions);
const message = await complete(model, context, {
apiKey,
maxTokens: resolvePdfToolMaxTokens(model.maxTokens),
});
const text = coercePdfAssistantText({ message, provider, model: modelId });
return { text, provider, model: modelId, native: false };
}
const context = buildPdfExtractionContext(params.prompt, extractions);
const message = await complete(model, context, {
apiKey,
maxTokens: resolvePdfToolMaxTokens(model.maxTokens),
});
const text = coercePdfAssistantText({ message, provider, model: modelId });
return { text, provider, model: modelId, native: false };
},
});
return {
text: result.result.text,
provider: result.result.provider,
model: result.result.model,
native: result.result.native,
attempts: result.attempts.map((a) => ({
provider: a.provider,
model: a.model,
error: a.error,
})),
};
}
// ---------------------------------------------------------------------------
// PDF tool factory
// ---------------------------------------------------------------------------
export function createPdfTool(options?: {
config?: OpenClawConfig;
agentDir?: string;
workspaceDir?: string;
sandbox?: PdfSandboxConfig;
fsPolicy?: ToolFsPolicy;
}): AnyAgentTool | null {
const agentDir = options?.agentDir?.trim();
if (!agentDir) {
const explicit = coercePdfModelConfig(options?.config);
if (explicit.primary?.trim() || (explicit.fallbacks?.length ?? 0) > 0) {
throw new Error("createPdfTool requires agentDir when enabled");
}
return null;
}
const pdfModelConfig = resolvePdfModelConfigForTool({ cfg: options?.config, agentDir });
if (!pdfModelConfig) {
return null;
}
const maxBytesMbDefault = (
options?.config?.agents?.defaults as Record<string, unknown> | undefined
)?.pdfMaxBytesMb;
const maxPagesDefault = (options?.config?.agents?.defaults as Record<string, unknown> | undefined)
?.pdfMaxPages;
const configuredMaxBytesMb =
typeof maxBytesMbDefault === "number" && Number.isFinite(maxBytesMbDefault)
? maxBytesMbDefault
: DEFAULT_MAX_BYTES_MB;
const configuredMaxPages =
typeof maxPagesDefault === "number" && Number.isFinite(maxPagesDefault)
? Math.floor(maxPagesDefault)
: DEFAULT_MAX_PAGES;
const localRoots = (() => {
const roots = getDefaultLocalRoots();
const workspaceDir = normalizeWorkspaceDir(options?.workspaceDir);
if (!workspaceDir) {
return roots;
}
return Array.from(new Set([...roots, workspaceDir]));
})();
const description =
"Analyze one or more PDF documents with a model. Supports native PDF analysis for Anthropic and Google models, with text/image extraction fallback for other providers. Use pdf for a single path/URL, or pdfs for multiple (up to 10). Provide a prompt describing what to analyze.";
return {
label: "PDF",
name: "pdf",
description,
parameters: Type.Object({
prompt: Type.Optional(Type.String()),
pdf: Type.Optional(Type.String({ description: "Single PDF path or URL." })),
pdfs: Type.Optional(
Type.Array(Type.String(), {
description: "Multiple PDF paths or URLs (up to 10).",
}),
),
pages: Type.Optional(
Type.String({
description: 'Page range to process, e.g. "1-5", "1,3,5-7". Defaults to all pages.',
}),
),
model: Type.Optional(Type.String()),
maxBytesMb: Type.Optional(Type.Number()),
}),
execute: async (_toolCallId, args) => {
const record = args && typeof args === "object" ? (args as Record<string, unknown>) : {};
// MARK: - Normalize pdf + pdfs input
const pdfCandidates: string[] = [];
if (typeof record.pdf === "string") {
pdfCandidates.push(record.pdf);
}
if (Array.isArray(record.pdfs)) {
pdfCandidates.push(...record.pdfs.filter((v): v is string => typeof v === "string"));
}
const seenPdfs = new Set<string>();
const pdfInputs: string[] = [];
for (const candidate of pdfCandidates) {
const trimmed = candidate.trim();
if (!trimmed || seenPdfs.has(trimmed)) {
continue;
}
seenPdfs.add(trimmed);
pdfInputs.push(trimmed);
}
if (pdfInputs.length === 0) {
throw new Error("pdf required: provide a path or URL to a PDF document");
}
// Enforce max PDFs cap
if (pdfInputs.length > DEFAULT_MAX_PDFS) {
return {
content: [
{
type: "text",
text: `Too many PDFs: ${pdfInputs.length} provided, maximum is ${DEFAULT_MAX_PDFS}. Please reduce the number.`,
},
],
details: { error: "too_many_pdfs", count: pdfInputs.length, max: DEFAULT_MAX_PDFS },
};
}
const promptRaw =
typeof record.prompt === "string" && record.prompt.trim()
? record.prompt.trim()
: DEFAULT_PROMPT;
const modelOverride =
typeof record.model === "string" && record.model.trim() ? record.model.trim() : undefined;
const maxBytesMbRaw = typeof record.maxBytesMb === "number" ? record.maxBytesMb : undefined;
const maxBytesMb =
typeof maxBytesMbRaw === "number" && Number.isFinite(maxBytesMbRaw) && maxBytesMbRaw > 0
? maxBytesMbRaw
: configuredMaxBytesMb;
const maxBytes = Math.floor(maxBytesMb * 1024 * 1024);
// Parse page range
const pagesRaw =
typeof record.pages === "string" && record.pages.trim() ? record.pages.trim() : undefined;
const sandboxConfig: SandboxedBridgeMediaPathConfig | null =
options?.sandbox && options.sandbox.root.trim()
? {
root: options.sandbox.root.trim(),
bridge: options.sandbox.bridge,
workspaceOnly: options.fsPolicy?.workspaceOnly === true,
}
: null;
// MARK: - Load each PDF
const loadedPdfs: Array<{
base64: string;
buffer: Buffer;
filename: string;
resolvedPath: string;
rewrittenFrom?: string;
}> = [];
for (const pdfRaw of pdfInputs) {
const trimmed = pdfRaw.trim();
const isHttpUrl = /^https?:\/\//i.test(trimmed);
const isFileUrl = /^file:/i.test(trimmed);
const isDataUrl = /^data:/i.test(trimmed);
const looksLikeWindowsDrive = /^[a-zA-Z]:[\\/]/.test(trimmed);
const hasScheme = /^[a-z][a-z0-9+.-]*:/i.test(trimmed);
if (hasScheme && !looksLikeWindowsDrive && !isFileUrl && !isHttpUrl && !isDataUrl) {
return {
content: [
{
type: "text",
text: `Unsupported PDF reference: ${pdfRaw}. Use a file path, file:// URL, or http(s) URL.`,
},
],
details: { error: "unsupported_pdf_reference", pdf: pdfRaw },
};
}
if (sandboxConfig && isHttpUrl) {
throw new Error("Sandboxed PDF tool does not allow remote URLs.");
}
const resolvedPdf = (() => {
if (sandboxConfig) {
return trimmed;
}
if (trimmed.startsWith("~")) {
return resolveUserPath(trimmed);
}
return trimmed;
})();
const resolvedPathInfo: { resolved: string; rewrittenFrom?: string } = sandboxConfig
? await resolveSandboxedBridgeMediaPath({
sandbox: sandboxConfig,
mediaPath: resolvedPdf,
inboundFallbackDir: "media/inbound",
})
: {
resolved: resolvedPdf.startsWith("file://")
? resolvedPdf.slice("file://".length)
: resolvedPdf,
};
const media = sandboxConfig
? await loadWebMediaRaw(resolvedPathInfo.resolved, {
maxBytes,
sandboxValidated: true,
readFile: createSandboxBridgeReadFile({ sandbox: sandboxConfig }),
})
: await loadWebMediaRaw(resolvedPathInfo.resolved, {
maxBytes,
localRoots,
});
if (media.kind !== "document") {
// Check MIME type more specifically
const ct = (media.contentType ?? "").toLowerCase();
if (!ct.includes("pdf") && !ct.includes("application/pdf")) {
throw new Error(`Expected PDF but got ${media.contentType ?? media.kind}: ${pdfRaw}`);
}
}
const base64 = media.buffer.toString("base64");
const filename =
media.fileName ??
(isHttpUrl
? (new URL(trimmed).pathname.split("/").pop() ?? "document.pdf")
: "document.pdf");
loadedPdfs.push({
base64,
buffer: media.buffer,
filename,
resolvedPath: resolvedPathInfo.resolved,
...(resolvedPathInfo.rewrittenFrom
? { rewrittenFrom: resolvedPathInfo.rewrittenFrom }
: {}),
});
}
const pageNumbers = pagesRaw ? parsePageRange(pagesRaw, configuredMaxPages) : undefined;
const getExtractions = async (): Promise<PdfExtractedContent[]> => {
const extractedAll: PdfExtractedContent[] = [];
for (const pdf of loadedPdfs) {
const extracted = await extractPdfContent({
buffer: pdf.buffer,
maxPages: configuredMaxPages,
maxPixels: PDF_MAX_PIXELS,
minTextChars: PDF_MIN_TEXT_CHARS,
pageNumbers,
});
extractedAll.push(extracted);
}
return extractedAll;
};
const result = await runPdfPrompt({
cfg: options?.config,
agentDir,
pdfModelConfig,
modelOverride,
prompt: promptRaw,
pdfBuffers: loadedPdfs.map((p) => ({ base64: p.base64, filename: p.filename })),
pageNumbers,
getExtractions,
});
const pdfDetails =
loadedPdfs.length === 1
? {
pdf: loadedPdfs[0].resolvedPath,
...(loadedPdfs[0].rewrittenFrom
? { rewrittenFrom: loadedPdfs[0].rewrittenFrom }
: {}),
}
: {
pdfs: loadedPdfs.map((p) => ({
pdf: p.resolvedPath,
...(p.rewrittenFrom ? { rewrittenFrom: p.rewrittenFrom } : {}),
})),
};
return {
content: [{ type: "text", text: result.text }],
details: {
model: `${result.provider}/${result.model}`,
native: result.native,
...pdfDetails,
attempts: result.attempts,
},
};
},
};
}

View File

@ -116,6 +116,40 @@ describe("config schema regressions", () => {
expect(res.ok).toBe(true);
});
it("accepts pdf default model and limits", () => {
const res = validateConfigObject({
agents: {
defaults: {
pdfModel: {
primary: "anthropic/claude-opus-4-6",
fallbacks: ["openai/gpt-5-mini"],
},
pdfMaxBytesMb: 12,
pdfMaxPages: 25,
},
},
});
expect(res.ok).toBe(true);
});
it("rejects non-positive pdf limits", () => {
const res = validateConfigObject({
agents: {
defaults: {
pdfModel: { primary: "openai/gpt-5-mini" },
pdfMaxBytesMb: 0,
pdfMaxPages: 0,
},
},
});
expect(res.ok).toBe(false);
if (!res.ok) {
expect(res.issues.some((issue) => issue.path.includes("agents.defaults.pdfMax"))).toBe(true);
}
});
it("rejects relative iMessage attachment roots", () => {
const res = validateConfigObject({
channels: {

View File

@ -922,6 +922,13 @@ export const FIELD_HELP: Record<string, string> = {
"agents.defaults.imageModel.primary":
"Optional image model (provider/model) used when the primary model lacks image input.",
"agents.defaults.imageModel.fallbacks": "Ordered fallback image models (provider/model).",
"agents.defaults.pdfModel.primary":
"Optional PDF model (provider/model) for the PDF analysis tool. Defaults to imageModel, then session model.",
"agents.defaults.pdfModel.fallbacks": "Ordered fallback PDF models (provider/model).",
"agents.defaults.pdfMaxBytesMb":
"Maximum PDF file size in megabytes for the PDF tool (default: 10).",
"agents.defaults.pdfMaxPages":
"Maximum number of PDF pages to process for the PDF tool (default: 20).",
"agents.defaults.imageMaxDimensionPx":
"Max image side length in pixels when sanitizing transcript/tool-result image payloads (default: 1200).",
"agents.defaults.cliBackends": "Optional CLI backends for text-only fallback (claude-cli, etc.).",

View File

@ -405,6 +405,10 @@ export const FIELD_LABELS: Record<string, string> = {
"agents.defaults.model.fallbacks": "Model Fallbacks",
"agents.defaults.imageModel.primary": "Image Model",
"agents.defaults.imageModel.fallbacks": "Image Model Fallbacks",
"agents.defaults.pdfModel.primary": "PDF Model",
"agents.defaults.pdfModel.fallbacks": "PDF Model Fallbacks",
"agents.defaults.pdfMaxBytesMb": "PDF Max Size (MB)",
"agents.defaults.pdfMaxPages": "PDF Max Pages",
"agents.defaults.imageMaxDimensionPx": "Image Max Dimension (px)",
"agents.defaults.humanDelay.mode": "Human Delay Mode",
"agents.defaults.humanDelay.minMs": "Human Delay Min (ms)",

View File

@ -122,6 +122,12 @@ export type AgentDefaultsConfig = {
model?: AgentModelConfig;
/** Optional image-capable model and fallbacks (provider/model). Accepts string or {primary,fallbacks}. */
imageModel?: AgentModelConfig;
/** Optional PDF-capable model and fallbacks (provider/model). Accepts string or {primary,fallbacks}. */
pdfModel?: AgentModelConfig;
/** Maximum PDF file size in megabytes (default: 10). */
pdfMaxBytesMb?: number;
/** Maximum number of PDF pages to process (default: 20). */
pdfMaxPages?: number;
/** Model catalog with optional aliases (full provider/model keys). */
models?: Record<string, AgentModelEntryConfig>;
/** Agent working directory (preferred). Used as the default cwd for agent runs. */

View File

@ -18,6 +18,9 @@ export const AgentDefaultsSchema = z
.object({
model: AgentModelSchema.optional(),
imageModel: AgentModelSchema.optional(),
pdfModel: AgentModelSchema.optional(),
pdfMaxBytesMb: z.number().positive().optional(),
pdfMaxPages: z.number().int().positive().optional(),
models: z
.record(
z.string(),

View File

@ -2,44 +2,10 @@ import { fetchWithSsrFGuard } from "../infra/net/fetch-guard.js";
import type { SsrFPolicy } from "../infra/net/ssrf.js";
import { logWarn } from "../logger.js";
import { canonicalizeBase64, estimateBase64DecodedBytes } from "./base64.js";
import { extractPdfContent, type PdfExtractedImage } from "./pdf-extract.js";
import { readResponseWithLimit } from "./read-response-with-limit.js";
type CanvasModule = typeof import("@napi-rs/canvas");
type PdfJsModule = typeof import("pdfjs-dist/legacy/build/pdf.mjs");
let canvasModulePromise: Promise<CanvasModule> | null = null;
let pdfJsModulePromise: Promise<PdfJsModule> | null = null;
// Lazy-load optional PDF/image deps so non-PDF paths don't require native installs.
async function loadCanvasModule(): Promise<CanvasModule> {
if (!canvasModulePromise) {
canvasModulePromise = import("@napi-rs/canvas").catch((err) => {
canvasModulePromise = null;
throw new Error(
`Optional dependency @napi-rs/canvas is required for PDF image extraction: ${String(err)}`,
);
});
}
return canvasModulePromise;
}
async function loadPdfJsModule(): Promise<PdfJsModule> {
if (!pdfJsModulePromise) {
pdfJsModulePromise = import("pdfjs-dist/legacy/build/pdf.mjs").catch((err) => {
pdfJsModulePromise = null;
throw new Error(
`Optional dependency pdfjs-dist is required for PDF extraction: ${String(err)}`,
);
});
}
return pdfJsModulePromise;
}
export type InputImageContent = {
type: "image";
data: string;
mimeType: string;
};
export type InputImageContent = PdfExtractedImage;
export type InputFileExtractResult = {
filename: string;
@ -241,65 +207,6 @@ function clampText(text: string, maxChars: number): string {
return text.slice(0, maxChars);
}
async function extractPdfContent(params: {
buffer: Buffer;
limits: InputFileLimits;
}): Promise<{ text: string; images: InputImageContent[] }> {
const { buffer, limits } = params;
const { getDocument } = await loadPdfJsModule();
const pdf = await getDocument({
data: new Uint8Array(buffer),
disableWorker: true,
}).promise;
const maxPages = Math.min(pdf.numPages, limits.pdf.maxPages);
const textParts: string[] = [];
for (let pageNum = 1; pageNum <= maxPages; pageNum += 1) {
const page = await pdf.getPage(pageNum);
const textContent = await page.getTextContent();
const pageText = textContent.items
.map((item) => ("str" in item ? String(item.str) : ""))
.filter(Boolean)
.join(" ");
if (pageText) {
textParts.push(pageText);
}
}
const text = textParts.join("\n\n");
if (text.trim().length >= limits.pdf.minTextChars) {
return { text, images: [] };
}
let canvasModule: CanvasModule;
try {
canvasModule = await loadCanvasModule();
} catch (err) {
logWarn(`media: PDF image extraction skipped; ${String(err)}`);
return { text, images: [] };
}
const { createCanvas } = canvasModule;
const images: InputImageContent[] = [];
for (let pageNum = 1; pageNum <= maxPages; pageNum += 1) {
const page = await pdf.getPage(pageNum);
const viewport = page.getViewport({ scale: 1 });
const maxPixels = limits.pdf.maxPixels;
const pixelBudget = Math.max(1, maxPixels);
const pagePixels = viewport.width * viewport.height;
const scale = Math.min(1, Math.sqrt(pixelBudget / pagePixels));
const scaled = page.getViewport({ scale: Math.max(0.1, scale) });
const canvas = createCanvas(Math.ceil(scaled.width), Math.ceil(scaled.height));
await page.render({
canvas: canvas as unknown as HTMLCanvasElement,
viewport: scaled,
}).promise;
const png = canvas.toBuffer("image/png");
images.push({ type: "image", data: png.toString("base64"), mimeType: "image/png" });
}
return { text, images };
}
export async function extractImageContentFromSource(
source: InputImageSource,
limits: InputImageLimits,
@ -409,7 +316,15 @@ export async function extractFileContentFromSource(params: {
}
if (mimeType === "application/pdf") {
const extracted = await extractPdfContent({ buffer, limits });
const extracted = await extractPdfContent({
buffer,
maxPages: limits.pdf.maxPages,
maxPixels: limits.pdf.maxPixels,
minTextChars: limits.pdf.minTextChars,
onImageExtractionError: (err) => {
logWarn(`media: PDF image extraction skipped, ${String(err)}`);
},
});
const text = extracted.text ? clampText(extracted.text, limits.maxChars) : "";
return {
filename,

104
src/media/pdf-extract.ts Normal file
View File

@ -0,0 +1,104 @@
type CanvasModule = typeof import("@napi-rs/canvas");
type PdfJsModule = typeof import("pdfjs-dist/legacy/build/pdf.mjs");
let canvasModulePromise: Promise<CanvasModule> | null = null;
let pdfJsModulePromise: Promise<PdfJsModule> | null = null;
async function loadCanvasModule(): Promise<CanvasModule> {
if (!canvasModulePromise) {
canvasModulePromise = import("@napi-rs/canvas").catch((err) => {
canvasModulePromise = null;
throw new Error(
`Optional dependency @napi-rs/canvas is required for PDF image extraction: ${String(err)}`,
);
});
}
return canvasModulePromise;
}
async function loadPdfJsModule(): Promise<PdfJsModule> {
if (!pdfJsModulePromise) {
pdfJsModulePromise = import("pdfjs-dist/legacy/build/pdf.mjs").catch((err) => {
pdfJsModulePromise = null;
throw new Error(
`Optional dependency pdfjs-dist is required for PDF extraction: ${String(err)}`,
);
});
}
return pdfJsModulePromise;
}
export type PdfExtractedImage = {
type: "image";
data: string;
mimeType: string;
};
export type PdfExtractedContent = {
text: string;
images: PdfExtractedImage[];
};
export async function extractPdfContent(params: {
buffer: Buffer;
maxPages: number;
maxPixels: number;
minTextChars: number;
pageNumbers?: number[];
onImageExtractionError?: (error: unknown) => void;
}): Promise<PdfExtractedContent> {
const { buffer, maxPages, maxPixels, minTextChars, pageNumbers, onImageExtractionError } = params;
const { getDocument } = await loadPdfJsModule();
const pdf = await getDocument({ data: new Uint8Array(buffer), disableWorker: true }).promise;
const effectivePages: number[] = pageNumbers
? pageNumbers.filter((p) => p >= 1 && p <= pdf.numPages).slice(0, maxPages)
: Array.from({ length: Math.min(pdf.numPages, maxPages) }, (_, i) => i + 1);
const textParts: string[] = [];
for (const pageNum of effectivePages) {
const page = await pdf.getPage(pageNum);
const textContent = await page.getTextContent();
const pageText = textContent.items
.map((item) => ("str" in item ? String(item.str) : ""))
.filter(Boolean)
.join(" ");
if (pageText) {
textParts.push(pageText);
}
}
const text = textParts.join("\n\n");
if (text.trim().length >= minTextChars) {
return { text, images: [] };
}
let canvasModule: CanvasModule;
try {
canvasModule = await loadCanvasModule();
} catch (err) {
onImageExtractionError?.(err);
return { text, images: [] };
}
const { createCanvas } = canvasModule;
const images: PdfExtractedImage[] = [];
const pixelBudget = Math.max(1, maxPixels);
for (const pageNum of effectivePages) {
const page = await pdf.getPage(pageNum);
const viewport = page.getViewport({ scale: 1 });
const pagePixels = viewport.width * viewport.height;
const scale = Math.min(1, Math.sqrt(pixelBudget / Math.max(1, pagePixels)));
const scaled = page.getViewport({ scale: Math.max(0.1, scale) });
const canvas = createCanvas(Math.ceil(scaled.width), Math.ceil(scaled.height));
await page.render({
canvas: canvas as unknown as HTMLCanvasElement,
viewport: scaled,
}).promise;
const png = canvas.toBuffer("image/png");
images.push({ type: "image", data: png.toString("base64"), mimeType: "image/png" });
}
return { text, images };
}