mirror of https://github.com/openclaw/openclaw.git
* feat: add PDF analysis tool with native provider support New `pdf` tool for analyzing PDF documents with model-powered analysis. Architecture: - Native PDF path: sends raw PDF bytes directly to providers that support inline document input (Anthropic via DocumentBlockParam, Google Gemini via inlineData with application/pdf MIME type) - Extraction fallback: for providers without native PDF support, extracts text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas, then sends through the standard vision/text completion path Key features: - Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10) - Page range selection (`pages` param, e.g. "1-5", "1,3,7-9") - Model override (`model` param) and file size limits (`maxBytesMb`) - Auto-detects provider capability and falls back gracefully - Same security patterns as image tool (SSRF guards, sandbox support, local path roots, workspace-only policy) Config (agents.defaults): - pdfModel: primary/fallbacks (defaults to imageModel, then session model) - pdfMaxBytesMb: max PDF file size (default: 10) - pdfMaxPages: max pages to process (default: 20) Model catalog: - Extended ModelInputType to include "document" alongside "text"/"image" - Added modelSupportsDocument() capability check Files: - src/agents/tools/pdf-tool.ts - main tool factory - src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.) - src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google - src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths - Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help * fix: prepare pdf tool for merge (#31319) (thanks @tyler6204) |
||
|---|---|---|
| .. | ||
| acp-agents.md | ||
| agent-send.md | ||
| apply-patch.md | ||
| browser-linux-troubleshooting.md | ||
| browser-login.md | ||
| browser.md | ||
| chrome-extension.md | ||
| clawhub.md | ||
| creating-skills.md | ||
| diffs.md | ||
| elevated.md | ||
| exec-approvals.md | ||
| exec.md | ||
| firecrawl.md | ||
| index.md | ||
| llm-task.md | ||
| lobster.md | ||
| loop-detection.md | ||
| multi-agent-sandbox-tools.md | ||
| plugin.md | ||
| reactions.md | ||
| skills-config.md | ||
| skills.md | ||
| slash-commands.md | ||
| subagents.md | ||
| thinking.md | ||
| web.md | ||