openclaw/docs/tools/firecrawl.md

139 lines
3.3 KiB
Markdown

---
summary: "Firecrawl search, scrape, and web_fetch fallback"
read_when:
- You want Firecrawl-backed web extraction
- You need a Firecrawl API key
- You want Firecrawl as a web_search provider
- You want anti-bot extraction for web_fetch
title: "Firecrawl"
---
# Firecrawl
OpenClaw can use **Firecrawl** in three ways:
- as the `web_search` provider
- as explicit plugin tools: `firecrawl_search` and `firecrawl_scrape`
- as a fallback extractor for `web_fetch`
It is a hosted extraction/search service that supports bot circumvention and caching,
which helps with JS-heavy sites or pages that block plain HTTP fetches.
## Get an API key
1. Create a Firecrawl account and generate an API key.
2. Store it in config or set `FIRECRAWL_API_KEY` in the gateway environment.
## Configure Firecrawl search
```json5
{
plugins: {
entries: {
firecrawl: {
enabled: true,
},
},
},
tools: {
web: {
search: {
provider: "firecrawl",
firecrawl: {
apiKey: "FIRECRAWL_API_KEY_HERE",
baseUrl: "https://api.firecrawl.dev",
},
},
},
},
}
```
Notes:
- Choosing Firecrawl in onboarding or `openclaw configure --section web` enables the bundled Firecrawl plugin automatically.
- `web_search` with Firecrawl supports `query` and `count`.
- For Firecrawl-specific controls like `sources`, `categories`, or result scraping, use `firecrawl_search`.
## Configure Firecrawl scrape + web_fetch fallback
```json5
{
plugins: {
entries: {
firecrawl: {
enabled: true,
},
},
},
tools: {
web: {
fetch: {
firecrawl: {
apiKey: "FIRECRAWL_API_KEY_HERE",
baseUrl: "https://api.firecrawl.dev",
onlyMainContent: true,
maxAgeMs: 172800000,
timeoutSeconds: 60,
},
},
},
},
}
```
Notes:
- `firecrawl.enabled` defaults to `true` unless explicitly set to `false`.
- Firecrawl fallback attempts run only when an API key is available (`tools.web.fetch.firecrawl.apiKey` or `FIRECRAWL_API_KEY`).
- `maxAgeMs` controls how old cached results can be (ms). Default is 2 days.
`firecrawl_scrape` reuses the same `tools.web.fetch.firecrawl.*` settings and env vars.
## Firecrawl plugin tools
### `firecrawl_search`
Use this when you want Firecrawl-specific search controls instead of generic `web_search`.
Core parameters:
- `query`
- `count`
- `sources`
- `categories`
- `scrapeResults`
- `timeoutSeconds`
### `firecrawl_scrape`
Use this for JS-heavy or bot-protected pages where plain `web_fetch` is weak.
Core parameters:
- `url`
- `extractMode`
- `maxChars`
- `onlyMainContent`
- `maxAgeMs`
- `proxy`
- `storeInCache`
- `timeoutSeconds`
## Stealth / bot circumvention
Firecrawl exposes a **proxy mode** parameter for bot circumvention (`basic`, `stealth`, or `auto`).
OpenClaw always uses `proxy: "auto"` plus `storeInCache: true` for Firecrawl requests.
If proxy is omitted, Firecrawl defaults to `auto`. `auto` retries with stealth proxies if a basic attempt fails, which may use more credits
than basic-only scraping.
## How `web_fetch` uses Firecrawl
`web_fetch` extraction order:
1. Readability (local)
2. Firecrawl (if configured)
3. Basic HTML cleanup (last fallback)
See [Web tools](/tools/web) for the full web tool setup.