mirror of https://github.com/openclaw/openclaw.git
139 lines
3.3 KiB
Markdown
139 lines
3.3 KiB
Markdown
---
|
|
summary: "Firecrawl search, scrape, and web_fetch fallback"
|
|
read_when:
|
|
- You want Firecrawl-backed web extraction
|
|
- You need a Firecrawl API key
|
|
- You want Firecrawl as a web_search provider
|
|
- You want anti-bot extraction for web_fetch
|
|
title: "Firecrawl"
|
|
---
|
|
|
|
# Firecrawl
|
|
|
|
OpenClaw can use **Firecrawl** in three ways:
|
|
|
|
- as the `web_search` provider
|
|
- as explicit plugin tools: `firecrawl_search` and `firecrawl_scrape`
|
|
- as a fallback extractor for `web_fetch`
|
|
|
|
It is a hosted extraction/search service that supports bot circumvention and caching,
|
|
which helps with JS-heavy sites or pages that block plain HTTP fetches.
|
|
|
|
## Get an API key
|
|
|
|
1. Create a Firecrawl account and generate an API key.
|
|
2. Store it in config or set `FIRECRAWL_API_KEY` in the gateway environment.
|
|
|
|
## Configure Firecrawl search
|
|
|
|
```json5
|
|
{
|
|
plugins: {
|
|
entries: {
|
|
firecrawl: {
|
|
enabled: true,
|
|
},
|
|
},
|
|
},
|
|
tools: {
|
|
web: {
|
|
search: {
|
|
provider: "firecrawl",
|
|
firecrawl: {
|
|
apiKey: "FIRECRAWL_API_KEY_HERE",
|
|
baseUrl: "https://api.firecrawl.dev",
|
|
},
|
|
},
|
|
},
|
|
},
|
|
}
|
|
```
|
|
|
|
Notes:
|
|
|
|
- Choosing Firecrawl in onboarding or `openclaw configure --section web` enables the bundled Firecrawl plugin automatically.
|
|
- `web_search` with Firecrawl supports `query` and `count`.
|
|
- For Firecrawl-specific controls like `sources`, `categories`, or result scraping, use `firecrawl_search`.
|
|
|
|
## Configure Firecrawl scrape + web_fetch fallback
|
|
|
|
```json5
|
|
{
|
|
plugins: {
|
|
entries: {
|
|
firecrawl: {
|
|
enabled: true,
|
|
},
|
|
},
|
|
},
|
|
tools: {
|
|
web: {
|
|
fetch: {
|
|
firecrawl: {
|
|
apiKey: "FIRECRAWL_API_KEY_HERE",
|
|
baseUrl: "https://api.firecrawl.dev",
|
|
onlyMainContent: true,
|
|
maxAgeMs: 172800000,
|
|
timeoutSeconds: 60,
|
|
},
|
|
},
|
|
},
|
|
},
|
|
}
|
|
```
|
|
|
|
Notes:
|
|
|
|
- `firecrawl.enabled` defaults to `true` unless explicitly set to `false`.
|
|
- Firecrawl fallback attempts run only when an API key is available (`tools.web.fetch.firecrawl.apiKey` or `FIRECRAWL_API_KEY`).
|
|
- `maxAgeMs` controls how old cached results can be (ms). Default is 2 days.
|
|
|
|
`firecrawl_scrape` reuses the same `tools.web.fetch.firecrawl.*` settings and env vars.
|
|
|
|
## Firecrawl plugin tools
|
|
|
|
### `firecrawl_search`
|
|
|
|
Use this when you want Firecrawl-specific search controls instead of generic `web_search`.
|
|
|
|
Core parameters:
|
|
|
|
- `query`
|
|
- `count`
|
|
- `sources`
|
|
- `categories`
|
|
- `scrapeResults`
|
|
- `timeoutSeconds`
|
|
|
|
### `firecrawl_scrape`
|
|
|
|
Use this for JS-heavy or bot-protected pages where plain `web_fetch` is weak.
|
|
|
|
Core parameters:
|
|
|
|
- `url`
|
|
- `extractMode`
|
|
- `maxChars`
|
|
- `onlyMainContent`
|
|
- `maxAgeMs`
|
|
- `proxy`
|
|
- `storeInCache`
|
|
- `timeoutSeconds`
|
|
|
|
## Stealth / bot circumvention
|
|
|
|
Firecrawl exposes a **proxy mode** parameter for bot circumvention (`basic`, `stealth`, or `auto`).
|
|
OpenClaw always uses `proxy: "auto"` plus `storeInCache: true` for Firecrawl requests.
|
|
If proxy is omitted, Firecrawl defaults to `auto`. `auto` retries with stealth proxies if a basic attempt fails, which may use more credits
|
|
than basic-only scraping.
|
|
|
|
## How `web_fetch` uses Firecrawl
|
|
|
|
`web_fetch` extraction order:
|
|
|
|
1. Readability (local)
|
|
2. Firecrawl (if configured)
|
|
3. Basic HTML cleanup (last fallback)
|
|
|
|
See [Web tools](/tools/web) for the full web tool setup.
|