Documentation Index
Fetch the complete documentation index at: https://mintlify.com/firecrawl/firecrawl/llms.txt
Use this file to discover all available pages before exploring further.
Batch Scrape allows you to scrape multiple URLs efficiently with parallel processing. It’s ideal when you have a list of specific URLs to scrape and want to process them all at once.
When to Use Batch Scrape
Use Batch Scrape when you need to:
- Scrape a known list of URLs
- Process hundreds or thousands of pages in parallel
- Extract data from multiple pages with the same structure
- Update data from a list of product pages, articles, or profiles
- Scrape URLs discovered from Map or other sources
Basic Usage
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Batch scrape multiple URLs (waits for completion)
job = app.batch_scrape(
[
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
"https://firecrawl.dev/pricing"
],
formats=["markdown"]
)
for doc in job.data:
print(doc.metadata.source_url)
print(doc.markdown[:100])
import Firecrawl from '@mendable/firecrawl-js';
const app = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' });
// Batch scrape multiple URLs (waits for completion)
const result = await app.batchScrape(
['https://firecrawl.dev', 'https://docs.firecrawl.dev', 'https://firecrawl.dev/pricing'],
{ formats: ['markdown'] }
);
result.data.forEach(doc => {
console.log(doc.metadata.sourceURL);
console.log(doc.markdown.substring(0, 100));
});
curl -X POST 'https://api.firecrawl.dev/v2/batch/scrape' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"urls": [
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
"https://firecrawl.dev/pricing"
],
"formats": ["markdown"]
}'
The SDKs automatically wait for batch scraping to complete. For manual control, use the async methods below.
Asynchronous Batch Scrape
For large batches, start the job asynchronously and poll for status:
# Start batch scrape asynchronously
batch_job = app.start_batch_scrape(
["https://firecrawl.dev", "https://docs.firecrawl.dev"],
formats=["markdown", "html"]
)
print(f"Batch job started with ID: {batch_job.id}")
# Check status later
status = app.get_batch_scrape_status(batch_job.id)
print(f"Status: {status.status}")
print(f"Completed: {status.completed}/{status.total}")
// Start batch scrape asynchronously
const start = await app.startBatchScrape(
['https://firecrawl.dev', 'https://docs.firecrawl.dev'],
{ formats: ['markdown', 'html'] }
);
console.log(`Batch job started with ID: ${start.id}`);
// Check status later
const status = await app.getBatchScrapeStatus(start.id);
console.log(`Status: ${status.status}`);
console.log(`Completed: ${status.completed}/${status.total}`);
Batch Scrape with Structured Data
Extract structured data from multiple pages:
from pydantic import BaseModel
from typing import List
class ProductInfo(BaseModel):
name: str
price: str
features: List[str]
result = app.batch_scrape(
[
"https://example.com/product1",
"https://example.com/product2",
"https://example.com/product3"
],
formats=[{"type": "json", "schema": ProductInfo.model_json_schema()}]
)
for doc in result.data:
print(f"Product: {doc.json}")
import { z } from 'zod';
const productSchema = z.object({
name: z.string(),
price: z.string(),
features: z.array(z.string()),
});
const result = await app.batchScrape(
[
'https://example.com/product1',
'https://example.com/product2',
'https://example.com/product3',
],
{
formats: [{ type: 'json', schema: productSchema }],
}
);
result.data.forEach(doc => {
console.log(`Product: ${JSON.stringify(doc.json)}`);
});
Real-Time Updates
Receive updates as pages are scraped:
const start = await app.startBatchScrape(
['https://firecrawl.dev', 'https://mendable.ai'],
{ formats: ['markdown', 'html'] }
);
const watch = app.watcher(start.id, { kind: 'batch', pollInterval: 2 });
watch.on('document', (doc) => {
console.log('DOC', doc);
});
watch.on('error', (err) => {
console.error('ERR', err);
});
watch.on('done', (state) => {
console.log('DONE', state.status);
});
await watch.start();
For very large batches, you can manually paginate through results:
from firecrawl.v2.types import PaginationConfig
# Start batch scrape
batch_job = app.start_batch_scrape(
["https://firecrawl.dev"],
formats=["markdown"]
)
# Fetch one page at a time
status = app.get_batch_scrape_status(
batch_job.id,
pagination_config=PaginationConfig(auto_paginate=False)
)
# Get next page if available
if status.next:
page2 = app.get_batch_scrape_status_page(status.next)
Webhooks
Receive notifications when pages are scraped:
curl -X POST 'https://api.firecrawl.dev/v2/batch/scrape' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"urls": ["https://firecrawl.dev", "https://docs.firecrawl.dev"],
"formats": ["markdown"],
"webhook": {
"url": "https://your-webhook.com/endpoint",
"headers": {
"Authorization": "Bearer your-token"
},
"events": ["page", "completed", "failed"],
"metadata": {
"job_id": "custom-identifier"
}
}
}'
Webhook Events
batch_scrape.started - Job has started
batch_scrape.page - A page has been scraped
batch_scrape.completed - All pages scraped successfully
batch_scrape.failed - Job failed
Handling Invalid URLs
Ignore invalid URLs instead of failing the entire batch:
curl -X POST 'https://api.firecrawl.dev/v2/batch/scrape' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"urls": [
"https://firecrawl.dev",
"invalid-url",
"https://docs.firecrawl.dev"
],
"formats": ["markdown"],
"ignoreInvalidURLs": true
}'
Invalid URLs will be returned in the invalidURLs field of the response.
Cancel a Batch Job
cancel_result = app.cancel_batch_scrape(batch_job_id)
print(cancel_result)
await app.cancelBatchScrape(batchJobId);
Use Cases
Scrape URLs from Map
Map the website
# Discover all URLs
map_result = app.map("https://docs.firecrawl.dev")
urls = [link.url for link in map_result.links]
print(f"Found {len(urls)} URLs")
Batch scrape discovered URLs
# Scrape all URLs in parallel
result = app.batch_scrape(urls, formats=["markdown"])
print(f"Scraped {len(result.data)} pages")
Update Product Catalog
from pydantic import BaseModel
from typing import Optional
class Product(BaseModel):
name: str
price: str
in_stock: bool
description: Optional[str] = None
# List of product URLs to update
product_urls = [
"https://store.example.com/product/1",
"https://store.example.com/product/2",
# ... more URLs
]
result = app.batch_scrape(
product_urls,
formats=[{"type": "json", "schema": Product.model_json_schema()}]
)
# Process results
for doc in result.data:
product = doc.json
# Update your database
print(f"Updated: {product['name']} - ${product['price']}")
Monitor Competitor Prices
import schedule
import time
def check_prices():
competitor_urls = [
"https://competitor1.com/pricing",
"https://competitor2.com/pricing",
"https://competitor3.com/pricing"
]
result = app.batch_scrape(
competitor_urls,
formats=[{"type": "json", "prompt": "Extract all pricing plans"}]
)
for doc in result.data:
print(f"Competitor: {doc.metadata.source_url}")
print(f"Pricing: {doc.json}")
# Run daily
schedule.every().day.at("09:00").do(check_prices)
Best Practices
- Use Batch Scrape for lists of known URLs, not for discovering URLs
- Combine with Map to discover URLs first, then batch scrape them
- Use webhooks for very large batches to avoid polling
- Set
ignoreInvalidURLs: true when working with uncertain URL lists
- Request only the formats you need to minimize processing time
- Use structured extraction with schemas for consistent data
- Consider rate limits and costs when batch scraping thousands of URLs
- URLs are processed in parallel for maximum speed
- No hard limit on the number of URLs per batch
- Each URL counts as one scrape credit
- Failed URLs can be retried individually
Next Steps
- Learn about Map to discover URLs for batch scraping
- Use Crawl when you need to scrape an entire site
- Try Scrape for individual URLs with more control