Documentation Index
Fetch the complete documentation index at: https://mintlify.com/firecrawl/firecrawl/llms.txt
Use this file to discover all available pages before exploring further.
By default, Firecrawl SDKs automatically paginate through all results when checking the status of crawls and batch scrapes. However, you can disable auto-pagination to manually fetch results one page at a time.
Manual pagination is useful when you want to:
- Process results incrementally as they become available
- Reduce memory usage by not loading all results at once
- Implement custom caching or rate limiting logic
- Display progress to users in real-time
- Handle very large result sets more efficiently
When a crawl or batch scrape job has more data than can fit in a single response, Firecrawl includes a next field in the status response. This opaque URL can be passed back to the SDK to fetch the next page of results.
The next URL is opaque and should not be parsed or modified. Always pass it directly to the SDK methods.
Disable auto-pagination and fetch results one page at a time:
from firecrawl import Firecrawl
from firecrawl.v2.types import PaginationConfig
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Start the crawl
crawl_job = app.start_crawl("https://firecrawl.dev", limit=100)
print(f"Crawl started: {crawl_job.id}")
# Fetch first page of results (auto_paginate=False)
status = app.get_crawl_status(
crawl_job.id,
pagination_config=PaginationConfig(auto_paginate=False)
)
print(f"Status: {status.status}")
print(f"Results in this page: {len(status.data)}")
print(f"Total: {status.total}")
print(f"Completed: {status.completed}")
# Process first page of results
for doc in status.data:
print(f"- {doc.metadata.get('sourceURL')}")
# Fetch next page if available
if status.next:
page2 = app.get_crawl_status_page(status.next)
print(f"\nPage 2 results: {len(page2.data)}")
for doc in page2.data:
print(f"- {doc.metadata.get('sourceURL')}")
# Continue fetching pages
if page2.next:
page3 = app.get_crawl_status_page(page2.next)
# ... and so on
The process is identical for batch scrape operations:
from firecrawl import Firecrawl
from firecrawl.v2.types import PaginationConfig
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Start the batch scrape
batch_job = app.start_batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
"https://firecrawl.dev/pricing",
# ... many more URLs
])
print(f"Batch scrape started: {batch_job.id}")
# Fetch first page of results
status = app.get_batch_scrape_status(
batch_job.id,
pagination_config=PaginationConfig(auto_paginate=False)
)
print(f"Status: {status.status}")
print(f"Results in this page: {len(status.data)}")
# Process first page
for doc in status.data:
print(f"- {doc.metadata.get('sourceURL')}")
# Fetch next page if available
if status.next:
page2 = app.get_batch_scrape_status_page(status.next)
print(f"\nPage 2 results: {len(page2.data)}")
for doc in page2.data:
print(f"- {doc.metadata.get('sourceURL')}")
Here’s a complete example that fetches all pages manually:
from firecrawl import Firecrawl
from firecrawl.v2.types import PaginationConfig
import time
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Start the crawl
crawl_job = app.start_crawl("https://docs.firecrawl.dev", limit=200)
print(f"Crawl started: {crawl_job.id}")
all_results = []
next_url = None
page_num = 1
# Wait for crawl to complete or have results
while True:
if next_url:
# Fetch next page using the next URL
status = app.get_crawl_status_page(next_url)
else:
# Fetch first page
status = app.get_crawl_status(
crawl_job.id,
pagination_config=PaginationConfig(auto_paginate=False)
)
print(f"\nPage {page_num}: {len(status.data)} results")
print(f"Overall progress: {status.completed}/{status.total}")
# Collect results from this page
all_results.extend(status.data)
# Check if there are more pages
if status.next:
next_url = status.next
page_num += 1
elif status.status == "completed":
print(f"\nCrawl completed! Total results: {len(all_results)}")
break
else:
# Still processing, wait and retry
print(f"Status: {status.status}, waiting...")
time.sleep(5)
next_url = None # Retry from first page
# Process all results
for doc in all_results:
print(f"- {doc.metadata.get('title')}: {doc.metadata.get('sourceURL')}")
Response Structure
When using manual pagination, status responses include:
{
"status": "scraping",
"total": 100,
"completed": 45,
"creditsUsed": 45,
"data": [
{
"markdown": "# Page Title\n\nContent...",
"metadata": {
"title": "Page Title",
"sourceURL": "https://example.com/page1"
}
}
// ... more results
],
"next": "https://api.firecrawl.dev/v2/crawl/123-456-789?page=eyJhbGciOiJI..."
}
Fields:
status: Current job status (scraping, completed, failed)
total: Total number of pages to scrape
completed: Number of pages completed so far
creditsUsed: Credits consumed
data: Array of scraped documents in this page
next: Opaque URL for the next page (only present if more data exists)
Best Practices
Check both next and status: A page may have no next URL because the job is still in progress. Check status to know if you should poll again.
Store the next URL: If you need to pause processing, save the next URL to resume from that point later.
Handle rate limits: When manually paginating, implement appropriate delays between requests to avoid rate limiting.
The next URL is temporary and may expire. Don’t store it for long-term use. If it expires, restart pagination from the job ID.
By default, the SDKs automatically fetch all pages for you:
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Auto-pagination enabled by default
crawl_job = app.start_crawl("https://firecrawl.dev", limit=100)
# This will automatically fetch all pages and poll until completion
status = app.get_crawl_status(crawl_job.id)
# All results are in status.data
print(f"Total results: {len(status.data)}")
for doc in status.data:
print(f"- {doc.metadata.get('sourceURL')}")
For most use cases, auto-pagination is recommended as it simplifies your code. Use manual pagination only when you need fine-grained control over the fetching process.