Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/firecrawl/firecrawl/llms.txt

Use this file to discover all available pages before exploring further.

By default, Firecrawl SDKs automatically paginate through all results when checking the status of crawls and batch scrapes. However, you can disable auto-pagination to manually fetch results one page at a time.

Why Use Manual Pagination?

Manual pagination is useful when you want to:
  • Process results incrementally as they become available
  • Reduce memory usage by not loading all results at once
  • Implement custom caching or rate limiting logic
  • Display progress to users in real-time
  • Handle very large result sets more efficiently

How Pagination Works

When a crawl or batch scrape job has more data than can fit in a single response, Firecrawl includes a next field in the status response. This opaque URL can be passed back to the SDK to fetch the next page of results.
The next URL is opaque and should not be parsed or modified. Always pass it directly to the SDK methods.

Manual Pagination for Crawls

Disable auto-pagination and fetch results one page at a time:
from firecrawl import Firecrawl
from firecrawl.v2.types import PaginationConfig

app = Firecrawl(api_key="fc-YOUR_API_KEY")

# Start the crawl
crawl_job = app.start_crawl("https://firecrawl.dev", limit=100)
print(f"Crawl started: {crawl_job.id}")

# Fetch first page of results (auto_paginate=False)
status = app.get_crawl_status(
    crawl_job.id,
    pagination_config=PaginationConfig(auto_paginate=False)
)

print(f"Status: {status.status}")
print(f"Results in this page: {len(status.data)}")
print(f"Total: {status.total}")
print(f"Completed: {status.completed}")

# Process first page of results
for doc in status.data:
    print(f"- {doc.metadata.get('sourceURL')}")

# Fetch next page if available
if status.next:
    page2 = app.get_crawl_status_page(status.next)
    print(f"\nPage 2 results: {len(page2.data)}")
    for doc in page2.data:
        print(f"- {doc.metadata.get('sourceURL')}")
    
    # Continue fetching pages
    if page2.next:
        page3 = app.get_crawl_status_page(page2.next)
        # ... and so on

Manual Pagination for Batch Scrapes

The process is identical for batch scrape operations:
from firecrawl import Firecrawl
from firecrawl.v2.types import PaginationConfig

app = Firecrawl(api_key="fc-YOUR_API_KEY")

# Start the batch scrape
batch_job = app.start_batch_scrape([
    "https://firecrawl.dev",
    "https://docs.firecrawl.dev",
    "https://firecrawl.dev/pricing",
    # ... many more URLs
])
print(f"Batch scrape started: {batch_job.id}")

# Fetch first page of results
status = app.get_batch_scrape_status(
    batch_job.id,
    pagination_config=PaginationConfig(auto_paginate=False)
)

print(f"Status: {status.status}")
print(f"Results in this page: {len(status.data)}")

# Process first page
for doc in status.data:
    print(f"- {doc.metadata.get('sourceURL')}")

# Fetch next page if available
if status.next:
    page2 = app.get_batch_scrape_status_page(status.next)
    print(f"\nPage 2 results: {len(page2.data)}")
    for doc in page2.data:
        print(f"- {doc.metadata.get('sourceURL')}")

Implementing a Pagination Loop

Here’s a complete example that fetches all pages manually:
from firecrawl import Firecrawl
from firecrawl.v2.types import PaginationConfig
import time

app = Firecrawl(api_key="fc-YOUR_API_KEY")

# Start the crawl
crawl_job = app.start_crawl("https://docs.firecrawl.dev", limit=200)
print(f"Crawl started: {crawl_job.id}")

all_results = []
next_url = None
page_num = 1

# Wait for crawl to complete or have results
while True:
    if next_url:
        # Fetch next page using the next URL
        status = app.get_crawl_status_page(next_url)
    else:
        # Fetch first page
        status = app.get_crawl_status(
            crawl_job.id,
            pagination_config=PaginationConfig(auto_paginate=False)
        )
    
    print(f"\nPage {page_num}: {len(status.data)} results")
    print(f"Overall progress: {status.completed}/{status.total}")
    
    # Collect results from this page
    all_results.extend(status.data)
    
    # Check if there are more pages
    if status.next:
        next_url = status.next
        page_num += 1
    elif status.status == "completed":
        print(f"\nCrawl completed! Total results: {len(all_results)}")
        break
    else:
        # Still processing, wait and retry
        print(f"Status: {status.status}, waiting...")
        time.sleep(5)
        next_url = None  # Retry from first page

# Process all results
for doc in all_results:
    print(f"- {doc.metadata.get('title')}: {doc.metadata.get('sourceURL')}")

Response Structure

When using manual pagination, status responses include:
{
  "status": "scraping",
  "total": 100,
  "completed": 45,
  "creditsUsed": 45,
  "data": [
    {
      "markdown": "# Page Title\n\nContent...",
      "metadata": {
        "title": "Page Title",
        "sourceURL": "https://example.com/page1"
      }
    }
    // ... more results
  ],
  "next": "https://api.firecrawl.dev/v2/crawl/123-456-789?page=eyJhbGciOiJI..."
}
Fields:
  • status: Current job status (scraping, completed, failed)
  • total: Total number of pages to scrape
  • completed: Number of pages completed so far
  • creditsUsed: Credits consumed
  • data: Array of scraped documents in this page
  • next: Opaque URL for the next page (only present if more data exists)

Best Practices

Check both next and status: A page may have no next URL because the job is still in progress. Check status to know if you should poll again.
Store the next URL: If you need to pause processing, save the next URL to resume from that point later.
Handle rate limits: When manually paginating, implement appropriate delays between requests to avoid rate limiting.
The next URL is temporary and may expire. Don’t store it for long-term use. If it expires, restart pagination from the job ID.

Auto-Pagination (Default Behavior)

By default, the SDKs automatically fetch all pages for you:
from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

# Auto-pagination enabled by default
crawl_job = app.start_crawl("https://firecrawl.dev", limit=100)

# This will automatically fetch all pages and poll until completion
status = app.get_crawl_status(crawl_job.id)

# All results are in status.data
print(f"Total results: {len(status.data)}")
for doc in status.data:
    print(f"- {doc.metadata.get('sourceURL')}")
For most use cases, auto-pagination is recommended as it simplifies your code. Use manual pagination only when you need fine-grained control over the fetching process.