Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/firecrawl/firecrawl/llms.txt

Use this file to discover all available pages before exploring further.

POST /v1/crawl

Start a crawl job to recursively scrape URLs starting from a base URL.

Authentication

This endpoint requires authentication using a Bearer token. Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY

Request Body

url
string
required
The base URL to start crawling from
excludePaths
array
URL pathname regex patterns that exclude matching URLs from the crawl. For example, if you set "excludePaths": ["blog/.*"] for the base URL firecrawl.dev, any results matching that pattern will be excluded, such as https://www.firecrawl.dev/blog/firecrawl-launch-week-1-recap.
includePaths
array
URL pathname regex patterns that include matching URLs in the crawl. Only the paths that match the specified patterns will be included in the response. For example, if you set "includePaths": ["blog/.*"] for the base URL firecrawl.dev, only results matching that pattern will be included, such as https://www.firecrawl.dev/blog/firecrawl-launch-week-1-recap.
maxDepth
integer
default:"10"
Maximum depth to crawl relative to the base URL. Basically, the max number of slashes the pathname of a scraped URL may contain.
maxDiscoveryDepth
integer
Maximum depth to crawl based on discovery order. The root site and sitemapped pages has a discovery depth of 0. For example, if you set it to 1, and you set ignoreSitemap, you will only crawl the entered URL and all URLs that are linked on that page.
ignoreSitemap
boolean
default:"false"
Ignore the website sitemap when crawling
ignoreQueryParameters
boolean
default:"false"
Do not re-scrape the same path with different (or none) query parameters
limit
integer
default:"10000"
Maximum number of pages to crawl. Default limit is 10000.
Allows the crawler to follow internal links to sibling or parent URLs, not just child paths.false: Only crawls deeper (child) URLs. → e.g. /features/feature-1 → /features/feature-1/tips ✅ → Won’t follow /pricing or / ❌true: Crawls any internal links, including siblings and parents. → e.g. /features/feature-1 → /pricing, /, etc. ✅Use true for broader internal coverage beyond nested paths.
Allows the crawler to follow links to external websites.
delay
number
Delay in seconds between scrapes. This helps respect website rate limits.
webhook
object
A webhook specification object.
scrapeOptions
object
Options for scraping each page. See Scrape Options for full details.Common options include:
  • formats: Output formats (e.g., ["markdown", "html", "links"])
  • onlyMainContent: Extract only main content (default: true)
  • includeTags: HTML tags to include
  • excludeTags: HTML tags to exclude
  • waitFor: Milliseconds to wait before scraping
  • mobile: Emulate mobile device

Response

success
boolean
Indicates if the crawl job was successfully started
id
string
The unique identifier for the crawl job. Use this ID to check the status and retrieve results.
url
string
The base URL that is being crawled

Example Request

curl -X POST https://api.firecrawl.dev/v1/crawl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -d '{
    "url": "https://example.com",
    "limit": 100,
    "scrapeOptions": {
      "formats": ["markdown", "html"]
    }
  }'

Example Response

{
  "success": true,
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "url": "https://example.com"
}

Error Responses

402 Payment Required
{
  "error": "Payment required to access this resource."
}
429 Too Many Requests
{
  "error": "Request rate limit exceeded. Please wait and try again later."
}
500 Server Error
{
  "error": "An unexpected error occurred on the server."
}

Next Steps

After starting a crawl job, use the returned id to: