Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/firecrawl/firecrawl/llms.txt

Use this file to discover all available pages before exploring further.

The Map feature allows you to quickly discover all URLs on a website without scraping the content. It’s the fastest way to get a complete list of pages, making it ideal for planning crawls or understanding website structure.

When to Use Map

Use Map when you need to:
  • Discover all URLs on a website before crawling
  • Understand website structure and navigation
  • Find specific pages using search
  • Build a sitemap or URL inventory
  • Plan a targeted crawl with specific URLs

Basic Usage

from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

# Map a website
result = app.map('https://firecrawl.dev')

for link in result.links:
    print(f"{link.url} - {link.title}")

Response

{
  "success": true,
  "links": [
    {
      "url": "https://firecrawl.dev",
      "title": "Firecrawl",
      "description": "Turn websites into LLM-ready data"
    },
    {
      "url": "https://firecrawl.dev/pricing",
      "title": "Pricing",
      "description": "Firecrawl pricing plans"
    },
    {
      "url": "https://firecrawl.dev/blog",
      "title": "Blog",
      "description": "Firecrawl blog"
    }
  ]
}
Find specific URLs within a site using search. Results are ordered by relevance:
result = app.map("https://firecrawl.dev", search="pricing")

# URLs are ordered by relevance to "pricing"
for link in result.links:
    print(link.url)
During the Alpha phase, the ‘smart’ search functionality is limited to 1000 results. However, if Map finds more results, there is no limit applied.

Map Options

Limit Results

result = app.map(
    "https://firecrawl.dev",
    limit=100  # Return up to 100 links
)

Include Subdomains

result = app.map(
    "https://firecrawl.dev",
    include_subdomains=True  # Include docs.firecrawl.dev, etc.
)

Ignore Sitemap

result = app.map(
    "https://firecrawl.dev",
    ignore_sitemap=True  # Don't use sitemap.xml
)

Sitemap Only

result = app.map(
    "https://firecrawl.dev",
    sitemap_only=True  # Only return links from sitemap.xml
)

Set Timeout

result = app.map(
    "https://firecrawl.dev",
    timeout=60000  # 60 second timeout in milliseconds
)

Use Cases

Plan a Targeted Crawl

1

Map the website

# First, map the site to discover all URLs
map_result = app.map("https://docs.firecrawl.dev")
print(f"Found {len(map_result.links)} URLs")
2

Filter URLs

# Filter to only API reference pages
api_urls = [
    link.url for link in map_result.links
    if '/api-reference/' in link.url
]
print(f"Found {len(api_urls)} API pages")
3

Crawl specific URLs

# Now crawl only those specific pages
from firecrawl.types import ScrapeOptions

result = app.crawl(
    'https://docs.firecrawl.dev',
    include_paths=['api-reference/.*'],
    scrape_options=ScrapeOptions(formats=['markdown'])
)

Find Specific Content

# Search for documentation about authentication
result = app.map(
    "https://docs.firecrawl.dev",
    search="authentication"
)

# Results are ordered by relevance
for link in result.links[:5]:  # Top 5 most relevant
    print(f"{link.title}: {link.url}")

Build a Sitemap

# Get all URLs for sitemap generation
result = app.map(
    "https://yoursite.com",
    limit=5000,
    include_subdomains=False
)

# Generate XML sitemap
with open('sitemap.xml', 'w') as f:
    f.write('<?xml version="1.0" encoding="UTF-8"?>\n')
    f.write('<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n')
    for link in result.links:
        f.write(f'  <url><loc>{link.url}</loc></url>\n')
    f.write('</urlset>')

Best Practices

  • Use Map before Crawl to understand the site structure and estimate costs
  • Use the search parameter to find specific pages quickly
  • Set appropriate limit values to avoid overwhelming results
  • Enable includeSubdomains only if you need cross-subdomain mapping
  • Use sitemapOnly for websites with well-maintained sitemaps for faster results
  • Map is much faster and cheaper than crawling - use it for discovery

Limits

  • Default limit: 5,000 links
  • Maximum limit: 30,000 links
  • No timeout by default (configurable)

Next Steps

  • Use discovered URLs with Batch Scrape for parallel scraping
  • Combine with Crawl for comprehensive site extraction
  • Try Search to find content across the web