Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/firecrawl/firecrawl/llms.txt

Use this file to discover all available pages before exploring further.

Firecrawl supports multiple output formats to suit different use cases. You can request one or multiple formats in a single scrape operation.

Available Formats

Specify the formats you want using the formats parameter:
from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

doc = app.scrape(
    url="https://firecrawl.dev",
    formats=["markdown", "html", "screenshot"]
)

Markdown

Clean, LLM-ready markdown format. This is the default format and is ideal for feeding content into AI models.
from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

doc = app.scrape("https://firecrawl.dev", formats=["markdown"])
print(doc.markdown)
Response:
{
  "success": true,
  "data": {
    "markdown": "# Firecrawl Docs\n\nTurn websites into LLM-ready data...",
    "metadata": {
      "title": "Quickstart | Firecrawl",
      "description": "Firecrawl allows you to turn entire websites into LLM-ready markdown",
      "sourceURL": "https://docs.firecrawl.dev",
      "statusCode": 200
    }
  }
}
Markdown format automatically removes headers, footers, navigation, and other non-main content when onlyMainContent is true (default).

HTML

Cleaned HTML version of the page content.
from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

doc = app.scrape("https://firecrawl.dev", formats=["html"])
print(doc.html)

Raw HTML

The complete, unmodified HTML of the page including all scripts, styles, and metadata.
from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

doc = app.scrape("https://firecrawl.dev", formats=["rawHtml"])
print(doc.raw_html)
Raw HTML includes everything on the page and can be very large. Use this only when you need the complete page source.
Extract all links found on the page.
from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

doc = app.scrape("https://firecrawl.dev", formats=["links"])
for link in doc.links:
    print(link)
Response:
{
  "success": true,
  "data": {
    "links": [
      "https://firecrawl.dev/pricing",
      "https://firecrawl.dev/blog",
      "https://docs.firecrawl.dev"
    ]
  }
}

Screenshot

Capture a screenshot of the page. Screenshots are returned as base64-encoded images.
from firecrawl import Firecrawl
import base64

app = Firecrawl(api_key="fc-YOUR_API_KEY")

# Viewport screenshot
doc = app.scrape("https://firecrawl.dev", formats=["screenshot"])
print(doc.screenshot)  # Base64 encoded image

# Full page screenshot
doc = app.scrape("https://firecrawl.dev", formats=["screenshot@fullPage"])
print(doc.screenshot)
Use screenshot for viewport-sized screenshots or screenshot@fullPage to capture the entire page including content below the fold.

JSON (Structured Extraction)

Extract structured data from pages using AI with a schema or prompt.

With Schema

Define a precise structure for the data you want to extract:
from firecrawl import Firecrawl
from pydantic import BaseModel

app = Firecrawl(api_key="fc-YOUR_API_KEY")

class CompanyInfo(BaseModel):
    company_mission: str
    is_open_source: bool
    is_in_yc: bool

result = app.scrape(
    'https://firecrawl.dev',
    formats=[{"type": "json", "schema": CompanyInfo.model_json_schema()}]
)

print(result.json)
Response:
{
  "success": true,
  "data": {
    "json": {
      "company_mission": "Turn websites into LLM-ready data",
      "is_open_source": true,
      "is_in_yc": true
    }
  }
}

With Prompt (No Schema)

Extract data using natural language without defining a strict schema:
from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

result = app.scrape(
    'https://firecrawl.dev',
    formats=[{"type": "json", "prompt": "Extract the company mission"}]
)

print(result.json)

Branding

Extract brand identity information including colors, fonts, and typography.
from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

doc = app.scrape("https://firecrawl.dev", formats=["branding"])
print(doc.branding)
Response:
{
  "success": true,
  "data": {
    "branding": {
      "logo": "https://firecrawl.dev/logo.png",
      "colors": {
        "primary": "#FF6B35",
        "secondary": "#004E89"
      },
      "fonts": [
        {"family": "Inter"},
        {"family": "Roboto"}
      ],
      "typography": {
        "headingFont": "Inter",
        "bodyFont": "Roboto"
      }
    }
  }
}
Branding extraction derives information by executing on-page JavaScript to analyze computed styles and detect brand assets.

Change Tracking

Track changes to web pages over time using the changeTracking format.
from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

doc = app.scrape(
    url="https://example.com",
    formats=["markdown", "changeTracking"],
    change_tracking_options={
        "modes": ["git-diff"]
    }
)

if doc.change_tracking:
    print(f"Status: {doc.change_tracking['changeStatus']}")
    if doc.change_tracking.get('diff'):
        print(doc.change_tracking['diff'])
Change tracking requires the markdown format to also be specified. The first scrape establishes a baseline for future comparisons.

Combining Multiple Formats

You can request multiple formats in a single scrape to get different views of the same content:
from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

doc = app.scrape(
    url="https://firecrawl.dev",
    formats=["markdown", "html", "screenshot", "links", "branding"]
)

print(f"Title: {doc.metadata['title']}")
print(f"Links found: {len(doc.links)}")
print(f"Primary color: {doc.branding['colors']['primary']}")
print(f"Content length: {len(doc.markdown)} chars")

Best Practices

Request only what you need: Each format adds to processing time and response size. Only request formats you’ll actually use.
Use JSON extraction for structured data: Instead of parsing markdown or HTML yourself, use JSON format with a schema to extract exactly what you need.
Combine formats strategically: Request markdown + links for content analysis and navigation discovery, or screenshot + branding for visual analysis.