Documentation Index
Fetch the complete documentation index at: https://mintlify.com/firecrawl/firecrawl/llms.txt
Use this file to discover all available pages before exploring further.
Firecrawl supports multiple output formats to suit different use cases. You can request one or multiple formats in a single scrape operation.
Specify the formats you want using the formats parameter:
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
doc = app.scrape(
url="https://firecrawl.dev",
formats=["markdown", "html", "screenshot"]
)
Markdown
Clean, LLM-ready markdown format. This is the default format and is ideal for feeding content into AI models.
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
doc = app.scrape("https://firecrawl.dev", formats=["markdown"])
print(doc.markdown)
Response:
{
"success": true,
"data": {
"markdown": "# Firecrawl Docs\n\nTurn websites into LLM-ready data...",
"metadata": {
"title": "Quickstart | Firecrawl",
"description": "Firecrawl allows you to turn entire websites into LLM-ready markdown",
"sourceURL": "https://docs.firecrawl.dev",
"statusCode": 200
}
}
}
Markdown format automatically removes headers, footers, navigation, and other non-main content when onlyMainContent is true (default).
HTML
Cleaned HTML version of the page content.
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
doc = app.scrape("https://firecrawl.dev", formats=["html"])
print(doc.html)
Raw HTML
The complete, unmodified HTML of the page including all scripts, styles, and metadata.
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
doc = app.scrape("https://firecrawl.dev", formats=["rawHtml"])
print(doc.raw_html)
Raw HTML includes everything on the page and can be very large. Use this only when you need the complete page source.
Links
Extract all links found on the page.
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
doc = app.scrape("https://firecrawl.dev", formats=["links"])
for link in doc.links:
print(link)
Response:
{
"success": true,
"data": {
"links": [
"https://firecrawl.dev/pricing",
"https://firecrawl.dev/blog",
"https://docs.firecrawl.dev"
]
}
}
Screenshot
Capture a screenshot of the page. Screenshots are returned as base64-encoded images.
from firecrawl import Firecrawl
import base64
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Viewport screenshot
doc = app.scrape("https://firecrawl.dev", formats=["screenshot"])
print(doc.screenshot) # Base64 encoded image
# Full page screenshot
doc = app.scrape("https://firecrawl.dev", formats=["screenshot@fullPage"])
print(doc.screenshot)
Use screenshot for viewport-sized screenshots or screenshot@fullPage to capture the entire page including content below the fold.
Extract structured data from pages using AI with a schema or prompt.
With Schema
Define a precise structure for the data you want to extract:
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR_API_KEY")
class CompanyInfo(BaseModel):
company_mission: str
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev',
formats=[{"type": "json", "schema": CompanyInfo.model_json_schema()}]
)
print(result.json)
Response:
{
"success": true,
"data": {
"json": {
"company_mission": "Turn websites into LLM-ready data",
"is_open_source": true,
"is_in_yc": true
}
}
}
With Prompt (No Schema)
Extract data using natural language without defining a strict schema:
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
result = app.scrape(
'https://firecrawl.dev',
formats=[{"type": "json", "prompt": "Extract the company mission"}]
)
print(result.json)
Branding
Extract brand identity information including colors, fonts, and typography.
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
doc = app.scrape("https://firecrawl.dev", formats=["branding"])
print(doc.branding)
Response:
{
"success": true,
"data": {
"branding": {
"logo": "https://firecrawl.dev/logo.png",
"colors": {
"primary": "#FF6B35",
"secondary": "#004E89"
},
"fonts": [
{"family": "Inter"},
{"family": "Roboto"}
],
"typography": {
"headingFont": "Inter",
"bodyFont": "Roboto"
}
}
}
}
Branding extraction derives information by executing on-page JavaScript to analyze computed styles and detect brand assets.
Change Tracking
Track changes to web pages over time using the changeTracking format.
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
doc = app.scrape(
url="https://example.com",
formats=["markdown", "changeTracking"],
change_tracking_options={
"modes": ["git-diff"]
}
)
if doc.change_tracking:
print(f"Status: {doc.change_tracking['changeStatus']}")
if doc.change_tracking.get('diff'):
print(doc.change_tracking['diff'])
Change tracking requires the markdown format to also be specified. The first scrape establishes a baseline for future comparisons.
You can request multiple formats in a single scrape to get different views of the same content:
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
doc = app.scrape(
url="https://firecrawl.dev",
formats=["markdown", "html", "screenshot", "links", "branding"]
)
print(f"Title: {doc.metadata['title']}")
print(f"Links found: {len(doc.links)}")
print(f"Primary color: {doc.branding['colors']['primary']}")
print(f"Content length: {len(doc.markdown)} chars")
Best Practices
Request only what you need: Each format adds to processing time and response size. Only request formats you’ll actually use.
Use JSON extraction for structured data: Instead of parsing markdown or HTML yourself, use JSON format with a schema to extract exactly what you need.
Combine formats strategically: Request markdown + links for content analysis and navigation discovery, or screenshot + branding for visual analysis.