Self-hosting Firecrawl gives you full control over your web scraping infrastructure, allowing you to run Firecrawl on your own servers or local environment.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/firecrawl/firecrawl/llms.txt
Use this file to discover all available pages before exploring further.
Why Self-Host?
Self-hosting Firecrawl is particularly beneficial for organizations with stringent security policies that require data to remain within controlled environments. Here are some key reasons to consider self-hosting:- Enhanced Security and Compliance: By self-hosting, you ensure that all data handling and processing complies with internal and external regulations, keeping sensitive information within your secure infrastructure. Note that Firecrawl is a Mendable product and relies on SOC2 Type2 certification, which means that the platform adheres to high industry standards for managing data security.
- Customizable Services: Self-hosting allows you to tailor the services, such as the Playwright service, to meet specific needs or handle particular use cases that may not be supported by the standard cloud offering.
- Learning and Community Contribution: By setting up and maintaining your own instance, you gain a deeper understanding of how Firecrawl works, which can also lead to more meaningful contributions to the project.
Cloud vs Self-Hosted
Firecrawl is open source under the AGPL-3.0 license. The cloud version at firecrawl.dev includes additional features:
Cloud Features
The cloud version includes:- Advanced AI capabilities: Agent endpoint for autonomous data gathering
- Fire-engine: Advanced features for handling IP blocks, robot detection mechanisms, and more
- Managed infrastructure: No maintenance or configuration required
- Automatic scaling: Handle any volume of requests
- Premium support: Direct support from the Firecrawl team
Self-Hosted Capabilities
When self-hosting, you have access to:- Core scraping features: Scrape, crawl, and map endpoints
- Playwright rendering: JavaScript rendering and dynamic content support
- Custom configurations: Full control over proxy settings, resource limits, and more
- Local deployment: Run entirely within your infrastructure
The repository is in development, and custom modules are still being integrated into the mono repo. It’s not fully ready for production self-hosted deployment yet, but you can run it locally.
Limitations and Considerations
However, there are some limitations and additional responsibilities to be aware of:- Manual Configuration Required: If you need to use scraping methods beyond the basic fetch and Playwright options, you will need to manually configure these in the
.envfile. This requires a deeper understanding of the technologies and might involve more setup time. - No Supabase Support: Right now it’s not possible to configure Supabase in self-hosted instances, which means advanced logging and DB authentication features are not available.
- Additional Maintenance: You are responsible for updates, security patches, and infrastructure maintenance.
API Keys and Authentication
When using Firecrawl SDKs with a self-hosted instance, API keys are optional. API keys are only required when connecting to the cloud service (api.firecrawl.dev).
USE_DB_AUTHENTICATION=false, which bypasses authentication. This is suitable for local development or internal deployments behind a firewall.