2026-06-24

Firecrawl + AI Agents: Web Scraping at Scale With Clean Markdown Output (2026)

Firecrawl turns any website into LLM-ready markdown or structured JSON. This guide shows how to integrate it with AI agents for scraping, crawling, and data extraction.

2026-06-23

Firecrawl + AI Agents: Web Scraping at Scale With Clean Markdown Output (2026)

Firecrawl turns any website into LLM-ready markdown or structured JSON. This guide shows how to integrate it with AI agents for scraping, crawling, and data extraction.

Firecrawl is the web scraping API built for AI agents. Search, scrape, crawl, and interact with any website — get clean markdown or structured JSON back. This guide covers setup, the Agent endpoint, MCP integration, and real workflows with OpenClaw.

Every AI agent that does real work eventually needs data from the web. Competitor pricing, news articles, product catalogs, research papers, documentation — the agent needs to fetch pages, extract content, and turn it into something it can reason about.

Traditional web scraping tools (BeautifulSoup, Scrapy, Puppeteer) were built for humans writing scripts, not for AI agents. They require CSS selectors, handle JS rendering poorly, break on anti-bot protections, and output raw HTML that eats thousands of tokens before the useful content appears.

Firecrawl was built for the AI agent era. You give it a URL or a search query — it returns clean markdown, structured JSON, or screenshots. It handles proxies, rate limits, JS rendering, and anti-bot protections automatically. P95 latency of 3.4 seconds across millions of pages. 96% web coverage including JS-heavy SPAs.

This guide covers what Firecrawl does, how to integrate it with AI agents (OpenClaw, Claude Code, any MCP client), and real workflows for scraping, crawling, and data extraction.

What Firecrawl Does

Firecrawl is an API with six core endpoints, each solving a specific web data problem:

Endpoint	What It Does	Use Case
Search	Search the web and get full page content from results	"Find me the latest articles about CRISPR"
Scrape	Convert any URL to markdown, HTML, screenshots, or structured JSON	"Get the content of this blog post as clean markdown"
Interact	Scrape a page, then interact with it — click, scroll, type, wait	"Go to Amazon, search for keyboards, click the first result"
Agent	Describe what data you need — the agent searches, navigates, and retrieves it	"Find the pricing plans for Notion"
Crawl	Scrape all URLs on a website with a single request	"Get all pages from docs.firecrawl.dev"
Map	Discover all URLs on a website instantly	"List every URL on firecrawl.dev"
Batch Scrape	Scrape thousands of URLs asynchronously	"Scrape these 500 product pages"

The key differentiator is output format. Firecrawl does not return raw HTML. It returns:

Clean markdown — the page content as well-formatted markdown, with navigation, ads, footers, and boilerplate removed
Structured JSON — define a schema with Pydantic or JSON Schema, and Firecrawl extracts data matching that schema
Screenshots — full-page or viewport screenshots
Links — all links on the page, with anchor text
Raw HTML — if you need it, but you rarely will

This makes Firecrawl uniquely suited for AI agent workflows. Instead of feeding an agent 50KB of HTML and hoping it finds the right content, you feed it 3KB of clean markdown. Fewer tokens, better reasoning, faster responses.

Quick Start

Step 1 — Get an API key

Sign up at firecrawl.dev to get your API key. The free tier includes 500 credits per month — enough for testing and small projects.

Step 2 — Install the SDK

Python:

pip install firecrawl-py

Node.js:

npm install firecrawl

CLI (no code):

npm install -g firecrawl-cli

Step 3 — First scrape

from firecrawl import Firecrawl

          app = Firecrawl(api_key="fc-YOUR_API_KEY")

          result = app.scrape("https://firecrawl.dev")
          print(result.markdown)

Output:

# Firecrawl

          Firecrawl helps AI systems search, scrape, and interact with the web.

          ## Features
          - Search: Find information across the web
          - Scrape: Clean data from any page
          - Interact: Click, navigate, and operate pages
          - Agent: Autonomous data gathering

That is it. No CSS selectors, no HTML parsing, no proxy configuration. One API call, clean markdown.

The Agent Endpoint

Firecrawl's most powerful feature is the /agent endpoint. You describe what data you need in natural language — the agent searches the web, navigates pages, extracts the data, and returns it. No URLs required.

Basic agent call

result = app.agent(prompt="Find the pricing plans for Notion")
          print(result.data.result)

Output:

{
            "result": "Notion offers the following pricing plans:\n\n1. Free - $0/month...\n2. Plus - $10/seat/month...\n3. Business - $18/seat/month...",
            "sources": ["https://www.notion.so/pricing"]
          }

The agent searched the web, found Notion's pricing page, scraped it, extracted the pricing information, and returned it as structured text with the source URL.

Agent with structured output

Define a schema, and the agent returns data matching it:

from pydantic import BaseModel, Field
          from typing import List, Optional

          class Founder(BaseModel):
              name: str = Field(description="Full name of the founder")
              role: Optional[str] = Field(None, description="Role or position")

          class FoundersSchema(BaseModel):
              founders: List[Founder] = Field(description="List of founders")

          result = app.agent(
              prompt="Find the founders of Firecrawl",
              schema=FoundersSchema
          )

          print(result.data)

Output:

{
            "founders": [
              {"name": "Eric Ciarla", "role": "Co-founder"},
              {"name": "Nicolas Camara", "role": "Co-founder"},
              {"name": "Caleb Peffer", "role": "Co-founder"}
            ]
          }

Model selection

Two models power the agent:

Model	Cost	Best For
`spark-1-mini` (default)	60% cheaper	Most tasks
`spark-1-pro`	Standard	Complex research, multi-site comparison, critical accuracy

result = app.agent(
              prompt="Compare enterprise features across Firecrawl, Apify, and ScrapingBee",
              model="spark-1-pro"
          )

Use Pro when the agent needs to explore multiple sites, handle complex navigation, or compare data across sources.

MCP Integration for AI Agents

Firecrawl ships with an MCP (Model Context Protocol) server. This means any MCP-compatible AI agent — Claude Code, OpenClaw, Antigravity, OpenCode — can use Firecrawl as a tool with zero custom code.

Setup

Add to your MCP client config:

{
            "mcpServers": {
              "firecrawl-mcp": {
                "command": "npx",
                "args": ["-y", "firecrawl-mcp"],
                "env": {
                  "FIRECRAWL_API_KEY": "fc-YOUR_API_KEY"
                }
              }
            }
          }

Restart your agent. It now has access to Firecrawl's search, scrape, crawl, and agent endpoints as MCP tools.

One-command skill install

For agents that support skills (Claude Code, OpenClaw):

npx -y firecrawl-cli@latest init --all --browser

This installs the Firecrawl skill, CLI, and browser integration. After restart, your agent can scrape pages, search the web, and interact with sites by calling Firecrawl tools.

Agent onboarding

Firecrawl has a self-onboarding endpoint for AI agents. If you are an agent reading this:

curl -s https://firecrawl.dev/agent-onboarding/SKILL.md

This returns a skill file that walks the agent through signing up a user, getting an API key, and starting to scrape.

Real Workflows With AI Agents

Workflow 1: Competitor monitoring

Every Monday at 8 AM, scrape the pricing pages of these 5 competitors:
          - competitor1.com/pricing
          - competitor2.com/pricing
          - competitor3.com/pricing
          - competitor4.com/pricing
          - competitor5.com/pricing

          Extract the pricing tiers, feature lists, and any promotional offers.
          Compare to last week's data. If anything changed, send me a Telegram message
          with the diff and a link to the changed page.

The agent uses Firecrawl to scrape each pricing page, extracts structured data using a schema, stores it, compares to the previous week's snapshot, and sends a Telegram alert if prices changed.

Workflow 2: Content research

Research the topic "MCP server architecture" for an article I'm writing.

          1. Search for "MCP server architecture" and get the top 10 results
          2. Scrape each result and extract the main content as markdown
          3. Identify key themes, definitions, code examples, and quotes
          4. Compile a research brief with sources cited
          5. Suggest an article outline based on the research

The agent uses /search to find sources, /scrape to get content, and its LLM to synthesize a research brief. Total time: under 2 minutes. Total cost: ~50 Firecrawl credits.

Workflow 3: Lead generation

Find all SaaS companies in the project management space that have a free tier.

          For each company:
          1. Find their website and pricing page
          2. Extract: company name, pricing tiers, free tier limits, key features
          3. Find their CEO/founder name and LinkedIn URL
          4. Check if they have a blog and extract the 3 most recent article titles

          Return as a structured table. Limit to 20 companies.

The agent uses /agent with a Pydantic schema, letting Firecrawl handle the web navigation and data extraction. The structured output goes directly into a CRM or spreadsheet.

Workflow 4: Documentation crawling

Crawl the entire documentation site at docs.example.com.

          1. Use /map to discover all URLs
          2. Use /crawl to scrape every page as markdown
          3. Save each page as a markdown file in ~/knowledge-base/example-docs/
          4. Create an index.md file with all page titles and links
          5. Count total pages scraped and total word count

The agent uses /map to discover URLs and /crawl to scrape them all. The result is a local knowledge base of clean markdown files — ready for RAG, search, or agent reference.

Workflow 5: Interactive form submission

Go to the government grant application portal at grants.gov.

          1. Scrape the homepage to understand the navigation
          2. Search for "AI research" grants
          3. Click the first result
          4. Extract: grant name, deadline, eligibility, award amount, application URL
          5. If there is an "Apply" button, click it and tell me what form fields are required

The agent uses /scrape to get the page, then /interact to click, type, and navigate. It can fill out forms, click buttons, and wait for dynamic content to load — all through the API.

Firecrawl vs Traditional Scraping Tools

	Firecrawl	BeautifulSoup + requests	Scrapy	Puppeteer/Playwright
JS rendering	Yes, handled	No	No (needs Splash)	Yes
Proxy rotation	Yes, automatic	No	No (needs middleware)	No
Anti-bot bypass	Yes	No	No	Partial
Output format	Clean markdown, JSON, screenshots	Raw HTML	Raw HTML	Raw HTML / screenshots
Schema extraction	Yes, Pydantic/JSON Schema	Manual parsing	Manual parsing	Manual parsing
Setup time	5 minutes	30+ minutes	1+ hours	1+ hours
Maintenance	Zero	High (selectors break)	High	High
Agent-native	Yes (MCP, skills, agent endpoint)	No	No	No
Cost	Free tier (500 credits), then pay per use	Free	Free	Free
Scale	Thousands of pages via batch scrape	Limited by your IP	Good with infrastructure	Limited by browser resources

The fundamental difference: traditional tools give you HTML and expect you to parse it. Firecrawl gives you data and expects you to use it.

Pricing

Plan	Credits/month	Price	Best For
Free	500	$0	Testing, small projects
Hobby	2,000	$19/mo	Personal projects, single agents
Standard	5,000	$49/mo	Small teams, regular scraping
Growth	20,000	$149/mo	Production agents, frequent crawling
Scale	100,000	$499/mo	Enterprise, high-volume batch scraping

One credit = one page scraped or one search query. Agent endpoint uses 1-5 credits depending on complexity and model choice (spark-1-mini is 60% cheaper than spark-1-pro).

Self-Hosting

Firecrawl is open source (AGPLv3). You can self-host the entire stack:

git clone https://github.com/firecrawl/firecrawl.git
          cd firecrawl
          docker-compose up -d

This runs the Firecrawl API on localhost:3002. Point your SDK or MCP config at your self-hosted instance instead of firecrawl.dev. You handle your own proxies and infrastructure — no credit costs, but you maintain the stack.

For most teams, the hosted service is cheaper than self-hosting once you factor in proxy costs, infrastructure, and maintenance time.

SDK Reference

Python

from firecrawl import Firecrawl

          app = Firecrawl(api_key="fc-YOUR_API_KEY")

          # Search
          results = app.search("best AI tools 2026", limit=10)

          # Scrape
          doc = app.scrape("https://example.com", formats=["markdown", "json"])

          # Crawl (auto-polls)
          docs = app.crawl("https://docs.example.com", limit=100)

          # Map
          urls = app.map("https://example.com")

          # Batch scrape
          job = app.batch_scrape(["url1", "url2", "url3"], formats=["markdown"])

          # Agent
          result = app.agent(prompt="Find the founders of Stripe", schema=MySchema)

          # Interact
          result = app.scrape("https://amazon.com")
          app.interact(result.metadata.scrape_id, prompt="Search for 'mechanical keyboard'")

Node.js

import { Firecrawl } from 'firecrawl';

          const app = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' });

          // Search
          const results = await app.search('best AI tools 2026', { limit: 10 });

          // Scrape
          const doc = await app.scrape('https://example.com');

          // Crawl
          const docs = await app.crawl('https://docs.example.com', { limit: 100 });

          // Agent
          const result = await app.agent({ prompt: 'Find the founders of Stripe' });

CLI

# Install
          npm install -g firecrawl-cli

          # Search
          firecrawl search "best AI tools 2026" --limit 10

          # Scrape
          firecrawl scrape https://example.com

          # Crawl
          firecrawl crawl https://docs.example.com --limit 100

          # Interactive
          firecrawl scrape https://amazon.com
          firecrawl interact exec --prompt "Search for 'mechanical keyboard'"

FAQ

Is Firecrawl free?

Yes. The free tier includes 500 credits per month (500 page scrapes or searches). Paid plans start at $19/month for 2,000 credits. You can also self-host the open-source version for unlimited usage — but you handle proxies and infrastructure.

Does Firecrawl handle JavaScript-rendered pages?

Yes. Firecrawl renders JS-heavy SPAs, React apps, Vue apps, and dynamic content. You do not need to configure anything — it is handled automatically.

Can Firecrawl bypass anti-bot protections?

Firecrawl handles proxy rotation, rate limiting, and common anti-bot measures automatically. It covers 96% of the web. For sites with aggressive bot detection (e.g., Cloudflare Challenge), the success rate is lower but still significantly better than raw scraping.

How is Firecrawl different from Apify or ScrapingBee?

Firecrawl is built specifically for AI agents. The output is clean markdown or structured JSON — not raw HTML. It has an MCP server, a skill install command, and an agent endpoint that does autonomous data gathering. Apify and ScrapingBee are general-purpose scraping APIs that output HTML or require you to write extraction logic.

Can I use Firecrawl with OpenClaw?

Yes. Install the Firecrawl skill with npx -y firecrawl-cli@latest init --all --browser, or add the MCP server to your OpenClaw config. Your agent can then search, scrape, crawl, and interact with web pages through Firecrawl tools.

Does Firecrawl support authentication-protected pages?

The /interact endpoint supports clicking, typing, and navigating — so the agent can log in to a site and then scrape authenticated content. However, this requires storing credentials and should be done carefully. For API-accessible authenticated content, use the site's official API instead.

Firecrawl + AI Agents: Web Scraping at Scale With Clean Markdown Output (2026)

Firecrawl + AI Agents: Web Scraping at Scale With Clean Markdown Output (2026)

What Firecrawl Does

Quick Start

Step 1 — Get an API key

Step 2 — Install the SDK

Step 3 — First scrape

The Agent Endpoint

Basic agent call

Agent with structured output

Model selection

MCP Integration for AI Agents

Setup

One-command skill install

Agent onboarding

Real Workflows With AI Agents

Workflow 1: Competitor monitoring

Workflow 2: Content research

Workflow 3: Lead generation

Workflow 4: Documentation crawling

Workflow 5: Interactive form submission

Firecrawl vs Traditional Scraping Tools

Pricing

Self-Hosting

SDK Reference

Python

Node.js

CLI

FAQ

Is Firecrawl free?

Does Firecrawl handle JavaScript-rendered pages?

Can Firecrawl bypass anti-bot protections?

How is Firecrawl different from Apify or ScrapingBee?

Can I use Firecrawl with OpenClaw?

Does Firecrawl support authentication-protected pages?

Related articles