2026-06-24

Firecrawl + AI Agents: Web Scraping at Scale With Clean Markdown Output (2026)

Firecrawl turns any website into LLM-ready markdown or structured JSON. This guide shows how to integrate it with AI agents for scraping, crawling, and data extraction.

2026-06-23

Firecrawl + AI Agents: Web Scraping at Scale With Clean Markdown Output (2026)

Firecrawl turns any website into LLM-ready markdown or structured JSON. This guide shows how to integrate it with AI agents for scraping, crawling, and data extraction.

Firecrawl is the web scraping API built for AI agents. Search, scrape, crawl, and interact with any website — get clean markdown or structured JSON back. This guide covers setup, the Agent endpoint, MCP integration, and real workflows with OpenClaw.


Every AI agent that does real work eventually needs data from the web. Competitor pricing, news articles, product catalogs, research papers, documentation — the agent needs to fetch pages, extract content, and turn it into something it can reason about.

Traditional web scraping tools (BeautifulSoup, Scrapy, Puppeteer) were built for humans writing scripts, not for AI agents. They require CSS selectors, handle JS rendering poorly, break on anti-bot protections, and output raw HTML that eats thousands of tokens before the useful content appears.

Firecrawl was built for the AI agent era. You give it a URL or a search query — it returns clean markdown, structured JSON, or screenshots. It handles proxies, rate limits, JS rendering, and anti-bot protections automatically. P95 latency of 3.4 seconds across millions of pages. 96% web coverage including JS-heavy SPAs.

This guide covers what Firecrawl does, how to integrate it with AI agents (OpenClaw, Claude Code, any MCP client), and real workflows for scraping, crawling, and data extraction.

What Firecrawl Does

Firecrawl is an API with six core endpoints, each solving a specific web data problem:

Endpoint What It Does Use Case
Search Search the web and get full page content from results "Find me the latest articles about CRISPR"
Scrape Convert any URL to markdown, HTML, screenshots, or structured JSON "Get the content of this blog post as clean markdown"
Interact Scrape a page, then interact with it — click, scroll, type, wait "Go to Amazon, search for keyboards, click the first result"
Agent Describe what data you need — the agent searches, navigates, and retrieves it "Find the pricing plans for Notion"
Crawl Scrape all URLs on a website with a single request "Get all pages from docs.firecrawl.dev"
Map Discover all URLs on a website instantly "List every URL on firecrawl.dev"
Batch Scrape Scrape thousands of URLs asynchronously "Scrape these 500 product pages"

The key differentiator is output format. Firecrawl does not return raw HTML. It returns:

  • Clean markdown — the page content as well-formatted markdown, with navigation, ads, footers, and boilerplate removed
  • Structured JSON — define a schema with Pydantic or JSON Schema, and Firecrawl extracts data matching that schema
  • Screenshots — full-page or viewport screenshots
  • Links — all links on the page, with anchor text
  • Raw HTML — if you need it, but you rarely will

This makes Firecrawl uniquely suited for AI agent workflows. Instead of feeding an agent 50KB of HTML and hoping it finds the right content, you feed it 3KB of clean markdown. Fewer tokens, better reasoning, faster responses.

Quick Start

Step 1 — Get an API key

Sign up at firecrawl.dev to get your API key. The free tier includes 500 credits per month — enough for testing and small projects.

Step 2 — Install the SDK

Python:

pip install firecrawl-py
          

Node.js:

npm install firecrawl
          

CLI (no code):

npm install -g firecrawl-cli
          

Step 3 — First scrape

from firecrawl import Firecrawl

          app = Firecrawl(api_key="fc-YOUR_API_KEY")

          result = app.scrape("https://firecrawl.dev")
          print(result.markdown)
          

Output:

# Firecrawl

          Firecrawl helps AI systems search, scrape, and interact with the web.

          ## Features
          - Search: Find information across the web
          - Scrape: Clean data from any page
          - Interact: Click, navigate, and operate pages
          - Agent: Autonomous data gathering
          

That is it. No CSS selectors, no HTML parsing, no proxy configuration. One API call, clean markdown.

The Agent Endpoint

Firecrawl's most powerful feature is the /agent endpoint. You describe what data you need in natural language — the agent searches the web, navigates pages, extracts the data, and returns it. No URLs required.

Basic agent call

result = app.agent(prompt="Find the pricing plans for Notion")
          print(result.data.result)
          

Output:

{
            "result": "Notion offers the following pricing plans:\n\n1. Free - $0/month...\n2. Plus - $10/seat/month...\n3. Business - $18/seat/month...",
            "sources": ["https://www.notion.so/pricing"]
          }
          

The agent searched the web, found Notion's pricing page, scraped it, extracted the pricing information, and returned it as structured text with the source URL.

Agent with structured output

Define a schema, and the agent returns data matching it:

from pydantic import BaseModel, Field
          from typing import List, Optional

          class Founder(BaseModel):
              name: str = Field(description="Full name of the founder")
              role: Optional[str] = Field(None, description="Role or position")

          class FoundersSchema(BaseModel):
              founders: List[Founder] = Field(description="List of founders")

          result = app.agent(
              prompt="Find the founders of Firecrawl",
              schema=FoundersSchema
          )

          print(result.data)
          

Output:

{
            "founders": [
              {"name": "Eric Ciarla", "role": "Co-founder"},
              {"name": "Nicolas Camara", "role": "Co-founder"},
              {"name": "Caleb Peffer", "role": "Co-founder"}
            ]
          }
          

Model selection

Two models power the agent:

Model Cost Best For
spark-1-mini (default) 60% cheaper Most tasks
spark-1-pro Standard Complex research, multi-site comparison, critical accuracy
result = app.agent(
              prompt="Compare enterprise features across Firecrawl, Apify, and ScrapingBee",
              model="spark-1-pro"
          )
          

Use Pro when the agent needs to explore multiple sites, handle complex navigation, or compare data across sources.

MCP Integration for AI Agents

Firecrawl ships with an MCP (Model Context Protocol) server. This means any MCP-compatible AI agent — Claude Code, OpenClaw, Antigravity, OpenCode — can use Firecrawl as a tool with zero custom code.

Setup

Add to your MCP client config:

{
            "mcpServers": {
              "firecrawl-mcp": {
                "command": "npx",
                "args": ["-y", "firecrawl-mcp"],
                "env": {
                  "FIRECRAWL_API_KEY": "fc-YOUR_API_KEY"
                }
              }
            }
          }
          

Restart your agent. It now has access to Firecrawl's search, scrape, crawl, and agent endpoints as MCP tools.

One-command skill install

For agents that support skills (Claude Code, OpenClaw):

npx -y firecrawl-cli@latest init --all --browser
          

This installs the Firecrawl skill, CLI, and browser integration. After restart, your agent can scrape pages, search the web, and interact with sites by calling Firecrawl tools.

Agent onboarding

Firecrawl has a self-onboarding endpoint for AI agents. If you are an agent reading this:

curl -s https://firecrawl.dev/agent-onboarding/SKILL.md
          

This returns a skill file that walks the agent through signing up a user, getting an API key, and starting to scrape.

Real Workflows With AI Agents

Workflow 1: Competitor monitoring

Every Monday at 8 AM, scrape the pricing pages of these 5 competitors:
          - competitor1.com/pricing
          - competitor2.com/pricing
          - competitor3.com/pricing
          - competitor4.com/pricing
          - competitor5.com/pricing

          Extract the pricing tiers, feature lists, and any promotional offers.
          Compare to last week's data. If anything changed, send me a Telegram message
          with the diff and a link to the changed page.
          

The agent uses Firecrawl to scrape each pricing page, extracts structured data using a schema, stores it, compares to the previous week's snapshot, and sends a Telegram alert if prices changed.

Workflow 2: Content research

Research the topic "MCP server architecture" for an article I'm writing.

          1. Search for "MCP server architecture" and get the top 10 results
          2. Scrape each result and extract the main content as markdown
          3. Identify key themes, definitions, code examples, and quotes
          4. Compile a research brief with sources cited
          5. Suggest an article outline based on the research
          

The agent uses /search to find sources, /scrape to get content, and its LLM to synthesize a research brief. Total time: under 2 minutes. Total cost: ~50 Firecrawl credits.

Workflow 3: Lead generation

Find all SaaS companies in the project management space that have a free tier.

          For each company:
          1. Find their website and pricing page
          2. Extract: company name, pricing tiers, free tier limits, key features
          3. Find their CEO/founder name and LinkedIn URL
          4. Check if they have a blog and extract the 3 most recent article titles

          Return as a structured table. Limit to 20 companies.
          

The agent uses /agent with a Pydantic schema, letting Firecrawl handle the web navigation and data extraction. The structured output goes directly into a CRM or spreadsheet.

Workflow 4: Documentation crawling

Crawl the entire documentation site at docs.example.com.

          1. Use /map to discover all URLs
          2. Use /crawl to scrape every page as markdown
          3. Save each page as a markdown file in ~/knowledge-base/example-docs/
          4. Create an index.md file with all page titles and links
          5. Count total pages scraped and total word count
          

The agent uses /map to discover URLs and /crawl to scrape them all. The result is a local knowledge base of clean markdown files — ready for RAG, search, or agent reference.

Workflow 5: Interactive form submission

Go to the government grant application portal at grants.gov.

          1. Scrape the homepage to understand the navigation
          2. Search for "AI research" grants
          3. Click the first result
          4. Extract: grant name, deadline, eligibility, award amount, application URL
          5. If there is an "Apply" button, click it and tell me what form fields are required
          

The agent uses /scrape to get the page, then /interact to click, type, and navigate. It can fill out forms, click buttons, and wait for dynamic content to load — all through the API.

Firecrawl vs Traditional Scraping Tools

Firecrawl BeautifulSoup + requests Scrapy Puppeteer/Playwright
JS rendering Yes, handled No No (needs Splash) Yes
Proxy rotation Yes, automatic No No (needs middleware) No
Anti-bot bypass Yes No No Partial
Output format Clean markdown, JSON, screenshots Raw HTML Raw HTML Raw HTML / screenshots
Schema extraction Yes, Pydantic/JSON Schema Manual parsing Manual parsing Manual parsing
Setup time 5 minutes 30+ minutes 1+ hours 1+ hours
Maintenance Zero High (selectors break) High High
Agent-native Yes (MCP, skills, agent endpoint) No No No
Cost Free tier (500 credits), then pay per use Free Free Free
Scale Thousands of pages via batch scrape Limited by your IP Good with infrastructure Limited by browser resources

The fundamental difference: traditional tools give you HTML and expect you to parse it. Firecrawl gives you data and expects you to use it.

Pricing

Plan Credits/month Price Best For
Free 500 $0 Testing, small projects
Hobby 2,000 $19/mo Personal projects, single agents
Standard 5,000 $49/mo Small teams, regular scraping
Growth 20,000 $149/mo Production agents, frequent crawling
Scale 100,000 $499/mo Enterprise, high-volume batch scraping

One credit = one page scraped or one search query. Agent endpoint uses 1-5 credits depending on complexity and model choice (spark-1-mini is 60% cheaper than spark-1-pro).

Self-Hosting

Firecrawl is open source (AGPLv3). You can self-host the entire stack:

git clone https://github.com/firecrawl/firecrawl.git
          cd firecrawl
          docker-compose up -d
          

This runs the Firecrawl API on localhost:3002. Point your SDK or MCP config at your self-hosted instance instead of firecrawl.dev. You handle your own proxies and infrastructure — no credit costs, but you maintain the stack.

For most teams, the hosted service is cheaper than self-hosting once you factor in proxy costs, infrastructure, and maintenance time.

SDK Reference

Python

from firecrawl import Firecrawl

          app = Firecrawl(api_key="fc-YOUR_API_KEY")

          # Search
          results = app.search("best AI tools 2026", limit=10)

          # Scrape
          doc = app.scrape("https://example.com", formats=["markdown", "json"])

          # Crawl (auto-polls)
          docs = app.crawl("https://docs.example.com", limit=100)

          # Map
          urls = app.map("https://example.com")

          # Batch scrape
          job = app.batch_scrape(["url1", "url2", "url3"], formats=["markdown"])

          # Agent
          result = app.agent(prompt="Find the founders of Stripe", schema=MySchema)

          # Interact
          result = app.scrape("https://amazon.com")
          app.interact(result.metadata.scrape_id, prompt="Search for 'mechanical keyboard'")
          

Node.js

import { Firecrawl } from 'firecrawl';

          const app = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' });

          // Search
          const results = await app.search('best AI tools 2026', { limit: 10 });

          // Scrape
          const doc = await app.scrape('https://example.com');

          // Crawl
          const docs = await app.crawl('https://docs.example.com', { limit: 100 });

          // Agent
          const result = await app.agent({ prompt: 'Find the founders of Stripe' });
          

CLI

# Install
          npm install -g firecrawl-cli

          # Search
          firecrawl search "best AI tools 2026" --limit 10

          # Scrape
          firecrawl scrape https://example.com

          # Crawl
          firecrawl crawl https://docs.example.com --limit 100

          # Interactive
          firecrawl scrape https://amazon.com
          firecrawl interact exec --prompt "Search for 'mechanical keyboard'"
          

FAQ

Is Firecrawl free?

Yes. The free tier includes 500 credits per month (500 page scrapes or searches). Paid plans start at $19/month for 2,000 credits. You can also self-host the open-source version for unlimited usage — but you handle proxies and infrastructure.

Does Firecrawl handle JavaScript-rendered pages?

Yes. Firecrawl renders JS-heavy SPAs, React apps, Vue apps, and dynamic content. You do not need to configure anything — it is handled automatically.

Can Firecrawl bypass anti-bot protections?

Firecrawl handles proxy rotation, rate limiting, and common anti-bot measures automatically. It covers 96% of the web. For sites with aggressive bot detection (e.g., Cloudflare Challenge), the success rate is lower but still significantly better than raw scraping.

How is Firecrawl different from Apify or ScrapingBee?

Firecrawl is built specifically for AI agents. The output is clean markdown or structured JSON — not raw HTML. It has an MCP server, a skill install command, and an agent endpoint that does autonomous data gathering. Apify and ScrapingBee are general-purpose scraping APIs that output HTML or require you to write extraction logic.

Can I use Firecrawl with OpenClaw?

Yes. Install the Firecrawl skill with npx -y firecrawl-cli@latest init --all --browser, or add the MCP server to your OpenClaw config. Your agent can then search, scrape, crawl, and interact with web pages through Firecrawl tools.

Does Firecrawl support authentication-protected pages?

The /interact endpoint supports clicking, typing, and navigating — so the agent can log in to a site and then scrape authenticated content. However, this requires storing credentials and should be done carefully. For API-accessible authenticated content, use the site's official API instead.

Related articles