Scrape URL
The Scrape URL block retrieves content from one or more web pages and returns it in a format you specify.
Extract text, HTML, or structured data from web pages
The Scrape URL block retrieves content from one or more web pages and returns it in a format you specify. You provide a URL or a list of URLs — as a JSON array, comma-separated values, or newline-separated entries — and the block fetches the page content using either the default scraping service or Firecrawl. You can control whether to extract only the main body content, wait a set number of milliseconds before scraping, replace relative paths with absolute URLs, remove specific HTML tags, or pass custom request headers.
The block outputs a content field containing the scraped result, whose shape depends on the output format you select: text returns markdown, html returns raw HTML, and json returns structured scraper data. When multiple URLs are provided, the content field returns an array. If you enable the screenshot option, the block also captures a page screenshot and returns its URL in a separate screenshot field. Both the content and screenshot values can be saved to named variables for use later in the workflow.
This block fits into workflows that need to pull live information from the web before processing it with an AI model — for example, summarizing articles, extracting product details from listings, monitoring pages for specific content, or feeding scraped data into a structured analysis step. It can handle social media URLs with an optional auto-enhance setting designed to improve results for those sources.
What you can build
Real-world workflows powered by the Scrape URL block.
Article Summarization Pipeline
Scrape the text content of a news article or blog post, then pass the result to an AI block to generate a summary.
Product Data Extraction
Pull HTML or structured JSON from e-commerce product pages to extract pricing, descriptions, and availability for comparison or cataloging.
Competitor Page Monitoring
Scrape a set of competitor URLs on a schedule and feed the text output into an analysis block to detect content or pricing changes.
Social Media Content Capture
Use the auto-enhance option to scrape social media profile or post URLs and extract readable text for downstream processing.
Screenshot-Based Visual Review
Enable the screenshot option to capture a visual snapshot of a web page and store the screenshot URL for display or audit purposes.
Research Data Collection
Provide a list of URLs from academic or reference sites and retrieve their main content in text format to feed into a knowledge synthesis workflow.
Ready to add Scrape URL to your workflow?
Get Started FreeCommon questions about Scrape URL
What are the required parameters for this block?
The only required parameter is the URL field, which accepts a single URL, a JSON array of URLs, or a comma- or newline-separated list of URLs. You also need to select a scraping service (either the default service or Firecrawl) and an output format (text, html, or json). All other fields — such as destination variable, screenshot variable, wait time, custom headers, and tag removal — are optional.
What does the block return?
The block returns a content field containing the scraped result. Its shape depends on the output format: text returns markdown, html returns raw HTML, and json returns structured scraper data. When multiple URLs are scraped, content is returned as an array. If the screenshot option is enabled, the block also returns a screenshot field containing the URL of the captured screenshot.
Can this block scrape multiple URLs at once?
Yes. The URL field accepts multiple URLs provided as a JSON array, a comma-separated list, or a newline-separated list. When multiple URLs are provided, the content output field returns an array of results corresponding to each URL.
What kinds of workflows commonly use this block?
This block is commonly used at the start of workflows that need live web content — for example, feeding scraped article text into an AI summarization step, extracting structured product data for comparison, or collecting reference material for a research or analysis workflow.
What does the onlyMainContent option do?
When onlyMainContent is set to true, the block attempts to extract only the primary body content of the page, filtering out navigation, footers, and other peripheral elements. This option is available regardless of which output format is selected.
Related capabilities
Extract Text from URL
Download a file from a URL and extract its text content. Supports PDFs, plain text files, and other document formats.
HTTP Request
Make an HTTP request to an external endpoint and return the response.
Search Perplexity
Search the web using the Perplexity API and return structured results.
Download Video
Download a video file
Extract from URL
Scrape and extract from a site
Add Scrape URL to your workflow
Build powerful AI workflows with drag-and-drop blocks. No coding required.