Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Web Data

Scrape URL

The Scrape URL block retrieves content from one or more web pages and returns it in a format you specify.

Extract text, HTML, or structured data from web pages

The Scrape URL block retrieves content from one or more web pages and returns it in a format you specify. You provide a URL or a list of URLs — as a JSON array, comma-separated values, or newline-separated entries — and the block fetches the page content using either the default scraping service or Firecrawl. You can control whether to extract only the main body content, wait a set number of milliseconds before scraping, replace relative paths with absolute URLs, remove specific HTML tags, or pass custom request headers.

The block outputs a content field containing the scraped result, whose shape depends on the output format you select: text returns markdown, html returns raw HTML, and json returns structured scraper data. When multiple URLs are provided, the content field returns an array. If you enable the screenshot option, the block also captures a page screenshot and returns its URL in a separate screenshot field. Both the content and screenshot values can be saved to named variables for use later in the workflow.

This block fits into workflows that need to pull live information from the web before processing it with an AI model — for example, summarizing articles, extracting product details from listings, monitoring pages for specific content, or feeding scraped data into a structured analysis step. It can handle social media URLs with an optional auto-enhance setting designed to improve results for those sources.

What you can build

Real-world workflows powered by the Scrape URL block.

Article Summarization Pipeline

Scrape the text content of a news article or blog post, then pass the result to an AI block to generate a summary.

Product Data Extraction

Pull HTML or structured JSON from e-commerce product pages to extract pricing, descriptions, and availability for comparison or cataloging.

Competitor Page Monitoring

Scrape a set of competitor URLs on a schedule and feed the text output into an analysis block to detect content or pricing changes.

Social Media Content Capture

Use the auto-enhance option to scrape social media profile or post URLs and extract readable text for downstream processing.

Screenshot-Based Visual Review

Enable the screenshot option to capture a visual snapshot of a web page and store the screenshot URL for display or audit purposes.

Research Data Collection

Provide a list of URLs from academic or reference sites and retrieve their main content in text format to feed into a knowledge synthesis workflow.

Ready to add Scrape URL to your workflow?

Get Started Free

Common questions about Scrape URL

What are the required parameters for this block?

The only required parameter is the URL field, which accepts a single URL, a JSON array of URLs, or a comma- or newline-separated list of URLs. You also need to select a scraping service (either the default service or Firecrawl) and an output format (text, html, or json). All other fields — such as destination variable, screenshot variable, wait time, custom headers, and tag removal — are optional.

What does the block return?

The block returns a content field containing the scraped result. Its shape depends on the output format: text returns markdown, html returns raw HTML, and json returns structured scraper data. When multiple URLs are scraped, content is returned as an array. If the screenshot option is enabled, the block also returns a screenshot field containing the URL of the captured screenshot.

Can this block scrape multiple URLs at once?

Yes. The URL field accepts multiple URLs provided as a JSON array, a comma-separated list, or a newline-separated list. When multiple URLs are provided, the content output field returns an array of results corresponding to each URL.

What kinds of workflows commonly use this block?

This block is commonly used at the start of workflows that need live web content — for example, feeding scraped article text into an AI summarization step, extracting structured product data for comparison, or collecting reference material for a research or analysis workflow.

What does the onlyMainContent option do?

When onlyMainContent is set to true, the block attempts to extract only the primary body content of the page, filtering out navigation, footers, and other peripheral elements. This option is available regardless of which output format is selected.

Add Scrape URL to your workflow

Build powerful AI workflows with drag-and-drop blocks. No coding required.