Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Web Data

Extract Text from URL

The Extract Text from URL block downloads a file from one or more URLs and returns its text content.

Extract text content from files at any URL

The Extract Text from URL block downloads a file from one or more URLs and returns its text content. It is designed for document formats such as PDFs and plain text files, making it useful when a workflow needs to read the contents of a file hosted online rather than scrape a web page. The block accepts a single URL or multiple URLs at once, and stores the result in a variable you specify.

The block takes a url input, which can be a single URL string, a comma-separated list of URLs, or a JSON array of URLs. An optional destinationVar field lets you name the variable where the extracted text will be stored for use in later steps. When a single URL is provided, the output is a plain string. When multiple URLs are provided, the output is an array of strings, one entry per URL.

This block fits into workflows that need to process document content — for example, reading a PDF report before summarizing it with an AI step, or pulling text from multiple files to compare or analyze. It works well as an early step in a pipeline where downstream blocks need raw text as their input.

What you can build

Real-world workflows powered by the Extract Text from URL block.

PDF Report Summarization

Extract text from a PDF report hosted at a URL, then pass the content to an AI block to generate a summary.

Contract Review Pipeline

Pull text from uploaded contract PDFs and route the content to an AI block that identifies key clauses or flags issues.

Batch Document Processing

Supply an array of document URLs to extract text from multiple files in a single step, then process each result downstream.

Resume Parsing Workflow

Download candidate resumes in PDF format from a storage URL and extract their text before passing it to a structured data extraction step.

Research Paper Ingestion

Extract text from academic papers or technical documents to feed into a question-answering or knowledge-base workflow.

Plain Text File Intake

Read the contents of plain text files hosted remotely and use the extracted content as input for classification or transformation steps.

Ready to add Extract Text from URL to your workflow?

Get Started Free

Common questions about Extract Text from URL

What are the required parameters for this block?

The only required parameter is url, which accepts a single URL string, a comma-separated list of URLs, or a JSON array of URLs pointing to the files you want to extract text from. The destinationVar field is optional and lets you name the variable that will hold the extracted text.

What does the block return?

The block returns a text output field. When a single URL is provided, text is a string containing the extracted content. When multiple URLs are provided, text is an array of strings, with each entry corresponding to one of the input URLs.

What file types does this block support?

The block supports PDFs and plain text or document files. It is not intended for extracting content from web pages — for that use case, a separate scraping block should be used.

What kinds of workflows commonly use this block?

This block is typically used as an early step in workflows that need to process document content — such as summarization pipelines, contract review tools, resume parsers, or research ingestion flows — where downstream AI or data blocks require raw text as input.

Is there a file size limit?

Yes, the maximum supported file size is 50MB per URL.

Add Extract Text from URL to your workflow

Build powerful AI workflows with drag-and-drop blocks. No coding required.