Extract Text from URL
The Extract Text from URL block downloads a file from one or more URLs and returns its text content.
Extract text content from files at any URL
The Extract Text from URL block downloads a file from one or more URLs and returns its text content. It is designed for document formats such as PDFs and plain text files, making it useful when a workflow needs to read the contents of a file hosted online rather than scrape a web page. The block accepts a single URL or multiple URLs at once, and stores the result in a variable you specify.
The block takes a url input, which can be a single URL string, a comma-separated list of URLs, or a JSON array of URLs. An optional destinationVar field lets you name the variable where the extracted text will be stored for use in later steps. When a single URL is provided, the output is a plain string. When multiple URLs are provided, the output is an array of strings, one entry per URL.
This block fits into workflows that need to process document content — for example, reading a PDF report before summarizing it with an AI step, or pulling text from multiple files to compare or analyze. It works well as an early step in a pipeline where downstream blocks need raw text as their input.
What you can build
Real-world workflows powered by the Extract Text from URL block.
PDF Report Summarization
Extract text from a PDF report hosted at a URL, then pass the content to an AI block to generate a summary.
Contract Review Pipeline
Pull text from uploaded contract PDFs and route the content to an AI block that identifies key clauses or flags issues.
Batch Document Processing
Supply an array of document URLs to extract text from multiple files in a single step, then process each result downstream.
Resume Parsing Workflow
Download candidate resumes in PDF format from a storage URL and extract their text before passing it to a structured data extraction step.
Research Paper Ingestion
Extract text from academic papers or technical documents to feed into a question-answering or knowledge-base workflow.
Plain Text File Intake
Read the contents of plain text files hosted remotely and use the extracted content as input for classification or transformation steps.
Ready to add Extract Text from URL to your workflow?
Get Started FreeCommon questions about Extract Text from URL
What are the required parameters for this block?
The only required parameter is url, which accepts a single URL string, a comma-separated list of URLs, or a JSON array of URLs pointing to the files you want to extract text from. The destinationVar field is optional and lets you name the variable that will hold the extracted text.
What does the block return?
The block returns a text output field. When a single URL is provided, text is a string containing the extracted content. When multiple URLs are provided, text is an array of strings, with each entry corresponding to one of the input URLs.
What file types does this block support?
The block supports PDFs and plain text or document files. It is not intended for extracting content from web pages — for that use case, a separate scraping block should be used.
What kinds of workflows commonly use this block?
This block is typically used as an early step in workflows that need to process document content — such as summarization pipelines, contract review tools, resume parsers, or research ingestion flows — where downstream AI or data blocks require raw text as input.
Is there a file size limit?
Yes, the maximum supported file size is 50MB per URL.
Related capabilities
HTTP Request
Make an HTTP request to an external endpoint and return the response.
Scrape URL
Extract text, HTML, or structured content from one or more web pages.
Search Perplexity
Search the web using the Perplexity API and return structured results.
Download Video
Download a video file
Extract from URL
Scrape and extract from a site
Add Extract Text from URL to your workflow
Build powerful AI workflows with drag-and-drop blocks. No coding required.