How to Use the STORM Research Method in Your AI Agent Workflows
Stanford's STORM method uses 5 expert perspectives to produce 25% more organized research. Learn how to implement it as a Claude Code skill.
What the STORM Research Method Actually Does
Most AI-assisted research follows the same basic pattern: ask a question, get an answer, move on. The problem is that single-perspective research tends to miss angles, generate shallow outlines, and produce content that reads like a summary rather than an analysis.
STORM — which stands for Synthesis of Topic Outlines through Retrieval and Multi-perspective questioning — takes a different approach. Developed by researchers at Stanford, STORM simulates a panel of expert perspectives before any writing begins. The result is research that covers more ground, surfaces more nuanced questions, and organizes information more coherently than standard single-shot prompting.
Implementing the STORM research method in your AI agent workflows with Claude means you can apply this structure to real research tasks — not just one-off queries. This guide walks through what STORM is, why it works, and how to build it as a reusable skill in Claude Code.
Where STORM Comes From
The STORM method was introduced in a 2024 Stanford research paper focused on generating Wikipedia-quality long-form articles using LLMs. The core insight: the problem with AI-generated research isn’t the writing — it’s the pre-writing.
When a human expert researches a topic deeply, they naturally think about it from multiple angles. A product manager researching “remote team coordination” thinks about it differently than an HR leader, a software engineer, or an organizational psychologist. Each perspective surfaces different questions, different gaps, different assumptions.
Remy is new. The platform isn't.
Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.
Standard prompting collapses all of that into a single lens. STORM restores it.
In the original research, STORM-generated articles showed measurable improvements in outline quality and information coverage compared to baseline LLM generation. The multi-perspective approach consistently produced more organized, more complete output — the 25% improvement in organization cited in the method’s benchmarks reflects how much better structured the resulting outlines were versus single-shot prompting.
The Core Mechanism: 5 Expert Perspectives
The key structural element in STORM is generating multiple expert personas before any research or writing happens. These personas each bring a distinct viewpoint to the topic, and they each ask different questions about it.
Here’s how the five-perspective model typically works:
Perspective 1: The Domain Expert
This persona knows the topic deeply from a technical or specialist standpoint. They ask questions about accuracy, nuance, and edge cases that generalists would miss. For a topic like “supply chain automation,” this might be a logistics engineer who wants to know about failure modes, latency, and integration complexity.
Perspective 2: The Practitioner
This is someone who uses the thing in practice — not the theorist, but the person doing the work. They ask about workflow friction, real-world adoption, and what actually matters day-to-day versus what sounds good in theory.
Perspective 3: The Skeptic
The skeptic asks hard questions. What are the counterarguments? What doesn’t work? What are the failure cases? This perspective prevents STORM output from being one-sided or promotional, which is a common problem with AI-generated research.
Perspective 4: The Newcomer
This persona represents someone encountering the topic for the first time. Their questions surface foundational definitions, common misconceptions, and the “obvious” things that experts often skip over. Including this perspective ensures completeness.
Perspective 5: The Adjacent Expert
This persona comes from a neighboring field. They draw connections to related concepts, bring analogies from other domains, and ask cross-disciplinary questions. They often surface the most unexpected — and most useful — insights.
How STORM Works, Step by Step
The STORM workflow has four distinct phases. Understanding each one helps you implement it accurately.
Phase 1: Perspective Generation
Before touching the topic, the agent generates the five expert personas. Each persona gets a name, a domain background, and a specific stake in the topic. This step is often skipped by people trying to shortcut the method — but it’s what makes everything else work.
Phase 2: Multi-Perspective Questioning
Each persona generates a set of questions about the topic — typically 5–10 per persona. These questions aren’t answered yet; they’re just enumerated. The goal is to surface the full question space before narrowing in on answers.
This phase often reveals blind spots immediately. You’ll notice that different personas ask questions that would never have appeared in a single-shot prompt.
Phase 3: Research and Retrieval
Other agents start typing. Remy starts asking.
Scoping, trade-offs, edge cases — the real work. Before a line of code.
With a complete question set in hand, the agent retrieves information to answer each question. This can involve web search, document retrieval from a knowledge base, or synthesis from the model’s training data. The multi-question structure means you’re pulling from a much wider information surface than a single query would reach.
Phase 4: Outline Synthesis and Writing
The final phase synthesizes answers into a structured outline, then expands the outline into coherent content. Because the outline emerged from multiple perspectives and a broad question set, it tends to be more comprehensive and better organized than outlines produced by direct prompting.
Implementing STORM as a Claude Code Skill
Claude Code is well-suited to STORM because it supports complex multi-step workflows with clear handoffs between phases. Here’s how to build this as a reusable skill.
Prerequisites
Before building the skill, you need:
- Claude Code set up with access to a capable model (Claude 3.5 Sonnet or Claude Opus recommended for multi-step reasoning)
- A retrieval mechanism (web search API, vector store, or document index)
- A way to persist intermediate outputs between phases (file system, memory, or a data store)
Step 1: Build the Perspective Generator
Create a function that takes a topic as input and outputs five expert personas. Your prompt should specify that each persona includes:
- A professional role and background
- Their specific stake in the topic
- The lens through which they’d evaluate information about it
Example prompt structure:
You are helping design a research process for the topic: [TOPIC]
Generate 5 expert personas who would each have a distinct and valuable perspective on this topic. For each persona, provide:
- Name and professional role
- Their domain background (2-3 sentences)
- The specific questions they'd want answered about this topic
- The assumptions or biases they'd bring to it
Make the personas genuinely distinct — avoid creating five versions of the same perspective.
This step typically takes one LLM call. Store the output as structured data you can iterate over.
Step 2: Generate Questions Per Persona
For each persona, make a separate LLM call that generates their specific questions. Pass the persona description and the topic. Ask for 5–8 questions per persona, prioritizing questions that:
- This persona would uniquely ask (not obvious to other personas)
- Surface non-obvious angles of the topic
- Could be answered with retrievable information
Deduplicate across personas before moving forward. You’ll typically end up with 20–30 unique questions after deduplication.
Step 3: Implement the Research Loop
For each question, run a retrieval step. The implementation here depends on your setup:
- Web search: Use a search API (like Serper or Brave Search) to retrieve relevant pages, then summarize them against the specific question
- Document retrieval: Query your vector store with the question as the input
- Model knowledge: For questions that don’t require external data, let the model answer from training
Store question-answer pairs as structured objects. Each answer should include the source of information for traceability.
Step 4: Synthesize the Outline
Pass all question-answer pairs to a final synthesis prompt. Ask the model to:
- Group related Q&A pairs into logical themes
- Arrange themes into a coherent outline with H2/H3 structure
- Identify gaps where questions remain unanswered
- Note where perspectives conflict or contradict each other
- ✕a coding agent
- ✕no-code
- ✕vibe coding
- ✕a faster Cursor
The one that tells the coding agents what to build.
The conflict identification step is particularly valuable. When the domain expert and the skeptic give different answers to the same underlying question, that tension is often the most interesting thing in the research.
Step 5: Expand to Full Content
With a synthesized outline in hand, you can either:
- Pass the outline to another LLM call for full content generation
- Return the outline as the skill’s output for a human to review and expand
- Feed it into a separate writing workflow
For most use cases, the outline itself is the most valuable output — it’s what’s hardest to generate well, and it’s what everything else is built on.
Common Mistakes When Implementing STORM
Skipping Persona Diversity
The most common failure is generating personas that are too similar. If all five personas are essentially “subject matter experts,” the question sets will overlap heavily and you won’t get the coverage the method promises. Make sure at least one persona is a skeptic and one is a newcomer.
Treating It as a Single-Pass Prompt
STORM only works if you maintain the phase structure. Collapsing “generate personas, ask questions, research, and synthesize” into a single mega-prompt defeats the purpose. The value comes from separating question generation from answer retrieval.
Ignoring Conflicts in the Synthesis Step
When different perspectives give contradictory information, that’s a signal — not noise. Don’t smooth it over during synthesis. Surface it explicitly in the outline so it can be addressed in the content.
Using Weak Models for Persona Generation
The persona generation step is where the method can fail most visibly. If the model generates generic, shallow personas, everything downstream suffers. Use a capable model for this step even if you use a lighter model for later phases.
How MindStudio Fits Into This Workflow
If you want to implement STORM without managing the infrastructure yourself, MindStudio’s Agent Skills Plugin is worth looking at. It’s an npm SDK (@mindstudio-ai/agent) that lets Claude Code call 120+ typed capabilities — including web search, workflow execution, and data storage — as simple method calls.
Instead of wiring up your own search API integration, rate limiting, and retry logic, you can call agent.searchGoogle() directly from within your STORM skill. The infrastructure layer is handled for you, which means your Claude Code implementation can focus on the reasoning logic rather than the plumbing.
A practical STORM setup using MindStudio might look like:
const { MindStudio } = require('@mindstudio-ai/agent');
const agent = new MindStudio();
// Phase 3: research loop using MindStudio's search capability
for (const question of questions) {
const results = await agent.searchGoogle({ query: question });
// summarize and store results
}
Beyond the SDK, MindStudio also lets you build the entire STORM workflow visually as an automated agent — useful if you want to run STORM on a schedule, trigger it from an email, or expose it as a webhook endpoint that other tools can call. You can try it free at mindstudio.ai.
If you’re already building automated research workflows or multi-step AI agents, STORM maps cleanly onto MindStudio’s visual builder: each phase becomes a workflow block, and the intermediate outputs pass between them as variables.
Practical Use Cases for STORM Workflows
Competitive Intelligence
Seven tools to build an app. Or just Remy.
Editor, preview, AI agents, deploy — all in one tab. Nothing to install.
Assign your five personas as: a potential customer, a domain analyst, a skeptic evaluating the competitor’s weaknesses, a developer evaluating technical depth, and a marketer evaluating positioning. The resulting research covers competitive analysis from angles that single-prompt approaches consistently miss.
Content Research
Use STORM before writing any long-form content. The outline it generates typically covers objections, foundational questions, and advanced angles — all in one structured document. This is especially useful for content on complex or contested topics.
Due Diligence
Financial, technical, and strategic due diligence all benefit from multi-perspective questioning. Assign personas as: a financial analyst, an operational expert, a legal risk assessor, an industry skeptic, and a sector newcomer. The question set surfaces risks that single-lens review misses.
Knowledge Base Building
If you’re building an internal knowledge base on a new topic, running STORM first produces a structured question set that becomes the foundation for documentation. The questions themselves are often more valuable than the initial answers, because they define the information architecture.
Frequently Asked Questions
What does STORM stand for?
STORM stands for Synthesis of Topic Outlines through Retrieval and Multi-perspective questioning. It was developed as part of Stanford research on using LLMs to generate Wikipedia-quality articles at scale.
How is STORM different from standard chain-of-thought prompting?
Chain-of-thought prompting improves a model’s reasoning on a single track of thought. STORM generates multiple parallel reasoning tracks (expert perspectives) before any synthesis happens. It’s less about improving individual reasoning steps and more about widening the question surface before research begins.
Do I need five perspectives, or can I use more or fewer?
The five-perspective structure in the original research is a practical starting point, not a hard rule. For simple topics, three personas may be sufficient. For highly contested or technically complex topics, seven or eight can add value. The key constraint is diminishing returns: after a certain point, additional personas generate overlapping questions rather than new ones.
Can STORM work with a closed knowledge base instead of web search?
Yes. The retrieval step in STORM is agnostic to the source. You can point it at a vector database, a document store, or a structured knowledge base. For internal research use cases — like analyzing a company’s internal documentation — this often produces better results than web search because the retrieved information is more relevant to the specific context.
How long does a full STORM workflow take to run?
With a well-optimized implementation, a full STORM run (persona generation → question generation → research → synthesis) typically takes 3–8 minutes, depending on the number of questions and the retrieval speed. The research loop is the main variable; if you parallelize question retrieval, you can cut that time significantly.
Is STORM useful for short-form content or just long-form research?
STORM’s primary value is in pre-writing organization, so it scales with content complexity. For a 500-word blog post, the overhead isn’t worth it. For anything 2,000+ words, a topic brief, a technical analysis, or a decision memo, the multi-perspective outline pays off in structure and completeness.
Key Takeaways
- STORM uses five expert personas — domain expert, practitioner, skeptic, newcomer, and adjacent expert — to generate a multi-perspective question set before any research or writing begins.
- The four-phase structure (persona generation → questioning → research → synthesis) must be maintained as separate steps; collapsing it into a single prompt loses most of the value.
- Implementing STORM as a Claude Code skill involves structured LLM calls at each phase, a retrieval mechanism, and a synthesis step that explicitly surfaces conflicts between perspectives.
- The most common failure modes are generating similar-sounding personas, skipping the question deduplication step, and ignoring contradictions in the synthesis phase.
- MindStudio’s Agent Skills Plugin can handle the retrieval and infrastructure layer, letting your Claude Code implementation stay focused on reasoning logic.
If you want to start building STORM-based research agents without managing infrastructure from scratch, MindStudio gives you the tools to wire it together quickly — including search, storage, and workflow execution out of the box. The free tier is a reasonable place to start experimenting.
