Coding Agents Skipped RAG — RAG Still Wins on Large Docs

Q: What retrieval method does Claude Code use?

Claude Code uses bash-style tool calling: readfile, listdirectory, search_files, and direct shell execution (grep, find, cat). It doesn't use a vector database or semantic embeddings for code navigation. It reads files directly, uses exact-match search, and leverages its 200K token context window to reason over the results. The RAG vs. agentic retrieval comparison covers this distinction in more detail.

Why the “RAG Is Dead” Argument Has a Point

Every few months, someone posts that RAG is dead. The context window got too large. The agents got too smart. Vector databases are overkill. And lately, there’s actual evidence behind the claim — at least for one category of AI system.

AI coding agents have largely moved away from traditional RAG (Retrieval-Augmented Generation) workflows. Claude Code doesn’t spin up a vector database before answering questions about your codebase. Aider doesn’t embed your Python files into Pinecone. GitHub Copilot doesn’t run semantic similarity search to figure out what functions are in a file.

But that doesn’t mean RAG is dead. It means it was being used for the wrong job.

Coding agents and document-search systems have fundamentally different retrieval needs. Understanding that distinction tells you more about modern AI architecture than any “X is dead” headline. This article explains what coding agents actually use instead of vector databases, why the switch happened, and where RAG still clearly wins.

What RAG Was Built to Solve

RAG emerged as a practical solution to a real constraint: LLMs have a knowledge cutoff and a finite context window. You couldn’t ask an early GPT model about your internal documentation or company-specific knowledge — that information simply wasn’t in its training data.

The fix was conceptually straightforward:

Take your documents and split them into chunks
Convert each chunk into a vector embedding
Store the vectors in a database (Pinecone, Weaviate, Chroma, pgvector, etc.)
At query time, embed the user’s question, find the most similar chunks, and inject them into the model’s context

For unstructured text — knowledge bases, PDFs, support docs, research papers — this works well. When someone asks “what’s our refund policy?”, semantic similarity search surfaces the right paragraph even if the user’s phrasing doesn’t match the document’s exact words.

The problem came when developers started applying this same pattern to codebases. It was the right mechanism applied to the wrong problem.

Why Vector Search Is a Poor Fit for Code Navigation

Code is not prose. It has rigid, parseable structure that prose doesn’t — and the retrieval questions you face with code are fundamentally different from document search.

When a coding agent navigates a codebase, the questions it’s trying to answer look like this:

Where is the AuthService class defined?
What files call processPayment()?
What does utils/helpers.py import?
Where does this variable get mutated?

These aren’t semantic questions. They’re structural and exact. The answer to “where is AuthService defined?” isn’t the file that’s most semantically similar to the concept of authentication. It’s the file that literally contains class AuthService.

Vector embeddings are optimized for fuzzy matching — finding things that mean the same thing even when the words differ. For code navigation, you almost never want fuzzy matching. You want precision. A function definition either exists or it doesn’t.

The Index Freshness Problem

There’s a practical issue with vector-based code retrieval as well: code changes constantly.

Every time a developer edits a file, the embeddings representing that file become stale. Keeping a vector index current for an actively developed codebase requires continuous re-embedding — expensive and prone to lag. File system reads, by contrast, are always real-time.

What AI Coding Agents Actually Use Instead

Modern coding agents have converged on a set of retrieval strategies that work much better for code than vector databases. Here’s what they actually do.

Exact-Match Search: Grep and Ripgrep

The most fundamental tool is exact text search — the same grep that Unix developers have used for decades.

When Claude Code wants to find where a function is called, it runs a grep across the repository. When it needs all files that import a specific module, it searches for the import statement. This is fast, always accurate, and requires zero indexing overhead.

Modern agents use tools like ripgrep for speed on large codebases, but the principle is the same: exact string matching, not semantic similarity. Claude Code is built around bash-style tool use — read_file, list_directory, search_files, and direct shell execution. No embeddings required.

File System Traversal

Coding agents rely heavily on understanding the structure of a codebase, not just its content.

Directory listing tools give agents a map of what exists: how files are organized, what naming conventions are in use, which directories contain what. A well-structured codebase communicates a lot through its organization alone — src/auth/login.py tells you something meaningful before you ever read it.

Knowing the file tree is often enough to answer basic navigation questions without reading any file content at all.

AST Parsing and Repo Maps

The most sophisticated approach is AST (Abstract Syntax Tree) parsing, where tools like tree-sitter become valuable.

An AST parser doesn’t treat code as text. It understands code as a structured language and can identify:

Every class definition and its methods
Every function signature and its parameters
Every import statement and its source
Every variable declaration and its scope

Hermes, walked through line by line — free 1-hour workshop

Aider, one of the more technically sophisticated open-source coding agents, uses what it calls a repo map built on tree-sitter. Instead of embedding code into vectors, it parses every file to extract function signatures, class definitions, and symbol names. The result is a compact, structured summary of the entire codebase — enough for the LLM to navigate intelligently without reading every file in full.

Aider’s repo map also uses a PageRank-style algorithm to prioritize the files and symbols most relevant to the current task, keeping the context focused rather than overwhelming the model with everything at once.

This is architecturally different from RAG in a meaningful way: it exploits structure, not semantic similarity.

The Context Window Shift

Perhaps the biggest factor reducing RAG’s role in code navigation is the growth of LLM context windows.

Claude 3.5 Sonnet handles 200,000 tokens. GPT-4o handles 128,000. Gemini 1.5 Pro reaches 1 million. A typical mid-sized codebase might contain 50,000–150,000 tokens of code. When the context window is large enough, you can load the entire codebase and let the model reason over all of it directly.

This is sometimes called context stuffing — not always elegant, but often effective for smaller codebases. For larger ones, agents combine targeted file reads (loading only the files likely to be relevant) with a high-level structural summary of everything else. The LLM gets a focused but accurate picture of the codebase without a retrieval layer between it and the information.

To understand more about how context windows affect agent reasoning, this overview of how large language models handle context is useful background.

RAG vs. File Search: A Direct Comparison

Here’s how the two approaches compare across factors that matter for real implementations:

Factor	RAG (Vector Search)	File Search / AST Parsing
Best for	Unstructured documents	Structured code
Search type	Semantic (fuzzy match)	Exact or structural
Setup complexity	High (embed, index, host)	Low (grep, tree-sitter)
Index freshness	Requires continuous re-indexing	Always current
Handles code structure	No	Yes
Natural language queries	Excellent	Limited
Latency	Higher (embed query + search + fetch)	Lower (direct reads)
Works with large documents	Yes — designed for this	Impractical for large prose
Cost	Embedding + storage	Minimal

Neither approach is universally better. They solve different problems.

Where RAG Still Wins

Establishing that coding agents prefer file search over vector RAG doesn’t make RAG irrelevant. There are clear categories where it remains the right architecture.

Large, Unstructured Knowledge Bases

If you’re building a system that answers questions from a library of PDFs, product documentation, legal contracts, or customer support tickets, RAG is exactly the right approach. These documents don’t have the rigid structure that AST parsing can exploit. Semantic similarity search is the practical mechanism for finding relevant content.

This is what RAG was designed for, and it still handles it well. You can learn more about building RAG-based knowledge bases on the MindStudio blog.

Cross-Domain Search

When users need to search across a mix of sources — code, documentation, Jira tickets, Slack messages — structural parsing breaks down. The sources are too heterogeneous for a single structural approach.

RAG provides a uniform retrieval layer that handles this diversity. You embed everything into the same vector space and query across all of it regardless of content type.

Natural Language Queries Against Code

Here’s where it gets nuanced: even in coding contexts, embeddings help for specific tasks.

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

If a user asks “show me all the places where we validate user input” or “find code related to our billing flow,” that’s a semantic query. The user doesn’t know the exact function names. Embedding-based search can surface relevant code that grep would miss.

Some tools, including Cursor, use embeddings as a supplement to structural search rather than as the primary mechanism. The architectural question isn’t “RAG or no RAG” — it’s where in the stack semantic search belongs.

Multi-Tenant Applications

In SaaS systems where different users should only access their own data, RAG infrastructure (vector databases with metadata filtering) handles permission scoping cleanly. Scoping vector search by user ID or organization is well-supported by databases like Pinecone, Weaviate, and Qdrant.

File system traversal doesn’t map as naturally to this model.

How MindStudio Fits Into This

If you’re building AI-powered workflows or agents — not a coding agent, but one that handles documents, answers questions, or processes business data — the RAG vs. file search distinction matters for how you architect retrieval.

MindStudio’s visual workflow builder lets you construct retrieval pipelines without managing embedding infrastructure yourself. You can connect knowledge bases for document retrieval, build structured query workflows against databases and APIs, or combine both approaches in a single agent — matching the retrieval strategy to what the data actually is.

For developers building agentic systems, MindStudio’s Agent Skills Plugin (@mindstudio-ai/agent) gives coding agents like Claude Code access to 120+ typed capabilities as simple method calls: agent.searchGoogle(), agent.runWorkflow(), structured database lookups, and more. The plugin handles infrastructure concerns like rate limiting, retries, and auth — so the agent focuses on reasoning, not plumbing.

The platform also supports building multi-step AI workflows where retrieval is one component among many — combining knowledge search with external API calls, data transformations, and action steps in a single flow.

Choosing the right retrieval strategy is a per-use-case decision, not a one-time architectural commitment. You can try MindStudio free at mindstudio.ai.

Frequently Asked Questions

Is RAG actually dead in 2025?

No. RAG isn’t dead — it’s being used more precisely. The “RAG is dead” claim is mostly directed at code navigation, where file system search and AST parsing outperform vector retrieval. For document Q&A, knowledge management, and applications dealing with large volumes of unstructured text, RAG remains the standard approach and continues to improve with better chunking strategies, reranking, and hybrid retrieval techniques.

What retrieval method does Claude Code use?

Claude Code uses bash-style tool calling: read_file, list_directory, search_files, and direct shell execution (grep, find, cat). It doesn’t use a vector database or semantic embeddings for code navigation. It reads files directly, uses exact-match search, and leverages its 200K token context window to reason over the results. The RAG vs. agentic retrieval comparison covers this distinction in more detail.

Does Cursor use RAG or vector embeddings?

Cursor uses a combination of approaches. Its codebase indexing feature does use embeddings, but they function as a supplementary layer for natural language queries — not as the sole retrieval mechanism. The primary retrieval relies on tree-sitter-based parsing and file system search. Cursor’s architecture is probably the most accurate representation of the hybrid model: structural search as the foundation, semantic search as a fallback.

When should I use RAG over file search for an AI project?

Use RAG when you’re working with unstructured documents — PDFs, articles, support tickets — when users query with natural language and don’t know the exact terms to search for, or when you need to search across heterogeneous data sources in a unified way. Use file search and structural parsing when your data is code or has a rigid schema, when precision matters more than recall, and when your data changes frequently enough that re-indexing would be a constant overhead.

What is a repo map and how is it different from RAG?

A repo map, as used in tools like Aider, is a compact structural representation of a codebase built from AST parsing. Instead of embedding code into vector space for similarity search, it extracts symbols, function signatures, class definitions, and file relationships into a structured summary. The LLM uses this map to navigate the codebase intelligently without reading everything in full. Unlike RAG, it relies on structural understanding of code — not semantic similarity of text.

Can you combine RAG with file search in the same agent?

Yes, and this is often the most practical architecture for complex agents. A hybrid approach might use file search and AST parsing as the primary code navigation mechanism, while using RAG for searching related documentation, commit messages, or issue tracker content. The retrieval strategy is matched to the data type rather than applied uniformly. Building this kind of hybrid pipeline is one area where a platform like MindStudio reduces the implementation overhead significantly.

Key Takeaways

The “RAG is dead” conversation is useful when it’s specific and misleading when it’s sweeping. Here’s what actually matters:

Coding agents moved away from vector RAG because code navigation requires structural precision, not semantic similarity. File search, grep, and AST parsing are more accurate and require less infrastructure for code-specific tasks.
Context window growth changed the calculus. When you can fit a codebase into 200K tokens, retrieval becomes less about smart selection and more about targeted reads.
RAG is not dead. For unstructured documents, large knowledge bases, and natural language search against prose content, it remains the right architecture.
The best retrieval approach depends on data type. Code → structural parsing. Prose → semantic search. Mixed sources → hybrid.
Modern production agents use retrieval strategies selectively, not as a fixed architectural identity. The answer to “RAG or file search?” is almost always “depends on what you’re retrieving.”

If you’re building agents that need to retrieve and reason across multiple data types without rebuilding infrastructure from scratch, MindStudio gives you the tools to get retrieval right — matching the approach to the data, not the other way around.