What Is Milvus? The Open-Source Vector Database for AI Agent Memory

Why Vector Databases Are Central to Modern AI Agents

AI agents are only as useful as their memory. Without a reliable way to store and retrieve relevant context, even the most sophisticated language model will hallucinate, repeat itself, or miss critical information that was available two steps ago.

That’s where vector databases come in — and Milvus is one of the most capable options available today. Built specifically for high-dimensional vector search, Milvus is an open-source vector database that can handle billions of records while returning results in milliseconds. If you’re building RAG pipelines, semantic search, or any kind of AI agent memory system, it’s worth understanding what Milvus does, how it works, and when it makes sense to use it.

This article covers the core concepts, architecture, practical use cases, and how tools like MindStudio make it easier to put a vector store like Milvus to work without infrastructure headaches.

What Milvus Actually Is

Milvus is an open-source vector database purpose-built for storing and searching high-dimensional embeddings. It was originally developed by Zilliz and released in 2019, and it’s now a graduated project under the Cloud Native Computing Foundation (CNCF) — the same foundation that hosts Kubernetes and Prometheus.

The core idea is simple: AI models like OpenAI’s text-embedding models or Sentence Transformers convert raw data (text, images, audio) into numerical vectors — long lists of floating-point numbers that represent semantic meaning. Milvus stores those vectors and lets you search them efficiently by similarity, not by exact match.

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

WHAT REMY DOESN'T HAVE TO BUILD

200+

AI MODELS

GPT · Claude · Gemini · Llama

✓

1,000+

INTEGRATIONS

Slack · Stripe · Notion · HubSpot

✓

MANAGED DB

AUTH

PAYMENTS

CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

This is fundamentally different from a traditional database. When you search in SQL, you’re looking for rows where a column equals a value. When you search in Milvus, you’re asking: which stored vectors are most similar to this query vector? The answer reveals semantically related content, even if the exact words or pixels are completely different.

The Problem Milvus Solves

Standard databases weren’t designed for this. Running cosine similarity or Euclidean distance across millions of high-dimensional vectors naively is brutally slow. Milvus solves this with purpose-built indexing algorithms that make approximate nearest neighbor (ANN) search fast and scalable.

It’s not the only vector database on the market — Pinecone, Weaviate, Qdrant, and Chroma are all in the same category — but Milvus stands out for its ability to scale to billions of vectors while keeping query latency low, and for being fully open-source with an active community.

How Milvus Works: Core Concepts

You don’t need to understand every internal detail to use Milvus effectively, but knowing the key concepts will help you make better architectural decisions.

Collections, Fields, and Schemas

Milvus organizes data into collections, which are roughly analogous to tables in a relational database. Each collection has a schema that defines its fields. At minimum, a collection needs:

A primary key field (auto-generated or user-defined)
At least one vector field to store your embeddings
Optional scalar fields (strings, integers, floats, booleans) for metadata

For example, a collection for a document store might have a vector field for the embedded text, plus scalar fields for document ID, source URL, creation date, and category.

Indexing

Raw vector search without an index scans every vector on every query — that’s fine for small datasets but completely unworkable at scale. Milvus supports several indexing strategies:

FLAT — Exact brute-force search. Perfect accuracy, slow on large datasets.
IVF_FLAT / IVF_SQ8 / IVF_PQ — Inverted file index variants. Good balance of speed and accuracy for medium-to-large datasets.
HNSW — Hierarchical Navigable Small World graphs. High recall, fast queries, higher memory usage. Very popular for production workloads.
DiskANN — Index stored on disk rather than RAM, enabling massive-scale datasets with lower memory costs.

For sparse vectors (used in hybrid search with BM25-style keyword weighting), Milvus supports sparse indexing as well.

Distance Metrics

When comparing vectors, Milvus supports multiple similarity metrics:

Cosine similarity — Measures the angle between vectors. Best for text embeddings where direction matters more than magnitude.
L2 (Euclidean distance) — Measures straight-line distance in vector space.
Inner product (IP) — Efficient for normalized vectors where inner product equals cosine similarity.

Choosing the right metric depends on how your embeddings were trained. Most text embedding models are optimized for cosine or inner product.

Hybrid Search

One of Milvus’s more powerful features is hybrid search, which combines vector similarity search with scalar field filtering. Instead of retrieving the 10 most semantically similar documents from your entire corpus, you can retrieve the 10 most similar documents where category = “legal” AND date > 2024-01-01. This dramatically improves result relevance in real-world applications.

Milvus Architecture: Built for Scale

Milvus 2.x was redesigned from scratch as a cloud-native, distributed system. The architecture separates storage from compute, which means you can scale each layer independently.

Key Components

Proxy — Handles client connections and request routing.
Query nodes — Execute vector search and scalar filtering.
Data nodes — Manage data ingestion and segment operations.
Index nodes — Build and maintain indexes.
Root coordinator, data coordinator, query coordinator — Manage cluster topology and task scheduling.

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

Under the hood, Milvus uses object storage (S3, MinIO, or Azure Blob) for persistent data and a message queue (Pulsar or Kafka) for write-ahead logging and streaming.

This separation of concerns means Milvus can handle high-throughput writes and low-latency reads simultaneously without either bottlenecking the other. It’s why Milvus is often the choice when teams expect to grow from millions to billions of vectors.

Deployment Options

You have several ways to run Milvus:

Milvus Lite — A lightweight Python package that runs in-process. Great for local development, notebooks, and prototyping. No Docker or Kubernetes required.
Milvus Standalone — A single-node deployment via Docker. Suitable for small production workloads or testing.
Milvus Distributed — The full distributed deployment on Kubernetes. Designed for production at scale.
Zilliz Cloud — A fully managed cloud service built on Milvus. Removes the operational overhead of running the database yourself.

For most teams just starting out with AI agent memory or RAG, Milvus Lite or Standalone is the right starting point.

Practical Use Cases for Milvus

Milvus is general-purpose infrastructure, but it shows up most often in a handful of scenarios.

Retrieval-Augmented Generation (RAG)

RAG is the most common use case for vector databases right now. The pattern works like this:

You chunk your source documents (PDFs, web pages, support tickets, etc.) into smaller pieces.
You embed each chunk using an embedding model and store the resulting vectors in Milvus.
At query time, you embed the user’s question and retrieve the most semantically similar chunks.
You pass those chunks as context to a language model, which generates a grounded, accurate response.

Milvus handles step 2 and 3 — storing embeddings efficiently and returning relevant results fast. The quality of your RAG pipeline depends heavily on how well your vector store scales and how accurately it retrieves relevant chunks under load.

For more on how RAG pipelines work end to end, see our breakdown of retrieval-augmented generation.

AI Agent Long-Term Memory

Autonomous AI agents need memory to function effectively across multiple sessions. Without persistence, every conversation starts from scratch. Milvus enables agents to:

Store past interactions as embedded memories
Retrieve relevant past context based on semantic similarity to the current task
Avoid re-asking for information the user already provided

This is particularly valuable for customer-facing agents, personal assistants, and any workflow where continuity matters. Vector search lets the agent retrieve what’s relevant, not just what’s recent — which is often a critical distinction.

Learn more about how AI agent memory systems work in production.

Semantic Search

Traditional keyword search breaks down when users don’t know the exact words used in a document. Semantic search using vector embeddings finds conceptually related results even when the vocabulary is completely different. Milvus powers semantic search across:

Internal knowledge bases and intranets
E-commerce product catalogs
Legal and compliance document libraries
Customer support ticket systems

Recommendation Systems

Recommendation engines are fundamentally a similarity problem: given a user’s history or a product they liked, find the most similar other items. Milvus handles this at scale. Companies have used it to power real-time recommendations across hundreds of millions of items.

Multimodal Search

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

Milvus isn’t limited to text. You can store embeddings from image models, audio models, or multimodal models like CLIP, enabling cross-modal search — for example, finding images that match a text description, or finding similar products from a photo upload.

Milvus vs. Other Vector Databases

Milvus competes with several other options, each with different trade-offs.

Feature	Milvus	Pinecone	Weaviate	Chroma
Open source	✅	❌	✅	✅
Managed cloud option	✅ (Zilliz)	✅	✅	Limited
Scale (vectors)	Billions	Hundreds of millions	Hundreds of millions	Millions
Hybrid search	✅	Limited	✅	Limited
Self-hosted	✅	❌	✅	✅
Best for	Large-scale production	Simplicity/managed	Knowledge graphs	Local/dev

Pinecone wins on simplicity — it’s fully managed with no infrastructure to think about, but you give up control and the cost scales quickly. Chroma is excellent for local development and quick prototypes. Weaviate offers strong built-in vectorization and schema management.

Milvus wins when you need to scale large, want full control over your infrastructure, need advanced indexing options, or expect to run workloads that would be cost-prohibitive on a fully managed SaaS vector database.

Getting Started with Milvus

The quickest way to try Milvus is through Milvus Lite, which runs directly in Python with no external services.

Install and Connect

pip install pymilvus

from pymilvus import MilvusClient

client = MilvusClient("milvus_demo.db")

That single line spins up a local Milvus instance backed by a file. For production, you’d replace the file path with a connection string to your Milvus server or Zilliz Cloud endpoint.

Create a Collection

client.create_collection(
    collection_name="documents",
    dimension=1536  # OpenAI text-embedding-3-small output size
)

Insert Vectors

data = [
    {"id": 1, "vector": embedding_1, "text": "source text here"},
    {"id": 2, "vector": embedding_2, "text": "another document"},
]

client.insert(collection_name="documents", data=data)

Search

results = client.search(
    collection_name="documents",
    data=[query_embedding],
    limit=5,
    output_fields=["text"]
)

That’s the core loop. In practice, you’d add metadata fields, configure indexes for larger collections, and integrate this into your agent’s retrieval step. But the fundamental pattern doesn’t change much regardless of scale.

Building AI Agents with Vector Memory on MindStudio

If you’re building AI agents and want to work with vector databases without managing all the infrastructure yourself, MindStudio is worth looking at.

MindStudio is a no-code platform for building and deploying AI agents. It supports 200+ AI models out of the box and includes 1,000+ integrations — including connections to vector stores and external databases that let you build RAG-powered agents without writing backend code.

The practical value for Milvus users: you can configure an agent in MindStudio that calls out to your Milvus instance (or a Zilliz Cloud endpoint) for retrieval, passes the results to a language model, and returns a grounded response — all wired together in a visual builder. The average agent build takes 15 minutes to an hour.

For teams that want agent memory without standing up their own Milvus deployment, MindStudio’s built-in data store handles vector storage natively, so you don’t necessarily need Milvus at all to get semantic retrieval working. But if you’re already running Milvus at scale and want to layer agent logic on top of it, MindStudio’s webhook and API endpoint agents can connect directly to your existing infrastructure.

You can try MindStudio free at mindstudio.ai. Paid plans start at $20/month.

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY

✓Designed the data model

✓Picked an auth scheme — sessions + RBAC

✓Wired up Stripe checkout

✓Deployed to production

Live at yourapp.msagent.ai

For a broader look at how to wire up retrieval and generation in an agent workflow, see how to build AI agents with RAG on MindStudio.

Frequently Asked Questions

What is Milvus used for?

Milvus is used to store and search high-dimensional vector embeddings. The most common applications are RAG pipelines (where it stores embedded document chunks for retrieval), AI agent memory systems (where it stores past interactions for semantic recall), semantic search, and recommendation engines. It’s also used for multimodal search across images, audio, and text.

Is Milvus free to use?

Yes. Milvus is fully open-source under the Apache 2.0 license, which means you can run it yourself at no cost. Zilliz, the company behind Milvus, also offers Zilliz Cloud — a managed hosted version with a free tier for small workloads and paid plans for larger-scale use.

How does Milvus differ from a traditional database?

Traditional relational databases search by exact or range matches on structured fields. Milvus searches by vector similarity — meaning it can find the most semantically similar items in your dataset even when no exact match exists. This makes it fundamentally different from PostgreSQL, MySQL, or even document stores like MongoDB. (That said, some traditional databases now have vector extensions — pgvector for PostgreSQL is a popular option for smaller-scale workloads.)

How many vectors can Milvus handle?

Milvus is designed to scale to billions of vectors in its distributed deployment mode. The practical upper bound depends on your hardware, indexing strategy, and query patterns. Zilliz has published benchmarks showing Milvus handling over a billion 128-dimensional vectors with sub-second query latency on appropriate hardware configurations.

What embedding models work with Milvus?

Milvus is model-agnostic — it stores any dense or sparse vector, regardless of what model produced it. Popular choices include OpenAI’s text-embedding-3-small and text-embedding-3-large, Cohere’s embedding models, and open-source models like BGE, E5, and all-MiniLM via Sentence Transformers. The only constraint is that your stored vectors and query vectors must use the same model and have the same dimension.

What’s the difference between Milvus and Zilliz Cloud?

Milvus is the open-source software you run yourself. Zilliz Cloud is a fully managed service built on Milvus that removes the need to handle deployment, scaling, backups, and upgrades. Both support the same pymilvus SDK, so switching between them doesn’t require code changes — just a different connection string.

Key Takeaways

Milvus is an open-source vector database built for similarity search at scale, handling billions of vectors with low-latency queries.
It stores and indexes high-dimensional embeddings produced by AI models, enabling semantic search rather than exact keyword matching.
Core use cases include RAG pipelines, AI agent long-term memory, semantic search, and recommendation systems.
Milvus supports flexible deployment: local (Milvus Lite), self-hosted (Standalone and Distributed), or fully managed (Zilliz Cloud).
For teams building AI agents without deep infrastructure investment, platforms like MindStudio make it possible to combine vector retrieval, language model inference, and business tool integrations in a single no-code environment — with or without a dedicated Milvus deployment.

If you’re serious about building AI agents that remember context, retrieve relevant information reliably, and scale to real production workloads, a vector database like Milvus is one of the most important pieces of infrastructure to understand.