Milvus
Milvus is an open-source vector database built for AI applications, enabling efficient similarity search and analytics on embedding vectors.
Overview
- Versions: 2.6.3, 2.5.19, 2.4.11
- Cluster Support: ✅ Yes (Distributed architecture)
- Use Cases: Vector databases, AI embeddings, semantic search, RAG systems
- Features: Similarity search, indexing, distributed architecture, multiple index types
Key Features
- Billion-Scale Vector Search: Handle massive vector datasets efficiently
- Multiple Index Types: IVF, HNSW, FLAT, ANNOY, and more for different use cases
- Hybrid Search: Combine vector similarity with attribute filtering
- GPU Acceleration: Leverage GPUs for faster indexing and search
- Dynamic Schema: Flexible schema with collections and partitions
- Multiple Distance Metrics: L2, IP (Inner Product), Cosine similarity
- Time Travel: Query historical data states
- Data Consistency: Tunable consistency levels
- Scalar Filtering: Filter vectors by metadata attributes
Deployment Modes
Milvus supports both single-node and cluster deployments:
Standalone (Single Node)
- Best for development and testing
- Lower cost, simpler setup
- All components in one container
- Suitable for smaller datasets (less than 1M vectors)
Cluster (Distributed)
- Recommended for production workloads
- High availability and horizontal scaling
- Distributed architecture with etcd and MinIO
- Separate nodes for query, data, and index
- 3-10 data nodes configurable
- Handles billions of vectors
Resource Tiers
| Tier | CPU | Memory | Disk | Best For |
|---|---|---|---|---|
| Small | 0.5 | 1GB | 10GB | Development, testing |
| Medium | 1 | 2GB | 25GB | Small production apps |
| Large | 2 | 4GB | 50GB | Production workloads |
| XLarge | 4 | 8GB | 100GB | Large-scale AI applications |
Creating a Milvus Add-on
- Navigate to Add-ons → Create Add-on
- Select Milvus as the type
- Choose a version (2.6.3, 2.5.19, or 2.4.11)
- Select deployment mode:
- Standalone: For development/testing
- Cluster: For production (distributed architecture)
- Configure:
- Label: Descriptive name (e.g., "Vector Database")
- Description: Purpose and notes
- Environment: Development or Production
- Resource Tier: Based on your workload requirements
- Configure backups:
- Schedule: Daily recommended for production
- Retention: 7+ days for production
- Click Create Add-on
Connection Information
After deployment, connection details are available in the add-on details page and automatically injected into your apps via STRONGLY_SERVICES.
Connection Format
Host: milvus-host
Port: 19530 (gRPC)
Accessing Connection Details
- Python
- Node.js
- Go
import os
import json
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
# Parse STRONGLY_SERVICES
services = json.loads(os.environ.get('STRONGLY_SERVICES', '{}'))
# Get Milvus add-on connection
milvus_addon = services['addons']['addon-id']
# Connect
connections.connect(
alias='default',
host=milvus_addon['host'],
port=milvus_addon['port'],
user=milvus_addon.get('username', ''),
password=milvus_addon.get('password', '')
)
# Verify connection
print("Connected to Milvus")
# Disconnect when done
connections.disconnect('default')
const { MilvusClient } = require('@zilliz/milvus2-sdk-node');
// Parse STRONGLY_SERVICES
const services = JSON.parse(process.env.STRONGLY_SERVICES || '{}');
const milvusAddon = services.addons['addon-id'];
// Connect
const client = new MilvusClient({
address: `${milvusAddon.host}:${milvusAddon.port}`,
username: milvusAddon.username || '',
password: milvusAddon.password || ''
});
// Check connection
const health = await client.checkHealth();
console.log('Milvus health:', health);
package main
import (
"context"
"encoding/json"
"fmt"
"os"
"github.com/milvus-io/milvus-sdk-go/v2/client"
)
type Services struct {
Addons map[string]Addon `json:"addons"`
}
type Addon struct {
Host string `json:"host"`
Port int `json:"port"`
Username string `json:"username"`
Password string `json:"password"`
}
func main() {
var services Services
json.Unmarshal([]byte(os.Getenv("STRONGLY_SERVICES")), &services)
milvusAddon := services.Addons["addon-id"]
ctx := context.Background()
// Connect
c, err := client.NewGrpcClient(
ctx,
fmt.Sprintf("%s:%d", milvusAddon.Host, milvusAddon.Port),
)
if err != nil {
panic(err)
}
defer c.Close()
fmt.Println("Connected to Milvus")
}
Core Concepts
Collections
Collections are similar to tables in relational databases, storing vectors and metadata.
from pymilvus import Collection, FieldSchema, CollectionSchema, DataType
# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128),
FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=1000),
FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=100)
]
schema = CollectionSchema(fields, description="Document embeddings")
# Create collection
collection = Collection(name="documents", schema=schema)
print(f"Collection created: {collection.name}")
Vectors and Fields
Vectors represent your embeddings, and fields store metadata.
# Insert data
data = [
# embeddings (128-dimensional vectors)
[[0.1] * 128, [0.2] * 128, [0.3] * 128],
# text metadata
["Document 1", "Document 2", "Document 3"],
# category metadata
["tech", "science", "tech"]
]
collection.insert(data)
collection.flush()
print(f"Inserted {collection.num_entities} entities")
Indexes
Indexes accelerate vector similarity search.
# Create IVF_FLAT index
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "L2",
"params": {"nlist": 128}
}
collection.create_index(
field_name="embedding",
index_params=index_params
)
print("Index created")
Common Operations
Creating a Collection
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
connections.connect(host='host', port=19530)
# Define schema with multiple field types
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=500),
FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=5000),
FieldSchema(name="timestamp", dtype=DataType.INT64),
FieldSchema(name="score", dtype=DataType.FLOAT)
]
schema = CollectionSchema(
fields,
description="Article embeddings",
enable_dynamic_field=True # Allow dynamic fields
)
collection = Collection(name="articles", schema=schema)
Inserting Vectors
# Prepare data
import numpy as np
num_entities = 1000
embeddings = np.random.random((num_entities, 768)).tolist()
ids = list(range(num_entities))
titles = [f"Article {i}" for i in range(num_entities)]
contents = [f"Content of article {i}" for i in range(num_entities)]
timestamps = [1234567890 + i for i in range(num_entities)]
scores = np.random.random(num_entities).tolist()
# Insert
data = [ids, embeddings, titles, contents, timestamps, scores]
collection.insert(data)
collection.flush()
print(f"Total entities: {collection.num_entities}")
Creating Indexes
Different index types for different use cases:
# IVF_FLAT: Good balance of speed and accuracy
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "L2",
"params": {"nlist": 1024}
}
# HNSW: High accuracy, memory intensive
index_params = {
"index_type": "HNSW",
"metric_type": "L2",
"params": {
"M": 16, # Max connections per layer
"efConstruction": 200 # Search quality during build
}
}
# IVF_PQ: Memory efficient, lower accuracy
index_params = {
"index_type": "IVF_PQ",
"metric_type": "L2",
"params": {
"nlist": 1024,
"m": 8, # PQ compression factor
"nbits": 8
}
}
collection.create_index(
field_name="embedding",
index_params=index_params
)
Searching Vectors
# Load collection to memory
collection.load()
# Prepare search vectors
search_vectors = [[0.1] * 768]
# Basic search
search_params = {
"metric_type": "L2",
"params": {"nprobe": 10} # Number of clusters to search
}
results = collection.search(
data=search_vectors,
anns_field="embedding",
param=search_params,
limit=10, # Top 10 results
output_fields=["title", "content", "score"]
)
for hits in results:
for hit in hits:
print(f"ID: {hit.id}, Distance: {hit.distance}, Title: {hit.entity.get('title')}")
Hybrid Search (Vector + Scalar Filtering)
# Search with metadata filtering
expr = "score > 0.5 and timestamp > 1234567890"
results = collection.search(
data=search_vectors,
anns_field="embedding",
param=search_params,
limit=10,
expr=expr, # Filter expression
output_fields=["title", "score", "timestamp"]
)
Querying by ID
# Query specific entities by ID
ids = [1, 5, 10, 15]
results = collection.query(
expr=f"id in {ids}",
output_fields=["id", "title", "content"]
)
for result in results:
print(result)
Deleting Vectors
# Delete by expression
expr = "id in [1, 2, 3]"
collection.delete(expr)
# Delete by range
expr = "timestamp < 1234567890"
collection.delete(expr)
Distance Metrics
Milvus supports multiple distance metrics:
- L2 (Euclidean): sqrt(Σ(x-y)²) - Smaller is more similar
- IP (Inner Product): Σ(x·y) - Larger is more similar
- COSINE: 1 - (x·y)/(||x||·||y||) - Smaller is more similar
# L2 distance (Euclidean)
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "L2",
"params": {"nlist": 128}
}
# Inner Product
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "IP",
"params": {"nlist": 128}
}
# Cosine similarity
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "COSINE",
"params": {"nlist": 128}
}
Use Cases
Semantic Search
from sentence_transformers import SentenceTransformer
# Initialize embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Encode documents
documents = [
"Machine learning is a subset of artificial intelligence",
"Deep learning uses neural networks with multiple layers",
"Python is a popular programming language"
]
embeddings = model.encode(documents)
# Insert into Milvus
data = [
list(range(len(documents))), # IDs
embeddings.tolist(), # Vectors
documents # Text
]
collection.insert(data)
collection.flush()
# Search with query
query = "What is AI and machine learning?"
query_vector = model.encode([query])
collection.load()
results = collection.search(
data=query_vector.tolist(),
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"nprobe": 10}},
limit=3,
output_fields=["text"]
)
for hits in results:
for hit in hits:
print(f"Distance: {hit.distance:.4f}, Text: {hit.entity.get('text')}")
Image Similarity Search
from PIL import Image
import torch
from torchvision import transforms, models
# Load pre-trained model
model = models.resnet50(pretrained=True)
model.eval()
# Remove classification layer to get embeddings
model = torch.nn.Sequential(*(list(model.children())[:-1]))
# Image preprocessing
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
def get_image_embedding(image_path):
image = Image.open(image_path).convert('RGB')
image_tensor = transform(image).unsqueeze(0)
with torch.no_grad():
embedding = model(image_tensor)
return embedding.squeeze().numpy()
# Index images
image_paths = ["img1.jpg", "img2.jpg", "img3.jpg"]
embeddings = [get_image_embedding(path) for path in image_paths]
data = [
list(range(len(image_paths))),
embeddings,
image_paths
]
collection.insert(data)
# Search for similar images
query_embedding = get_image_embedding("query.jpg")
results = collection.search(
data=[query_embedding.tolist()],
anns_field="embedding",
param={"metric_type": "L2", "params": {"nprobe": 10}},
limit=5,
output_fields=["image_path"]
)
RAG (Retrieval-Augmented Generation)
from openai import OpenAI
from sentence_transformers import SentenceTransformer
# Initialize models
embedder = SentenceTransformer('all-MiniLM-L6-v2')
llm = OpenAI()
# Index knowledge base
knowledge_base = [
"The Eiffel Tower is located in Paris, France.",
"It was built in 1889 for the World's Fair.",
"The tower is 330 meters tall.",
]
embeddings = embedder.encode(knowledge_base)
data = [list(range(len(knowledge_base))), embeddings.tolist(), knowledge_base]
collection.insert(data)
collection.flush()
collection.load()
# RAG query
def rag_query(question):
# Retrieve relevant context
query_vector = embedder.encode([question])
results = collection.search(
data=query_vector.tolist(),
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"nprobe": 10}},
limit=3,
output_fields=["text"]
)
# Extract context
context = "\n".join([hit.entity.get('text') for hit in results[0]])
# Generate answer with LLM
response = llm.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "Answer based on the following context:\n" + context},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
# Usage
answer = rag_query("How tall is the Eiffel Tower?")
print(answer)
Partitions
Partitions divide a collection for better organization and query performance.
# Create partitions
collection.create_partition("2024")
collection.create_partition("2023")
# Insert into specific partition
collection.insert(data, partition_name="2024")
# Search in specific partition
results = collection.search(
data=search_vectors,
anns_field="embedding",
param=search_params,
limit=10,
partition_names=["2024"]
)
# List partitions
partitions = collection.partitions
for partition in partitions:
print(f"Partition: {partition.name}, Entities: {partition.num_entities}")
Backup & Restore
Milvus add-ons use milvus-backup for backups, creating compressed archives of vector data and metadata.
Backup Configuration
- Tool:
milvus-backup - Format:
.tar.gz - Includes: Vector data + metadata
- Storage: AWS S3 (
s3://strongly-backups/backups/<addon-id>/)
Manual Backup
- Go to add-on details page
- Click Backup Now
- Monitor progress in job logs
- Backup saved as
backup-YYYYMMDDHHMMSS.tar.gz
Scheduled Backups
Configure during add-on creation or in settings:
- Daily backups: Recommended for production
- Retention: 7-14 days minimum for production
- Custom cron: For specific schedules
Restore Process
- Navigate to Backups tab
- Select backup from list
- Click Restore
- Confirm (add-on will stop temporarily)
- Data restored and add-on restarts
Restoring from backup replaces ALL current data. Create a current backup first if needed.
Performance Optimization
Index Selection
Choose index based on your use case:
| Index Type | Speed | Memory | Accuracy | Use Case |
|---|---|---|---|---|
| FLAT | Slow | High | 100% | Small datasets, highest accuracy needed |
| IVF_FLAT | Fast | Medium | ~99% | General purpose |
| IVF_PQ | Fastest | Low | ~95% | Large datasets, memory constrained |
| HNSW | Fast | High | ~99% | High QPS, low latency |
Search Parameters
Tune search parameters for performance:
# Lower nprobe = faster but less accurate
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
# Higher nprobe = slower but more accurate
search_params = {"metric_type": "L2", "params": {"nprobe": 64}}
# HNSW search parameters
search_params = {"metric_type": "L2", "params": {"ef": 64}} # Higher ef = better accuracy
Batch Operations
Batch inserts and searches for better throughput:
# Batch insert
batch_size = 1000
for i in range(0, len(all_data), batch_size):
batch = all_data[i:i + batch_size]
collection.insert(batch)
# Batch search
search_vectors = [[0.1] * 768 for _ in range(100)]
results = collection.search(
data=search_vectors,
anns_field="embedding",
param=search_params,
limit=10
)
Monitoring
Monitor your Milvus add-on through the Strongly platform:
- CPU Usage: Track CPU utilization
- Memory Usage: Monitor memory consumption (important for large indexes)
- Disk Space: Watch storage usage
- Query Latency: Search performance metrics
- QPS (Queries Per Second): Throughput metrics
Collection Statistics
# Collection info
print(f"Entities: {collection.num_entities}")
# Index info
indexes = collection.indexes
for index in indexes:
print(f"Index: {index.field_name}, Type: {index.params}")
# Partition info
for partition in collection.partitions:
print(f"Partition: {partition.name}, Entities: {partition.num_entities}")
Best Practices
- Choose Right Index: Select index type based on dataset size and accuracy needs
- Batch Operations: Insert and search in batches for better performance
- Use Partitions: Organize data by time or category for faster queries
- Normalize Vectors: Normalize embeddings when using IP or COSINE metrics
- Monitor Memory: Large indexes require significant memory
- Tune Search Params: Balance nprobe/ef for speed vs accuracy
- Regular Backups: Enable daily backups for production
- Load Collections: Load collections to memory before searching
- Release Collections: Release unused collections to free memory
- Use Hybrid Search: Combine vector search with scalar filtering for better results
- Test Restore: Verify backup restoration works
- Scale Horizontally: Use cluster mode for large datasets
Troubleshooting
Connection Issues
from pymilvus import connections
try:
connections.connect(host='host', port=19530)
print("Connected successfully")
except Exception as e:
print(f"Connection failed: {e}")
Collection Not Found
from pymilvus import utility
# List all collections
collections = utility.list_collections()
print(f"Available collections: {collections}")
# Check if collection exists
exists = utility.has_collection("collection_name")
print(f"Collection exists: {exists}")
Out of Memory
# Release collection from memory
collection.release()
# Drop unused index
collection.drop_index()
# Use more memory-efficient index (IVF_PQ)
# Reduce nlist parameter
# Use partitions to limit search scope
Support
For issues or questions:
- Check add-on logs in the Strongly dashboard
- Review Milvus official documentation
- Contact Strongly support through the platform