Skip to main content

Milvus

Milvus

Milvus is an open-source vector database built for AI applications, enabling efficient similarity search and analytics on embedding vectors.

Overview

  • Versions: 2.6.3, 2.5.19, 2.4.11
  • Cluster Support: ✅ Yes (Distributed architecture)
  • Use Cases: Vector databases, AI embeddings, semantic search, RAG systems
  • Features: Similarity search, indexing, distributed architecture, multiple index types

Key Features

  • Billion-Scale Vector Search: Handle massive vector datasets efficiently
  • Multiple Index Types: IVF, HNSW, FLAT, ANNOY, and more for different use cases
  • Hybrid Search: Combine vector similarity with attribute filtering
  • GPU Acceleration: Leverage GPUs for faster indexing and search
  • Dynamic Schema: Flexible schema with collections and partitions
  • Multiple Distance Metrics: L2, IP (Inner Product), Cosine similarity
  • Time Travel: Query historical data states
  • Data Consistency: Tunable consistency levels
  • Scalar Filtering: Filter vectors by metadata attributes

Deployment Modes

Milvus supports both single-node and cluster deployments:

Standalone (Single Node)

  • Best for development and testing
  • Lower cost, simpler setup
  • All components in one container
  • Suitable for smaller datasets (less than 1M vectors)

Cluster (Distributed)

  • Recommended for production workloads
  • High availability and horizontal scaling
  • Distributed architecture with etcd and MinIO
  • Separate nodes for query, data, and index
  • 3-10 data nodes configurable
  • Handles billions of vectors

Resource Tiers

TierCPUMemoryDiskBest For
Small0.51GB10GBDevelopment, testing
Medium12GB25GBSmall production apps
Large24GB50GBProduction workloads
XLarge48GB100GBLarge-scale AI applications

Creating a Milvus Add-on

  1. Navigate to Add-onsCreate Add-on
  2. Select Milvus as the type
  3. Choose a version (2.6.3, 2.5.19, or 2.4.11)
  4. Select deployment mode:
    • Standalone: For development/testing
    • Cluster: For production (distributed architecture)
  5. Configure:
    • Label: Descriptive name (e.g., "Vector Database")
    • Description: Purpose and notes
    • Environment: Development or Production
    • Resource Tier: Based on your workload requirements
  6. Configure backups:
    • Schedule: Daily recommended for production
    • Retention: 7+ days for production
  7. Click Create Add-on

Connection Information

After deployment, connection details are available in the add-on details page and automatically injected into your apps via STRONGLY_SERVICES.

Connection Format

Host: milvus-host
Port: 19530 (gRPC)

Accessing Connection Details

import os
import json
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType

# Parse STRONGLY_SERVICES
services = json.loads(os.environ.get('STRONGLY_SERVICES', '{}'))

# Get Milvus add-on connection
milvus_addon = services['addons']['addon-id']

# Connect
connections.connect(
alias='default',
host=milvus_addon['host'],
port=milvus_addon['port'],
user=milvus_addon.get('username', ''),
password=milvus_addon.get('password', '')
)

# Verify connection
print("Connected to Milvus")

# Disconnect when done
connections.disconnect('default')

Core Concepts

Collections

Collections are similar to tables in relational databases, storing vectors and metadata.

from pymilvus import Collection, FieldSchema, CollectionSchema, DataType

# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128),
FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=1000),
FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=100)
]

schema = CollectionSchema(fields, description="Document embeddings")

# Create collection
collection = Collection(name="documents", schema=schema)

print(f"Collection created: {collection.name}")

Vectors and Fields

Vectors represent your embeddings, and fields store metadata.

# Insert data
data = [
# embeddings (128-dimensional vectors)
[[0.1] * 128, [0.2] * 128, [0.3] * 128],
# text metadata
["Document 1", "Document 2", "Document 3"],
# category metadata
["tech", "science", "tech"]
]

collection.insert(data)
collection.flush()

print(f"Inserted {collection.num_entities} entities")

Indexes

Indexes accelerate vector similarity search.

# Create IVF_FLAT index
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "L2",
"params": {"nlist": 128}
}

collection.create_index(
field_name="embedding",
index_params=index_params
)

print("Index created")

Common Operations

Creating a Collection

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType

connections.connect(host='host', port=19530)

# Define schema with multiple field types
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=500),
FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=5000),
FieldSchema(name="timestamp", dtype=DataType.INT64),
FieldSchema(name="score", dtype=DataType.FLOAT)
]

schema = CollectionSchema(
fields,
description="Article embeddings",
enable_dynamic_field=True # Allow dynamic fields
)

collection = Collection(name="articles", schema=schema)

Inserting Vectors

# Prepare data
import numpy as np

num_entities = 1000
embeddings = np.random.random((num_entities, 768)).tolist()
ids = list(range(num_entities))
titles = [f"Article {i}" for i in range(num_entities)]
contents = [f"Content of article {i}" for i in range(num_entities)]
timestamps = [1234567890 + i for i in range(num_entities)]
scores = np.random.random(num_entities).tolist()

# Insert
data = [ids, embeddings, titles, contents, timestamps, scores]
collection.insert(data)
collection.flush()

print(f"Total entities: {collection.num_entities}")

Creating Indexes

Different index types for different use cases:

# IVF_FLAT: Good balance of speed and accuracy
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "L2",
"params": {"nlist": 1024}
}

# HNSW: High accuracy, memory intensive
index_params = {
"index_type": "HNSW",
"metric_type": "L2",
"params": {
"M": 16, # Max connections per layer
"efConstruction": 200 # Search quality during build
}
}

# IVF_PQ: Memory efficient, lower accuracy
index_params = {
"index_type": "IVF_PQ",
"metric_type": "L2",
"params": {
"nlist": 1024,
"m": 8, # PQ compression factor
"nbits": 8
}
}

collection.create_index(
field_name="embedding",
index_params=index_params
)

Searching Vectors

# Load collection to memory
collection.load()

# Prepare search vectors
search_vectors = [[0.1] * 768]

# Basic search
search_params = {
"metric_type": "L2",
"params": {"nprobe": 10} # Number of clusters to search
}

results = collection.search(
data=search_vectors,
anns_field="embedding",
param=search_params,
limit=10, # Top 10 results
output_fields=["title", "content", "score"]
)

for hits in results:
for hit in hits:
print(f"ID: {hit.id}, Distance: {hit.distance}, Title: {hit.entity.get('title')}")

Hybrid Search (Vector + Scalar Filtering)

# Search with metadata filtering
expr = "score > 0.5 and timestamp > 1234567890"

results = collection.search(
data=search_vectors,
anns_field="embedding",
param=search_params,
limit=10,
expr=expr, # Filter expression
output_fields=["title", "score", "timestamp"]
)

Querying by ID

# Query specific entities by ID
ids = [1, 5, 10, 15]
results = collection.query(
expr=f"id in {ids}",
output_fields=["id", "title", "content"]
)

for result in results:
print(result)

Deleting Vectors

# Delete by expression
expr = "id in [1, 2, 3]"
collection.delete(expr)

# Delete by range
expr = "timestamp < 1234567890"
collection.delete(expr)

Distance Metrics

Milvus supports multiple distance metrics:

  • L2 (Euclidean): sqrt(Σ(x-y)²) - Smaller is more similar
  • IP (Inner Product): Σ(x·y) - Larger is more similar
  • COSINE: 1 - (x·y)/(||x||·||y||) - Smaller is more similar
# L2 distance (Euclidean)
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "L2",
"params": {"nlist": 128}
}

# Inner Product
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "IP",
"params": {"nlist": 128}
}

# Cosine similarity
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "COSINE",
"params": {"nlist": 128}
}

Use Cases

from sentence_transformers import SentenceTransformer

# Initialize embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Encode documents
documents = [
"Machine learning is a subset of artificial intelligence",
"Deep learning uses neural networks with multiple layers",
"Python is a popular programming language"
]

embeddings = model.encode(documents)

# Insert into Milvus
data = [
list(range(len(documents))), # IDs
embeddings.tolist(), # Vectors
documents # Text
]
collection.insert(data)
collection.flush()

# Search with query
query = "What is AI and machine learning?"
query_vector = model.encode([query])

collection.load()
results = collection.search(
data=query_vector.tolist(),
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"nprobe": 10}},
limit=3,
output_fields=["text"]
)

for hits in results:
for hit in hits:
print(f"Distance: {hit.distance:.4f}, Text: {hit.entity.get('text')}")
from PIL import Image
import torch
from torchvision import transforms, models

# Load pre-trained model
model = models.resnet50(pretrained=True)
model.eval()

# Remove classification layer to get embeddings
model = torch.nn.Sequential(*(list(model.children())[:-1]))

# Image preprocessing
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

def get_image_embedding(image_path):
image = Image.open(image_path).convert('RGB')
image_tensor = transform(image).unsqueeze(0)

with torch.no_grad():
embedding = model(image_tensor)

return embedding.squeeze().numpy()

# Index images
image_paths = ["img1.jpg", "img2.jpg", "img3.jpg"]
embeddings = [get_image_embedding(path) for path in image_paths]

data = [
list(range(len(image_paths))),
embeddings,
image_paths
]
collection.insert(data)

# Search for similar images
query_embedding = get_image_embedding("query.jpg")
results = collection.search(
data=[query_embedding.tolist()],
anns_field="embedding",
param={"metric_type": "L2", "params": {"nprobe": 10}},
limit=5,
output_fields=["image_path"]
)

RAG (Retrieval-Augmented Generation)

from openai import OpenAI
from sentence_transformers import SentenceTransformer

# Initialize models
embedder = SentenceTransformer('all-MiniLM-L6-v2')
llm = OpenAI()

# Index knowledge base
knowledge_base = [
"The Eiffel Tower is located in Paris, France.",
"It was built in 1889 for the World's Fair.",
"The tower is 330 meters tall.",
]

embeddings = embedder.encode(knowledge_base)
data = [list(range(len(knowledge_base))), embeddings.tolist(), knowledge_base]
collection.insert(data)
collection.flush()
collection.load()

# RAG query
def rag_query(question):
# Retrieve relevant context
query_vector = embedder.encode([question])
results = collection.search(
data=query_vector.tolist(),
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"nprobe": 10}},
limit=3,
output_fields=["text"]
)

# Extract context
context = "\n".join([hit.entity.get('text') for hit in results[0]])

# Generate answer with LLM
response = llm.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "Answer based on the following context:\n" + context},
{"role": "user", "content": question}
]
)

return response.choices[0].message.content

# Usage
answer = rag_query("How tall is the Eiffel Tower?")
print(answer)

Partitions

Partitions divide a collection for better organization and query performance.

# Create partitions
collection.create_partition("2024")
collection.create_partition("2023")

# Insert into specific partition
collection.insert(data, partition_name="2024")

# Search in specific partition
results = collection.search(
data=search_vectors,
anns_field="embedding",
param=search_params,
limit=10,
partition_names=["2024"]
)

# List partitions
partitions = collection.partitions
for partition in partitions:
print(f"Partition: {partition.name}, Entities: {partition.num_entities}")

Backup & Restore

Milvus add-ons use milvus-backup for backups, creating compressed archives of vector data and metadata.

Backup Configuration

  • Tool: milvus-backup
  • Format: .tar.gz
  • Includes: Vector data + metadata
  • Storage: AWS S3 (s3://strongly-backups/backups/<addon-id>/)

Manual Backup

  1. Go to add-on details page
  2. Click Backup Now
  3. Monitor progress in job logs
  4. Backup saved as backup-YYYYMMDDHHMMSS.tar.gz

Scheduled Backups

Configure during add-on creation or in settings:

  • Daily backups: Recommended for production
  • Retention: 7-14 days minimum for production
  • Custom cron: For specific schedules

Restore Process

  1. Navigate to Backups tab
  2. Select backup from list
  3. Click Restore
  4. Confirm (add-on will stop temporarily)
  5. Data restored and add-on restarts
Data Loss

Restoring from backup replaces ALL current data. Create a current backup first if needed.

Performance Optimization

Index Selection

Choose index based on your use case:

Index TypeSpeedMemoryAccuracyUse Case
FLATSlowHigh100%Small datasets, highest accuracy needed
IVF_FLATFastMedium~99%General purpose
IVF_PQFastestLow~95%Large datasets, memory constrained
HNSWFastHigh~99%High QPS, low latency

Search Parameters

Tune search parameters for performance:

# Lower nprobe = faster but less accurate
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}

# Higher nprobe = slower but more accurate
search_params = {"metric_type": "L2", "params": {"nprobe": 64}}

# HNSW search parameters
search_params = {"metric_type": "L2", "params": {"ef": 64}} # Higher ef = better accuracy

Batch Operations

Batch inserts and searches for better throughput:

# Batch insert
batch_size = 1000
for i in range(0, len(all_data), batch_size):
batch = all_data[i:i + batch_size]
collection.insert(batch)

# Batch search
search_vectors = [[0.1] * 768 for _ in range(100)]
results = collection.search(
data=search_vectors,
anns_field="embedding",
param=search_params,
limit=10
)

Monitoring

Monitor your Milvus add-on through the Strongly platform:

  • CPU Usage: Track CPU utilization
  • Memory Usage: Monitor memory consumption (important for large indexes)
  • Disk Space: Watch storage usage
  • Query Latency: Search performance metrics
  • QPS (Queries Per Second): Throughput metrics

Collection Statistics

# Collection info
print(f"Entities: {collection.num_entities}")

# Index info
indexes = collection.indexes
for index in indexes:
print(f"Index: {index.field_name}, Type: {index.params}")

# Partition info
for partition in collection.partitions:
print(f"Partition: {partition.name}, Entities: {partition.num_entities}")

Best Practices

  1. Choose Right Index: Select index type based on dataset size and accuracy needs
  2. Batch Operations: Insert and search in batches for better performance
  3. Use Partitions: Organize data by time or category for faster queries
  4. Normalize Vectors: Normalize embeddings when using IP or COSINE metrics
  5. Monitor Memory: Large indexes require significant memory
  6. Tune Search Params: Balance nprobe/ef for speed vs accuracy
  7. Regular Backups: Enable daily backups for production
  8. Load Collections: Load collections to memory before searching
  9. Release Collections: Release unused collections to free memory
  10. Use Hybrid Search: Combine vector search with scalar filtering for better results
  11. Test Restore: Verify backup restoration works
  12. Scale Horizontally: Use cluster mode for large datasets

Troubleshooting

Connection Issues

from pymilvus import connections

try:
connections.connect(host='host', port=19530)
print("Connected successfully")
except Exception as e:
print(f"Connection failed: {e}")

Collection Not Found

from pymilvus import utility

# List all collections
collections = utility.list_collections()
print(f"Available collections: {collections}")

# Check if collection exists
exists = utility.has_collection("collection_name")
print(f"Collection exists: {exists}")

Out of Memory

# Release collection from memory
collection.release()

# Drop unused index
collection.drop_index()

# Use more memory-efficient index (IVF_PQ)
# Reduce nlist parameter
# Use partitions to limit search scope

Support

For issues or questions:

  • Check add-on logs in the Strongly dashboard
  • Review Milvus official documentation
  • Contact Strongly support through the platform