Qdrant 101

Introduction
Qdrant is an open-source vector database designed for storing, searching, and managing high-dimensional vectors. It's particularly useful for AI applications like semantic search, recommendation systems, and RAG (Retrieval-Augmented Generation) implementations.
Why Qdrant?
High Performance: Written in Rust for optimal speed
Scalability: Handles millions of vectors efficiently
Rich Filtering: Complex metadata filtering capabilities
Multiple Metrics: Support for various similarity metrics
Easy Integration: REST API and multiple language clients
Key Terminologies
Core Concepts
Vector: A high-dimensional numerical representation of data (text, images, etc.)
# Example vector (384 dimensions)
vector = [0.1, -0.2, 0.3, ..., 0.5]
Embedding: The process of converting raw data into vector representations
# Text to vector embedding
text = "Artificial intelligence is transforming healthcare"
# After embedding: [0.023, -0.156, 0.891, ...]
Point: A vector combined with an optional payload (metadata) and unique ID
point = {
"id": 1,
"vector": [0.1, 0.2, 0.3, ...], # 1536-dim vector
"payload": {
"title": "AI in Medical Diagnosis",
"category": "healthcare",
"confidence": 0.95
}
}
Collection: A named set of vectors with the same dimensionality and distance metric
# Collections are like tables in traditional databases
collection_name = "documents"
Payload: Metadata associated with a vector for filtering and additional information
payload = {
"title": "AI in Healthcare",
"author": "John Doe",
"published_date": "2024-01-15",
"tags": ["AI", "healthcare", "machine learning"]
}
Distance Metric: Method to measure similarity between vectors
Cosine Distance: Measures angle between vectors
Range: 0 (identical) to 2 (opposite)
Best for: Text embeddings, when magnitude doesn't matter
Use case: Document similarity, semantic search
Euclidean Distance: Measures straight-line distance in space
Range: 0 (identical) to ∞
Best for: When both direction and magnitude matter
Use case: Image features, spatial data
Dot Product: Measures alignment and magnitude
Range: -∞ to +∞ (higher = more similar)
Best for: When you want to consider vector magnitude
Use case: Recommendation systems with user preferences
Advanced Concepts
Quantization: Technique to reduce memory usage by compressing vectors
Why: Reduces memory footprint by 2-8x, enables larger datasets
Trade-off: Slight accuracy loss for significant memory savings
Types: Scalar (INT8), Product (PQ), Binary
Indexing: Data structure optimization for faster search
HNSW: Hierarchical Navigable Small World graphs
Purpose: Trade memory for search speed
Configuration: Affects build time, memory usage, and search accuracy
Sharding: Distributing data across multiple nodes
Horizontal scaling: Split collection across multiple machines
Load balancing: Distribute query load
Fault tolerance: Replicas ensure availability
Replication: Creating copies for fault tolerance
Data safety: Multiple copies prevent data loss
Read scaling: Distribute read queries across replicas
Consistency: Strong or eventual consistency models
Installation & Setup
Docker Installation (Recommended)
Basic Setup
# Basic setup
docker run -p 6333:6333 qdrant/qdrant
# With persistent storage
docker run -p 6333:6333 -v $(pwd)/storage:/qdrant/storage qdrant/qdrant
Docker Compose for Production
# docker-compose.yml
version: '3.7'
services:
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
volumes:
- qdrant_storage:/qdrant/storage
environment:
- QDRANT__SERVICE__HTTP_PORT=6333
- QDRANT__STORAGE__STORAGE_PATH=/qdrant/storage
restart: unless-stopped
volumes:
qdrant_storage:
Python Client Installation
pip install qdrant-client openai python-dotenv
Basic Connection Setup
import os
import openai
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
# Setup
os.environ["OPENAI_API_KEY"] = "your-key-here"
openai.api_key = os.getenv("OPENAI_API_KEY")
# Connect to Qdrant
client = QdrantClient("localhost", port=6333)
# Test OpenAI embedding
def get_embedding(text):
response = openai.Embedding.create(
input=text,
model="text-embedding-ada-002"
)
return response['data'][0]['embedding']
# Test
test_embedding = get_embedding("Hello world")
print(f"Embedding dimension: {len(test_embedding)}") # Should be 1536
Core Concepts
Vector Similarity
Vector similarity is the foundation of semantic search. Let's explore how it works:
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Similar medical texts
text1 = "AI is revolutionizing medical diagnosis"
text2 = "Artificial intelligence transforms healthcare"
text3 = "The weather is sunny today"
vec1 = get_embedding(text1)
vec2 = get_embedding(text2)
vec3 = get_embedding(text3)
print(f"Medical similarity: {cosine_similarity(vec1, vec2):.3f}") # High
print(f"Different topic: {cosine_similarity(vec1, vec3):.3f}") # Low
Basic Operations
1. Create Collection
# Create collection for OpenAI embeddings
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=1536, # OpenAI ada-002 dimension
distance=Distance.COSINE
)
)
2. Insert Documents
from qdrant_client.models import PointStruct
import uuid
def insert_document(title, content, category):
# Generate embedding
text = f"{title}. {content}"
vector = get_embedding(text)
# Create point
point = PointStruct(
id=str(uuid.uuid4()),
vector=vector,
payload={
"title": title,
"content": content,
"category": category,
"word_count": len(content.split())
}
)
# Insert
client.upsert(
collection_name="documents",
points=[point]
)
return point.id
# Example usage
doc_id = insert_document(
"AI in Healthcare",
"Machine learning is transforming medical diagnosis...",
"technology"
)
3. Advanced Search with OpenAI Embeddings
def search_documents(query, limit=5):
# Generate query embedding
query_vector = get_embedding(query)
# Search
results = client.search(
collection_name="documents",
query_vector=query_vector,
limit=limit,
with_payload=True
)
# Format results
return [{
"title": r.payload["title"],
"score": r.score,
"category": r.payload["category"]
} for r in results]
# Search example
results = search_documents("artificial intelligence medicine")
for result in results:
print(f"{result['title']} - Score: {result['score']:.3f}")
4. Filter Search
from qdrant_client.models import Filter, FieldCondition, MatchValue
def search_with_filter(query, category, limit=5):
query_vector = get_embedding(query)
category_filter = Filter(
must=[
FieldCondition(
key="category",
match=MatchValue(value=category)
)
]
)
results = client.search(
collection_name="documents",
query_vector=query_vector,
query_filter=category_filter,
limit=limit
)
return results
# Search only in technology category
tech_results = search_with_filter("machine learning", "technology")
Performance Optimization
Quantization
Why Quantization? Quantization reduces memory usage by representing vectors with lower precision. This is crucial when dealing with millions of vectors.
Memory Savings:
Scalar (INT8): General purpose, 75% memory reduction, 1-3% accuracy loss
Product: Maximum compression, 87-95% reduction, 5-15% accuracy loss
Binary: Extreme compression, 96% reduction, 20-40% accuracy loss
from qdrant_client.models import ScalarQuantization, QuantizationType
# Apply INT8 quantization (recommended for most cases)
client.update_collection(
collection_name="documents",
quantization_config=ScalarQuantization(
scalar=models.ScalarQuantizationConfig(
type=QuantizationType.INT8,
quantile=0.99,
always_ram=True
)
)
)
Indexing Optimization
# Optimize for search speed
client.update_collection(
collection_name="documents",
hnsw_config=models.HnswConfigDiff(
m=32, # Higher = better quality, more memory
ef_construct=200, # Higher = better quality, slower build
full_scan_threshold=10000
)
)
Advanced Operations
Complex Filtering
from qdrant_client.models import Range, MatchAny
# Complex filter: technology OR healthcare, with substantial content
complex_filter = Filter(
must=[
FieldCondition(key="category", match=MatchAny(any=["technology", "healthcare"])),
FieldCondition(key="word_count", range=Range(gte=100))
],
must_not=[
FieldCondition(key="status", match=MatchValue(value="draft"))
]
)
results = client.search(
collection_name="documents",
query_vector=query_vector,
query_filter=complex_filter,
limit=10
)
Batch Operations
def batch_insert_documents(documents, batch_size=100):
points = []
for doc in documents:
text = f"{doc['title']}. {doc['content']}"
vector = get_embedding(text)
points.append(PointStruct(
id=str(uuid.uuid4()),
vector=vector,
payload=doc
))
# Insert in batches
for i in range(0, len(points), batch_size):
batch = points[i:i + batch_size]
client.upsert(collection_name="documents", points=batch)
print(f"Inserted batch {i//batch_size + 1}")
# Usage
sample_docs = [
{"title": "Doc 1", "content": "Content 1...", "category": "tech"},
{"title": "Doc 2", "content": "Content 2...", "category": "health"}
]
batch_insert_documents(sample_docs)
Best Practices
1. Embedding Generation
class EmbeddingManager:
def __init__(self):
self.model = "text-embedding-ada-002"
def preprocess_text(self, text):
# Clean and normalize text
text = text.strip()
text = ' '.join(text.split()) # Normalize whitespace
# Truncate if too long (8191 tokens max for ada-002)
max_chars = 8191 * 4 # ~4 chars per token
if len(text) > max_chars:
text = text[:max_chars]
return text
def get_embedding(self, text):
processed_text = self.preprocess_text(text)
response = openai.Embedding.create(
input=processed_text,
model=self.model
)
return response['data'][0]['embedding']
def optimize_for_search(self, title, content):
# Weight title more heavily
return f"{title}. {title}. {content}"
2. Error Handling
import time
from functools import wraps
def retry_on_failure(max_retries=3):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_retries - 1:
raise e
wait_time = 2 ** attempt
print(f"Attempt {attempt + 1} failed, retrying in {wait_time}s...")
time.sleep(wait_time)
return wrapper
return decorator
@retry_on_failure(max_retries=3)
def safe_upsert(collection_name, points):
return client.upsert(collection_name=collection_name, points=points)
3. Collection Design
def create_optimized_collection(name, expected_size):
if expected_size < 10000:
# Small collection - no quantization needed
config = VectorParams(size=1536, distance=Distance.COSINE)
quantization = None
elif expected_size < 1000000:
# Medium collection - use scalar quantization
config = VectorParams(size=1536, distance=Distance.COSINE)
quantization = ScalarQuantization(
scalar=models.ScalarQuantizationConfig(
type=QuantizationType.INT8,
quantile=0.99
)
)
else:
# Large collection - aggressive optimization
config = VectorParams(
size=1536,
distance=Distance.COSINE,
hnsw_config=models.HnswConfigDiff(m=16, ef_construct=100)
)
quantization = models.ProductQuantization(
product=models.ProductQuantizationConfig(
compression=models.CompressionRatio.X8
)
)
client.create_collection(
collection_name=name,
vectors_config=config,
quantization_config=quantization
)
Production Considerations
Health Monitoring
def health_check():
try:
collections = client.get_collections()
for collection in collections.collections:
info = client.get_collection(collection.name)
print(f"Collection: {collection.name}")
print(f" Points: {info.points_count}")
print(f" Indexed: {info.indexed_vectors_count}")
if info.vectors_count != info.indexed_vectors_count:
print(f" Warning: Indexing behind!")
return True
except Exception as e:
print(f"Health check failed: {e}")
return False
Backup Strategy
def backup_collection(collection_name, backup_file):
all_points = []
offset = None
while True:
result = client.scroll(
collection_name=collection_name,
offset=offset,
limit=1000,
with_payload=True,
with_vectors=True
)
points, next_offset = result
all_points.extend([{
"id": p.id,
"vector": p.vector,
"payload": p.payload
} for p in points])
if next_offset is None:
break
offset = next_offset
import json
with open(backup_file, 'w') as f:
json.dump(all_points, f)
print(f"Backed up {len(all_points)} points")