Skip to main content

Command Palette

Search for a command to run...

Qdrant 101

Updated
7 min read
Qdrant 101

Introduction

Qdrant is an open-source vector database designed for storing, searching, and managing high-dimensional vectors. It's particularly useful for AI applications like semantic search, recommendation systems, and RAG (Retrieval-Augmented Generation) implementations.

Why Qdrant?

  • High Performance: Written in Rust for optimal speed

  • Scalability: Handles millions of vectors efficiently

  • Rich Filtering: Complex metadata filtering capabilities

  • Multiple Metrics: Support for various similarity metrics

  • Easy Integration: REST API and multiple language clients

Key Terminologies

Core Concepts

Vector: A high-dimensional numerical representation of data (text, images, etc.)

# Example vector (384 dimensions)
vector = [0.1, -0.2, 0.3, ..., 0.5]

Embedding: The process of converting raw data into vector representations

# Text to vector embedding
text = "Artificial intelligence is transforming healthcare"
# After embedding: [0.023, -0.156, 0.891, ...]

Point: A vector combined with an optional payload (metadata) and unique ID

point = {
    "id": 1,
    "vector": [0.1, 0.2, 0.3, ...],  # 1536-dim vector
    "payload": {
        "title": "AI in Medical Diagnosis", 
        "category": "healthcare",
        "confidence": 0.95
    }
}

Collection: A named set of vectors with the same dimensionality and distance metric

# Collections are like tables in traditional databases
collection_name = "documents"

Payload: Metadata associated with a vector for filtering and additional information

payload = {
    "title": "AI in Healthcare",
    "author": "John Doe",
    "published_date": "2024-01-15",
    "tags": ["AI", "healthcare", "machine learning"]
}

Distance Metric: Method to measure similarity between vectors

  • Cosine Distance: Measures angle between vectors

    • Range: 0 (identical) to 2 (opposite)

    • Best for: Text embeddings, when magnitude doesn't matter

    • Use case: Document similarity, semantic search

  • Euclidean Distance: Measures straight-line distance in space

    • Range: 0 (identical) to ∞

    • Best for: When both direction and magnitude matter

    • Use case: Image features, spatial data

  • Dot Product: Measures alignment and magnitude

    • Range: -∞ to +∞ (higher = more similar)

    • Best for: When you want to consider vector magnitude

    • Use case: Recommendation systems with user preferences


Advanced Concepts

Quantization: Technique to reduce memory usage by compressing vectors

  • Why: Reduces memory footprint by 2-8x, enables larger datasets

  • Trade-off: Slight accuracy loss for significant memory savings

  • Types: Scalar (INT8), Product (PQ), Binary

Indexing: Data structure optimization for faster search

  • HNSW: Hierarchical Navigable Small World graphs

  • Purpose: Trade memory for search speed

  • Configuration: Affects build time, memory usage, and search accuracy

Sharding: Distributing data across multiple nodes

  • Horizontal scaling: Split collection across multiple machines

  • Load balancing: Distribute query load

  • Fault tolerance: Replicas ensure availability

Replication: Creating copies for fault tolerance

  • Data safety: Multiple copies prevent data loss

  • Read scaling: Distribute read queries across replicas

  • Consistency: Strong or eventual consistency models


Installation & Setup

Basic Setup

# Basic setup
docker run -p 6333:6333 qdrant/qdrant

# With persistent storage
docker run -p 6333:6333 -v $(pwd)/storage:/qdrant/storage qdrant/qdrant

Docker Compose for Production

# docker-compose.yml
version: '3.7'
services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - qdrant_storage:/qdrant/storage
    environment:
      - QDRANT__SERVICE__HTTP_PORT=6333
      - QDRANT__STORAGE__STORAGE_PATH=/qdrant/storage
    restart: unless-stopped

volumes:
  qdrant_storage:

Python Client Installation

pip install qdrant-client openai python-dotenv

Basic Connection Setup

import os
import openai
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

# Setup
os.environ["OPENAI_API_KEY"] = "your-key-here"
openai.api_key = os.getenv("OPENAI_API_KEY")

# Connect to Qdrant
client = QdrantClient("localhost", port=6333)

# Test OpenAI embedding
def get_embedding(text):
    response = openai.Embedding.create(
        input=text,
        model="text-embedding-ada-002"
    )
    return response['data'][0]['embedding']

# Test
test_embedding = get_embedding("Hello world")
print(f"Embedding dimension: {len(test_embedding)}")  # Should be 1536

Core Concepts

Vector Similarity

Vector similarity is the foundation of semantic search. Let's explore how it works:

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Similar medical texts
text1 = "AI is revolutionizing medical diagnosis"
text2 = "Artificial intelligence transforms healthcare"
text3 = "The weather is sunny today"

vec1 = get_embedding(text1)
vec2 = get_embedding(text2)
vec3 = get_embedding(text3)

print(f"Medical similarity: {cosine_similarity(vec1, vec2):.3f}")  # High
print(f"Different topic: {cosine_similarity(vec1, vec3):.3f}")    # Low

Basic Operations

1. Create Collection

# Create collection for OpenAI embeddings
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1536,  # OpenAI ada-002 dimension
        distance=Distance.COSINE
    )
)

2. Insert Documents

from qdrant_client.models import PointStruct
import uuid

def insert_document(title, content, category):
    # Generate embedding
    text = f"{title}. {content}"
    vector = get_embedding(text)

    # Create point
    point = PointStruct(
        id=str(uuid.uuid4()),
        vector=vector,
        payload={
            "title": title,
            "content": content,
            "category": category,
            "word_count": len(content.split())
        }
    )

    # Insert
    client.upsert(
        collection_name="documents",
        points=[point]
    )
    return point.id

# Example usage
doc_id = insert_document(
    "AI in Healthcare", 
    "Machine learning is transforming medical diagnosis...",
    "technology"
)

3. Advanced Search with OpenAI Embeddings

def search_documents(query, limit=5):
    # Generate query embedding
    query_vector = get_embedding(query)

    # Search
    results = client.search(
        collection_name="documents",
        query_vector=query_vector,
        limit=limit,
        with_payload=True
    )

    # Format results
    return [{
        "title": r.payload["title"],
        "score": r.score,
        "category": r.payload["category"]
    } for r in results]

# Search example
results = search_documents("artificial intelligence medicine")
for result in results:
    print(f"{result['title']} - Score: {result['score']:.3f}")
from qdrant_client.models import Filter, FieldCondition, MatchValue

def search_with_filter(query, category, limit=5):
    query_vector = get_embedding(query)

    category_filter = Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value=category)
            )
        ]
    )

    results = client.search(
        collection_name="documents",
        query_vector=query_vector,
        query_filter=category_filter,
        limit=limit
    )

    return results

# Search only in technology category
tech_results = search_with_filter("machine learning", "technology")

Performance Optimization

Quantization

Why Quantization? Quantization reduces memory usage by representing vectors with lower precision. This is crucial when dealing with millions of vectors.

Memory Savings:

  • Scalar (INT8): General purpose, 75% memory reduction, 1-3% accuracy loss

  • Product: Maximum compression, 87-95% reduction, 5-15% accuracy loss

  • Binary: Extreme compression, 96% reduction, 20-40% accuracy loss

from qdrant_client.models import ScalarQuantization, QuantizationType

# Apply INT8 quantization (recommended for most cases)
client.update_collection(
    collection_name="documents",
    quantization_config=ScalarQuantization(
        scalar=models.ScalarQuantizationConfig(
            type=QuantizationType.INT8,
            quantile=0.99,
            always_ram=True
        )
    )
)

Indexing Optimization

# Optimize for search speed
client.update_collection(
    collection_name="documents",
    hnsw_config=models.HnswConfigDiff(
        m=32,              # Higher = better quality, more memory
        ef_construct=200,  # Higher = better quality, slower build
        full_scan_threshold=10000
    )
)

Advanced Operations

Complex Filtering

from qdrant_client.models import Range, MatchAny

# Complex filter: technology OR healthcare, with substantial content
complex_filter = Filter(
    must=[
        FieldCondition(key="category", match=MatchAny(any=["technology", "healthcare"])),
        FieldCondition(key="word_count", range=Range(gte=100))
    ],
    must_not=[
        FieldCondition(key="status", match=MatchValue(value="draft"))
    ]
)

results = client.search(
    collection_name="documents",
    query_vector=query_vector,
    query_filter=complex_filter,
    limit=10
)

Batch Operations

def batch_insert_documents(documents, batch_size=100):
    points = []

    for doc in documents:
        text = f"{doc['title']}. {doc['content']}"
        vector = get_embedding(text)

        points.append(PointStruct(
            id=str(uuid.uuid4()),
            vector=vector,
            payload=doc
        ))

    # Insert in batches
    for i in range(0, len(points), batch_size):
        batch = points[i:i + batch_size]
        client.upsert(collection_name="documents", points=batch)
        print(f"Inserted batch {i//batch_size + 1}")

# Usage
sample_docs = [
    {"title": "Doc 1", "content": "Content 1...", "category": "tech"},
    {"title": "Doc 2", "content": "Content 2...", "category": "health"}
]
batch_insert_documents(sample_docs)

Best Practices

1. Embedding Generation

class EmbeddingManager:
    def __init__(self):
        self.model = "text-embedding-ada-002"

    def preprocess_text(self, text):
        # Clean and normalize text
        text = text.strip()
        text = ' '.join(text.split())  # Normalize whitespace

        # Truncate if too long (8191 tokens max for ada-002)
        max_chars = 8191 * 4  # ~4 chars per token
        if len(text) > max_chars:
            text = text[:max_chars]

        return text

    def get_embedding(self, text):
        processed_text = self.preprocess_text(text)

        response = openai.Embedding.create(
            input=processed_text,
            model=self.model
        )
        return response['data'][0]['embedding']

    def optimize_for_search(self, title, content):
        # Weight title more heavily
        return f"{title}. {title}. {content}"

2. Error Handling

import time
from functools import wraps

def retry_on_failure(max_retries=3):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise e
                    wait_time = 2 ** attempt
                    print(f"Attempt {attempt + 1} failed, retrying in {wait_time}s...")
                    time.sleep(wait_time)
        return wrapper
    return decorator

@retry_on_failure(max_retries=3)
def safe_upsert(collection_name, points):
    return client.upsert(collection_name=collection_name, points=points)

3. Collection Design

def create_optimized_collection(name, expected_size):
    if expected_size < 10000:
        # Small collection - no quantization needed
        config = VectorParams(size=1536, distance=Distance.COSINE)
        quantization = None
    elif expected_size < 1000000:
        # Medium collection - use scalar quantization
        config = VectorParams(size=1536, distance=Distance.COSINE)
        quantization = ScalarQuantization(
            scalar=models.ScalarQuantizationConfig(
                type=QuantizationType.INT8,
                quantile=0.99
            )
        )
    else:
        # Large collection - aggressive optimization
        config = VectorParams(
            size=1536, 
            distance=Distance.COSINE,
            hnsw_config=models.HnswConfigDiff(m=16, ef_construct=100)
        )
        quantization = models.ProductQuantization(
            product=models.ProductQuantizationConfig(
                compression=models.CompressionRatio.X8
            )
        )

    client.create_collection(
        collection_name=name,
        vectors_config=config,
        quantization_config=quantization
    )

Production Considerations

Health Monitoring

def health_check():
    try:
        collections = client.get_collections()

        for collection in collections.collections:
            info = client.get_collection(collection.name)
            print(f"Collection: {collection.name}")
            print(f"  Points: {info.points_count}")
            print(f"  Indexed: {info.indexed_vectors_count}")

            if info.vectors_count != info.indexed_vectors_count:
                print(f"  Warning: Indexing behind!")

        return True
    except Exception as e:
        print(f"Health check failed: {e}")
        return False

Backup Strategy

def backup_collection(collection_name, backup_file):
    all_points = []
    offset = None

    while True:
        result = client.scroll(
            collection_name=collection_name,
            offset=offset,
            limit=1000,
            with_payload=True,
            with_vectors=True
        )

        points, next_offset = result
        all_points.extend([{
            "id": p.id,
            "vector": p.vector,
            "payload": p.payload
        } for p in points])

        if next_offset is None:
            break
        offset = next_offset

    import json
    with open(backup_file, 'w') as f:
        json.dump(all_points, f)

    print(f"Backed up {len(all_points)} points")