Ultimate RAG with Qdrant

Build production RAG systems using Qdrant, a high-performance vector database written in Rust — with rich filtering, horizontal scaling, and on-premise deployment options.

When to Use

Choose Qdrant when:

Need high-performance vector search with rich metadata filtering
Want on-premise deployment with full data control
Require horizontal scaling with sharding and replication
Building production systems that need both cloud and self-hosted options
Need advanced filtering (geo, range, nested objects)

Consider alternatives when:

Want fully managed with zero ops → Pinecone
Pure vector search at billion scale → FAISS
Already using Postgres → pgvector
Multi-modal with GraphQL → Weaviate

Quick Start

Installation


# Python client
pip install qdrant-client

# Run Qdrant locally (Docker)
docker run -p 6333:6333 qdrant/qdrant

Create Collection and Insert


from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct

client = QdrantClient("http://localhost:6333")

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

# Insert vectors with payload (metadata)
points = [
    PointStruct(
        id=1,
        vector=embedding_vector,
        payload={
            "source": "handbook.pdf",
            "page": 12,
            "section": "refund-policy",
            "text": "Refunds are processed within 5-7 business days...",
            "tags": ["policy", "refunds"],
            "date": "2024-01-15"
        }
    ),
    # ... more points
]

client.upsert(collection_name="documents", points=points)

Search with Filtering


from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    limit=5,
    query_filter=Filter(
        must=[
            FieldCondition(key="source", match=MatchValue(value="handbook.pdf")),
            FieldCondition(key="page", range=Range(gte=10, lte=50)),
        ]
    )
)

for point in results:
    print(f"Score: {point.score:.4f}")
    print(f"Text: {point.payload['text']}")

Core Concepts

Payload Filtering

Qdrant supports rich filtering on payload (metadata) fields:

Filter Type	Example	Use Case
Match	`MatchValue(value="finance")`	Exact match
Range	`Range(gte=10, lte=50)`	Numeric ranges
Geo	`GeoBoundingBox(...)`	Location-based
Full-text	`MatchText(text="refund")`	Keyword in text
Nested	`Nested(key="tags", ...)`	Array/object fields
Is Empty	`IsEmpty(key="notes")`	Null check

Index Optimization


# Create payload indexes for filtered fields
client.create_payload_index(
    collection_name="documents",
    field_name="source",
    field_schema="keyword"
)

client.create_payload_index(
    collection_name="documents",
    field_name="date",
    field_schema="datetime"
)

# HNSW parameters
client.update_collection(
    collection_name="documents",
    hnsw_config={"m": 16, "ef_construct": 200}
)

Horizontal Scaling


# Qdrant cluster configuration
storage:
  storage_path: /data/qdrant

cluster:
  enabled: true
  consensus:
    tick_period_ms: 100

collections:
  documents:
    shard_number: 6         # Distribute across nodes
    replication_factor: 2    # Data redundancy

Configuration

Parameter	Default	Description
`size`	—	Vector dimension
`distance`	COSINE	Similarity metric (Cosine, Euclid, Dot)
`hnsw_m`	16	HNSW graph connectivity
`hnsw_ef_construct`	100	HNSW build quality
`shard_number`	1	Data partitions
`replication_factor`	1	Data copies
`on_disk`	False	Store vectors on disk

Best Practices

Create payload indexes for fields you filter on — dramatically improves filtered search
Use quantization for large collections — scalar quantization reduces memory by 4x
Set shard_number to at least the number of nodes for even distribution
Use on_disk mode for collections larger than available RAM
Batch inserts — upsert in batches of 100-500 for optimal throughput
Monitor with Qdrant dashboard — built-in UI at port 6333/dashboard

Common Issues

Slow filtered search: Create payload indexes for filtered fields. Pre-filter narrows the search space dramatically. Check that HNSW ef_construct is high enough for your dataset size.

High memory usage: Enable scalar quantization to reduce memory by 4x. Use on_disk storage for vectors. Reduce HNSW m parameter (trades accuracy for memory).

Cluster node failure: Set replication_factor >= 2 for fault tolerance. Qdrant automatically handles node recovery. Monitor shard distribution to ensure even data spread.

⚠️ Loading Issue

Ultimate Rag Qdrant

Ultimate RAG with Qdrant

When to Use

Quick Start

Installation

Create Collection and Insert

Search with Filtering

Core Concepts

Payload Filtering

Index Optimization

Horizontal Scaling

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace