U

Ultimate Rag Faiss

Streamline your workflow with this facebook, library, efficient, similarity. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Ultimate RAG with FAISS

Build high-performance vector similarity search systems using Facebook AI's FAISS library — supporting billion-scale datasets with GPU acceleration, approximate nearest neighbors, and production-optimized index types.

When to Use

Choose FAISS when:

  • Need fast similarity search on large datasets (millions to billions of vectors)
  • GPU acceleration is available and throughput matters
  • Pure vector similarity without complex metadata filtering
  • Batch processing of embeddings at scale
  • Self-hosted deployment with full control

Consider alternatives when:

  • Need rich metadata filtering → Qdrant, Pinecone, or Weaviate
  • Want managed infrastructure → Pinecone (serverless)
  • Small dataset (< 100K vectors) → ChromaDB or pgvector
  • Need real-time updates with consistency → Qdrant or Weaviate

Quick Start

Installation

# CPU only pip install faiss-cpu # GPU support (CUDA) pip install faiss-gpu
import faiss import numpy as np # Create vectors (e.g., from embedding model) dimension = 1536 # text-embedding-3-small dimension num_vectors = 100000 vectors = np.random.random((num_vectors, dimension)).astype('float32') # Build index index = faiss.IndexFlatL2(dimension) # Exact L2 distance index.add(vectors) print(f"Total vectors: {index.ntotal}") # Search query = np.random.random((1, dimension)).astype('float32') distances, indices = index.search(query, k=5) # Top 5 nearest print(f"Nearest indices: {indices[0]}") print(f"Distances: {distances[0]}")

Production Index (IVF + PQ)

import faiss dimension = 1536 num_vectors = 10_000_000 # IVF (Inverted File) + PQ (Product Quantization) nlist = 1000 # Number of clusters m = 64 # Number of sub-quantizers nbits = 8 # Bits per sub-quantizer quantizer = faiss.IndexFlatL2(dimension) index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, nbits) # Train index on representative sample training_data = vectors[:100000] # Use subset for training index.train(training_data) index.add(vectors) # Search with probe parameter index.nprobe = 50 # Search 50 clusters (accuracy vs speed tradeoff) distances, indices = index.search(query, k=10)

Core Concepts

Index Types

IndexTimeMemoryAccuracyBest For
IndexFlatL2O(n)FullExact< 1M vectors, baseline
IndexIVFFlatO(n/nlist)FullHigh1-10M, good accuracy
IndexIVFPQO(n/nlist)LowGood10M-1B, memory constrained
IndexHNSWFlatO(log n)FullVery high1-100M, low latency
IndexIVFScalarQuantizerO(n/nlist)MediumHigh1-100M, balanced

GPU Acceleration

import faiss # Move index to GPU gpu_resource = faiss.StandardGpuResources() gpu_index = faiss.index_cpu_to_gpu(gpu_resource, 0, cpu_index) # GPU 0 # Multi-GPU gpu_index = faiss.index_cpu_to_all_gpus(cpu_index) # Search on GPU (10-50x faster) distances, indices = gpu_index.search(queries, k=10)

Persistence

# Save index to disk faiss.write_index(index, "my_index.faiss") # Load index from disk index = faiss.read_index("my_index.faiss")

Configuration

ParameterDefaultDescription
dimensionVector dimension (must match embeddings)
nlistsqrt(n)Number of IVF clusters
nprobe1Clusters to search (accuracy vs speed)
m8PQ sub-quantizers (memory vs accuracy)
nbits8Bits per sub-quantizer code
metricL2Distance metric (L2 or inner product)

Tuning Guidelines

Dataset SizeIndexnlistnprobeExpected Recall@10
< 1MIndexFlatL2100% (exact)
1-10MIndexIVFFlat100050~97%
10-100MIndexIVFPQ4096128~90%
100M-1BIndexIVFPQ16384256~85%

Best Practices

  1. Train on representative data — IVF training data should match the distribution of your full dataset
  2. Tune nprobe for your accuracy/speed tradeoff — start at nprobe=nlist/10 and adjust
  3. Use HNSW for low-latency requirements — best query time without GPU
  4. GPU for throughput — batch queries of 100+ to fully utilize GPU
  5. Normalize vectors for cosine similarity — use faiss.normalize_L2() before adding
  6. Save and version indexes — rebuilding large indexes takes hours

Common Issues

Low recall with IVF index: Increase nprobe. If still low, increase nlist and retrain. Ensure training data is representative of the full dataset.

Out of memory: Use PQ compression (IndexIVFPQ) to reduce memory by 10-50x. Use memory-mapped indexes for datasets larger than RAM. Consider sharding across multiple machines.

Slow training: Use a representative subset (10-50K vectors) for training, not the full dataset. Move training to GPU with index_cpu_to_gpu.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates