Ultimate Rag Faiss
Streamline your workflow with this facebook, library, efficient, similarity. Includes structured workflows, validation checks, and reusable patterns for ai research.
Ultimate RAG with FAISS
Build high-performance vector similarity search systems using Facebook AI's FAISS library — supporting billion-scale datasets with GPU acceleration, approximate nearest neighbors, and production-optimized index types.
When to Use
Choose FAISS when:
- Need fast similarity search on large datasets (millions to billions of vectors)
- GPU acceleration is available and throughput matters
- Pure vector similarity without complex metadata filtering
- Batch processing of embeddings at scale
- Self-hosted deployment with full control
Consider alternatives when:
- Need rich metadata filtering → Qdrant, Pinecone, or Weaviate
- Want managed infrastructure → Pinecone (serverless)
- Small dataset (< 100K vectors) → ChromaDB or pgvector
- Need real-time updates with consistency → Qdrant or Weaviate
Quick Start
Installation
# CPU only pip install faiss-cpu # GPU support (CUDA) pip install faiss-gpu
Basic Similarity Search
import faiss import numpy as np # Create vectors (e.g., from embedding model) dimension = 1536 # text-embedding-3-small dimension num_vectors = 100000 vectors = np.random.random((num_vectors, dimension)).astype('float32') # Build index index = faiss.IndexFlatL2(dimension) # Exact L2 distance index.add(vectors) print(f"Total vectors: {index.ntotal}") # Search query = np.random.random((1, dimension)).astype('float32') distances, indices = index.search(query, k=5) # Top 5 nearest print(f"Nearest indices: {indices[0]}") print(f"Distances: {distances[0]}")
Production Index (IVF + PQ)
import faiss dimension = 1536 num_vectors = 10_000_000 # IVF (Inverted File) + PQ (Product Quantization) nlist = 1000 # Number of clusters m = 64 # Number of sub-quantizers nbits = 8 # Bits per sub-quantizer quantizer = faiss.IndexFlatL2(dimension) index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, nbits) # Train index on representative sample training_data = vectors[:100000] # Use subset for training index.train(training_data) index.add(vectors) # Search with probe parameter index.nprobe = 50 # Search 50 clusters (accuracy vs speed tradeoff) distances, indices = index.search(query, k=10)
Core Concepts
Index Types
| Index | Time | Memory | Accuracy | Best For |
|---|---|---|---|---|
IndexFlatL2 | O(n) | Full | Exact | < 1M vectors, baseline |
IndexIVFFlat | O(n/nlist) | Full | High | 1-10M, good accuracy |
IndexIVFPQ | O(n/nlist) | Low | Good | 10M-1B, memory constrained |
IndexHNSWFlat | O(log n) | Full | Very high | 1-100M, low latency |
IndexIVFScalarQuantizer | O(n/nlist) | Medium | High | 1-100M, balanced |
GPU Acceleration
import faiss # Move index to GPU gpu_resource = faiss.StandardGpuResources() gpu_index = faiss.index_cpu_to_gpu(gpu_resource, 0, cpu_index) # GPU 0 # Multi-GPU gpu_index = faiss.index_cpu_to_all_gpus(cpu_index) # Search on GPU (10-50x faster) distances, indices = gpu_index.search(queries, k=10)
Persistence
# Save index to disk faiss.write_index(index, "my_index.faiss") # Load index from disk index = faiss.read_index("my_index.faiss")
Configuration
| Parameter | Default | Description |
|---|---|---|
dimension | — | Vector dimension (must match embeddings) |
nlist | sqrt(n) | Number of IVF clusters |
nprobe | 1 | Clusters to search (accuracy vs speed) |
m | 8 | PQ sub-quantizers (memory vs accuracy) |
nbits | 8 | Bits per sub-quantizer code |
metric | L2 | Distance metric (L2 or inner product) |
Tuning Guidelines
| Dataset Size | Index | nlist | nprobe | Expected Recall@10 |
|---|---|---|---|---|
| < 1M | IndexFlatL2 | — | — | 100% (exact) |
| 1-10M | IndexIVFFlat | 1000 | 50 | ~97% |
| 10-100M | IndexIVFPQ | 4096 | 128 | ~90% |
| 100M-1B | IndexIVFPQ | 16384 | 256 | ~85% |
Best Practices
- Train on representative data — IVF training data should match the distribution of your full dataset
- Tune nprobe for your accuracy/speed tradeoff — start at nprobe=nlist/10 and adjust
- Use HNSW for low-latency requirements — best query time without GPU
- GPU for throughput — batch queries of 100+ to fully utilize GPU
- Normalize vectors for cosine similarity — use
faiss.normalize_L2()before adding - Save and version indexes — rebuilding large indexes takes hours
Common Issues
Low recall with IVF index:
Increase nprobe. If still low, increase nlist and retrain. Ensure training data is representative of the full dataset.
Out of memory: Use PQ compression (IndexIVFPQ) to reduce memory by 10-50x. Use memory-mapped indexes for datasets larger than RAM. Consider sharding across multiple machines.
Slow training:
Use a representative subset (10-50K vectors) for training, not the full dataset. Move training to GPU with index_cpu_to_gpu.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.