Rag Pinecone Smart
Boost productivity using this managed, vector, database, production. Includes structured workflows, validation checks, and reusable patterns for ai research.
RAG with Pinecone
Build production RAG applications using Pinecone's managed, serverless vector database — with hybrid search, metadata filtering, namespaces, and automatic scaling.
When to Use
Choose Pinecone when:
- Need managed, serverless vector database (zero infrastructure)
- Production RAG applications requiring auto-scaling
- Low latency critical (< 100ms P99)
- Need hybrid search (dense + sparse vectors)
- Don't want to manage database infrastructure
Consider alternatives when:
- Need full data control on-premise → Qdrant
- Budget-constrained research → FAISS (free, self-hosted)
- Postgres already in stack → pgvector
- Need multi-modal natively → Weaviate
Quick Start
Installation
pip install pinecone-client
Create Index and Upsert Vectors
from pinecone import Pinecone, ServerlessSpec pc = Pinecone(api_key="your-api-key") # Create serverless index pc.create_index( name="my-rag-index", dimension=1536, metric="cosine", spec=ServerlessSpec(cloud="aws", region="us-east-1") ) index = pc.Index("my-rag-index") # Upsert vectors with metadata vectors = [ { "id": "doc-1-chunk-1", "values": embedding_vector, # 1536-dim float list "metadata": { "source": "handbook.pdf", "page": 12, "section": "refund-policy", "text": "Refunds are processed within 5-7 business days..." } }, # ... more vectors ] index.upsert(vectors=vectors, namespace="production")
Query with Metadata Filtering
results = index.query( vector=query_embedding, top_k=5, namespace="production", filter={ "source": {"$eq": "handbook.pdf"}, "page": {"$gte": 10, "$lte": 20} }, include_metadata=True ) for match in results.matches: print(f"Score: {match.score:.4f}") print(f"Text: {match.metadata['text']}")
Hybrid Search (Dense + Sparse)
from pinecone_text.sparse import BM25Encoder # Fit BM25 on your corpus bm25 = BM25Encoder() bm25.fit(corpus_texts) # Create hybrid query sparse_vector = bm25.encode_queries(query_text) results = index.query( vector=dense_embedding, # Semantic sparse_vector=sparse_vector, # Keyword top_k=10, include_metadata=True )
Core Concepts
Namespaces
Partition your index for multi-tenant or multi-environment setups:
# Separate namespaces for different purposes index.upsert(vectors, namespace="production") index.upsert(vectors, namespace="staging") index.upsert(vectors, namespace="customer-abc") # Query only within a namespace results = index.query(vector=q, top_k=5, namespace="customer-abc")
Metadata Filtering
| Operator | Usage | Example |
|---|---|---|
$eq | Equals | {"category": {"$eq": "finance"}} |
$ne | Not equals | {"status": {"$ne": "archived"}} |
$gt/$gte | Greater than | {"date": {"$gte": "2024-01-01"}} |
$lt/$lte | Less than | {"page": {"$lte": 50}} |
$in | In list | {"tag": {"$in": ["urgent", "high"]}} |
$and/$or | Logical | {"$and": [{...}, {...}]} |
Index Management
# Get index statistics stats = index.describe_index_stats() print(f"Total vectors: {stats.total_vector_count}") print(f"Namespaces: {stats.namespaces}") # Delete vectors index.delete(ids=["doc-1-chunk-1", "doc-1-chunk-2"]) index.delete(filter={"source": "old-doc.pdf"}) index.delete(delete_all=True, namespace="staging")
Configuration
| Parameter | Default | Description |
|---|---|---|
dimension | — | Vector dimension (match your embedding model) |
metric | "cosine" | Distance metric (cosine, euclidean, dotproduct) |
cloud | "aws" | Cloud provider |
region | "us-east-1" | Deployment region |
top_k | 10 | Results to return |
namespace | "" | Partition for multi-tenancy |
Best Practices
- Use namespaces for multi-tenancy — isolate customer data without separate indexes
- Include text in metadata — avoids a second lookup to retrieve the original content
- Batch upserts in groups of 100-200 for optimal throughput
- Use hybrid search for production — combines semantic understanding with keyword precision
- Set appropriate metadata — source, page, section, and date enable powerful filtering
- Monitor index size — costs scale with vector count and dimension
Common Issues
Slow upsert performance: Batch vectors into groups of 100-200. Use async/parallel upserts for large ingestion jobs. Avoid upserting one vector at a time.
Query returns irrelevant results: Add metadata filters to narrow scope. Use hybrid search if pure semantic misses keyword-dependent queries. Check that your embedding model handles your domain vocabulary.
High costs at scale: Reduce vector dimensions (text-embedding-3-small supports dimension reduction). Delete stale vectors. Use namespaces instead of separate indexes.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.