RAG with Pinecone

Build production RAG applications using Pinecone's managed, serverless vector database — with hybrid search, metadata filtering, namespaces, and automatic scaling.

When to Use

Choose Pinecone when:

Need managed, serverless vector database (zero infrastructure)
Production RAG applications requiring auto-scaling
Low latency critical (< 100ms P99)
Need hybrid search (dense + sparse vectors)
Don't want to manage database infrastructure

Consider alternatives when:

Need full data control on-premise → Qdrant
Budget-constrained research → FAISS (free, self-hosted)
Postgres already in stack → pgvector
Need multi-modal natively → Weaviate

Quick Start

Installation


pip install pinecone-client

Create Index and Upsert Vectors


from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create serverless index
pc.create_index(
    name="my-rag-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("my-rag-index")

# Upsert vectors with metadata
vectors = [
    {
        "id": "doc-1-chunk-1",
        "values": embedding_vector,  # 1536-dim float list
        "metadata": {
            "source": "handbook.pdf",
            "page": 12,
            "section": "refund-policy",
            "text": "Refunds are processed within 5-7 business days..."
        }
    },
    # ... more vectors
]

index.upsert(vectors=vectors, namespace="production")

Query with Metadata Filtering


results = index.query(
    vector=query_embedding,
    top_k=5,
    namespace="production",
    filter={
        "source": {"$eq": "handbook.pdf"},
        "page": {"$gte": 10, "$lte": 20}
    },
    include_metadata=True
)

for match in results.matches:
    print(f"Score: {match.score:.4f}")
    print(f"Text: {match.metadata['text']}")

Hybrid Search (Dense + Sparse)


from pinecone_text.sparse import BM25Encoder

# Fit BM25 on your corpus
bm25 = BM25Encoder()
bm25.fit(corpus_texts)

# Create hybrid query
sparse_vector = bm25.encode_queries(query_text)

results = index.query(
    vector=dense_embedding,       # Semantic
    sparse_vector=sparse_vector,  # Keyword
    top_k=10,
    include_metadata=True
)

Core Concepts

Namespaces

Partition your index for multi-tenant or multi-environment setups:


# Separate namespaces for different purposes
index.upsert(vectors, namespace="production")
index.upsert(vectors, namespace="staging")
index.upsert(vectors, namespace="customer-abc")

# Query only within a namespace
results = index.query(vector=q, top_k=5, namespace="customer-abc")

Metadata Filtering

Operator	Usage	Example
`$eq`	Equals	`{"category": {"$eq": "finance"}}`
`$ne`	Not equals	`{"status": {"$ne": "archived"}}`
`$gt/$gte`	Greater than	`{"date": {"$gte": "2024-01-01"}}`
`$lt/$lte`	Less than	`{"page": {"$lte": 50}}`
`$in`	In list	`{"tag": {"$in": ["urgent", "high"]}}`
`$and/$or`	Logical	`{"$and": [{...}, {...}]}`

Index Management


# Get index statistics
stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")
print(f"Namespaces: {stats.namespaces}")

# Delete vectors
index.delete(ids=["doc-1-chunk-1", "doc-1-chunk-2"])
index.delete(filter={"source": "old-doc.pdf"})
index.delete(delete_all=True, namespace="staging")

Configuration

Parameter	Default	Description
`dimension`	—	Vector dimension (match your embedding model)
`metric`	"cosine"	Distance metric (cosine, euclidean, dotproduct)
`cloud`	"aws"	Cloud provider
`region`	"us-east-1"	Deployment region
`top_k`	10	Results to return
`namespace`	""	Partition for multi-tenancy

Best Practices

Use namespaces for multi-tenancy — isolate customer data without separate indexes
Include text in metadata — avoids a second lookup to retrieve the original content
Batch upserts in groups of 100-200 for optimal throughput
Use hybrid search for production — combines semantic understanding with keyword precision
Set appropriate metadata — source, page, section, and date enable powerful filtering
Monitor index size — costs scale with vector count and dimension

Common Issues

Slow upsert performance: Batch vectors into groups of 100-200. Use async/parallel upserts for large ingestion jobs. Avoid upserting one vector at a time.

Query returns irrelevant results: Add metadata filters to narrow scope. Use hybrid search if pure semantic misses keyword-dependent queries. Check that your embedding model handles your domain vocabulary.

High costs at scale: Reduce vector dimensions (text-embedding-3-small supports dimension reduction). Delete stale vectors. Use namespaces instead of separate indexes.

⚠️ Loading Issue

Rag Pinecone Smart