R

Rag Pinecone Smart

Boost productivity using this managed, vector, database, production. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

RAG with Pinecone

Build production RAG applications using Pinecone's managed, serverless vector database — with hybrid search, metadata filtering, namespaces, and automatic scaling.

When to Use

Choose Pinecone when:

  • Need managed, serverless vector database (zero infrastructure)
  • Production RAG applications requiring auto-scaling
  • Low latency critical (< 100ms P99)
  • Need hybrid search (dense + sparse vectors)
  • Don't want to manage database infrastructure

Consider alternatives when:

  • Need full data control on-premise → Qdrant
  • Budget-constrained research → FAISS (free, self-hosted)
  • Postgres already in stack → pgvector
  • Need multi-modal natively → Weaviate

Quick Start

Installation

pip install pinecone-client

Create Index and Upsert Vectors

from pinecone import Pinecone, ServerlessSpec pc = Pinecone(api_key="your-api-key") # Create serverless index pc.create_index( name="my-rag-index", dimension=1536, metric="cosine", spec=ServerlessSpec(cloud="aws", region="us-east-1") ) index = pc.Index("my-rag-index") # Upsert vectors with metadata vectors = [ { "id": "doc-1-chunk-1", "values": embedding_vector, # 1536-dim float list "metadata": { "source": "handbook.pdf", "page": 12, "section": "refund-policy", "text": "Refunds are processed within 5-7 business days..." } }, # ... more vectors ] index.upsert(vectors=vectors, namespace="production")

Query with Metadata Filtering

results = index.query( vector=query_embedding, top_k=5, namespace="production", filter={ "source": {"$eq": "handbook.pdf"}, "page": {"$gte": 10, "$lte": 20} }, include_metadata=True ) for match in results.matches: print(f"Score: {match.score:.4f}") print(f"Text: {match.metadata['text']}")

Hybrid Search (Dense + Sparse)

from pinecone_text.sparse import BM25Encoder # Fit BM25 on your corpus bm25 = BM25Encoder() bm25.fit(corpus_texts) # Create hybrid query sparse_vector = bm25.encode_queries(query_text) results = index.query( vector=dense_embedding, # Semantic sparse_vector=sparse_vector, # Keyword top_k=10, include_metadata=True )

Core Concepts

Namespaces

Partition your index for multi-tenant or multi-environment setups:

# Separate namespaces for different purposes index.upsert(vectors, namespace="production") index.upsert(vectors, namespace="staging") index.upsert(vectors, namespace="customer-abc") # Query only within a namespace results = index.query(vector=q, top_k=5, namespace="customer-abc")

Metadata Filtering

OperatorUsageExample
$eqEquals{"category": {"$eq": "finance"}}
$neNot equals{"status": {"$ne": "archived"}}
$gt/$gteGreater than{"date": {"$gte": "2024-01-01"}}
$lt/$lteLess than{"page": {"$lte": 50}}
$inIn list{"tag": {"$in": ["urgent", "high"]}}
$and/$orLogical{"$and": [{...}, {...}]}

Index Management

# Get index statistics stats = index.describe_index_stats() print(f"Total vectors: {stats.total_vector_count}") print(f"Namespaces: {stats.namespaces}") # Delete vectors index.delete(ids=["doc-1-chunk-1", "doc-1-chunk-2"]) index.delete(filter={"source": "old-doc.pdf"}) index.delete(delete_all=True, namespace="staging")

Configuration

ParameterDefaultDescription
dimensionVector dimension (match your embedding model)
metric"cosine"Distance metric (cosine, euclidean, dotproduct)
cloud"aws"Cloud provider
region"us-east-1"Deployment region
top_k10Results to return
namespace""Partition for multi-tenancy

Best Practices

  1. Use namespaces for multi-tenancy — isolate customer data without separate indexes
  2. Include text in metadata — avoids a second lookup to retrieve the original content
  3. Batch upserts in groups of 100-200 for optimal throughput
  4. Use hybrid search for production — combines semantic understanding with keyword precision
  5. Set appropriate metadata — source, page, section, and date enable powerful filtering
  6. Monitor index size — costs scale with vector count and dimension

Common Issues

Slow upsert performance: Batch vectors into groups of 100-200. Use async/parallel upserts for large ingestion jobs. Avoid upserting one vector at a time.

Query returns irrelevant results: Add metadata filters to narrow scope. Use hybrid search if pure semantic misses keyword-dependent queries. Check that your embedding model handles your domain vocabulary.

High costs at scale: Reduce vector dimensions (text-embedding-3-small supports dimension reduction). Delete stale vectors. Use namespaces instead of separate indexes.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates