R

Rag Engineer Engine

Streamline your workflow with this expert, building, retrieval, augmented. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

RAG Engineer Engine

Systems architecture toolkit for building production RAG pipelines — covering chunking strategy design, embedding selection, vector store architecture, retrieval optimization, and quality evaluation.

When to Use

Use this toolkit when:

  • Designing a new RAG system architecture from scratch
  • Optimizing an existing RAG pipeline for better retrieval quality
  • Selecting between vector databases, embedding models, and chunking strategies
  • Debugging poor answer quality in RAG-powered applications

Use simpler approaches when:

  • Documents fit in the LLM context window → direct prompting
  • Structured data only → SQL queries or API calls
  • Real-time web data → web search integration

Quick Start

Architecture Decision Template

## RAG Architecture Decision Record ### Document Characteristics - Type: {pdf, markdown, html, code} - Volume: {number_of_documents} - Update frequency: {static, daily, real-time} - Average length: {pages_or_tokens} ### Query Patterns - Type: {factual, analytical, comparative} - Complexity: {simple, multi-hop, aggregation} - Language: {single, multilingual} ### Recommended Stack - Chunking: {strategy} - Embeddings: {model} - Vector Store: {database} - Retrieval: {method} - Generation: {llm}

Chunking Strategy Selection

def select_chunking_strategy(doc_type, avg_length, query_type): """Select optimal chunking based on document and query characteristics.""" if doc_type == "code": return { "strategy": "ast_based", "unit": "function", "context": "include class signature and imports", "overlap": "none" } elif doc_type == "technical_docs" and avg_length > 5000: return { "strategy": "hierarchical", "parent_size": 2000, "child_size": 500, "overlap": 100, "preserve": "section headers" } elif query_type == "multi_hop": return { "strategy": "semantic", "size": 1500, "overlap": 300, "method": "sentence_transformers_segmentation" } else: return { "strategy": "recursive", "size": 1000, "overlap": 200, "separators": ["\n\n", "\n", ". ", " "] }

Core Concepts

Embedding Model Selection

ModelDimensionsSpeedQualityBest For
text-embedding-3-small1536FastGoodGeneral purpose, cost-sensitive
text-embedding-3-large3072MediumExcellentHigh-accuracy requirements
Cohere embed-v31024FastExcellentMultilingual
BGE-large-en-v1.51024MediumVery goodSelf-hosted, English
Voyage-31024MediumExcellentCode and technical docs

Vector Store Selection

DatabaseHostingFilteringScaleBest For
FAISSSelf-hostedNoneBillionsBatch processing, research
PineconeManagedRichBillionsProduction SaaS, zero-ops
QdrantBothRichBillionsOn-prem with filtering
WeaviateBothRichMillionsMulti-modal, GraphQL
pgvectorSelf-hostedSQLMillionsPostgres users, small-medium
ChromaDBSelf-hostedBasicThousandsPrototyping, development

Retrieval Optimization Pipeline

Query → Query Expansion → Hybrid Retrieval → Reranking → Context Assembly
  |          |                   |               |              |
  v          v                   v               v              v
Original  HyDE/Multi-query   BM25+Dense    Cohere Rerank   Deduplicate
query     expansion           fusion        or cross-encoder  + order

Configuration

ComponentParameterRecommended
Chunkingchunk_size500-1500 tokens
Chunkingoverlap10-20% of chunk_size
Embeddingsmodeltext-embedding-3-small
Retrievaltop_k5-10 initial, 3-5 after rerank
Retrievalhybrid_weight0.6 semantic / 0.4 keyword
RerankingmodelCohere rerank-v3
Generationmax_context4000-8000 tokens

Best Practices

  1. Profile your queries first — understand query patterns before choosing architecture
  2. Chunk with document awareness — respect section boundaries, headers, and logical units
  3. Always use hybrid retrieval in production — semantic alone misses keyword-dependent queries
  4. Add reranking — a reranker after initial retrieval improves precision significantly
  5. Evaluate retrieval independently — measure Recall@K and MRR before evaluating generation
  6. Version your embeddings — re-embedding is expensive, track which model version generated each vector

Common Issues

Low recall — relevant documents not retrieved: Increase top_k. Try query expansion (HyDE or multi-query). Check that your embedding model handles domain-specific vocabulary. Add keyword search as a fallback.

High recall but low precision — too many irrelevant results: Add a reranking step. Use metadata filtering to narrow scope before vector search. Reduce chunk size to increase granularity.

Embedding drift after model update: All vectors must be re-embedded when changing the embedding model. Never mix vectors from different models. Use a migration pipeline that re-embeds in batches.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates