Rag Engineer Engine
Streamline your workflow with this expert, building, retrieval, augmented. Includes structured workflows, validation checks, and reusable patterns for ai research.
RAG Engineer Engine
Systems architecture toolkit for building production RAG pipelines — covering chunking strategy design, embedding selection, vector store architecture, retrieval optimization, and quality evaluation.
When to Use
Use this toolkit when:
- Designing a new RAG system architecture from scratch
- Optimizing an existing RAG pipeline for better retrieval quality
- Selecting between vector databases, embedding models, and chunking strategies
- Debugging poor answer quality in RAG-powered applications
Use simpler approaches when:
- Documents fit in the LLM context window → direct prompting
- Structured data only → SQL queries or API calls
- Real-time web data → web search integration
Quick Start
Architecture Decision Template
## RAG Architecture Decision Record ### Document Characteristics - Type: {pdf, markdown, html, code} - Volume: {number_of_documents} - Update frequency: {static, daily, real-time} - Average length: {pages_or_tokens} ### Query Patterns - Type: {factual, analytical, comparative} - Complexity: {simple, multi-hop, aggregation} - Language: {single, multilingual} ### Recommended Stack - Chunking: {strategy} - Embeddings: {model} - Vector Store: {database} - Retrieval: {method} - Generation: {llm}
Chunking Strategy Selection
def select_chunking_strategy(doc_type, avg_length, query_type): """Select optimal chunking based on document and query characteristics.""" if doc_type == "code": return { "strategy": "ast_based", "unit": "function", "context": "include class signature and imports", "overlap": "none" } elif doc_type == "technical_docs" and avg_length > 5000: return { "strategy": "hierarchical", "parent_size": 2000, "child_size": 500, "overlap": 100, "preserve": "section headers" } elif query_type == "multi_hop": return { "strategy": "semantic", "size": 1500, "overlap": 300, "method": "sentence_transformers_segmentation" } else: return { "strategy": "recursive", "size": 1000, "overlap": 200, "separators": ["\n\n", "\n", ". ", " "] }
Core Concepts
Embedding Model Selection
| Model | Dimensions | Speed | Quality | Best For |
|---|---|---|---|---|
| text-embedding-3-small | 1536 | Fast | Good | General purpose, cost-sensitive |
| text-embedding-3-large | 3072 | Medium | Excellent | High-accuracy requirements |
| Cohere embed-v3 | 1024 | Fast | Excellent | Multilingual |
| BGE-large-en-v1.5 | 1024 | Medium | Very good | Self-hosted, English |
| Voyage-3 | 1024 | Medium | Excellent | Code and technical docs |
Vector Store Selection
| Database | Hosting | Filtering | Scale | Best For |
|---|---|---|---|---|
| FAISS | Self-hosted | None | Billions | Batch processing, research |
| Pinecone | Managed | Rich | Billions | Production SaaS, zero-ops |
| Qdrant | Both | Rich | Billions | On-prem with filtering |
| Weaviate | Both | Rich | Millions | Multi-modal, GraphQL |
| pgvector | Self-hosted | SQL | Millions | Postgres users, small-medium |
| ChromaDB | Self-hosted | Basic | Thousands | Prototyping, development |
Retrieval Optimization Pipeline
Query → Query Expansion → Hybrid Retrieval → Reranking → Context Assembly
| | | | |
v v v v v
Original HyDE/Multi-query BM25+Dense Cohere Rerank Deduplicate
query expansion fusion or cross-encoder + order
Configuration
| Component | Parameter | Recommended |
|---|---|---|
| Chunking | chunk_size | 500-1500 tokens |
| Chunking | overlap | 10-20% of chunk_size |
| Embeddings | model | text-embedding-3-small |
| Retrieval | top_k | 5-10 initial, 3-5 after rerank |
| Retrieval | hybrid_weight | 0.6 semantic / 0.4 keyword |
| Reranking | model | Cohere rerank-v3 |
| Generation | max_context | 4000-8000 tokens |
Best Practices
- Profile your queries first — understand query patterns before choosing architecture
- Chunk with document awareness — respect section boundaries, headers, and logical units
- Always use hybrid retrieval in production — semantic alone misses keyword-dependent queries
- Add reranking — a reranker after initial retrieval improves precision significantly
- Evaluate retrieval independently — measure Recall@K and MRR before evaluating generation
- Version your embeddings — re-embedding is expensive, track which model version generated each vector
Common Issues
Low recall — relevant documents not retrieved: Increase top_k. Try query expansion (HyDE or multi-query). Check that your embedding model handles domain-specific vocabulary. Add keyword search as a fallback.
High recall but low precision — too many irrelevant results: Add a reranking step. Use metadata filtering to narrow scope before vector search. Reduce chunk size to increase granularity.
Embedding drift after model update: All vectors must be re-embedded when changing the embedding model. Never mix vectors from different models. Use a migration pipeline that re-embeds in batches.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.