Ultimate Rag Framework
All-in-one skill covering open, source, embedding, database. Includes structured workflows, validation checks, and reusable patterns for ai research.
Ultimate RAG Framework
End-to-end Retrieval-Augmented Generation framework covering document ingestion, chunking strategies, embedding generation, vector storage, retrieval optimization, and LLM-powered answer synthesis — with production-grade evaluation and monitoring.
When to Use
Deploy this framework when:
- Building production RAG systems that need reliable, grounded answers
- Need to combine multiple retrieval strategies (semantic + keyword + metadata)
- Require evaluation pipelines to measure retrieval and generation quality
- Managing large document collections (10K+ documents, millions of chunks)
Use simpler approaches when:
- Small document set (< 100 pages) → load directly into context window
- Simple Q&A over structured data → use SQL or API queries
- Real-time data only → use web search tools instead
Quick Start
Basic RAG Pipeline
from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import FAISS from langchain_anthropic import ChatAnthropic from langchain.chains import RetrievalQA # 1. Load and chunk documents text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", ". ", " ", ""] ) chunks = text_splitter.split_documents(documents) # 2. Create embeddings and vector store embeddings = OpenAIEmbeddings(model="text-embedding-3-small") vectorstore = FAISS.from_documents(chunks, embeddings) # 3. Build retrieval chain llm = ChatAnthropic(model="claude-sonnet-4-20250514") qa_chain = RetrievalQA.from_chain_type( llm=llm, retriever=vectorstore.as_retriever(search_kwargs={"k": 5}), return_source_documents=True ) # 4. Query result = qa_chain.invoke({"query": "What is the refund policy?"}) print(result["result"]) print("Sources:", [doc.metadata["source"] for doc in result["source_documents"]])
Advanced Pipeline with Hybrid Search
from langchain.retrievers import EnsembleRetriever from langchain_community.retrievers import BM25Retriever # Semantic retriever (embedding-based) semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": 5}) # Keyword retriever (BM25) bm25_retriever = BM25Retriever.from_documents(chunks, k=5) # Hybrid: combine semantic + keyword with weights hybrid_retriever = EnsembleRetriever( retrievers=[semantic_retriever, bm25_retriever], weights=[0.6, 0.4] # 60% semantic, 40% keyword )
Core Concepts
RAG Pipeline Architecture
Documents → Chunking → Embedding → Vector Store → Retrieval → Generation
| | | | | |
v v v v v v
PDF/HTML Recursive text-emb-3 FAISS/Pinecone Hybrid Claude/GPT
Markdown Semantic ada-002 Qdrant/Weaviate Reranking Answer+Sources
Code Agentic cohere-v3 pgvector Filtering
Chunking Strategies
| Strategy | Best For | Chunk Size | Overlap |
|---|---|---|---|
| Fixed-size | General purpose | 500-1000 tokens | 100-200 |
| Recursive | Structured documents | 500-1500 tokens | 100-200 |
| Semantic | Technical docs | Variable | Sentence boundary |
| Document-aware | PDFs with sections | Section-level | Header context |
| Code-aware | Source code | Function/class level | Signature context |
Retrieval Methods
| Method | Strength | Weakness | When to Use |
|---|---|---|---|
| Dense (embedding) | Semantic understanding | Misses exact terms | Conceptual queries |
| Sparse (BM25) | Exact keyword match | No semantic understanding | Technical queries |
| Hybrid | Best of both | More complex | Production systems |
| Reranking | Precision improvement | Added latency | Quality-critical |
Configuration
| Parameter | Default | Description |
|---|---|---|
chunk_size | 1000 | Characters per chunk |
chunk_overlap | 200 | Overlap between chunks |
embedding_model | "text-embedding-3-small" | Embedding model |
top_k | 5 | Documents to retrieve |
similarity_threshold | 0.7 | Minimum relevance score |
hybrid_weights | [0.6, 0.4] | Semantic vs keyword weight |
reranker | None | Optional reranking model |
Evaluation
Retrieval Metrics
| Metric | Measures | Target |
|---|---|---|
| Recall@K | Relevant docs in top K results | > 0.85 |
| Precision@K | Relevant ratio in top K | > 0.70 |
| MRR | Rank of first relevant result | > 0.80 |
| NDCG | Graded relevance ranking | > 0.75 |
Generation Metrics
| Metric | Measures | Target |
|---|---|---|
| Faithfulness | Answer grounded in context | > 0.90 |
| Relevance | Answer addresses the query | > 0.85 |
| Completeness | All relevant info included | > 0.80 |
Best Practices
- Chunk with overlap — 10-20% overlap prevents losing information at boundaries
- Use hybrid retrieval in production — semantic + keyword covers more query types
- Add metadata to chunks — source, page number, section title improve filtering and attribution
- Evaluate retrieval separately from generation — bad retrieval causes bad answers regardless of LLM quality
- Rerank for precision — a reranker after initial retrieval significantly improves answer quality
- Include source citations in every answer — users need to verify RAG outputs
Common Issues
Answers not grounded in documents: Retriever is returning irrelevant chunks. Lower the similarity threshold, increase top_k, or add a reranker. Check that your embedding model handles your domain vocabulary.
Missing information in answers: Chunk size may be too small, splitting relevant content across chunks. Increase chunk_size or use semantic chunking that respects document structure.
Slow retrieval at scale: Switch from exact search to approximate nearest neighbors (HNSW index). Use metadata pre-filtering before vector search. Consider a managed vector database for production.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.