U

Ultimate Rag Framework

All-in-one skill covering open, source, embedding, database. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Ultimate RAG Framework

End-to-end Retrieval-Augmented Generation framework covering document ingestion, chunking strategies, embedding generation, vector storage, retrieval optimization, and LLM-powered answer synthesis — with production-grade evaluation and monitoring.

When to Use

Deploy this framework when:

  • Building production RAG systems that need reliable, grounded answers
  • Need to combine multiple retrieval strategies (semantic + keyword + metadata)
  • Require evaluation pipelines to measure retrieval and generation quality
  • Managing large document collections (10K+ documents, millions of chunks)

Use simpler approaches when:

  • Small document set (< 100 pages) → load directly into context window
  • Simple Q&A over structured data → use SQL or API queries
  • Real-time data only → use web search tools instead

Quick Start

Basic RAG Pipeline

from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import FAISS from langchain_anthropic import ChatAnthropic from langchain.chains import RetrievalQA # 1. Load and chunk documents text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", ". ", " ", ""] ) chunks = text_splitter.split_documents(documents) # 2. Create embeddings and vector store embeddings = OpenAIEmbeddings(model="text-embedding-3-small") vectorstore = FAISS.from_documents(chunks, embeddings) # 3. Build retrieval chain llm = ChatAnthropic(model="claude-sonnet-4-20250514") qa_chain = RetrievalQA.from_chain_type( llm=llm, retriever=vectorstore.as_retriever(search_kwargs={"k": 5}), return_source_documents=True ) # 4. Query result = qa_chain.invoke({"query": "What is the refund policy?"}) print(result["result"]) print("Sources:", [doc.metadata["source"] for doc in result["source_documents"]])
from langchain.retrievers import EnsembleRetriever from langchain_community.retrievers import BM25Retriever # Semantic retriever (embedding-based) semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": 5}) # Keyword retriever (BM25) bm25_retriever = BM25Retriever.from_documents(chunks, k=5) # Hybrid: combine semantic + keyword with weights hybrid_retriever = EnsembleRetriever( retrievers=[semantic_retriever, bm25_retriever], weights=[0.6, 0.4] # 60% semantic, 40% keyword )

Core Concepts

RAG Pipeline Architecture

Documents → Chunking → Embedding → Vector Store → Retrieval → Generation
    |           |           |           |              |           |
    v           v           v           v              v           v
  PDF/HTML  Recursive   text-emb-3  FAISS/Pinecone  Hybrid    Claude/GPT
  Markdown  Semantic    ada-002     Qdrant/Weaviate  Reranking Answer+Sources
  Code      Agentic     cohere-v3   pgvector         Filtering

Chunking Strategies

StrategyBest ForChunk SizeOverlap
Fixed-sizeGeneral purpose500-1000 tokens100-200
RecursiveStructured documents500-1500 tokens100-200
SemanticTechnical docsVariableSentence boundary
Document-awarePDFs with sectionsSection-levelHeader context
Code-awareSource codeFunction/class levelSignature context

Retrieval Methods

MethodStrengthWeaknessWhen to Use
Dense (embedding)Semantic understandingMisses exact termsConceptual queries
Sparse (BM25)Exact keyword matchNo semantic understandingTechnical queries
HybridBest of bothMore complexProduction systems
RerankingPrecision improvementAdded latencyQuality-critical

Configuration

ParameterDefaultDescription
chunk_size1000Characters per chunk
chunk_overlap200Overlap between chunks
embedding_model"text-embedding-3-small"Embedding model
top_k5Documents to retrieve
similarity_threshold0.7Minimum relevance score
hybrid_weights[0.6, 0.4]Semantic vs keyword weight
rerankerNoneOptional reranking model

Evaluation

Retrieval Metrics

MetricMeasuresTarget
Recall@KRelevant docs in top K results> 0.85
Precision@KRelevant ratio in top K> 0.70
MRRRank of first relevant result> 0.80
NDCGGraded relevance ranking> 0.75

Generation Metrics

MetricMeasuresTarget
FaithfulnessAnswer grounded in context> 0.90
RelevanceAnswer addresses the query> 0.85
CompletenessAll relevant info included> 0.80

Best Practices

  1. Chunk with overlap — 10-20% overlap prevents losing information at boundaries
  2. Use hybrid retrieval in production — semantic + keyword covers more query types
  3. Add metadata to chunks — source, page number, section title improve filtering and attribution
  4. Evaluate retrieval separately from generation — bad retrieval causes bad answers regardless of LLM quality
  5. Rerank for precision — a reranker after initial retrieval significantly improves answer quality
  6. Include source citations in every answer — users need to verify RAG outputs

Common Issues

Answers not grounded in documents: Retriever is returning irrelevant chunks. Lower the similarity threshold, increase top_k, or add a reranker. Check that your embedding model handles your domain vocabulary.

Missing information in answers: Chunk size may be too small, splitting relevant content across chunks. Increase chunk_size or use semantic chunking that respects document structure.

Slow retrieval at scale: Switch from exact search to approximate nearest neighbors (HNSW index). Use metadata pre-filtering before vector search. Consider a managed vector database for production.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates