Agents Llamaindex Kit
Comprehensive skill designed for data, framework, building, applications. Includes structured workflows, validation checks, and reusable patterns for ai research.
LlamaIndex - Data Framework for LLM Applications
Overview
LlamaIndex is the leading framework for connecting large language models with your data. While other frameworks focus on general agent orchestration or chain composition, LlamaIndex is purpose-built for one thing: making your data queryable by LLMs. It provides a complete pipeline from data ingestion (300+ connectors on LlamaHub) through indexing, retrieval, and response synthesis, with first-class support for RAG (Retrieval-Augmented Generation) patterns.
LlamaIndex matters because RAG is the most practical way to make LLMs useful over private, domain-specific data without fine-tuning. The framework handles the hard parts: intelligent document chunking, embedding management, vector store abstraction, retrieval strategies (similarity, keyword, hybrid), response synthesis modes (stuff, tree summarize, refine), and evaluation metrics to ensure your system actually works. You can go from a folder of documents to a working Q&A system in 5 lines of code, then progressively customize every layer as your requirements grow.
The framework is organized as a modular package ecosystem: llama-index-core provides the base abstractions, and specific integrations (LLMs, embeddings, vector stores, data loaders) are installed as separate packages. This keeps your dependency tree lean.
When to Use
- Building RAG applications that answer questions over private documents
- Need document Q&A over PDFs, web pages, databases, APIs, or code repositories
- Ingesting data from many heterogeneous sources (300+ connectors via LlamaHub)
- Creating knowledge bases that ground LLM responses in factual data
- Building chatbots that reference enterprise documentation
- Need structured data extraction from unstructured documents
- Evaluating RAG quality with built-in relevancy and faithfulness metrics
- Building multi-modal RAG (images + text + tables)
- Want the simplest path from "I have documents" to "I can query them"
Quick Start
Installation
# Full starter package (includes OpenAI integration) pip install llama-index # Or minimal install with specific providers pip install llama-index-core pip install llama-index-llms-anthropic # For Claude pip install llama-index-llms-openai # For GPT pip install llama-index-embeddings-openai # Embeddings pip install llama-index-vector-stores-chroma # Vector store # Set API keys export OPENAI_API_KEY="sk-..." # Or for Anthropic: export ANTHROPIC_API_KEY="sk-ant-..."
5-Line RAG
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader # 1. Load all documents from a directory documents = SimpleDirectoryReader("./data").load_data() # 2. Build the index (chunks, embeds, and stores) index = VectorStoreIndex.from_documents(documents) # 3. Query response = index.as_query_engine().query("What is the main topic of these documents?") print(response)
Production-Ready RAG (with persistence)
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage import os PERSIST_DIR = "./storage" if os.path.exists(PERSIST_DIR): # Load existing index from disk storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR) index = load_index_from_storage(storage_context) print("Loaded existing index.") else: # Build new index and persist documents = SimpleDirectoryReader("./data").load_data() index = VectorStoreIndex.from_documents(documents) index.storage_context.persist(persist_dir=PERSIST_DIR) print(f"Built index from {len(documents)} documents and saved to {PERSIST_DIR}.") # Query with configuration query_engine = index.as_query_engine( similarity_top_k=5, response_mode="compact", streaming=True ) response = query_engine.query("Summarize the key findings") for text in response.response_gen: print(text, end="", flush=True)
Core Concepts
1. Data Connectors (Loaders)
LlamaIndex loads data from virtually any source into a normalized Document format.
from llama_index.core import SimpleDirectoryReader, Document # Local files (PDF, DOCX, TXT, MD, CSV, images, etc.) documents = SimpleDirectoryReader( "./data", recursive=True, # Traverse subdirectories required_exts=[".pdf", ".md", ".txt"], # Filter by extension filename_as_id=True # Use filename as doc ID ).load_data() # Web pages from llama_index.readers.web import SimpleWebPageReader documents = SimpleWebPageReader(html_to_text=True).load_data([ "https://docs.python.org/3/tutorial/classes.html", "https://docs.python.org/3/tutorial/errors.html" ]) # GitHub repository from llama_index.readers.github import GithubRepositoryReader documents = GithubRepositoryReader( owner="run-llama", repo="llama_index", filter_file_extensions=[".py", ".md"], verbose=True ).load_data(branch="main") # Database from llama_index.readers.database import DatabaseReader reader = DatabaseReader(sql_database_uri="postgresql://user:pass@localhost/db") documents = reader.load_data(query="SELECT title, content FROM articles WHERE published = true") # Manual document creation doc = Document( text="This is custom content.", metadata={"source": "manual", "category": "tutorial", "date": "2025-06-15"} )
2. Indices -- Data Structures for Retrieval
Indices organize your documents for efficient querying. Each index type optimizes for different access patterns.
from llama_index.core import VectorStoreIndex, SummaryIndex, TreeIndex, KeywordTableIndex # VectorStoreIndex (most common -- semantic similarity search) vector_index = VectorStoreIndex.from_documents(documents) # SummaryIndex (formerly ListIndex -- scans all nodes sequentially) # Good for summarization tasks over entire corpus summary_index = SummaryIndex.from_documents(documents) # TreeIndex (hierarchical summarization) # Good for multi-level summarization tree_index = TreeIndex.from_documents(documents) # KeywordTableIndex (keyword-based retrieval) # Good for precise keyword matching keyword_index = KeywordTableIndex.from_documents(documents) # Persist any index vector_index.storage_context.persist(persist_dir="./vector_storage") summary_index.storage_context.persist(persist_dir="./summary_storage") # Load from disk from llama_index.core import load_index_from_storage, StorageContext storage = StorageContext.from_defaults(persist_dir="./vector_storage") loaded_index = load_index_from_storage(storage)
3. Query Engines -- Ask Questions
Query engines combine retrieval and response synthesis into a single queryable interface.
# Basic query engine query_engine = index.as_query_engine() response = query_engine.query("What are the main conclusions?") # Configurable query engine query_engine = index.as_query_engine( similarity_top_k=5, # Retrieve top 5 chunks response_mode="tree_summarize", # Synthesis strategy verbose=True # Show retrieval details ) # Response modes: # "compact" - Stuff as many chunks as fit into one LLM call (default) # "tree_summarize" - Hierarchically summarize chunks # "refine" - Iteratively refine answer with each chunk # "simple_summarize" - Simple concatenation and summarize # "no_text" - Return retrieved nodes without LLM synthesis # "accumulate" - Get separate answer per chunk # Streaming query_engine = index.as_query_engine(streaming=True) response = query_engine.query("Explain the architecture") for token in response.response_gen: print(token, end="", flush=True) # Access source nodes (for citations) response = query_engine.query("What is the system design?") print(response) for node in response.source_nodes: print(f" Score: {node.score:.3f}") print(f" Source: {node.metadata.get('file_name', 'unknown')}") print(f" Text: {node.text[:100]}...")
4. Retrievers -- Fine-Grained Control
When you need more control over what gets retrieved:
# Vector retriever (default) retriever = index.as_retriever(similarity_top_k=10) nodes = retriever.retrieve("machine learning algorithms") for node in nodes: print(f"Score: {node.score:.3f} | {node.text[:80]}...") # Metadata filtering from llama_index.core.vector_stores import MetadataFilters, MetadataFilter filters = MetadataFilters(filters=[ MetadataFilter(key="category", value="tutorial"), MetadataFilter(key="difficulty", value="beginner") ]) retriever = index.as_retriever( similarity_top_k=5, filters=filters ) # Custom retriever from llama_index.core.retrievers import BaseRetriever from llama_index.core.schema import NodeWithScore class HybridRetriever(BaseRetriever): """Combines vector search with keyword matching.""" def __init__(self, vector_retriever, keyword_retriever): self.vector_retriever = vector_retriever self.keyword_retriever = keyword_retriever super().__init__() def _retrieve(self, query_bundle): vector_nodes = self.vector_retriever.retrieve(query_bundle) keyword_nodes = self.keyword_retriever.retrieve(query_bundle) # Merge and deduplicate seen = set() merged = [] for node in vector_nodes + keyword_nodes: if node.node.node_id not in seen: seen.add(node.node.node_id) merged.append(node) return sorted(merged, key=lambda x: x.score or 0, reverse=True)[:5]
Agents with Tools
LlamaIndex agents combine RAG with tool calling for complex reasoning tasks.
Basic Agent
from llama_index.core.agent import FunctionAgent from llama_index.llms.openai import OpenAI from llama_index.core.tools import FunctionTool def search_codebase(query: str) -> str: """Search the codebase for functions matching the query.""" # In production: actual code search return f"Found 3 functions matching '{query}': parse_config(), validate_config(), load_config()" def run_tests(test_path: str) -> str: """Run tests at the given path and return results.""" return f"Running tests at {test_path}: 12 passed, 0 failed" def create_pull_request(title: str, description: str) -> str: """Create a GitHub pull request.""" return f"Created PR: '{title}' - {description}" # Wrap plain functions as tools tools = [ FunctionTool.from_defaults(fn=search_codebase), FunctionTool.from_defaults(fn=run_tests), FunctionTool.from_defaults(fn=create_pull_request), ] # Create agent llm = OpenAI(model="gpt-4o") agent = FunctionAgent.from_tools(tools, llm=llm, verbose=True) response = agent.chat( "Find all config-related functions, run their tests, " "and create a PR summarizing the test results." ) print(response)
RAG Agent (documents + tools)
from llama_index.core.tools import QueryEngineTool # Create indices for different document sets api_docs_index = VectorStoreIndex.from_documents(api_docs) architecture_index = VectorStoreIndex.from_documents(arch_docs) # Wrap each index as a tool api_tool = QueryEngineTool.from_defaults( query_engine=api_docs_index.as_query_engine(), name="api_documentation", description="Search API documentation for endpoint details, request/response formats, and authentication." ) arch_tool = QueryEngineTool.from_defaults( query_engine=architecture_index.as_query_engine(), name="architecture_docs", description="Search architecture documentation for system design, data flow, and component relationships." ) # Agent can search both document sets + use custom tools agent = FunctionAgent.from_tools( [api_tool, arch_tool, search_codebase, run_tests], llm=llm, verbose=True, system_prompt=( "You are a senior developer assistant. Use the documentation tools " "to find information, and the codebase tools to verify implementation details." ) ) response = agent.chat("How does the authentication flow work? Check the API docs and architecture docs.")
Advanced RAG Patterns
Chat Engine (multi-turn conversation)
# Condense + Context mode: condenses follow-up questions with chat history, # then retrieves fresh context for each turn chat_engine = index.as_chat_engine( chat_mode="condense_plus_context", verbose=True ) r1 = chat_engine.chat("What is the system architecture?") print(r1) r2 = chat_engine.chat("How does the caching layer work?") # Builds on r1 print(r2) r3 = chat_engine.chat("What are its failure modes?") # Refers to caching print(r3) # Reset conversation chat_engine.reset()
Structured Output
from pydantic import BaseModel, Field from typing import List from llama_index.core.output_parsers import PydanticOutputParser class DocumentSummary(BaseModel): title: str = Field(description="Document title") key_topics: List[str] = Field(description="Main topics covered") sentiment: str = Field(description="Overall sentiment: positive, negative, or neutral") actionable_items: List[str] = Field(description="Action items extracted from the document") output_parser = PydanticOutputParser(output_cls=DocumentSummary) query_engine = index.as_query_engine(output_parser=output_parser) response = query_engine.query("Summarize the quarterly review document") # response is a DocumentSummary instance print(response.title) print(response.key_topics) print(response.actionable_items)
Multi-Modal RAG (images + text)
from llama_index.core import SimpleDirectoryReader from llama_index.multi_modal_llms.openai import OpenAIMultiModal # Load documents including images documents = SimpleDirectoryReader( "./data", required_exts=[".pdf", ".png", ".jpg", ".md"] ).load_data() # Build multi-modal index index = VectorStoreIndex.from_documents(documents) # Use multi-modal LLM for queries about visual content mm_llm = OpenAIMultiModal(model="gpt-4o") query_engine = index.as_query_engine(llm=mm_llm) response = query_engine.query("Describe the architecture diagram on page 5") print(response)
Vector Store Integrations
# Chroma (local, great for development) from llama_index.vector_stores.chroma import ChromaVectorStore import chromadb db = chromadb.PersistentClient(path="./chroma_db") collection = db.get_or_create_collection("my_docs") vector_store = ChromaVectorStore(chroma_collection=collection) # Pinecone (cloud, production scale) from llama_index.vector_stores.pinecone import PineconeVectorStore from pinecone import Pinecone pc = Pinecone(api_key="your-key") pinecone_index = pc.Index("my-index") vector_store = PineconeVectorStore(pinecone_index=pinecone_index) # FAISS (fast local similarity search) from llama_index.vector_stores.faiss import FaissVectorStore import faiss faiss_index = faiss.IndexFlatL2(1536) # Dimension of your embeddings vector_store = FaissVectorStore(faiss_index=faiss_index) # Qdrant (self-hosted, production features) from llama_index.vector_stores.qdrant import QdrantVectorStore from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") vector_store = QdrantVectorStore(client=client, collection_name="my_docs") # Use any vector store in an index from llama_index.core import StorageContext storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
Customization
Swap LLM Provider
from llama_index.core import Settings # Use Anthropic globally from llama_index.llms.anthropic import Anthropic Settings.llm = Anthropic(model="claude-sonnet-4-5-20250929") # Use local model via Ollama from llama_index.llms.ollama import Ollama Settings.llm = Ollama(model="llama3.1", request_timeout=120.0) # Per-query override (does not change global) query_engine = index.as_query_engine(llm=Anthropic(model="claude-sonnet-4-5-20250929"))
Custom Embeddings
from llama_index.core import Settings # OpenAI embeddings (default) from llama_index.embeddings.openai import OpenAIEmbedding Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small") # HuggingFace (free, local) from llama_index.embeddings.huggingface import HuggingFaceEmbedding Settings.embed_model = HuggingFaceEmbedding( model_name="sentence-transformers/all-mpnet-base-v2" ) # Cohere from llama_index.embeddings.cohere import CohereEmbedding Settings.embed_model = CohereEmbedding(model_name="embed-english-v3.0")
Custom Prompt Templates
from llama_index.core import PromptTemplate # Override the QA prompt qa_prompt = PromptTemplate( "You are a technical documentation expert.\n" "Context from the documentation:\n" "-----\n" "{context_str}\n" "-----\n" "Question: {query_str}\n\n" "Rules:\n" "1. Only answer based on the provided context.\n" "2. If the answer is not in the context, say 'Not found in documentation.'\n" "3. Include the relevant section name in your answer.\n" "4. Use code examples from the context when available.\n\n" "Answer: " ) query_engine = index.as_query_engine(text_qa_template=qa_prompt)
Custom Node Parsing (chunking)
from llama_index.core.node_parser import ( SentenceSplitter, SemanticSplitterNodeParser, MarkdownNodeParser, CodeSplitter, ) # Sentence-based splitting (recommended default) parser = SentenceSplitter(chunk_size=1024, chunk_overlap=200) # Semantic splitting (splits by meaning boundaries) from llama_index.embeddings.openai import OpenAIEmbedding parser = SemanticSplitterNodeParser( embed_model=OpenAIEmbedding(), buffer_size=1, breakpoint_percentile_threshold=95 ) # Markdown-aware splitting parser = MarkdownNodeParser() # Code-aware splitting parser = CodeSplitter(language="python", chunk_lines=40, chunk_lines_overlap=10) # Use in Settings from llama_index.core import Settings Settings.node_parser = parser
Evaluation
LlamaIndex provides built-in evaluation to measure RAG quality:
from llama_index.core.evaluation import ( RelevancyEvaluator, FaithfulnessEvaluator, BatchEvalRunner, ) # Relevancy: Does the response actually answer the question? relevancy_evaluator = RelevancyEvaluator() # Faithfulness: Is the response supported by the retrieved context? (no hallucination) faithfulness_evaluator = FaithfulnessEvaluator() # Evaluate a single response query = "What is the authentication flow?" response = query_engine.query(query) relevancy_result = relevancy_evaluator.evaluate_response(query=query, response=response) faithfulness_result = faithfulness_evaluator.evaluate_response(query=query, response=response) print(f"Relevant: {relevancy_result.passing} (score: {relevancy_result.score})") print(f"Faithful: {faithfulness_result.passing} (score: {faithfulness_result.score})") # Batch evaluation eval_questions = [ "How does authentication work?", "What is the database schema?", "How are errors handled?", ] runner = BatchEvalRunner( {"relevancy": relevancy_evaluator, "faithfulness": faithfulness_evaluator}, workers=4 ) eval_results = await runner.aevaluate_queries( query_engine, queries=eval_questions ) for query, results in zip(eval_questions, eval_results): print(f"Q: {query}") print(f" Relevancy: {results['relevancy'].passing}") print(f" Faithfulness: {results['faithfulness'].passing}")
Configuration Reference
Settings (Global Defaults)
| Setting | Type | Default | Description |
|---|---|---|---|
Settings.llm | BaseLLM | OpenAI("gpt-3.5-turbo") | Default LLM for all operations |
Settings.embed_model | BaseEmbedding | OpenAIEmbedding | Default embedding model |
Settings.node_parser | NodeParser | SentenceSplitter | Default chunking strategy |
Settings.chunk_size | int | 1024 | Default chunk size (tokens) |
Settings.chunk_overlap | int | 20 | Default chunk overlap (tokens) |
Settings.num_output | int | 256 | Max output tokens for LLM |
Settings.callback_manager | CallbackManager | None | For observability/tracing |
Query Engine Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
similarity_top_k | int | 2 | Number of chunks to retrieve |
response_mode | str | "compact" | Response synthesis strategy |
streaming | bool | False | Enable streaming responses |
verbose | bool | False | Show retrieval details |
text_qa_template | PromptTemplate | default | Override QA prompt |
refine_template | PromptTemplate | default | Override refine prompt |
Performance Benchmarks
| Operation | Typical Latency | Notes |
|---|---|---|
| Index 100 documents | 10-30s | One-time cost, persist to disk |
| Index 10,000 documents | 5-15min | Use batch embedding, persist |
| Vector query (top-5) | 200-500ms | Vector search only |
| Full RAG query | 1-3s | Retrieval + LLM synthesis |
| Streaming first token | 300-600ms | Much better perceived latency |
| Agent with 2 tool calls | 4-8s | Multi-step reasoning |
Best Practices
-
Persist your index. Always call
index.storage_context.persist()after building. Re-embedding documents on every startup wastes time and money. -
Use
VectorStoreIndexas your default. It handles 90% of RAG use cases. Only reach forTreeIndexorSummaryIndexwhen you have specific summarization needs. -
Tune
similarity_top_k. Start with 3-5 and adjust. Too few misses relevant context; too many dilutes with noise and increases LLM cost. -
Add metadata to documents. Metadata enables filtering, source attribution, and better retrieval. Always include at least
source,date, andcategory. -
Use streaming for all user-facing queries. The difference between 2s of silence and immediate partial output fundamentally changes user perception.
-
Choose the right response mode.
compactis the best default. Usetree_summarizefor long documents,refinefor highest quality (at higher cost), andno_textwhen you just need retrieved chunks. -
Evaluate your RAG system. Use
RelevancyEvaluatorandFaithfulnessEvaluatorto measure quality. A RAG system without evaluation is a guessing game. -
Use
chat_enginefor conversations, not repeatedquery_enginecalls. The chat engine automatically handles history condensation and context management. -
Match chunk size to your content. Technical documentation benefits from larger chunks (1000-1500 tokens) to preserve context. Short Q&A pairs work better with smaller chunks (256-512 tokens).
-
Use separate indices for separate concerns. Do not dump API docs, architecture docs, and meeting notes into one index. Create separate indices and wrap them as tools for an agent that can choose the right source.
Troubleshooting
Query returns "I don't have enough information":
# Increase the number of retrieved chunks query_engine = index.as_query_engine(similarity_top_k=10) # Check what's actually being retrieved retriever = index.as_retriever(similarity_top_k=10) nodes = retriever.retrieve("your query here") for node in nodes: print(f"Score: {node.score:.3f} | {node.text[:100]}") # If scores are low, your chunks may not match the query phrasing
Hallucinated answers (not grounded in context):
# Use a stricter prompt template from llama_index.core import PromptTemplate strict_prompt = PromptTemplate( "Context:\n{context_str}\n\n" "Question: {query_str}\n\n" "IMPORTANT: Only answer from the context above. " "If the answer is not clearly stated in the context, respond with " "'The provided documents do not contain this information.'\n" "Answer: " ) query_engine = index.as_query_engine(text_qa_template=strict_prompt) # Also: evaluate with FaithfulnessEvaluator
Slow indexing on large document sets:
# Use batch processing with progress bar from llama_index.core import VectorStoreIndex from llama_index.core.ingestion import IngestionPipeline from llama_index.core.node_parser import SentenceSplitter pipeline = IngestionPipeline( transformations=[ SentenceSplitter(chunk_size=1024, chunk_overlap=200), OpenAIEmbedding(model="text-embedding-3-small"), ] ) nodes = pipeline.run(documents=documents, show_progress=True) index = VectorStoreIndex(nodes)
Memory issues with large indices:
# Use an external vector store instead of in-memory default from llama_index.vector_stores.chroma import ChromaVectorStore import chromadb db = chromadb.PersistentClient(path="./chroma_db") collection = db.get_or_create_collection("my_docs") vector_store = ChromaVectorStore(chroma_collection=collection) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
Chat engine loses context after many turns:
# Use condense_plus_context mode (re-retrieves on each turn) chat_engine = index.as_chat_engine( chat_mode="condense_plus_context", verbose=True ) # This condenses the full chat history + new question into a standalone query, # then retrieves fresh context each time
LlamaIndex vs LangChain
| Dimension | LlamaIndex | LangChain |
|---|---|---|
| Primary focus | RAG and data retrieval | General LLM applications |
| RAG quality | Best-in-class (core focus) | Good (one of many features) |
| Data connectors | 300+ via LlamaHub | 100+ via community |
| Index types | Vector, Tree, Summary, Keyword, KG | Vector store wrappers |
| Response synthesis | 5+ modes (compact, refine, tree) | Basic (stuff, map_reduce) |
| Evaluation | Built-in (relevancy, faithfulness) | Via LangSmith |
| Agent support | FunctionAgent, ReActAgent | AgentExecutor, tool calling |
| Learning curve | Easy for RAG, moderate for agents | Moderate for everything |
| When to choose | RAG is your primary use case | Agents + tools are primary |
Use LlamaIndex when your application is fundamentally about querying data -- document Q&A, knowledge bases, enterprise search, research assistants.
Use LangChain when your application is fundamentally about agent reasoning, tool orchestration, or you need the broadest integration ecosystem.
Use both together when you need LlamaIndex's superior RAG as a tool within a LangChain agent:
# LlamaIndex index as a LangChain tool from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from langchain.tools import Tool # Build LlamaIndex index documents = SimpleDirectoryReader("./docs").load_data() index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine() # Wrap as LangChain tool doc_search_tool = Tool( name="DocumentSearch", func=lambda q: str(query_engine.query(q)), description="Search internal documentation for answers" ) # Use in a LangChain agent from langchain.agents import create_tool_calling_agent, AgentExecutor agent = create_tool_calling_agent(llm, [doc_search_tool, ...], prompt)
Resources
- GitHub: https://github.com/run-llama/llama_index (45,100+ stars)
- Documentation: https://docs.llamaindex.ai
- LlamaHub: https://llamahub.ai (300+ data connectors)
- LlamaCloud: https://cloud.llamaindex.ai (managed RAG)
- Discord: https://discord.gg/dGcwcsnxhU
- Version: 0.14.7+
- License: MIT
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.