LlamaIndex - Data Framework for LLM Applications

Overview

LlamaIndex is the leading framework for connecting large language models with your data. While other frameworks focus on general agent orchestration or chain composition, LlamaIndex is purpose-built for one thing: making your data queryable by LLMs. It provides a complete pipeline from data ingestion (300+ connectors on LlamaHub) through indexing, retrieval, and response synthesis, with first-class support for RAG (Retrieval-Augmented Generation) patterns.

LlamaIndex matters because RAG is the most practical way to make LLMs useful over private, domain-specific data without fine-tuning. The framework handles the hard parts: intelligent document chunking, embedding management, vector store abstraction, retrieval strategies (similarity, keyword, hybrid), response synthesis modes (stuff, tree summarize, refine), and evaluation metrics to ensure your system actually works. You can go from a folder of documents to a working Q&A system in 5 lines of code, then progressively customize every layer as your requirements grow.

The framework is organized as a modular package ecosystem: llama-index-core provides the base abstractions, and specific integrations (LLMs, embeddings, vector stores, data loaders) are installed as separate packages. This keeps your dependency tree lean.

When to Use

Building RAG applications that answer questions over private documents
Need document Q&A over PDFs, web pages, databases, APIs, or code repositories
Ingesting data from many heterogeneous sources (300+ connectors via LlamaHub)
Creating knowledge bases that ground LLM responses in factual data
Building chatbots that reference enterprise documentation
Need structured data extraction from unstructured documents
Evaluating RAG quality with built-in relevancy and faithfulness metrics
Building multi-modal RAG (images + text + tables)
Want the simplest path from "I have documents" to "I can query them"

Quick Start

Installation


# Full starter package (includes OpenAI integration)
pip install llama-index

# Or minimal install with specific providers
pip install llama-index-core
pip install llama-index-llms-anthropic      # For Claude
pip install llama-index-llms-openai         # For GPT
pip install llama-index-embeddings-openai   # Embeddings
pip install llama-index-vector-stores-chroma  # Vector store

# Set API keys
export OPENAI_API_KEY="sk-..."
# Or for Anthropic:
export ANTHROPIC_API_KEY="sk-ant-..."

5-Line RAG


from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# 1. Load all documents from a directory
documents = SimpleDirectoryReader("./data").load_data()

# 2. Build the index (chunks, embeds, and stores)
index = VectorStoreIndex.from_documents(documents)

# 3. Query
response = index.as_query_engine().query("What is the main topic of these documents?")
print(response)

Production-Ready RAG (with persistence)


from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage
import os

PERSIST_DIR = "./storage"

if os.path.exists(PERSIST_DIR):
    # Load existing index from disk
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)
    print("Loaded existing index.")
else:
    # Build new index and persist
    documents = SimpleDirectoryReader("./data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    index.storage_context.persist(persist_dir=PERSIST_DIR)
    print(f"Built index from {len(documents)} documents and saved to {PERSIST_DIR}.")

# Query with configuration
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="compact",
    streaming=True
)

response = query_engine.query("Summarize the key findings")
for text in response.response_gen:
    print(text, end="", flush=True)

Core Concepts

1. Data Connectors (Loaders)

LlamaIndex loads data from virtually any source into a normalized Document format.


from llama_index.core import SimpleDirectoryReader, Document

# Local files (PDF, DOCX, TXT, MD, CSV, images, etc.)
documents = SimpleDirectoryReader(
    "./data",
    recursive=True,                              # Traverse subdirectories
    required_exts=[".pdf", ".md", ".txt"],        # Filter by extension
    filename_as_id=True                           # Use filename as doc ID
).load_data()

# Web pages
from llama_index.readers.web import SimpleWebPageReader
documents = SimpleWebPageReader(html_to_text=True).load_data([
    "https://docs.python.org/3/tutorial/classes.html",
    "https://docs.python.org/3/tutorial/errors.html"
])

# GitHub repository
from llama_index.readers.github import GithubRepositoryReader
documents = GithubRepositoryReader(
    owner="run-llama",
    repo="llama_index",
    filter_file_extensions=[".py", ".md"],
    verbose=True
).load_data(branch="main")

# Database
from llama_index.readers.database import DatabaseReader
reader = DatabaseReader(sql_database_uri="postgresql://user:pass@localhost/db")
documents = reader.load_data(query="SELECT title, content FROM articles WHERE published = true")

# Manual document creation
doc = Document(
    text="This is custom content.",
    metadata={"source": "manual", "category": "tutorial", "date": "2025-06-15"}
)

2. Indices -- Data Structures for Retrieval

Indices organize your documents for efficient querying. Each index type optimizes for different access patterns.


from llama_index.core import VectorStoreIndex, SummaryIndex, TreeIndex, KeywordTableIndex

# VectorStoreIndex (most common -- semantic similarity search)
vector_index = VectorStoreIndex.from_documents(documents)

# SummaryIndex (formerly ListIndex -- scans all nodes sequentially)
# Good for summarization tasks over entire corpus
summary_index = SummaryIndex.from_documents(documents)

# TreeIndex (hierarchical summarization)
# Good for multi-level summarization
tree_index = TreeIndex.from_documents(documents)

# KeywordTableIndex (keyword-based retrieval)
# Good for precise keyword matching
keyword_index = KeywordTableIndex.from_documents(documents)

# Persist any index
vector_index.storage_context.persist(persist_dir="./vector_storage")
summary_index.storage_context.persist(persist_dir="./summary_storage")

# Load from disk
from llama_index.core import load_index_from_storage, StorageContext
storage = StorageContext.from_defaults(persist_dir="./vector_storage")
loaded_index = load_index_from_storage(storage)

3. Query Engines -- Ask Questions

Query engines combine retrieval and response synthesis into a single queryable interface.


# Basic query engine
query_engine = index.as_query_engine()
response = query_engine.query("What are the main conclusions?")

# Configurable query engine
query_engine = index.as_query_engine(
    similarity_top_k=5,              # Retrieve top 5 chunks
    response_mode="tree_summarize",  # Synthesis strategy
    verbose=True                     # Show retrieval details
)

# Response modes:
# "compact"          - Stuff as many chunks as fit into one LLM call (default)
# "tree_summarize"   - Hierarchically summarize chunks
# "refine"           - Iteratively refine answer with each chunk
# "simple_summarize" - Simple concatenation and summarize
# "no_text"          - Return retrieved nodes without LLM synthesis
# "accumulate"       - Get separate answer per chunk

# Streaming
query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("Explain the architecture")
for token in response.response_gen:
    print(token, end="", flush=True)

# Access source nodes (for citations)
response = query_engine.query("What is the system design?")
print(response)
for node in response.source_nodes:
    print(f"  Score: {node.score:.3f}")
    print(f"  Source: {node.metadata.get('file_name', 'unknown')}")
    print(f"  Text: {node.text[:100]}...")

4. Retrievers -- Fine-Grained Control

When you need more control over what gets retrieved:


# Vector retriever (default)
retriever = index.as_retriever(similarity_top_k=10)
nodes = retriever.retrieve("machine learning algorithms")
for node in nodes:
    print(f"Score: {node.score:.3f} | {node.text[:80]}...")

# Metadata filtering
from llama_index.core.vector_stores import MetadataFilters, MetadataFilter

filters = MetadataFilters(filters=[
    MetadataFilter(key="category", value="tutorial"),
    MetadataFilter(key="difficulty", value="beginner")
])

retriever = index.as_retriever(
    similarity_top_k=5,
    filters=filters
)

# Custom retriever
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore

class HybridRetriever(BaseRetriever):
    """Combines vector search with keyword matching."""

    def __init__(self, vector_retriever, keyword_retriever):
        self.vector_retriever = vector_retriever
        self.keyword_retriever = keyword_retriever
        super().__init__()

    def _retrieve(self, query_bundle):
        vector_nodes = self.vector_retriever.retrieve(query_bundle)
        keyword_nodes = self.keyword_retriever.retrieve(query_bundle)

        # Merge and deduplicate
        seen = set()
        merged = []
        for node in vector_nodes + keyword_nodes:
            if node.node.node_id not in seen:
                seen.add(node.node.node_id)
                merged.append(node)

        return sorted(merged, key=lambda x: x.score or 0, reverse=True)[:5]

Agents with Tools

LlamaIndex agents combine RAG with tool calling for complex reasoning tasks.

Basic Agent


from llama_index.core.agent import FunctionAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import FunctionTool

def search_codebase(query: str) -> str:
    """Search the codebase for functions matching the query."""
    # In production: actual code search
    return f"Found 3 functions matching '{query}': parse_config(), validate_config(), load_config()"

def run_tests(test_path: str) -> str:
    """Run tests at the given path and return results."""
    return f"Running tests at {test_path}: 12 passed, 0 failed"

def create_pull_request(title: str, description: str) -> str:
    """Create a GitHub pull request."""
    return f"Created PR: '{title}' - {description}"

# Wrap plain functions as tools
tools = [
    FunctionTool.from_defaults(fn=search_codebase),
    FunctionTool.from_defaults(fn=run_tests),
    FunctionTool.from_defaults(fn=create_pull_request),
]

# Create agent
llm = OpenAI(model="gpt-4o")
agent = FunctionAgent.from_tools(tools, llm=llm, verbose=True)

response = agent.chat(
    "Find all config-related functions, run their tests, "
    "and create a PR summarizing the test results."
)
print(response)

RAG Agent (documents + tools)


from llama_index.core.tools import QueryEngineTool

# Create indices for different document sets
api_docs_index = VectorStoreIndex.from_documents(api_docs)
architecture_index = VectorStoreIndex.from_documents(arch_docs)

# Wrap each index as a tool
api_tool = QueryEngineTool.from_defaults(
    query_engine=api_docs_index.as_query_engine(),
    name="api_documentation",
    description="Search API documentation for endpoint details, request/response formats, and authentication."
)

arch_tool = QueryEngineTool.from_defaults(
    query_engine=architecture_index.as_query_engine(),
    name="architecture_docs",
    description="Search architecture documentation for system design, data flow, and component relationships."
)

# Agent can search both document sets + use custom tools
agent = FunctionAgent.from_tools(
    [api_tool, arch_tool, search_codebase, run_tests],
    llm=llm,
    verbose=True,
    system_prompt=(
        "You are a senior developer assistant. Use the documentation tools "
        "to find information, and the codebase tools to verify implementation details."
    )
)

response = agent.chat("How does the authentication flow work? Check the API docs and architecture docs.")

Advanced RAG Patterns

Chat Engine (multi-turn conversation)


# Condense + Context mode: condenses follow-up questions with chat history,
# then retrieves fresh context for each turn
chat_engine = index.as_chat_engine(
    chat_mode="condense_plus_context",
    verbose=True
)

r1 = chat_engine.chat("What is the system architecture?")
print(r1)

r2 = chat_engine.chat("How does the caching layer work?")  # Builds on r1
print(r2)

r3 = chat_engine.chat("What are its failure modes?")  # Refers to caching
print(r3)

# Reset conversation
chat_engine.reset()

Structured Output


from pydantic import BaseModel, Field
from typing import List
from llama_index.core.output_parsers import PydanticOutputParser

class DocumentSummary(BaseModel):
    title: str = Field(description="Document title")
    key_topics: List[str] = Field(description="Main topics covered")
    sentiment: str = Field(description="Overall sentiment: positive, negative, or neutral")
    actionable_items: List[str] = Field(description="Action items extracted from the document")

output_parser = PydanticOutputParser(output_cls=DocumentSummary)
query_engine = index.as_query_engine(output_parser=output_parser)

response = query_engine.query("Summarize the quarterly review document")
# response is a DocumentSummary instance
print(response.title)
print(response.key_topics)
print(response.actionable_items)


from llama_index.core import SimpleDirectoryReader
from llama_index.multi_modal_llms.openai import OpenAIMultiModal

# Load documents including images
documents = SimpleDirectoryReader(
    "./data",
    required_exts=[".pdf", ".png", ".jpg", ".md"]
).load_data()

# Build multi-modal index
index = VectorStoreIndex.from_documents(documents)

# Use multi-modal LLM for queries about visual content
mm_llm = OpenAIMultiModal(model="gpt-4o")
query_engine = index.as_query_engine(llm=mm_llm)

response = query_engine.query("Describe the architecture diagram on page 5")
print(response)

Vector Store Integrations


# Chroma (local, great for development)
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

db = chromadb.PersistentClient(path="./chroma_db")
collection = db.get_or_create_collection("my_docs")
vector_store = ChromaVectorStore(chroma_collection=collection)

# Pinecone (cloud, production scale)
from llama_index.vector_stores.pinecone import PineconeVectorStore
from pinecone import Pinecone

pc = Pinecone(api_key="your-key")
pinecone_index = pc.Index("my-index")
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

# FAISS (fast local similarity search)
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

faiss_index = faiss.IndexFlatL2(1536)  # Dimension of your embeddings
vector_store = FaissVectorStore(faiss_index=faiss_index)

# Qdrant (self-hosted, production features)
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

client = QdrantClient(url="http://localhost:6333")
vector_store = QdrantVectorStore(client=client, collection_name="my_docs")

# Use any vector store in an index
from llama_index.core import StorageContext
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Customization

Swap LLM Provider


from llama_index.core import Settings

# Use Anthropic globally
from llama_index.llms.anthropic import Anthropic
Settings.llm = Anthropic(model="claude-sonnet-4-5-20250929")

# Use local model via Ollama
from llama_index.llms.ollama import Ollama
Settings.llm = Ollama(model="llama3.1", request_timeout=120.0)

# Per-query override (does not change global)
query_engine = index.as_query_engine(llm=Anthropic(model="claude-sonnet-4-5-20250929"))

Custom Embeddings


from llama_index.core import Settings

# OpenAI embeddings (default)
from llama_index.embeddings.openai import OpenAIEmbedding
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# HuggingFace (free, local)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
Settings.embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

# Cohere
from llama_index.embeddings.cohere import CohereEmbedding
Settings.embed_model = CohereEmbedding(model_name="embed-english-v3.0")

Custom Prompt Templates


from llama_index.core import PromptTemplate

# Override the QA prompt
qa_prompt = PromptTemplate(
    "You are a technical documentation expert.\n"
    "Context from the documentation:\n"
    "-----\n"
    "{context_str}\n"
    "-----\n"
    "Question: {query_str}\n\n"
    "Rules:\n"
    "1. Only answer based on the provided context.\n"
    "2. If the answer is not in the context, say 'Not found in documentation.'\n"
    "3. Include the relevant section name in your answer.\n"
    "4. Use code examples from the context when available.\n\n"
    "Answer: "
)

query_engine = index.as_query_engine(text_qa_template=qa_prompt)

Custom Node Parsing (chunking)


from llama_index.core.node_parser import (
    SentenceSplitter,
    SemanticSplitterNodeParser,
    MarkdownNodeParser,
    CodeSplitter,
)

# Sentence-based splitting (recommended default)
parser = SentenceSplitter(chunk_size=1024, chunk_overlap=200)

# Semantic splitting (splits by meaning boundaries)
from llama_index.embeddings.openai import OpenAIEmbedding
parser = SemanticSplitterNodeParser(
    embed_model=OpenAIEmbedding(),
    buffer_size=1,
    breakpoint_percentile_threshold=95
)

# Markdown-aware splitting
parser = MarkdownNodeParser()

# Code-aware splitting
parser = CodeSplitter(language="python", chunk_lines=40, chunk_lines_overlap=10)

# Use in Settings
from llama_index.core import Settings
Settings.node_parser = parser

Evaluation

LlamaIndex provides built-in evaluation to measure RAG quality:


from llama_index.core.evaluation import (
    RelevancyEvaluator,
    FaithfulnessEvaluator,
    BatchEvalRunner,
)

# Relevancy: Does the response actually answer the question?
relevancy_evaluator = RelevancyEvaluator()

# Faithfulness: Is the response supported by the retrieved context? (no hallucination)
faithfulness_evaluator = FaithfulnessEvaluator()

# Evaluate a single response
query = "What is the authentication flow?"
response = query_engine.query(query)

relevancy_result = relevancy_evaluator.evaluate_response(query=query, response=response)
faithfulness_result = faithfulness_evaluator.evaluate_response(query=query, response=response)

print(f"Relevant: {relevancy_result.passing} (score: {relevancy_result.score})")
print(f"Faithful: {faithfulness_result.passing} (score: {faithfulness_result.score})")

# Batch evaluation
eval_questions = [
    "How does authentication work?",
    "What is the database schema?",
    "How are errors handled?",
]

runner = BatchEvalRunner(
    {"relevancy": relevancy_evaluator, "faithfulness": faithfulness_evaluator},
    workers=4
)

eval_results = await runner.aevaluate_queries(
    query_engine, queries=eval_questions
)

for query, results in zip(eval_questions, eval_results):
    print(f"Q: {query}")
    print(f"  Relevancy: {results['relevancy'].passing}")
    print(f"  Faithfulness: {results['faithfulness'].passing}")

Configuration Reference

Settings (Global Defaults)

Setting	Type	Default	Description
`Settings.llm`	BaseLLM	OpenAI("gpt-3.5-turbo")	Default LLM for all operations
`Settings.embed_model`	BaseEmbedding	OpenAIEmbedding	Default embedding model
`Settings.node_parser`	NodeParser	SentenceSplitter	Default chunking strategy
`Settings.chunk_size`	int	1024	Default chunk size (tokens)
`Settings.chunk_overlap`	int	20	Default chunk overlap (tokens)
`Settings.num_output`	int	256	Max output tokens for LLM
`Settings.callback_manager`	CallbackManager	None	For observability/tracing

Query Engine Parameters

Parameter	Type	Default	Description
`similarity_top_k`	int	2	Number of chunks to retrieve
`response_mode`	str	"compact"	Response synthesis strategy
`streaming`	bool	False	Enable streaming responses
`verbose`	bool	False	Show retrieval details
`text_qa_template`	PromptTemplate	default	Override QA prompt
`refine_template`	PromptTemplate	default	Override refine prompt

Performance Benchmarks

Operation	Typical Latency	Notes
Index 100 documents	10-30s	One-time cost, persist to disk
Index 10,000 documents	5-15min	Use batch embedding, persist
Vector query (top-5)	200-500ms	Vector search only
Full RAG query	1-3s	Retrieval + LLM synthesis
Streaming first token	300-600ms	Much better perceived latency
Agent with 2 tool calls	4-8s	Multi-step reasoning

Best Practices

Persist your index. Always call index.storage_context.persist() after building. Re-embedding documents on every startup wastes time and money.
Use VectorStoreIndex as your default. It handles 90% of RAG use cases. Only reach for TreeIndex or SummaryIndex when you have specific summarization needs.
Tune similarity_top_k. Start with 3-5 and adjust. Too few misses relevant context; too many dilutes with noise and increases LLM cost.
Add metadata to documents. Metadata enables filtering, source attribution, and better retrieval. Always include at least source, date, and category.
Use streaming for all user-facing queries. The difference between 2s of silence and immediate partial output fundamentally changes user perception.
Choose the right response mode. compact is the best default. Use tree_summarize for long documents, refine for highest quality (at higher cost), and no_text when you just need retrieved chunks.
Evaluate your RAG system. Use RelevancyEvaluator and FaithfulnessEvaluator to measure quality. A RAG system without evaluation is a guessing game.
Use chat_engine for conversations, not repeated query_engine calls. The chat engine automatically handles history condensation and context management.
Match chunk size to your content. Technical documentation benefits from larger chunks (1000-1500 tokens) to preserve context. Short Q&A pairs work better with smaller chunks (256-512 tokens).
Use separate indices for separate concerns. Do not dump API docs, architecture docs, and meeting notes into one index. Create separate indices and wrap them as tools for an agent that can choose the right source.

Troubleshooting

Query returns "I don't have enough information":


# Increase the number of retrieved chunks
query_engine = index.as_query_engine(similarity_top_k=10)

# Check what's actually being retrieved
retriever = index.as_retriever(similarity_top_k=10)
nodes = retriever.retrieve("your query here")
for node in nodes:
    print(f"Score: {node.score:.3f} | {node.text[:100]}")
# If scores are low, your chunks may not match the query phrasing

Hallucinated answers (not grounded in context):


# Use a stricter prompt template
from llama_index.core import PromptTemplate
strict_prompt = PromptTemplate(
    "Context:\n{context_str}\n\n"
    "Question: {query_str}\n\n"
    "IMPORTANT: Only answer from the context above. "
    "If the answer is not clearly stated in the context, respond with "
    "'The provided documents do not contain this information.'\n"
    "Answer: "
)
query_engine = index.as_query_engine(text_qa_template=strict_prompt)

# Also: evaluate with FaithfulnessEvaluator

Slow indexing on large document sets:


# Use batch processing with progress bar
from llama_index.core import VectorStoreIndex
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=1024, chunk_overlap=200),
        OpenAIEmbedding(model="text-embedding-3-small"),
    ]
)

nodes = pipeline.run(documents=documents, show_progress=True)
index = VectorStoreIndex(nodes)

Memory issues with large indices:


# Use an external vector store instead of in-memory default
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

db = chromadb.PersistentClient(path="./chroma_db")
collection = db.get_or_create_collection("my_docs")
vector_store = ChromaVectorStore(chroma_collection=collection)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Chat engine loses context after many turns:


# Use condense_plus_context mode (re-retrieves on each turn)
chat_engine = index.as_chat_engine(
    chat_mode="condense_plus_context",
    verbose=True
)
# This condenses the full chat history + new question into a standalone query,
# then retrieves fresh context each time

LlamaIndex vs LangChain

Dimension	LlamaIndex	LangChain
Primary focus	RAG and data retrieval	General LLM applications
RAG quality	Best-in-class (core focus)	Good (one of many features)
Data connectors	300+ via LlamaHub	100+ via community
Index types	Vector, Tree, Summary, Keyword, KG	Vector store wrappers
Response synthesis	5+ modes (compact, refine, tree)	Basic (stuff, map_reduce)
Evaluation	Built-in (relevancy, faithfulness)	Via LangSmith
Agent support	FunctionAgent, ReActAgent	AgentExecutor, tool calling
Learning curve	Easy for RAG, moderate for agents	Moderate for everything
When to choose	RAG is your primary use case	Agents + tools are primary

Use LlamaIndex when your application is fundamentally about querying data -- document Q&A, knowledge bases, enterprise search, research assistants.

Use LangChain when your application is fundamentally about agent reasoning, tool orchestration, or you need the broadest integration ecosystem.

Use both together when you need LlamaIndex's superior RAG as a tool within a LangChain agent:


# LlamaIndex index as a LangChain tool
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from langchain.tools import Tool

# Build LlamaIndex index
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Wrap as LangChain tool
doc_search_tool = Tool(
    name="DocumentSearch",
    func=lambda q: str(query_engine.query(q)),
    description="Search internal documentation for answers"
)

# Use in a LangChain agent
from langchain.agents import create_tool_calling_agent, AgentExecutor
agent = create_tool_calling_agent(llm, [doc_search_tool, ...], prompt)

Resources

GitHub: https://github.com/run-llama/llama_index (45,100+ stars)
Documentation: https://docs.llamaindex.ai
LlamaHub: https://llamahub.ai (300+ data connectors)
LlamaCloud: https://cloud.llamaindex.ai (managed RAG)
Discord: https://discord.gg/dGcwcsnxhU
Version: 0.14.7+
License: MIT

Agents Llamaindex Kit