A

Agents Langchain Expert

All-in-one skill covering framework, building, powered, applications. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

LangChain - Build LLM Applications with Agents and RAG

Overview

LangChain is the most widely adopted framework for building applications powered by large language models. With 119,000+ GitHub stars and 500+ integrations, it provides the connective tissue between LLMs, tools, data sources, and memory systems. At its core, LangChain abstracts the common patterns of LLM application development -- prompt templating, chain composition, agent reasoning, retrieval-augmented generation, and conversational memory -- into a composable, provider-agnostic API.

LangChain matters because it eliminates the boilerplate of LLM application development. Swapping from OpenAI to Anthropic to a local Ollama model requires changing one line. Building a ReAct agent with tool calling takes fewer than 10 lines. Adding RAG over your documents is a well-paved path with dozens of vector store integrations. And when you need observability, LangSmith gives you full traces of every chain execution, tool call, and token.

The ecosystem splits into three packages: langchain-core (interfaces and base abstractions), langchain (chains, agents, and retrieval logic), and langchain-community (500+ third-party integrations). Provider-specific packages like langchain-openai and langchain-anthropic provide optimized model bindings.

When to Use

  • Building agents that reason about which tools to call (ReAct pattern, function calling)
  • Implementing RAG pipelines over documents, web pages, code, or databases
  • Prototyping LLM applications quickly with swappable model providers
  • Creating chatbots with conversation memory (buffer, summary, or window-based)
  • Need structured output from LLMs (Pydantic models, JSON schemas)
  • Streaming agent execution steps to a frontend in real time
  • Production deployments that need LangSmith tracing for debugging and monitoring
  • Combining multiple data sources (PDFs, CSVs, APIs, databases) into a single queryable interface

Quick Start

Installation

# Core + Anthropic (recommended for Claude Code users) pip install -U langchain langchain-anthropic # Or with OpenAI pip install -U langchain langchain-openai # Common extras pip install langchain-community # 500+ integrations pip install langchain-chroma # Chroma vector store pip install langchain-pinecone # Pinecone vector store # Set API keys export ANTHROPIC_API_KEY="sk-ant-..." export OPENAI_API_KEY="sk-..."

Hello World -- LLM Call

from langchain_anthropic import ChatAnthropic llm = ChatAnthropic(model="claude-sonnet-4-5-20250929") response = llm.invoke("What are the three laws of thermodynamics? Be concise.") print(response.content)

Hello World -- Agent with Tools

from langchain.agents import create_tool_calling_agent, AgentExecutor from langchain_anthropic import ChatAnthropic from langchain.tools import tool from langchain_core.prompts import ChatPromptTemplate @tool def get_stock_price(ticker: str) -> str: """Get the current stock price for a given ticker symbol.""" # In production, call a real API prices = {"AAPL": 198.50, "GOOGL": 175.30, "MSFT": 425.80} price = prices.get(ticker.upper(), None) if price: return f"{ticker.upper()}: ${price}" return f"Ticker {ticker} not found." @tool def calculate(expression: str) -> str: """Evaluate a mathematical expression. Input: valid Python math expression.""" try: return str(eval(expression)) except Exception as e: return f"Error: {e}" # Build the agent llm = ChatAnthropic(model="claude-sonnet-4-5-20250929") prompt = ChatPromptTemplate.from_messages([ ("system", "You are a financial analyst. Use tools to answer questions accurately."), ("human", "{input}"), ("placeholder", "{agent_scratchpad}") ]) agent = create_tool_calling_agent(llm, [get_stock_price, calculate], prompt) executor = AgentExecutor(agent=agent, tools=[get_stock_price, calculate], verbose=True) result = executor.invoke({"input": "What is AAPL's stock price times 100 shares?"}) print(result["output"])

Core Concepts

1. Models -- Provider Abstraction

LangChain provides a unified interface across LLM providers. Swap models by changing one line:

from langchain_openai import ChatOpenAI from langchain_anthropic import ChatAnthropic from langchain_google_genai import ChatGoogleGenerativeAI # All share the same interface llm = ChatOpenAI(model="gpt-4o", temperature=0.7) llm = ChatAnthropic(model="claude-sonnet-4-5-20250929", temperature=0.7) llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0.7) # Streaming (works with any provider) for chunk in llm.stream("Explain recursion in 3 sentences"): print(chunk.content, end="", flush=True) # Batch processing responses = llm.batch([ "Summarize quantum computing", "Explain neural networks", "Describe blockchain" ])

2. Prompt Templates

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder # Simple template prompt = ChatPromptTemplate.from_messages([ ("system", "You are an expert {role}. Respond in {language}."), ("human", "{question}") ]) # With conversation history prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful coding assistant."), MessagesPlaceholder("history"), # Injected conversation history ("human", "{input}"), MessagesPlaceholder("agent_scratchpad") # For agent reasoning ]) # Invoke with variables chain = prompt | llm result = chain.invoke({"role": "physicist", "language": "English", "question": "What is dark matter?"})

3. Chains -- Composable Pipelines

LangChain Expression Language (LCEL) lets you compose chains with the | operator:

from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate # Chain: prompt -> LLM -> parse output summarize_chain = ( ChatPromptTemplate.from_template("Summarize this in 3 bullet points:\n{text}") | llm | StrOutputParser() ) result = summarize_chain.invoke({"text": "Long article text here..."}) print(result) # Plain string output # Chain composition -- pipe one chain into another translate_chain = ( ChatPromptTemplate.from_template("Translate to {language}:\n{text}") | llm | StrOutputParser() ) # Summarize then translate full_chain = summarize_chain | (lambda summary: {"text": summary, "language": "Spanish"}) | translate_chain

4. Agents -- Autonomous Tool-Using Reasoning

Agents use the ReAct (Reasoning + Acting) pattern: the LLM decides which tool to call, observes the result, and continues reasoning until it can answer the question.

from langchain.agents import create_tool_calling_agent, AgentExecutor from langchain.tools import tool from langchain_core.prompts import ChatPromptTemplate @tool def search_documentation(query: str) -> str: """Search project documentation for relevant information.""" # In production: vector store lookup return f"Documentation results for: {query}" @tool def run_sql_query(query: str) -> str: """Execute a read-only SQL query against the analytics database.""" # In production: actual database connection return "Query returned 42 rows. Top result: revenue=$1.2M" @tool def send_slack_message(channel: str, message: str) -> str: """Send a message to a Slack channel.""" return f"Message sent to #{channel}" prompt = ChatPromptTemplate.from_messages([ ("system", "You are a data analyst assistant. Use tools to find answers and communicate results."), ("human", "{input}"), ("placeholder", "{agent_scratchpad}") ]) tools = [search_documentation, run_sql_query, send_slack_message] agent = create_tool_calling_agent(llm, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=10) # The agent will: 1) query DB, 2) analyze results, 3) send to Slack result = executor.invoke({ "input": "What was last quarter's revenue? Post the result to #analytics" })

5. Memory -- Conversation State

from langchain.memory import ConversationBufferWindowMemory from langchain.chains import ConversationChain # Keep last 10 exchanges (20 messages) memory = ConversationBufferWindowMemory(k=10) conversation = ConversationChain(llm=llm, memory=memory, verbose=True) conversation.predict(input="My name is Alice and I work on the backend team.") conversation.predict(input="What team do I work on?") # Output: "You mentioned you work on the backend team." # Other memory types: from langchain.memory import ( ConversationBufferMemory, # Keep everything (unbounded) ConversationSummaryMemory, # Summarize older messages ConversationSummaryBufferMemory, # Hybrid: recent messages + summary ConversationTokenBufferMemory, # Bounded by token count )

RAG (Retrieval-Augmented Generation)

Complete RAG Pipeline

from langchain_community.document_loaders import ( WebBaseLoader, PyPDFLoader, DirectoryLoader ) from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain.chains import RetrievalQA # Step 1: Load documents from multiple sources web_docs = WebBaseLoader([ "https://docs.python.org/3/tutorial/classes.html", "https://docs.python.org/3/tutorial/errors.html" ]).load() pdf_docs = PyPDFLoader("technical_spec.pdf").load() all_docs = web_docs + pdf_docs # Step 2: Split into chunks splitter = RecursiveCharacterTextSplitter( chunk_size=800, chunk_overlap=150, separators=["\n\n", "\n", ". ", " ", ""] ) chunks = splitter.split_documents(all_docs) print(f"Split into {len(chunks)} chunks") # Step 3: Embed and store in vector database vectorstore = Chroma.from_documents( documents=chunks, embedding=OpenAIEmbeddings(model="text-embedding-3-small"), persist_directory="./chroma_db" ) # Step 4: Create retriever retriever = vectorstore.as_retriever( search_type="mmr", # Maximum Marginal Relevance (diverse results) search_kwargs={"k": 4, "fetch_k": 10} ) # Step 5: Build QA chain qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", # Or "map_reduce", "refine", "map_rerank" retriever=retriever, return_source_documents=True ) # Step 6: Query result = qa_chain.invoke({"query": "How do Python classes handle inheritance?"}) print(result["result"]) for doc in result["source_documents"]: print(f" Source: {doc.metadata.get('source', 'unknown')}")

Conversational RAG (chatbot over documents)

from langchain.chains import ConversationalRetrievalChain from langchain.memory import ConversationBufferMemory memory = ConversationBufferMemory( memory_key="chat_history", return_messages=True, output_key="answer" ) chat_qa = ConversationalRetrievalChain.from_llm( llm=llm, retriever=retriever, memory=memory, return_source_documents=True ) # Multi-turn conversation chat_qa.invoke({"question": "What is Python's GIL?"}) chat_qa.invoke({"question": "How does it affect multithreading?"}) # Remembers context chat_qa.invoke({"question": "What are alternatives?"}) # Builds on previous answers

Advanced Patterns

Structured Output

from pydantic import BaseModel, Field from typing import List class CodeReview(BaseModel): file_path: str = Field(description="Path to the reviewed file") issues: List[str] = Field(description="List of identified issues") severity: str = Field(description="Overall severity: critical, high, medium, low") suggestions: List[str] = Field(description="Improvement suggestions") # LLM returns structured data structured_llm = llm.with_structured_output(CodeReview) review = structured_llm.invoke( "Review this code: def add(a,b): return a+b # no type hints, no docstring" ) print(review.file_path) # str print(review.issues) # List[str] print(review.severity) # str

Parallel Tool Execution

# Modern LLMs can call multiple tools simultaneously # LangChain handles this automatically with tool-calling agents @tool def get_weather(city: str) -> str: """Get weather for a city.""" return f"Sunny, 72F in {city}" @tool def get_population(city: str) -> str: """Get population of a city.""" return f"{city} population: 2.1 million" # When asked "Compare Paris and London", the agent will call # get_weather("Paris"), get_weather("London"), # get_population("Paris"), get_population("London") # in parallel if the LLM supports it

Streaming Agent Steps

# Stream individual steps as the agent reasons async for event in executor.astream_events( {"input": "Research and summarize recent AI news"}, version="v2" ): if event["event"] == "on_tool_start": print(f"Calling tool: {event['name']}") elif event["event"] == "on_tool_end": print(f"Tool result: {event['data']['output'][:100]}...") elif event["event"] == "on_llm_stream": print(event["data"]["chunk"].content, end="")

LangSmith Observability

import os # Enable tracing (all chains and agents automatically traced) os.environ["LANGCHAIN_TRACING_V2"] = "true" os.environ["LANGCHAIN_API_KEY"] = "ls__your_api_key" os.environ["LANGCHAIN_PROJECT"] = "production-chatbot" # Every invoke() is now traced result = executor.invoke({"input": "What is the quarterly revenue?"}) # View traces at https://smith.langchain.com # See: latency per step, token usage, tool calls, full prompt/response

Vector Store Integrations

Vector StoreTypeInstallBest For
ChromaLocal/embeddedpip install langchain-chromaDevelopment, small datasets
FAISSLocal/in-memorypip install langchain-community[faiss]Fast similarity search
PineconeCloud (managed)pip install langchain-pineconeProduction, scalability
WeaviateSelf-hosted/cloudpip install langchain-weaviateHybrid search
QdrantSelf-hosted/cloudpip install langchain-qdrantFiltering + vector search
pgvectorPostgreSQL extensionpip install langchain-postgresExisting Postgres infra
# Chroma (local development) from langchain_chroma import Chroma vectorstore = Chroma.from_documents(docs, embeddings, persist_directory="./db") # FAISS (fast, in-memory) from langchain_community.vectorstores import FAISS vectorstore = FAISS.from_documents(docs, embeddings) vectorstore.save_local("faiss_index") vectorstore = FAISS.load_local("faiss_index", embeddings) # Pinecone (production) from langchain_pinecone import PineconeVectorStore vectorstore = PineconeVectorStore.from_documents(docs, embeddings, index_name="prod")

Document Loaders and Text Splitters

# Loaders for every source type from langchain_community.document_loaders import ( WebBaseLoader, # Web pages PyPDFLoader, # PDF files CSVLoader, # CSV files TextLoader, # Plain text JSONLoader, # JSON files GitLoader, # Git repositories NotionDirectoryLoader, # Notion exports UnstructuredMarkdownLoader, # Markdown files ) # Text splitters for different content types from langchain.text_splitter import ( RecursiveCharacterTextSplitter, # General text (recommended default) PythonCodeTextSplitter, # Python source code MarkdownTextSplitter, # Markdown documents HTMLSectionSplitter, # HTML documents ) # Semantic chunking (split by meaning, not character count) from langchain_experimental.text_splitter import SemanticChunker splitter = SemanticChunker(OpenAIEmbeddings())

Configuration Reference

Environment Variables

VariablePurpose
OPENAI_API_KEYOpenAI models and embeddings
ANTHROPIC_API_KEYAnthropic Claude models
GOOGLE_API_KEYGoogle Gemini models
LANGCHAIN_TRACING_V2Enable LangSmith tracing ("true")
LANGCHAIN_API_KEYLangSmith API key
LANGCHAIN_PROJECTLangSmith project name
LANGCHAIN_ENDPOINTLangSmith API endpoint

Performance Benchmarks

OperationTypical LatencyNotes
Simple LLM call1-3sVaries by provider and model
Agent with 1 tool call3-5sReAct reasoning overhead
Agent with 3 parallel tools4-7sParallel execution helps
RAG retrieval + answer1-3sVector search (~200ms) + LLM
Embed 1000 documents10-30sBatch embedding recommended
Streaming first token200-500msMuch better perceived latency

Best Practices

  1. Start with create_tool_calling_agent. It is the simplest, most reliable agent factory in LangChain. Use it unless you have a specific reason to use a different agent type.

  2. Enable streaming for all user-facing applications. Streaming reduces perceived latency from seconds to milliseconds for the first token. Use astream_events for fine-grained control.

  3. Use LCEL (| operator) over legacy chains. LCEL is the modern way to compose LangChain components. Legacy classes like LLMChain still work but are not actively developed.

  4. Optimize RAG chunk size. Start with 800-1000 characters and 150-200 character overlap. Too small loses context; too large wastes tokens. Test with your actual queries.

  5. Enable LangSmith in every environment. Even in development, traces are invaluable for debugging. In production, they give you latency, cost, and quality metrics.

  6. Handle tool errors gracefully. Tools fail. Wrap tool functions in try/except and return descriptive error strings instead of raising exceptions. The agent can then reason about the error.

  7. Use MMR retrieval over simple similarity. Maximum Marginal Relevance (search_type="mmr") returns diverse results instead of 4 near-duplicate chunks.

  8. Cache embeddings aggressively. Embedding the same documents repeatedly is expensive. Persist your vector store to disk and reload on startup.

  9. Set max_iterations on agents. Without a limit, a confused agent can loop indefinitely. Set it to 10-15 for most use cases.

  10. Version your prompts. Store prompt templates in files or a prompt registry (LangSmith Hub). Track changes over time to correlate prompt edits with quality changes.

Troubleshooting

Agent keeps calling the same tool in a loop:

# Set max_iterations to break the loop executor = AgentExecutor( agent=agent, tools=tools, max_iterations=10, # Force stop after 10 iterations handle_parsing_errors=True # Gracefully handle malformed tool calls )

RAG returns irrelevant chunks:

# Increase fetch_k and use MMR for diversity retriever = vectorstore.as_retriever( search_type="mmr", search_kwargs={"k": 4, "fetch_k": 20, "lambda_mult": 0.5} ) # Also try: smaller chunk_size, more chunk_overlap, better splitter

"Could not parse LLM output" errors:

# Enable error handling in the executor executor = AgentExecutor( agent=agent, tools=tools, handle_parsing_errors=True # Converts parsing errors to agent observations )

Slow embedding on large document sets:

# Use batch embedding with concurrency from langchain_openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings( model="text-embedding-3-small", chunk_size=1000 # Batch size for API calls ) # Also: persist vectorstore to avoid re-embedding vectorstore = Chroma.from_documents(docs, embeddings, persist_directory="./db")

Memory growing too large:

# Use windowed memory instead of unbounded buffer from langchain.memory import ConversationBufferWindowMemory memory = ConversationBufferWindowMemory(k=5) # Keep last 5 exchanges # Or token-limited memory from langchain.memory import ConversationTokenBufferMemory memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=2000)

LangChain vs LangGraph

DimensionLangChainLangGraph
Abstraction levelHigh (opinionated defaults)Low (explicit control)
Lines to start<10~30-50
Agent patternReAct via AgentExecutorCustom state machines
Cyclic workflowsNot supportedNative
Human-in-the-loopBasic (tool approval)Advanced (checkpoints, branching)
Multi-agentLimitedNative sub-graphs
State managementMemory classesTyped state objects
When to useQuick prototyping, standard RAGComplex production workflows

Use LangChain when you want to move fast. Use LangGraph when you need fine-grained control over a stateful, potentially cyclic workflow.

Resources

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates