Master Langfuse Suite
All-in-one skill covering expert, langfuse, open, source. Includes structured workflows, validation checks, and reusable patterns for ai research.
Master Langfuse Suite
Overview
Langfuse is an open-source LLM engineering platform providing end-to-end observability, prompt management, and evaluation capabilities for AI applications. Acquired by ClickHouse in 2025, it has become the de facto standard for teams building production LLM systems who need visibility into model behavior, cost tracking, latency monitoring, and quality evaluation. Langfuse captures every interaction in your LLM pipeline as structured traces, enabling you to debug complex chains, monitor production performance, and iteratively improve prompt quality. The platform supports self-hosting via Docker or Kubernetes, and also offers a managed cloud service. Its Python and JavaScript SDKs integrate natively with LangChain, LlamaIndex, OpenAI SDK, LiteLLM, and any OpenTelemetry-instrumented library.
When to Use
- Production LLM monitoring: Track latency, token usage, costs, and error rates across all LLM calls in real time.
- Debugging agent workflows: Visualize multi-step agent traces, tool calls, and retrieval-augmented generation pipelines.
- Prompt iteration: Version and A/B test prompts with linked evaluation scores.
- Quality evaluation: Run LLM-as-a-judge evaluations, collect user feedback, and annotate outputs manually.
- Cost optimization: Identify expensive calls, compare model costs, and optimize token usage across providers.
- Dataset curation: Build evaluation datasets from production traces to benchmark prompt and model changes.
Quick Start
Installation
# Python SDK (OpenTelemetry-based) pip install langfuse # JavaScript/TypeScript SDK npm install langfuse
Environment Configuration
# Set credentials (cloud or self-hosted) export LANGFUSE_PUBLIC_KEY="pk-lf-..." export LANGFUSE_SECRET_KEY="sk-lf-..." export LANGFUSE_HOST="https://cloud.langfuse.com" # or your self-hosted URL
Minimal Tracing Example
from langfuse import observe, get_client langfuse = get_client() @observe() def process_query(user_input: str) -> str: """Automatically creates a trace with timing and metadata.""" result = call_llm(user_input) return result @observe(as_type="generation") def call_llm(prompt: str) -> str: """Tracked as an LLM generation with token counts.""" import openai client = openai.OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content # Run and traces appear in Langfuse dashboard result = process_query("Explain quantum computing in simple terms") langfuse.flush()
Core Concepts
Tracing Architecture
Langfuse organizes observability data into a hierarchy of Traces, Spans, Generations, and Events:
Trace (top-level request)
āāā Span: "retrieve-documents"
ā āāā Span: "embed-query"
ā āāā Span: "vector-search"
āāā Generation: "llm-call" (model, tokens, cost)
āāā Event: "user-feedback-received"
Decorator-Based Instrumentation
from langfuse import observe @observe() def rag_pipeline(query: str) -> str: documents = retrieve_docs(query) context = format_context(documents) return generate_answer(query, context) @observe() def retrieve_docs(query: str) -> list: embedding = embed_query(query) return vector_store.search(embedding, top_k=5) @observe(as_type="generation") def generate_answer(query: str, context: str) -> str: import openai client = openai.OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": f"Context: {context}"}, {"role": "user", "content": query} ] ) return response.choices[0].message.content
Context Manager Approach
from langfuse import get_client langfuse = get_client() with langfuse.start_as_current_observation( as_type="span", name="data-pipeline" ) as span: data = fetch_data() span.update(metadata={"record_count": len(data)}) with langfuse.start_as_current_observation( as_type="generation", name="summarize", model="gpt-4o-mini" ) as gen: summary = summarize(data) gen.update( output=summary, usage={"input_tokens": 500, "output_tokens": 150} ) langfuse.flush()
LangChain Integration
from langfuse.callback import CallbackHandler langfuse_handler = CallbackHandler( public_key="pk-lf-...", secret_key="sk-lf-...", host="https://cloud.langfuse.com" ) # Pass to any LangChain runnable from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate chain = ChatPromptTemplate.from_template("Explain {topic}") | ChatOpenAI() result = chain.invoke( {"topic": "neural networks"}, config={"callbacks": [langfuse_handler]} )
Prompt Management
from langfuse import get_client langfuse = get_client() # Fetch versioned prompt from Langfuse prompt = langfuse.get_prompt("summarization-prompt", version=3) # Use in your application compiled = prompt.compile(document=my_document, max_length=200)
Evaluation and Scoring
from langfuse import get_client langfuse = get_client() # Attach scores to traces langfuse.score( trace_id="trace-abc-123", name="relevance", value=0.92, comment="Highly relevant response" ) # User feedback scoring langfuse.score( trace_id="trace-abc-123", name="user-thumbs", value=1, # thumbs up data_type="BOOLEAN" )
Configuration Reference
| Parameter | Description | Default |
|---|---|---|
LANGFUSE_PUBLIC_KEY | Project public key from Langfuse dashboard | Required |
LANGFUSE_SECRET_KEY | Project secret key from Langfuse dashboard | Required |
LANGFUSE_HOST | Langfuse server URL | https://cloud.langfuse.com |
LANGFUSE_RELEASE | Application release/version tag | None |
LANGFUSE_DEBUG | Enable debug logging | false |
LANGFUSE_SAMPLE_RATE | Trace sampling rate (0.0 to 1.0) | 1.0 |
LANGFUSE_FLUSH_AT | Batch size before flush | 15 |
LANGFUSE_FLUSH_INTERVAL | Flush interval in seconds | 0.5 |
LANGFUSE_ENABLED | Enable/disable tracing globally | true |
Self-Hosting Configuration
| Parameter | Description | Default |
|---|---|---|
DATABASE_URL | PostgreSQL connection string | Required |
CLICKHOUSE_URL | ClickHouse connection string (v3+) | Required |
NEXTAUTH_SECRET | Auth encryption secret | Required |
SALT | API key hashing salt | Required |
NEXTAUTH_URL | Application base URL | http://localhost:3000 |
LANGFUSE_S3_BUCKET | S3 bucket for media storage | Optional |
Best Practices
-
Use the
@observe()decorator liberally: Wrap every meaningful function in your LLM pipeline. The overhead is minimal and the debugging value is enormous. Nested decorators automatically create parent-child trace relationships. -
Attach metadata to traces: Include user IDs, session IDs, and request metadata so you can filter and segment traces in the dashboard. Use
langfuse.update_current_observation(metadata={...})within decorated functions. -
Version your prompts in Langfuse: Store prompts in the Langfuse prompt management system rather than hardcoding them. This enables A/B testing, rollback, and linking evaluation scores to specific prompt versions.
-
Implement structured evaluation: Combine automated LLM-as-a-judge scoring with human annotation workflows. Track scores over time to detect quality regressions before users notice.
-
Set up sampling in production: For high-throughput applications, use
LANGFUSE_SAMPLE_RATEto control trace volume. Start at 1.0 during development and reduce to 0.1-0.3 in production to manage costs. -
Build evaluation datasets from production: Use the Langfuse dataset feature to curate representative examples from real traffic. These become your regression test suite for prompt changes.
-
Monitor cost per trace: Use the cost tracking dashboard to identify expensive model calls. Consider routing simple queries to cheaper models while reserving powerful models for complex tasks.
-
Flush before process exit: Always call
langfuse.flush()before your application shuts down to ensure all buffered traces are sent. In serverless environments, flush at the end of each request handler. -
Use sessions for multi-turn conversations: Group related traces into sessions using
session_idto track entire conversation flows rather than isolated requests. -
Integrate with CI/CD: Run evaluation datasets as part of your deployment pipeline. Block deployments when quality scores drop below configured thresholds.
Troubleshooting
Traces not appearing in dashboard
Ensure LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are correctly set. Call langfuse.flush() explicitly and check LANGFUSE_DEBUG=true for error messages. Verify network connectivity to the Langfuse host.
High latency impact on application
Langfuse SDKs batch and send traces asynchronously. If you observe latency, check that you are not calling flush() synchronously in the request path. Adjust LANGFUSE_FLUSH_AT and LANGFUSE_FLUSH_INTERVAL for your throughput.
Missing token counts or costs
Token usage must be reported by the LLM integration. When using the OpenAI SDK directly, wrap calls with @observe(as_type="generation") and the SDK auto-captures usage. For custom models, manually set usage={"input_tokens": N, "output_tokens": M}.
Self-hosted deployment fails to start
Verify DATABASE_URL points to a running PostgreSQL instance and CLICKHOUSE_URL is configured for Langfuse v3+. Run database migrations with langfuse-server migrate before starting the application.
Decorator nesting not working
Ensure all functions in the call chain use @observe(). The SDK uses Python contextvars for trace propagation, so async functions require the @observe() decorator on the async function itself.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.