Master Langfuse Suite

Overview

Langfuse is an open-source LLM engineering platform providing end-to-end observability, prompt management, and evaluation capabilities for AI applications. Acquired by ClickHouse in 2025, it has become the de facto standard for teams building production LLM systems who need visibility into model behavior, cost tracking, latency monitoring, and quality evaluation. Langfuse captures every interaction in your LLM pipeline as structured traces, enabling you to debug complex chains, monitor production performance, and iteratively improve prompt quality. The platform supports self-hosting via Docker or Kubernetes, and also offers a managed cloud service. Its Python and JavaScript SDKs integrate natively with LangChain, LlamaIndex, OpenAI SDK, LiteLLM, and any OpenTelemetry-instrumented library.

When to Use

Production LLM monitoring: Track latency, token usage, costs, and error rates across all LLM calls in real time.
Debugging agent workflows: Visualize multi-step agent traces, tool calls, and retrieval-augmented generation pipelines.
Prompt iteration: Version and A/B test prompts with linked evaluation scores.
Quality evaluation: Run LLM-as-a-judge evaluations, collect user feedback, and annotate outputs manually.
Cost optimization: Identify expensive calls, compare model costs, and optimize token usage across providers.
Dataset curation: Build evaluation datasets from production traces to benchmark prompt and model changes.

Quick Start

Installation


# Python SDK (OpenTelemetry-based)
pip install langfuse

# JavaScript/TypeScript SDK
npm install langfuse

Environment Configuration


# Set credentials (cloud or self-hosted)
export LANGFUSE_PUBLIC_KEY="pk-lf-..."
export LANGFUSE_SECRET_KEY="sk-lf-..."
export LANGFUSE_HOST="https://cloud.langfuse.com"  # or your self-hosted URL

Minimal Tracing Example


from langfuse import observe, get_client

langfuse = get_client()

@observe()
def process_query(user_input: str) -> str:
    """Automatically creates a trace with timing and metadata."""
    result = call_llm(user_input)
    return result

@observe(as_type="generation")
def call_llm(prompt: str) -> str:
    """Tracked as an LLM generation with token counts."""
    import openai
    client = openai.OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Run and traces appear in Langfuse dashboard
result = process_query("Explain quantum computing in simple terms")
langfuse.flush()

Core Concepts

Tracing Architecture

Langfuse organizes observability data into a hierarchy of Traces, Spans, Generations, and Events:

Trace (top-level request)
├── Span: "retrieve-documents"
│   ├── Span: "embed-query"
│   └── Span: "vector-search"
├── Generation: "llm-call" (model, tokens, cost)
└── Event: "user-feedback-received"

Decorator-Based Instrumentation


from langfuse import observe

@observe()
def rag_pipeline(query: str) -> str:
    documents = retrieve_docs(query)
    context = format_context(documents)
    return generate_answer(query, context)

@observe()
def retrieve_docs(query: str) -> list:
    embedding = embed_query(query)
    return vector_store.search(embedding, top_k=5)

@observe(as_type="generation")
def generate_answer(query: str, context: str) -> str:
    import openai
    client = openai.OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Context: {context}"},
            {"role": "user", "content": query}
        ]
    )
    return response.choices[0].message.content

Context Manager Approach


from langfuse import get_client

langfuse = get_client()

with langfuse.start_as_current_observation(
    as_type="span", name="data-pipeline"
) as span:
    data = fetch_data()
    span.update(metadata={"record_count": len(data)})

    with langfuse.start_as_current_observation(
        as_type="generation",
        name="summarize",
        model="gpt-4o-mini"
    ) as gen:
        summary = summarize(data)
        gen.update(
            output=summary,
            usage={"input_tokens": 500, "output_tokens": 150}
        )

langfuse.flush()

LangChain Integration


from langfuse.callback import CallbackHandler

langfuse_handler = CallbackHandler(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="https://cloud.langfuse.com"
)

# Pass to any LangChain runnable
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

chain = ChatPromptTemplate.from_template("Explain {topic}") | ChatOpenAI()
result = chain.invoke(
    {"topic": "neural networks"},
    config={"callbacks": [langfuse_handler]}
)

Prompt Management


from langfuse import get_client

langfuse = get_client()

# Fetch versioned prompt from Langfuse
prompt = langfuse.get_prompt("summarization-prompt", version=3)

# Use in your application
compiled = prompt.compile(document=my_document, max_length=200)

Evaluation and Scoring


from langfuse import get_client

langfuse = get_client()

# Attach scores to traces
langfuse.score(
    trace_id="trace-abc-123",
    name="relevance",
    value=0.92,
    comment="Highly relevant response"
)

# User feedback scoring
langfuse.score(
    trace_id="trace-abc-123",
    name="user-thumbs",
    value=1,  # thumbs up
    data_type="BOOLEAN"
)

Configuration Reference

Parameter	Description	Default
`LANGFUSE_PUBLIC_KEY`	Project public key from Langfuse dashboard	Required
`LANGFUSE_SECRET_KEY`	Project secret key from Langfuse dashboard	Required
`LANGFUSE_HOST`	Langfuse server URL	`https://cloud.langfuse.com`
`LANGFUSE_RELEASE`	Application release/version tag	`None`
`LANGFUSE_DEBUG`	Enable debug logging	`false`
`LANGFUSE_SAMPLE_RATE`	Trace sampling rate (0.0 to 1.0)	`1.0`
`LANGFUSE_FLUSH_AT`	Batch size before flush	`15`
`LANGFUSE_FLUSH_INTERVAL`	Flush interval in seconds	`0.5`
`LANGFUSE_ENABLED`	Enable/disable tracing globally	`true`

Self-Hosting Configuration

Parameter	Description	Default
`DATABASE_URL`	PostgreSQL connection string	Required
`CLICKHOUSE_URL`	ClickHouse connection string (v3+)	Required
`NEXTAUTH_SECRET`	Auth encryption secret	Required
`SALT`	API key hashing salt	Required
`NEXTAUTH_URL`	Application base URL	`http://localhost:3000`
`LANGFUSE_S3_BUCKET`	S3 bucket for media storage	Optional

Best Practices

Use the @observe() decorator liberally: Wrap every meaningful function in your LLM pipeline. The overhead is minimal and the debugging value is enormous. Nested decorators automatically create parent-child trace relationships.
Attach metadata to traces: Include user IDs, session IDs, and request metadata so you can filter and segment traces in the dashboard. Use langfuse.update_current_observation(metadata={...}) within decorated functions.
Version your prompts in Langfuse: Store prompts in the Langfuse prompt management system rather than hardcoding them. This enables A/B testing, rollback, and linking evaluation scores to specific prompt versions.
Implement structured evaluation: Combine automated LLM-as-a-judge scoring with human annotation workflows. Track scores over time to detect quality regressions before users notice.
Set up sampling in production: For high-throughput applications, use LANGFUSE_SAMPLE_RATE to control trace volume. Start at 1.0 during development and reduce to 0.1-0.3 in production to manage costs.
Build evaluation datasets from production: Use the Langfuse dataset feature to curate representative examples from real traffic. These become your regression test suite for prompt changes.
Monitor cost per trace: Use the cost tracking dashboard to identify expensive model calls. Consider routing simple queries to cheaper models while reserving powerful models for complex tasks.
Flush before process exit: Always call langfuse.flush() before your application shuts down to ensure all buffered traces are sent. In serverless environments, flush at the end of each request handler.
Use sessions for multi-turn conversations: Group related traces into sessions using session_id to track entire conversation flows rather than isolated requests.
Integrate with CI/CD: Run evaluation datasets as part of your deployment pipeline. Block deployments when quality scores drop below configured thresholds.

Troubleshooting

Traces not appearing in dashboard Ensure LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are correctly set. Call langfuse.flush() explicitly and check LANGFUSE_DEBUG=true for error messages. Verify network connectivity to the Langfuse host.

High latency impact on application Langfuse SDKs batch and send traces asynchronously. If you observe latency, check that you are not calling flush() synchronously in the request path. Adjust LANGFUSE_FLUSH_AT and LANGFUSE_FLUSH_INTERVAL for your throughput.

Missing token counts or costs Token usage must be reported by the LLM integration. When using the OpenAI SDK directly, wrap calls with @observe(as_type="generation") and the SDK auto-captures usage. For custom models, manually set usage={"input_tokens": N, "output_tokens": M}.

Self-hosted deployment fails to start Verify DATABASE_URL points to a running PostgreSQL instance and CLICKHOUSE_URL is configured for Langfuse v3+. Run database migrations with langfuse-server migrate before starting the application.

Decorator nesting not working Ensure all functions in the call chain use @observe(). The SDK uses Python contextvars for trace propagation, so async functions require the @observe() decorator on the async function itself.

⚠️ Loading Issue

Master Langfuse Suite

Master Langfuse Suite

Overview

When to Use

Quick Start

Installation

Environment Configuration

Minimal Tracing Example

Core Concepts

Tracing Architecture

Decorator-Based Instrumentation

Context Manager Approach

LangChain Integration

Prompt Management

Evaluation and Scoring

Configuration Reference

Self-Hosting Configuration

Best Practices

Troubleshooting

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace