M

Master Langfuse Suite

All-in-one skill covering expert, langfuse, open, source. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Master Langfuse Suite

Overview

Langfuse is an open-source LLM engineering platform providing end-to-end observability, prompt management, and evaluation capabilities for AI applications. Acquired by ClickHouse in 2025, it has become the de facto standard for teams building production LLM systems who need visibility into model behavior, cost tracking, latency monitoring, and quality evaluation. Langfuse captures every interaction in your LLM pipeline as structured traces, enabling you to debug complex chains, monitor production performance, and iteratively improve prompt quality. The platform supports self-hosting via Docker or Kubernetes, and also offers a managed cloud service. Its Python and JavaScript SDKs integrate natively with LangChain, LlamaIndex, OpenAI SDK, LiteLLM, and any OpenTelemetry-instrumented library.

When to Use

  • Production LLM monitoring: Track latency, token usage, costs, and error rates across all LLM calls in real time.
  • Debugging agent workflows: Visualize multi-step agent traces, tool calls, and retrieval-augmented generation pipelines.
  • Prompt iteration: Version and A/B test prompts with linked evaluation scores.
  • Quality evaluation: Run LLM-as-a-judge evaluations, collect user feedback, and annotate outputs manually.
  • Cost optimization: Identify expensive calls, compare model costs, and optimize token usage across providers.
  • Dataset curation: Build evaluation datasets from production traces to benchmark prompt and model changes.

Quick Start

Installation

# Python SDK (OpenTelemetry-based) pip install langfuse # JavaScript/TypeScript SDK npm install langfuse

Environment Configuration

# Set credentials (cloud or self-hosted) export LANGFUSE_PUBLIC_KEY="pk-lf-..." export LANGFUSE_SECRET_KEY="sk-lf-..." export LANGFUSE_HOST="https://cloud.langfuse.com" # or your self-hosted URL

Minimal Tracing Example

from langfuse import observe, get_client langfuse = get_client() @observe() def process_query(user_input: str) -> str: """Automatically creates a trace with timing and metadata.""" result = call_llm(user_input) return result @observe(as_type="generation") def call_llm(prompt: str) -> str: """Tracked as an LLM generation with token counts.""" import openai client = openai.OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content # Run and traces appear in Langfuse dashboard result = process_query("Explain quantum computing in simple terms") langfuse.flush()

Core Concepts

Tracing Architecture

Langfuse organizes observability data into a hierarchy of Traces, Spans, Generations, and Events:

Trace (top-level request)
ā”œā”€ā”€ Span: "retrieve-documents"
│   ā”œā”€ā”€ Span: "embed-query"
│   └── Span: "vector-search"
ā”œā”€ā”€ Generation: "llm-call" (model, tokens, cost)
└── Event: "user-feedback-received"

Decorator-Based Instrumentation

from langfuse import observe @observe() def rag_pipeline(query: str) -> str: documents = retrieve_docs(query) context = format_context(documents) return generate_answer(query, context) @observe() def retrieve_docs(query: str) -> list: embedding = embed_query(query) return vector_store.search(embedding, top_k=5) @observe(as_type="generation") def generate_answer(query: str, context: str) -> str: import openai client = openai.OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": f"Context: {context}"}, {"role": "user", "content": query} ] ) return response.choices[0].message.content

Context Manager Approach

from langfuse import get_client langfuse = get_client() with langfuse.start_as_current_observation( as_type="span", name="data-pipeline" ) as span: data = fetch_data() span.update(metadata={"record_count": len(data)}) with langfuse.start_as_current_observation( as_type="generation", name="summarize", model="gpt-4o-mini" ) as gen: summary = summarize(data) gen.update( output=summary, usage={"input_tokens": 500, "output_tokens": 150} ) langfuse.flush()

LangChain Integration

from langfuse.callback import CallbackHandler langfuse_handler = CallbackHandler( public_key="pk-lf-...", secret_key="sk-lf-...", host="https://cloud.langfuse.com" ) # Pass to any LangChain runnable from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate chain = ChatPromptTemplate.from_template("Explain {topic}") | ChatOpenAI() result = chain.invoke( {"topic": "neural networks"}, config={"callbacks": [langfuse_handler]} )

Prompt Management

from langfuse import get_client langfuse = get_client() # Fetch versioned prompt from Langfuse prompt = langfuse.get_prompt("summarization-prompt", version=3) # Use in your application compiled = prompt.compile(document=my_document, max_length=200)

Evaluation and Scoring

from langfuse import get_client langfuse = get_client() # Attach scores to traces langfuse.score( trace_id="trace-abc-123", name="relevance", value=0.92, comment="Highly relevant response" ) # User feedback scoring langfuse.score( trace_id="trace-abc-123", name="user-thumbs", value=1, # thumbs up data_type="BOOLEAN" )

Configuration Reference

ParameterDescriptionDefault
LANGFUSE_PUBLIC_KEYProject public key from Langfuse dashboardRequired
LANGFUSE_SECRET_KEYProject secret key from Langfuse dashboardRequired
LANGFUSE_HOSTLangfuse server URLhttps://cloud.langfuse.com
LANGFUSE_RELEASEApplication release/version tagNone
LANGFUSE_DEBUGEnable debug loggingfalse
LANGFUSE_SAMPLE_RATETrace sampling rate (0.0 to 1.0)1.0
LANGFUSE_FLUSH_ATBatch size before flush15
LANGFUSE_FLUSH_INTERVALFlush interval in seconds0.5
LANGFUSE_ENABLEDEnable/disable tracing globallytrue

Self-Hosting Configuration

ParameterDescriptionDefault
DATABASE_URLPostgreSQL connection stringRequired
CLICKHOUSE_URLClickHouse connection string (v3+)Required
NEXTAUTH_SECRETAuth encryption secretRequired
SALTAPI key hashing saltRequired
NEXTAUTH_URLApplication base URLhttp://localhost:3000
LANGFUSE_S3_BUCKETS3 bucket for media storageOptional

Best Practices

  1. Use the @observe() decorator liberally: Wrap every meaningful function in your LLM pipeline. The overhead is minimal and the debugging value is enormous. Nested decorators automatically create parent-child trace relationships.

  2. Attach metadata to traces: Include user IDs, session IDs, and request metadata so you can filter and segment traces in the dashboard. Use langfuse.update_current_observation(metadata={...}) within decorated functions.

  3. Version your prompts in Langfuse: Store prompts in the Langfuse prompt management system rather than hardcoding them. This enables A/B testing, rollback, and linking evaluation scores to specific prompt versions.

  4. Implement structured evaluation: Combine automated LLM-as-a-judge scoring with human annotation workflows. Track scores over time to detect quality regressions before users notice.

  5. Set up sampling in production: For high-throughput applications, use LANGFUSE_SAMPLE_RATE to control trace volume. Start at 1.0 during development and reduce to 0.1-0.3 in production to manage costs.

  6. Build evaluation datasets from production: Use the Langfuse dataset feature to curate representative examples from real traffic. These become your regression test suite for prompt changes.

  7. Monitor cost per trace: Use the cost tracking dashboard to identify expensive model calls. Consider routing simple queries to cheaper models while reserving powerful models for complex tasks.

  8. Flush before process exit: Always call langfuse.flush() before your application shuts down to ensure all buffered traces are sent. In serverless environments, flush at the end of each request handler.

  9. Use sessions for multi-turn conversations: Group related traces into sessions using session_id to track entire conversation flows rather than isolated requests.

  10. Integrate with CI/CD: Run evaluation datasets as part of your deployment pipeline. Block deployments when quality scores drop below configured thresholds.

Troubleshooting

Traces not appearing in dashboard Ensure LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are correctly set. Call langfuse.flush() explicitly and check LANGFUSE_DEBUG=true for error messages. Verify network connectivity to the Langfuse host.

High latency impact on application Langfuse SDKs batch and send traces asynchronously. If you observe latency, check that you are not calling flush() synchronously in the request path. Adjust LANGFUSE_FLUSH_AT and LANGFUSE_FLUSH_INTERVAL for your throughput.

Missing token counts or costs Token usage must be reported by the LLM integration. When using the OpenAI SDK directly, wrap calls with @observe(as_type="generation") and the SDK auto-captures usage. For custom models, manually set usage={"input_tokens": N, "output_tokens": M}.

Self-hosted deployment fails to start Verify DATABASE_URL points to a running PostgreSQL instance and CLICKHOUSE_URL is configured for Langfuse v3+. Run database migrations with langfuse-server migrate before starting the application.

Decorator nesting not working Ensure all functions in the call chain use @observe(). The SDK uses Python contextvars for trace propagation, so async functions require the @observe() decorator on the async function itself.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates