FAQ

Evaluator Migration Guide

Langfuse has introduced running evaluators on observations as the recommended approach for live data LLM-as-a-Judge evaluations. This guide helps you migrate existing live data evaluators to the new system.

Why Migrate?

Benefits of Observation-Level Evaluators

1. Better Performance

  • Reduced database load enables faster evaluation processing
  • Scales better under high-volume workloads

2. Improved Reliability

  • More predictable behavior with evaluation targeting specific operations
  • Better error handling and retry logic

3. Greater Control

  • Evaluate specific observations (LLM calls, tool invocations, etc.) rather than entire traces
  • More precise filtering
  • Easier debugging when evaluations fail

4. Future-Proof

  • Built on Langfuse’s next-generation evaluation architecture

Understanding the Trade-offs

We recognize this migration may require work on your end. Here’s our perspective:

  • You can keep running evaluators on traces: They will continue to work for the foreseeable future
  • Some users benefit more than others: High-volume users or those with complex traces will see the biggest improvements
  • This enables long-term improvements: The architectural change allows us to build better, simpler features for everyone
  • We’re here to help: Use the built-in migration wizard and this guide

When to Migrate

✅ Migrate Now If:

  • You are running on or planning to upgrade to SDK >= 4.4.0 (JavaScript) or >= 3.9.0 (Python)
  • You are experiencing performance issues with current evaluators
  • You are setting up new evaluators and want the best experience

⏸️ Wait If:

  • You are blocked from upgrading to SDK >= 4.4.0 (JavaScript) or >= 3.9.0 (Python)
  • Your current evaluators perfectly for your use case

Migration Process

Step 1: Check Your SDK Version

Ensure you’re running a compatible SDK version:

pip show langfuse
# Required: >= 3.9.0

To upgrade:

pip install --upgrade langfuse

Step 2: Use the Upgrade Wizard

Langfuse provides a built-in wizard to migrate your evaluators.

  1. Navigate to your evaluators page

    • Go to your project → Evaluation → LLM-as-a-Judge
    • You’ll see a callout for evaluators marked “Legacy”
  2. Click “Upgrade” on any legacy evaluator

    • This opens the migration wizard
    • The wizard shows your current configuration on the left
  3. Review the migrated configuration

    • Left side: Your current (legacy) configuration (read-only)
    • Right side: Proposed configuration (editable)
  4. Adjust the new configuration

    • Filters: Add filters to narrow down the evaluation to a specific subset of data you’re interested in (observation type, trace name, trace tags, userId, sessionId, metadata etc.)
    • Variable Mapping: Map variables from observation fields (input, output, metadata) to your evaluation prompt
  5. Choose what happens to the old evaluator

    • Keep both active: Test the new evaluator alongside the old one
    • Mark old as inactive (recommended initially): Old evaluator stops running, new one takes over
    • Delete old evaluator: Permanently remove the legacy evaluator

Step 3: Verify Evaluator Execution

Verify the new evaluator works correctly:

  1. Check execution metrics

    • Go to Evaluator Table → find new evaluator row → click “Logs”
    • View execution logs
  2. Compare results (if running both)

    • Review scores from both legacy and new evaluators. You might find our score analytics helpful to compare the results.
    • Ensure consistency in evaluation logic

Migration Examples

Example 1: Simple Trace Evaluator

Likely, your trace input/output is equivalent to a observation’s input/output within that same trace. Your evaluator should now target this observation directly. In this example, let’s assume you have a generation observation named “chat-completion” that holds the same input/output as your trace.

Before (Trace-level):

Target: Traces
Filter: trace.name = "chat-completion"
Variables:
  - user_query: trace.input
  - assistant_response: trace.output

After (Observation-level):

Target: Observations
Filter:  trace.name = "chat-completion" AND observation.type = "generation" AND observation.name = "chat-completion"
Variables:
  - user_query: observation.input
  - assistant_response: observation.output

Key Changes:

  • Additional filters at observation level to identify the specific observation you want to evaluate in the trace tree
  • Variables come from observation instead of trace (e.g. observation.input and observation.output)

Troubleshooting

Variables Don’t Map Correctly

Problem: You were mapping variables from two different observations

Solution:

  • If possible, store necessary context in a single observation metadata during instrumentation
  • Consider breaking your single trace evaluator into multiple observation evaluators
  • Do not migrate your evaluator now. We do not yet have a translation for the new system, but are actively working on it.

SDK Version-Specific Guidance

For Users on Old SDK Versions (Python < 3.9.0, JavaScript < 4.4.0)

You have two options:

Option 1: Upgrade Your SDK (Recommended)

  1. Update to latest SDK version
  2. Migrate evaluators using the wizard

Option 2: Continue with Evaluators Running on Traces

  • No changes needed
  • Evaluators will continue to work
  • Note: You may continue using trace evaluators on the new SDK version, but you will not get performance improvements.

For Users on New SDK Versions (Python >= 3.9.0, JavaScript >= 4.4.0)

If you have existing evaluators running on traces:

Option 1: Migrate to Observations (Recommended)

  • Follow the migration wizard
  • Get full benefits of new architecture

Getting Help

Rollback Plan

If you need to revert after migration:

  1. If you kept both evaluators: Simply mark the new one as inactive
  2. If you deleted the old evaluator: Create a new evaluator with the old configuration
  3. Data is preserved: All historical evaluation results remain accessible

Last updated: January 29, 2026

Was this page helpful?