Skip to content

[FEATURE] Add Support for LlamaIndex OpenAI Client in Judgeval #455

@weicheng112

Description

@weicheng112

Is your feature request related to a problem? Please describe.

When using judgeval with LlamaIndex's ReActAgent, I'm unable to trace the LLM calls made by the agent because judgeval's wrap() function doesn't support the llama_index.llms.openai.OpenAI client type. While I can successfully trace tool functions with @judgment.observe(span_type="tool") decorators and the main agent function with @judgment.observe(span_type="function"), I cannot trace the actual LLM interactions that drive the agent's reasoning process.

Describe the solution you'd like

I would like judgeval to add support for wrapping the llama_index.llms.openai.OpenAI client type in its wrap() function. This would allow for complete tracing of LlamaIndex-based agents, including all LLM calls made by the agent, while maintaining the existing tool and function tracing capabilities. Ideally, the solution would work seamlessly with the existing judgeval API, allowing users to simply do:

from llama_index.llms.openai import OpenAI
from judgeval.tracer import wrap

llm = OpenAI(model="gpt-4o", temperature=0.0)
wrapped_llm = wrap(llm)

Describe alternatives you've considered

I've tried several things:

  1. Using only the @judgment.observe decorators on tool functions and the main agent function, which works well for tracing the flow of execution and tool usage, but misses the actual LLM calls.

  2. Using a separate wrapped OpenAI client for direct LLM calls alongside the LlamaIndex agent, but this creates duplicate LLM calls and doesn't capture the actual calls made by the agent.

None of these alternatives provide a complete solution that captures both tool usage and LLM calls in an integrated way.

Which component(s) does this affect?

  • SDK (open for community contributions)
  • Website (internal development only)
  • Documentation (open for community contributions)
  • Not sure

Use case and impact

I'm building a Blackjack decision agent using LlamaIndex's ReActAgent and want to trace its performance using judgeval. Currently, I can successfully trace:

  • All tool function calls (when they're called, with what inputs, and what outputs they return)
  • The main agent function execution (start, end, total duration)

However, I cannot trace:

  • The actual LLM calls made by the agent

This feature would complete the tracing picture, allowing for comprehensive analysis of the agent's performance, reasoning, and cost.

Proposed API/Interface (if applicable)

The ideal API would maintain consistency with the current judgeval API:

from llama_index.llms.openai import OpenAI
from llama_index.core.agent import ReActAgent
from judgeval.tracer import Tracer, wrap

# Initialize Judgment tracer
judgment = Tracer(project_name="my_project")

# Wrap LlamaIndex OpenAI client
llm = OpenAI(model="gpt-4o", temperature=0.0)
wrapped_llm = wrap(llm)

# Use wrapped LLM with ReActAgent
agent = ReActAgent.from_tools(tools, llm=wrapped_llm, verbose=True)

Additional context

The current error when attempting to wrap a LlamaIndex OpenAI client is:

ValueError: Unsupported client type: <class 'llama_index.llms.openai.base.OpenAI'>

This occurs in the _get_client_config function in judgeval/common/tracer.py.

It's worth noting that the current implementation with @judgment.observe decorators on tool functions still provides valuable tracing capabilities for understanding the flow of execution and tool usage patterns. Adding support for wrapping the LlamaIndex OpenAI client would complement these existing capabilities rather than replace them.

Are you interested in contributing this feature?

The Judgment community is happy to provide guidance and review for contributions via Discord.

  • Yes, I'd like to implement this
  • Yes, I'd like to help with design/planning
  • No, but I'd be happy to test it
  • No

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions