-
Notifications
You must be signed in to change notification settings - Fork 77
Add litellm integration #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @rishi763, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
As gemini-code-assist, I've reviewed this pull request which introduces integration with the LiteLLM library. The core change is the addition of a custom LiteLLM callback handler (JudgevalLitellmCallbackHandler
) that hooks into LiteLLM's API call lifecycle. This handler creates and updates judgeval
trace spans for each LiteLLM interaction, capturing inputs, outputs, duration, and errors.
A significant part of this integration involved implementing a save coordination mechanism within the judgeval
tracer. Because LiteLLM operations can be asynchronous and might complete after the main trace context manager exits, the tracer now includes logic to defer trace saving until all pending LiteLLM operations associated with that trace have completed. This ensures that LLM call details are fully captured before the trace is finalized and saved.
The pull request also includes a comprehensive test suite (test_litellm.py
) to validate the callback handler's functionality, span creation, error handling, save coordination, and end-to-end tracing with mock LiteLLM calls.
Highlights
- LiteLLM Integration: Adds a new
JudgevalLitellmCallbackHandler
that integrates with LiteLLM's custom logger mechanism to automatically trace LLM calls made via LiteLLM. - LLM Span Creation: The callback handler creates 'llm' type spans for each LiteLLM API call, recording the model, messages (inputs), response content (output), duration, and any errors.
- Save Coordination: Introduces a mechanism in the
Tracer
andTraceClient
to defer trace saving when LiteLLM operations are in progress, ensuring that asynchronous LLM call results are included in the final trace data. - Comprehensive Testing: A new test file (
test_litellm.py
) is added with multiple tests covering the callback handler's lifecycle, span updates, error handling, save deferral, and end-to-end integration.
Changelog
Click here to see the changelog
- src/e2etests/test_litellm.py
- Adds new end-to-end tests for the LiteLLM integration.
- Includes tests for callback handler creation and registration.
- Tests span creation, updates, and error handling for LiteLLM calls.
- Validates the save coordination mechanism to ensure traces are saved after LiteLLM operations complete.
- Tests tracing multiple LiteLLM calls within a single trace.
- src/judgeval/common/tracer.py
- Adds
_safe_to_save
,_deferred_save_pending
,_deferred_save_args
, and_save_lock
attributes toTracer
for save coordination. - Introduces
litellm_operation_started
andlitellm_operation_completed
methods inTracer
to manage the save state. - Modifies
TraceClient.save
to check the_safe_to_save
flag and defer saving if necessary. - Extracts the actual save logic into a new private method
_perform_actual_save
. - Adds an optional
span_id
parameter toTraceClient.async_evaluate
(line 404) to allow external handlers to link evaluations to specific spans. - Includes minor formatting and comment adjustments.
- Adds
- src/judgeval/integrations/litellm_integration.py
- Adds a new file containing the
JudgevalLitellmCallbackHandler
class. - Implements the
litellm.integrations.custom_logger.CustomLogger
interface. - Hooks into
log_pre_api_call
,log_success_event
,log_failure_event
, and their async counterparts. - Creates and updates
TraceSpan
objects within the currentjudgeval
trace for each LiteLLM call. - Signals the
Tracer
when a LiteLLM operation starts and completes to manage save coordination.
- Adds a new file containing the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
LLM speaks, a stream flows,
Tracer watches, where thought goes.
Span captures the light,
Through day and through night,
Till the final token shows.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
Overall, this pull request introduces the LiteLLM integration and adds corresponding tests. The tests cover various aspects of the integration, which is good. The changes in tracer.py
implement the necessary context propagation and save coordination logic. The core idea of using a LiteLLM callback handler to integrate with the tracer is a sound approach.
I've reviewed the code focusing on correctness, efficiency, maintainability, and security, keeping in mind the request to comment on medium
, high
, and critical
severity issues. I've also considered common Python best practices, defaulting to principles aligned with PEP 8 where applicable, as no specific style guide was provided.
While the functionality seems to be implemented, there are a few areas related to maintainability and code structure that could be improved before merging. Please see the specific comments below.
Summary of Findings
- Broad Exception Handling: The LiteLLM callback handler uses a bare
except:
block, which is too broad and can hide unexpected errors. - Misplaced Function Definition: The
log
function intracer.py
is defined outside a class but usesself
, indicating it's intended as a method but is incorrectly placed. - Coupling of Tracer to Integration: The core
Tracer
class includes LiteLLM-specific methods and state for save coordination, creating tight coupling. - Design of Save Coordination: The logic for deferring saves based on LiteLLM operations is embedded in
TraceClient.save
, raising questions about the best place for this coordination logic. - Modification of Frame Locals: The deep tracing mechanism modifies frame local variables to manage context, which can be less predictable than other methods.
- Complexity of API Client Wrapping: The logic for wrapping various API clients and extracting data, particularly from streams, is complex and potentially fragile to changes in client libraries.
- Unusual Test File Structure: The test file
test_litellm.py
includes anif __name__ == "__main__":
block, which is not standard for test files. (Not commented on due to severity settings). - Debug Print Statements: The LiteLLM callback handler includes numerous
print
statements for debugging that should be replaced with proper logging. (Not commented on due to severity settings). - Fragile Type Check: The
_finish_span
method useshasattr
to check ifstart_time
is a datetime object, which is less explicit thanisinstance
. (Not commented on due to severity settings). - Commented Out Code/Warnings: There is commented out code and warnings in
tracer.py
that should either be removed or uncommented/addressed. (Not commented on due to severity settings). - Minor Formatting: There are minor formatting changes (blank lines, comment alignment) that don't affect functionality. (Not commented on due to severity settings).
Merge Readiness
This pull request introduces significant new functionality and includes good test coverage. However, there are several high and medium severity issues identified related to error handling, code structure, and maintainability, particularly concerning the coupling between the core tracer and the LiteLLM integration, and the complexity of the API client wrapping. These issues should be addressed before merging to ensure the stability and maintainability of the codebase. I am unable to approve this pull request; please have other reviewers review and approve this code after the suggested changes are made.
# Check if it's safe to save (no pending LiteLLM operations) | ||
with self.tracer._save_lock: | ||
if not self.tracer._safe_to_save: | ||
# Defer the save | ||
self.tracer._deferred_save_pending = True | ||
self.tracer._deferred_save_args = (self, overwrite) | ||
return self.trace_id, {} # Return placeholder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TraceClient.save
method now includes logic specifically for coordinating with LiteLLM operations (_safe_to_save
, _deferred_save_pending
, _deferred_save_args
, _save_lock
). While necessary for the integration, placing this specific coordination logic directly within the core TraceClient
couples it tightly to the LiteLLM integration. Could this save coordination mechanism be abstracted into a more generic pattern that other integrations could also use, or perhaps managed more directly by the Tracer
instance based on signals from integrations?
token = current_span_var.set(span_id) | ||
frame.f_locals["_judgment_span_token"] = token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deep tracing mechanism stores the context variable token (_judgment_span_token
) directly in the frame's local variables. While this is a known technique for context propagation in some scenarios, modifying frame locals can sometimes be less predictable or have unintended interactions compared to other methods. Consider if there's an alternative approach to managing the context variable token within the deep tracing logic that avoids direct frame local modification, potentially leveraging contextvars
features more directly or using a different mechanism for passing state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to your PR - we can check this out later
def litellm_operation_started(self): | ||
"""Called by LiteLLM handler when an operation starts""" | ||
with self._save_lock: | ||
self._safe_to_save = False | ||
|
||
def litellm_operation_completed(self): | ||
"""Called by LiteLLM handler when an operation completes""" | ||
deferred_save_info = None | ||
|
||
with self._save_lock: | ||
self._safe_to_save = True | ||
if self._deferred_save_pending: | ||
# Get the deferred save info | ||
deferred_save_info = self._deferred_save_args | ||
self._deferred_save_pending = False | ||
self._deferred_save_args = None | ||
|
||
# Execute deferred save outside lock if needed | ||
if deferred_save_info: | ||
trace_client, overwrite = deferred_save_info | ||
trace_client._perform_actual_save(overwrite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Tracer
class now includes methods (litellm_operation_started
, litellm_operation_completed
) and state (_safe_to_save
, _deferred_save_pending
, _deferred_save_args
, _save_lock
) specifically for coordinating saves with the LiteLLM integration. This introduces coupling between the core Tracer
and a specific integration. Ideally, the core tracing mechanism should be integration-agnostic. Could the LiteLLM integration interact with the Tracer
using a more generic interface (e.g., tracer.defer_save()
, tracer.allow_save()
) rather than the Tracer
having explicit LiteLLM methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alanzhang25 I was thinking about doing this but I wasn't sure about the tradeoff of generality vs making it easy to know why we have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think switch it to generality but leave a comment somewhere for internal documentation that this is used by litellm
Can the person reviewing fix the e2etest? I can't get ts to work on my setup, but I'm certain the integration works it's just the test that is messed up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Some questions, and also wondering if I can get more info on the locking logic
self.tracer.litellm_operation_started() | ||
|
||
span_id = str(uuid.uuid4()) | ||
self._current_span_id = span_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt there a method for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not getting the current span id of the tracer? Not sure what you mean here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant for setting the span_id
created_at=time.time(), | ||
span_type="llm" | ||
) | ||
span.inputs = {"model": model, "messages": messages} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to include model as part of the input? Maybe take a look at how we save usage in tracer.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should, I'm not sure how tracer would be able to get the model from LiteLLM since model used away
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be related to how we track usage and token cost? What do you mean by " LiteLLM since model used away"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some requests and questions!
mock_response.choices[0].message.content = content | ||
mock_response.model = "gpt-4o" | ||
mock_response.usage = Mock() | ||
mock_response.usage.prompt_tokens = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do these token numbers come from?
updated_span = trace_client.span_id_to_span[span_id] | ||
assert updated_span.duration == 1.0 | ||
assert "API Error" in str(updated_span.output) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a check for the error
field on the TraceSpan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
assert isinstance(result, str) | ||
assert len(result) > 0 | ||
print(f"LiteLLM Response: {result}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add checks to see if we are tracking correctly
def mock_save(overwrite=False): | ||
save_calls.append(("save", overwrite)) | ||
# Check if we should defer | ||
with tracer._save_lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are these private methods from?
"""Post-processing - no action needed""" | ||
pass | ||
|
||
def _finish_span(self, response_obj, start_time, end_time, error=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we tracking token usage and cost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this, didn't realise we tracked this, will do!
token = current_span_var.set(span_id) | ||
frame.f_locals["_judgment_span_token"] = token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to your PR - we can check this out later
self.class_identifiers: Dict[str, str] = {} | ||
|
||
# LiteLLM save coordination | ||
self._safe_to_save = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are running calls in parallel, how will this affect the locking?
Will the locking prevent saving the spans at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. From my discussions with Minh, he said there should be a single Tracer object. I think we can handle parallel calls by storing a variable of how many pending calls vs how many resolved calls there have been.
def litellm_operation_started(self): | ||
"""Called by LiteLLM handler when an operation starts""" | ||
with self._save_lock: | ||
self._safe_to_save = False | ||
|
||
def litellm_operation_completed(self): | ||
"""Called by LiteLLM handler when an operation completes""" | ||
deferred_save_info = None | ||
|
||
with self._save_lock: | ||
self._safe_to_save = True | ||
if self._deferred_save_pending: | ||
# Get the deferred save info | ||
deferred_save_info = self._deferred_save_args | ||
self._deferred_save_pending = False | ||
self._deferred_save_args = None | ||
|
||
# Execute deferred save outside lock if needed | ||
if deferred_save_info: | ||
trace_client, overwrite = deferred_save_info | ||
trace_client._perform_actual_save(overwrite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.