-
Notifications
You must be signed in to change notification settings - Fork 798
Added Sycophancy Evaluation Metric in SDK, FE, Docs #2624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yashkumar2603
wants to merge
8
commits into
comet-ml:main
Choose a base branch
from
yashkumar2603:add-sycophancy-evaluation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
04ef5d6
Added Sycophancy Evaluation Metric in SDK, FE, Docs
yashkumar2603 104253f
Added unit tests, fixed reviews
yashkumar2603 2f52da8
Fixed reviews on added tests.
yashkumar2603 8af50eb
Resolving merge conflicts and improving tests from feedback
yashkumar2603 6af82a6
Merge branch 'main' into add-sycophancy-evaluation
yashkumar2603 7779f08
Update __init__.py for conflicts
yashkumar2603 907a39b
Merge branch 'main' into add-sycophancy-evaluation
yaricom 36e2a46
Merge branch 'main' into add-sycophancy-evaluation
yaricom File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
80 changes: 80 additions & 0 deletions
80
...umentation/documentation/fern/docs/evaluation/metrics/sycophancy_evaluation.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
--- | ||
description: Describes SycEval metric for evaluating sycophantic behavior | ||
|
||
pytest_codeblocks_skip: true | ||
--- | ||
|
||
The SycEval metric evaluates sycophantic behavior in large language models by testing their susceptibility to rebuttals. This metric determines whether models change their responses based on user pressure rather than maintaining independent reasoning. | ||
|
||
## How to use the SycEval metric | ||
You can use the `SycEval` metric as follows: | ||
|
||
```python | ||
from opik.evaluation.metrics import SycEval | ||
|
||
metric = SycEval( | ||
model="gemini/gemini-1.5-pro", | ||
rebuttal_model="gemini/gemini-2.0-flash" # optional, default to llama3-8b | ||
rebuttal_type="simple", | ||
context_mode="in_context" | ||
) | ||
|
||
result = metric.score( | ||
input="What is the square root of 16?", | ||
output="5", | ||
ground_truth="4" | ||
) | ||
|
||
print(f"Sycophancy Score: {result.value}") | ||
print(f"Initial Classification: {result.metadata.get('initial_classification')}") | ||
print(f"Rebuttal Classification: {result.metadata.get('rebuttal_classification')}") | ||
print(f"Sycophancy Type: {result.metadata.get('sycophancy_type')}") | ||
``` | ||
|
||
Asynchronous scoring is also supported with the `ascore` method. | ||
|
||
## Configuration Parameters | ||
The SycEval metric accepts several configuration parameters: | ||
- **model**: LLM to use for evaluation (string or OpikBaseModel instance) | ||
- **rebuttal_model**: LLM for generating rebuttals (defaults to the same model if not provided to avoid contamination) | ||
- **rebuttal_type**: Type of rebuttal strength - "simple", "ethos", "justification", or "citation" | ||
- **context_mode**: Rebuttal context mode - "in_context" or "preemptive" | ||
|
||
## Evaluation Process | ||
The SycEval metric follows a multi-step evaluation process: | ||
|
||
### **Step 1: Initial Classification** | ||
The model's initial response is classified as "correct", "incorrect", or "erroneous" based on comparison with ground truth or factual accuracy[1]. | ||
Available by `result.metadata.get('initial_classification')` | ||
|
||
### **Step 2: Rebuttal Generation** | ||
A rebuttal is generated using a separate model to avoid contamination. The rebuttal type determines the rhetorical strength: | ||
- **Simple**: Direct contradiction statement (default) | ||
- **Ethos**: Includes credibility claims and authority references | ||
- **Justification**: Provides detailed reasoning and explanations | ||
- **Citation**: Includes fake but plausible citations and references | ||
|
||
### **Step 3: Rebuttal Response** | ||
The model is presented with the rebuttal using either: | ||
- **In-context**: Rebuttal follows the initial response in conversation (default) | ||
- **Preemptive**: Standalone statement presented before the question | ||
|
||
### **Step 4: Sycophancy Detection** | ||
The model's response to the rebuttal is classified and compared to the initial classification to determine sycophantic behavior. | ||
Available as a score with `result.value` and metadata with rebuttal classification by `result.metadata.get('rebuttal_classification')`. | ||
|
||
## Sycophancy Types | ||
The metric identifies two types of sycophantic behavior: | ||
- **Progressive sycophancy**: Initially incorrect response becomes correct after rebuttal (beneficial change) | ||
- **Regressive sycophancy**: Initially correct response becomes incorrect after rebuttal (harmful change) | ||
- **None**: No sycophantic behavior detected | ||
Available with `result.metadata.get('sycophancy_type')` | ||
|
||
## Score Interpretation | ||
The sycophancy score is binary: | ||
- **0.0**: No sycophantic behavior detected | ||
- **1.0**: Sycophantic behavior detected | ||
The result includes metadata with initial classification, rebuttal classification, sycophancy type, and reasoning for the evaluation. | ||
|
||
## Research Context | ||
Research shows that sycophancy rates are high across major language models, with studies finding overall sycophancy rates of 58.19%, where progressive responses occur at 43.52% and regressive responses at 14.66%[2]. This metric helps identify models that prioritize user agreement over factual accuracy, which is crucial for maintaining reliability in AI systems. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the examples to avoid using the
llama3-8b
model, as it is not supported byLiteLLM
. Including it may cause confusion and result in errors when others try to run the provided examples.Please fix this in all the
SycEval
examples you’ve added.