Add Response Specificity, Engagement, Topic Consistency, and Helpfulness Metrics #2699

ultimus11 · 2025-07-09T19:38:02Z

Summary

This PR introduces four new LLM-based conversational quality metrics to Opik:

ResponseSpecificityMetric
- Scores based on how specific or vague the assistant's reply is.
- Penalizes generic responses like “It depends,” “I'm not sure,” etc.
EngagementScoreMetric
- Evaluates if the assistant's responses keep the user engaged.
- Checks for emotional tone, follow-up questions, and curiosity.
TopicConsistencyMetric
- Assesses whether the conversation stays on topic across multiple turns.
- Identifies derailments or topic shifts in responses.
ResponseHelpfulnessMetric
- Determines how actionable and useful the assistant's responses are.

Implementation Details

Each metric:
- Inherits from ConversationThreadMetric
- Uses LLM-based templates to extract verdicts and reasons
- Implements both synchronous (score) and async (ascore) scoring
- Provides robust fallback error handling and JSON schema validation
Added corresponding:
- Schema classes (schema.py)
- Prompt templates (templates.py)
- Imports in __init__.py
- Inline docstrings (where applicable)

Why This Is Valuable

These new metrics extend Opik’s ability to evaluate nuanced aspects of assistant conversations, improving its effectiveness in monitoring dialogue quality for chatbots and LLM-based systems.

Notes

Can extend with additional few-shot examples or fine-tuned prompt logic later if needed.
Fully modular with async support for scalability.

Let me know if maintainers have any specific checklist (e.g., test coverage or changelog updates).

…ess metrics

alexkuzmik · 2025-07-11T09:43:34Z

Hi @ultimus11! You probably forgot to push the implementation:)

Add Response Specificity, Engagement, Topic Consistency, and Helpfuln…

f3c2d5a

…ess metrics

ultimus11 requested a review from a team as a code owner July 9, 2025 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Response Specificity, Engagement, Topic Consistency, and Helpfulness Metrics #2699

Add Response Specificity, Engagement, Topic Consistency, and Helpfulness Metrics #2699

ultimus11 commented Jul 9, 2025

Uh oh!

alexkuzmik commented Jul 11, 2025

Uh oh!

Uh oh!

Add Response Specificity, Engagement, Topic Consistency, and Helpfulness Metrics #2699

Are you sure you want to change the base?

Add Response Specificity, Engagement, Topic Consistency, and Helpfulness Metrics #2699

Conversation

ultimus11 commented Jul 9, 2025

Summary

Implementation Details

Why This Is Valuable

Notes

Uh oh!

alexkuzmik commented Jul 11, 2025

Uh oh!

Uh oh!