Skip to content

Add Response Specificity, Engagement, Topic Consistency, and Helpfulness Metrics #2699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ultimus11
Copy link

Summary

This PR introduces four new LLM-based conversational quality metrics to Opik:

  1. ResponseSpecificityMetric

    • Scores based on how specific or vague the assistant's reply is.
    • Penalizes generic responses like “It depends,” “I'm not sure,” etc.
  2. EngagementScoreMetric

    • Evaluates if the assistant's responses keep the user engaged.
    • Checks for emotional tone, follow-up questions, and curiosity.
  3. TopicConsistencyMetric

    • Assesses whether the conversation stays on topic across multiple turns.
    • Identifies derailments or topic shifts in responses.
  4. ResponseHelpfulnessMetric

    • Determines how actionable and useful the assistant's responses are.

Implementation Details

  • Each metric:

    • Inherits from ConversationThreadMetric
    • Uses LLM-based templates to extract verdicts and reasons
    • Implements both synchronous (score) and async (ascore) scoring
    • Provides robust fallback error handling and JSON schema validation
  • Added corresponding:

    • Schema classes (schema.py)
    • Prompt templates (templates.py)
    • Imports in __init__.py
    • Inline docstrings (where applicable)

Why This Is Valuable

These new metrics extend Opik’s ability to evaluate nuanced aspects of assistant conversations, improving its effectiveness in monitoring dialogue quality for chatbots and LLM-based systems.

Notes

  • Can extend with additional few-shot examples or fine-tuned prompt logic later if needed.
  • Fully modular with async support for scalability.

Let me know if maintainers have any specific checklist (e.g., test coverage or changelog updates).

@ultimus11 ultimus11 requested a review from a team as a code owner July 9, 2025 19:38
@alexkuzmik
Copy link
Collaborator

Hi @ultimus11! You probably forgot to push the implementation:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants