Skip to content

Missing Async Evaluation API for Non-Blocking Operations #400

@prajwalun

Description

@prajwalun

Title: Missing async evaluation API for non-blocking agent operations

Description:
I'm building a resume analysis agent that needs to maintain sub-second response times for user experience. The current async_evaluate() function still blocks the main thread in some scenarios, preventing truly non-blocking evaluation in production agents.

Problem:
My agent needs to run evaluations without impacting user response times. I had to implement my own non-blocking evaluation pattern:

def _trigger_judgment_evaluation(self, original_content, improved_content, section_type, job_analysis, primary_score, iteration):
    """JUDGMENT FRAMEWORK: Comprehensive async evaluation that doesn't block agent flow."""
    import asyncio
    
    # Run judgment evaluation asynchronously (non-blocking)
    asyncio.create_task(self._run_comprehensive_evaluation(
        original_content=original_content,
        improved_content=improved_content,
        section_type=section_type,
        job_analysis=job_analysis,
        primary_score=primary_score,
        iteration=iteration
    ))

Expected Behavior:
A truly fire-and-forget evaluation API that doesn't impact response times:

# Option 1: Fire-and-forget API
judgment.fire_evaluate(
    scorers=[AnswerRelevancyScorer(threshold=0.5)],
    input=task_input,
    actual_output=res,
    model="gpt-4.1"
)

# Option 2: Background evaluation with callback
judgment.background_evaluate(
    scorers=[AnswerRelevancyScorer(threshold=0.5)],
    input=task_input,
    actual_output=res,
    model="gpt-4.1",
    callback=handle_evaluation_result
)

Current Behavior:
async_evaluate() can still cause delays, especially with:

  • Complex evaluations with multiple scorers
  • Slow Judgment service response times
  • Network latency issues
  • Large input/output data

Impact:
This prevents building truly responsive agents that need to maintain fast user experience while still getting the benefits of comprehensive evaluation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions