Skip to content

Conversation

weizhehuang0827
Copy link
Collaborator

  • Separate time prediction for prefill and decode sequences.

  • The constant term in linear regression does not accumulate as the number of sequences in a batch increases.

  • Prediction accuracy test:

    • In the prefill-only scenario, MAPE error: ~12% -> ~8%
    • In the decode-only scenario, MAPE error: very large -> ~5%
    • In the mixed prefill and decode scenario, MAPE error: very large -> ~7%

@weizhehuang0827 weizhehuang0827 changed the title feat: improving overall prediction accuracy through separating prefill and decode. feat: improve overall prediction accuracy through separating prefill and decode. Sep 26, 2025
@yq33victor yq33victor merged commit 13f773b into jd-opensource:main Sep 28, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants