-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Description
Identify the file to be fixed
Likely in the chat/completions inference stack . The issue seems related to context window handling, token prioritization, or truncation policies.
Describe the problem
In GPT-5 (Sept 2025 release), multi-turn interactions lose previously supplied user context. Specifically, after structured input is injected (e.g., academic details), the model correctly generates an “Area of Interest” section. However, in the next turn, when asked for a Statement of Purpose using the same details, the model either:
Repeats the previous output (ignores new instruction), or
Forgets the provided details and asks the user to re-supply them.
This suggests short-term context degradation or recency-biased attention weighting. GPT-4 did not show this behavior, indicating a regression.
Describe a solution
Review context retention logic in the chat/completions endpoint, especially token compression strategies and attention window prioritization.
Adjust alignment / instruction-following tuning to prevent repetition of earlier outputs when a new but related task is requested.
Implement causal consistency checks to ensure tokens injected earlier in the same session are preserved for subsequent generations.
Benchmark GPT-5 vs GPT-4 on multi-step prompt suites (document generation, iterative editing).
Screenshots
N/A — issue is text-based and reproducible via API logs.
Additional context
Environment: GPT-5 (Sept 2025), API chat/completions endpoint.
Session type: multi-turn, no persistent memory enabled.
Severity: High — breaks professional/research workflows dependent on iterative context (e.g., multi-section academic docs, technical specifications).
Possible root cause: aggressive context compression, mis-weighted recency bias, or token persistence loss in session manager.