How Conversational History Geometrically Traps LLMs
How does conversational history influence LLM behavior? Carryover effect indicates that once phenomena like hallucinations, refusal, or sycophancy manifest, they tend to persist across subsequent turns. We introduce HISTORY-ECHOES, a framework for investigating the carryover effect. Our framework contains two perspectives: probabilistically, we model conversations as Markov chains; geometrically, we analyze hidden representations. Our key finding: these perspectives strongly correlate, revealing that behavioral persistence manifests as a geometric trap where gaps in latent space confine the model's trajectory.
Click through to see how we analyze conversational history through two complementary perspectives.
Step 1: We start with a conversation where the model exhibits a phenomenon (φ⁺ = hallucination) after initially being correct (φ⁻). Notice how the phenomenon persists across turns.
We introduce a novel framework combining probabilistic Markov chain analysis (Tr(T) > 1 indicates persistence) with geometric analysis of hidden states (θ_ref measures separation).
Spearman correlation of 0.78 across 3 models and 6 datasets between the probabilistic & geometric perspectives.
Closed Models: GPT-5 and Claude Opus 4.5 exhibit probabilistic patterns relatively similar with open-weight models, indication that closed models may also be subject to internal geometric traps.
1.Refusal exhibits the strongest carryover effect, hallucination the weakest.
2.Context coherence is impoartant—inconsistent conversations dissolve the geometric trap.
| Phenomenon | Dataset | Tr(T) ↑ | θ_ref (°) ↑ | Interpretation |
|---|---|---|---|---|
| Refusal | Sorry | 1.57 | 51.87 | Strongest carryover |
| Refusal | Do-Not-Answer | 1.59 | 42.29 | Strongest carryover |
| Sycophancy | S-pos | 1.33 | 21.63 | Moderate carryover |
| Sycophancy | S-neg | 1.14 | 24.80 | Moderate carryover |
| Hallucination | NaturalQA | 1.13 | 10.88 | Weakest carryover |
| Hallucination | TriviaQA | 1.12 | 11.12 | Weakest carryover |
Values averaged across LLaMA-3.1-8B, Qwen-8B, and GPT-OSS-20B.