of birds in flight.
There’s no leader. No central command. Each bird aligns with its neighbors—matching direction, adjusting speed, maintaining coherence through purely local coordination. The result’s global order emerging from local consistency.
Now imagine one bird flying with the identical conviction because the others. Its wingbeats are confident. Its speed is correct. But its direction doesn’t match its neighbors. It’s the red bird.
It’s not lost. It’s not hesitating. It simply doesn’t belong to the flock.
Hallucinations in LLMs are red birds.
The issue we’re actually trying to unravel
LLMs generate fluent, confident text that will contain fabricated information. They devise legal cases that don’t exist. They cite papers that were never written. They state facts with the identical tone whether those facts are true or completely made up.
The usual approach to detecting that is to ask one other language model to envision the output. LLM-as-judge. You’ll be able to see the issue immediately: we’re using a system that hallucinates to detect hallucinations. It’s like asking someone who can’t distinguish colours to sort paint samples. They’ll provide you with a solution. It’d even be right sometimes. But they’re not actually seeing what you would like them to see.
The query we asked was different: can we detect hallucinations from the geometric structure of the text itself, while not having one other language model’s opinion?
What embeddings actually do
Before attending to the detection method, I would like to step back and establish what we’re working with.
Whenever you feed text right into a sentence encoder, you get back a vector—a degree in high-dimensional space. Texts which might be semantically similar land near one another. Texts which might be unrelated land far apart. That is what contrastive training optimizes for. But there’s a more subtle tructure than simply “similar things are close.”
Consider what happens whenever you embed a matter and its answer. The query lands somewhere on this embeddings space. The reply lands elsewhere. The vector connecting them—what we call the —points in a specific direction. We now have a vector: a magnitude and an angle.
We also observed that for grounded responses inside a particular domain, these displacement vectors point in consistent directions. We now have found something in common: angles.
In case you ask five similar questions and get five grounded answers, the displacements from query to reply will probably be roughly parallel. Not equivalent—the magnitudes vary, the precise angles differ barely—but the general direction is consistent.
When a model hallucinates, something different happens. The response still lands in embedding space. It’s still fluent. It still appears like a solution. However the displacement doesn’t follow the local pattern. It points elsewhere. A vector with a completely different angle.
The red bird is flying confidently. But not with the flock. Flies in the wrong way with an angle totally different from the remainder of the birds.
Displacement Consistency (DC)
We formalize this as Displacement Consistency (DC). The concept is straightforward:
- Construct a reference set of grounded question-answer pairs out of your domain
- For a brand new question-answer pair, find the neighboring questions within the reference set
- Compute the mean displacement direction of those neighbors
- Measure how well the brand new displacement aligns with that mean direction
Grounded responses align well. Hallucinated responses don’t. That’s it. One cosine similarity. No source documents needed at inference time. No multiple generations. No model internals.
And it really works remarkably well. Across five architecturally distinct embedding models, across multiple hallucination benchmarks including HaluEval and TruthfulQA, DC achieves near-perfect discrimination. The distributions barely overlap.
The catch: domain locality
We tested DC across five embedding models chosen to span architectural diversity: MPNet-based contrastive fine-tuning (all-mpnet-base-v2), weakly-supervised pre-training (E5-large-v2), instruction-tuned training with hard negatives (BGE-large-en-v1.5), encoder-decoder adaptation (GTR-T5-large), and efficient long-context architectures (nomic-embed-text-v1.5). If DC only worked with one architecture, it is likely to be an artifact of that specific model. Consistent results across architecturally distinct models would suggest the structure is key.
The outcomes were consistent. DC achieved AUROC of 1.0 across all five models on our synthetic benchmark. But synthetic benchmarks may be misleading—perhaps domain-shuffled responses are just too easy to detect.
So we validated on established hallucination datasets: HaluEval-QA, which accommodates LLM-generated hallucinations specifically designed to be subtle; HaluEval-Dialogue, with responses that deviate from conversation context; and TruthfulQA, which tests common misconceptions that humans steadily consider.
DC maintained perfect discrimination on all of them. Zero degradation from synthetic to realistic benchmarks.
For comparison, ratio-based methods that measure where responses land relative to queries (slightly than the direction they move) achieved AUROC around 0.70–0.81. The gap—roughly 0.20 absolute AUROC—is substantial and consistent across all models tested.
The rating distributions tell the story visually. Grounded responses cluster tightly at high DC values (around 0.9). Hallucinated responses spread at lower values (around 0.3). The distributions barely overlap.
DC achieves perfect detection a narrow domain. But when you try to make use of a reference set from one domain to detect hallucinations in one other domain, performance drops to random—AUROC around 0.50. That is telling us something fundamental about how embeddings encode grounding. It’s such as see different flocks within the sky: every flock could have a special direction.
For LLMs, the simplest method to understand this is thru the image of what in geometry is named a “fiber bundle”.
The surface in Figure 1 is the bottom manifold representing all possible questions. At each point on this surface, there’s a fiber: a line pointing within the direction that grounded responses move. Inside any local region of the surface (one specific domain), all of the fibers point roughly the identical way. That’s why DC works so well locally.
But globally, across different regions, the fibers point in several directions. The “grounded direction” for legal questions is different from the “grounded direction” for medical questions. There’s no single global pattern. Only local coherence.
Now take a look at the next video. Birds flight paths connecting Europe and Africa. We will see the fiber bundles. Different birds (medium/large small, insects) have different directions.
In differential geometry, this structure is named . Each patch of the manifold looks easy and consistent internally. However the patches can’t be stitched together into one global coordinate system.
This has a noticeable implication:
There’s no single “truthfulness direction” in embedding space. Each domain—each form of task, each LLM—develops its own displacement pattern during training. The patterns are real and detectable, but they’re domain-specific. Birds don’t migrate in the identical direction.
What this implies practically
For deployment, the domain-locality finding means you would like a small calibration set (around 100 examples) matched to your specific use case. A legal Q&A system needs legal examples. A medical chatbot needs medical examples. This can be a one-time upfront cost—the calibration happens offline—but it could actually’t be skipped.
For understanding embeddings, the finding suggests these models encode richer structure than we typically assume. They’re not only learning “similarity.” They’re learning domain-specific mappings whose disruption reliably signals hallucination.
The red bird doesn’t d
The hallucinated response has no marker that claims “I’m fabricated.” It’s fluent. It’s confident. It looks exactly like a grounded response on every surface-level metric.
But it surely doesn’t move with the flock. And now we are able to measure that.
The geometry has been there all along, implicit in how contrastive training shapes embedding space. We’re just learning to read it.
Notes:
You could find the entire paper at https://cert-framework.com/docs/research/dc-paper.
If you’ve gotten any questions on the discussed topics, be at liberty to contact me at
