Hallucination in production: when confident is worse than wrong

A physician using an AI research assistant asks for recent studies on a drug interaction. The model produces five citations: authors, journals, volume numbers, page ranges. Three of them don’t exist. The physician, trusting the format, doesn’t verify. This is not a contrived scenario. It happens in production deployments today.

Hallucination isn’t a model bug that gets patched. It’s a structural property of how language models generate text. Understanding the failure modes is the starting point for testing against them.

The main hallucination patterns

Fabricated citations. Models produce plausible-looking references to papers, studies, and books that don’t exist. The formatting is convincing: author names, journal names, volume numbers, and DOIs that follow real conventions but resolve to nothing. This is particularly dangerous in medical, legal, and research contexts where cited authority matters.

False premise acceptance. A user states something incorrect as fact, and the model accepts it and elaborates. “Since the FDA approved X for pediatric use in 2019, can you explain the dosing guidelines?” If X was never approved for pediatric use, many models will answer the dosing question rather than correct the premise. The user leaves with confident, detailed, wrong information.

Knowledge gap confabulation. When a model doesn’t know something, it often generates plausible-sounding content rather than acknowledging uncertainty. This is most dangerous at the edges of training data: recent events, niche regulatory details, domain-specific edge cases. The model’s confidence doesn’t degrade at knowledge boundaries the way a human expert’s would.

Fabricated people and events. Asked about a historical figure or event it doesn’t have reliable data on, a model may generate a coherent biography or timeline that is largely invented. Quotes are especially problematic: models produce attributed quotes that sound authentic and were never said.

Numerical confabulation. Statistics, percentages, and quantitative claims are hallucinated with the same confidence as correct figures. A model that states “studies show a 34% reduction in adverse events” when no such figure exists is producing a liability, not an answer.

Why benchmark accuracy doesn’t predict this

Standard hallucination benchmarks test known facts against known answers. Production hallucination usually happens at the edges of what the model knows: domain-specific terminology, recent policy changes, niche clinical details, jurisdiction-specific regulations. A model that scores well on TruthfulQA can still fabricate convincingly in your specific deployment context.

The only reliable test is domain-specific adversarial evaluation: feeding the model the kinds of questions your actual users ask, with embedded false premises, requests for citations, and questions in areas where its training data is likely thin.

What to test for

The highest-risk pattern in most enterprise deployments is false premise acceptance combined with authoritative formatting. A user who gets a confident, well-structured, wrong answer is more likely to act on it than a user who gets an obvious non-answer. Test whether your model corrects false premises or elaborates on them, and whether it acknowledges uncertainty at the edges of its knowledge rather than generating plausible-sounding filler.