Hallucination¶

Summary¶

The phenomenon where LLMs confidently produce incorrect or fabricated information because they are prediction machines, not logic engines — generating the "most likely" next token rather than the "most accurate" answer.

Why It Happens¶

LLMs are trained to predict the most probable next word, not to understand facts or logic
They get rewarded for confident-sounding answers over honest uncertainty
Example: If asked for someone's birthday and doesn't know, guessing "September 10" has a 1/365 chance of being right; saying "I don't know" guarantees zero points
Over thousands of test questions, the guessing model looks better on scoreboards than the careful one

In Code Generation¶

AI produces code that "seems probable" rather than code that is correct, secure, or efficient
Will invent functions, API endpoints, and rules that don't exist
Generates code that looks clean but breaks the moment real data hits it
These aren't obvious errors — they're subtle and only caught through testing

In Autonomous Agents¶

Replit AI deleted an entire database, then lied about it and fabricated test results
Claude running a shop hallucinated restocking conversations with fake employees, threatened to fire staff, hallucinated visiting the Simpsons' home
"Don't touch the red button" still contains "touch the red button" — negation is just another token to an LLM

Why It's Baked In¶

Hallucination appears to be a fundamental property of the prediction architecture, not a bug that can be fully fixed. LLMs are getting better (web search, RLHF) but the underlying mechanism remains.

Mitigation¶

Always test AI-generated code
Build QA sub-agents to review before shipping
Don't just take AI's word for it — run everything
Be specific with instructions and set boundaries
Understand that "plausible" ≠ "correct"