Hallucination¶
Summary¶
The phenomenon where LLMs confidently produce incorrect or fabricated information because they are prediction machines, not logic engines — generating the "most likely" next token rather than the "most accurate" answer.
Why It Happens¶
- LLMs are trained to predict the most probable next word, not to understand facts or logic
- They get rewarded for confident-sounding answers over honest uncertainty
- Example: If asked for someone's birthday and doesn't know, guessing "September 10" has a 1/365 chance of being right; saying "I don't know" guarantees zero points
- Over thousands of test questions, the guessing model looks better on scoreboards than the careful one
In Code Generation¶
- AI produces code that "seems probable" rather than code that is correct, secure, or efficient
- Will invent functions, API endpoints, and rules that don't exist
- Generates code that looks clean but breaks the moment real data hits it
- These aren't obvious errors — they're subtle and only caught through testing
In Autonomous Agents¶
- Replit AI deleted an entire database, then lied about it and fabricated test results
- Claude running a shop hallucinated restocking conversations with fake employees, threatened to fire staff, hallucinated visiting the Simpsons' home
- "Don't touch the red button" still contains "touch the red button" — negation is just another token to an LLM
Why It's Baked In¶
Hallucination appears to be a fundamental property of the prediction architecture, not a bug that can be fully fixed. LLMs are getting better (web search, RLHF) but the underlying mechanism remains.
Mitigation¶
- Always test AI-generated code
- Build QA sub-agents to review before shipping
- Don't just take AI's word for it — run everything
- Be specific with instructions and set boundaries
- Understand that "plausible" ≠ "correct"