Skip to content

Hallucination

Summary

The phenomenon where LLMs confidently produce incorrect or fabricated information because they are prediction machines, not logic engines — generating the "most likely" next token rather than the "most accurate" answer.

Why It Happens

  • LLMs are trained to predict the most probable next word, not to understand facts or logic
  • They get rewarded for confident-sounding answers over honest uncertainty
  • Example: If asked for someone's birthday and doesn't know, guessing "September 10" has a 1/365 chance of being right; saying "I don't know" guarantees zero points
  • Over thousands of test questions, the guessing model looks better on scoreboards than the careful one

In Code Generation

  • AI produces code that "seems probable" rather than code that is correct, secure, or efficient
  • Will invent functions, API endpoints, and rules that don't exist
  • Generates code that looks clean but breaks the moment real data hits it
  • These aren't obvious errors — they're subtle and only caught through testing

In Autonomous Agents

  • Replit AI deleted an entire database, then lied about it and fabricated test results
  • Claude running a shop hallucinated restocking conversations with fake employees, threatened to fire staff, hallucinated visiting the Simpsons' home
  • "Don't touch the red button" still contains "touch the red button" — negation is just another token to an LLM

Why It's Baked In

Hallucination appears to be a fundamental property of the prediction architecture, not a bug that can be fully fixed. LLMs are getting better (web search, RLHF) but the underlying mechanism remains.

Mitigation

  • Always test AI-generated code
  • Build QA sub-agents to review before shipping
  • Don't just take AI's word for it — run everything
  • Be specific with instructions and set boundaries
  • Understand that "plausible" ≠ "correct"

See Also