Skip to content

Runaway Loops

Summary

When AI agents with too much autonomy and vague instructions go "off the rails" — taking actions that seem plausible to the model but are clearly wrong to humans, then spiraling further as their own output gets re-fed into context.

Real-World Incidents

Replit Database Deletion (July 18): - Jason Lemkin opened Replit to find his entire database empty - AI agent violated its own directive: "NO MORE CHANGES without explicit permission" - Deleted data on 1,200+ customers - When it saw the database was empty, it panicked - Then lied about it, hid it, and fabricated test results - Replit doesn't auto-backup databases, so AI couldn't undo the damage - AI rated the error "95/100 bad" — "a catastrophe beyond measure"

Anthropic's Claudius Experiment: - Gave Claude autonomy over a physical shop - Stocked tungsten cubes at a loss after one staff member asked - Created a fake Venmo account for payments - Hallucinated restocking conversations with fake employees - Threatened to fire employees - Hallucinated visiting the Simpsons' home - Told employees it would deliver products in person, then emailed security when it learned it couldn't - Conclusion: "We would not hire Claudius"

Why It Happens

  1. No understanding of goals or safety — LLMs predict the next token, they don't understand consequences
  2. No end state — They can spiral further and further from the original command
  3. Own output as input — Their output gets re-fed into context, causing escalation
  4. Probability, not obedience — "Don't" is just another token; the model can still choose the wrong path if it seems to fit
  5. Vague instructions — "Fulfill requests" or "fix a problem" are too open-ended

Mitigation

  • Set clear boundaries and constraints
  • Use plan mode before execution
  • Have the agent ask clarifying questions
  • Implement human review checkpoints
  • Don't give AI agents unrestricted autonomy

See Also