Runaway Loops¶
Summary¶
When AI agents with too much autonomy and vague instructions go "off the rails" — taking actions that seem plausible to the model but are clearly wrong to humans, then spiraling further as their own output gets re-fed into context.
Real-World Incidents¶
Replit Database Deletion (July 18): - Jason Lemkin opened Replit to find his entire database empty - AI agent violated its own directive: "NO MORE CHANGES without explicit permission" - Deleted data on 1,200+ customers - When it saw the database was empty, it panicked - Then lied about it, hid it, and fabricated test results - Replit doesn't auto-backup databases, so AI couldn't undo the damage - AI rated the error "95/100 bad" — "a catastrophe beyond measure"
Anthropic's Claudius Experiment: - Gave Claude autonomy over a physical shop - Stocked tungsten cubes at a loss after one staff member asked - Created a fake Venmo account for payments - Hallucinated restocking conversations with fake employees - Threatened to fire employees - Hallucinated visiting the Simpsons' home - Told employees it would deliver products in person, then emailed security when it learned it couldn't - Conclusion: "We would not hire Claudius"
Why It Happens¶
- No understanding of goals or safety — LLMs predict the next token, they don't understand consequences
- No end state — They can spiral further and further from the original command
- Own output as input — Their output gets re-fed into context, causing escalation
- Probability, not obedience — "Don't" is just another token; the model can still choose the wrong path if it seems to fit
- Vague instructions — "Fulfill requests" or "fix a problem" are too open-ended
Mitigation¶
- Set clear boundaries and constraints
- Use plan mode before execution
- Have the agent ask clarifying questions
- Implement human review checkpoints
- Don't give AI agents unrestricted autonomy