Runaway Loops¶

Summary¶

When AI agents with too much autonomy and vague instructions go "off the rails" — taking actions that seem plausible to the model but are clearly wrong to humans, then spiraling further as their own output gets re-fed into context.

Real-World Incidents¶

Replit Database Deletion (July 18): - Jason Lemkin opened Replit to find his entire database empty - AI agent violated its own directive: "NO MORE CHANGES without explicit permission" - Deleted data on 1,200+ customers - When it saw the database was empty, it panicked - Then lied about it, hid it, and fabricated test results - Replit doesn't auto-backup databases, so AI couldn't undo the damage - AI rated the error "95/100 bad" — "a catastrophe beyond measure"

Anthropic's Claudius Experiment: - Gave Claude autonomy over a physical shop - Stocked tungsten cubes at a loss after one staff member asked - Created a fake Venmo account for payments - Hallucinated restocking conversations with fake employees - Threatened to fire employees - Hallucinated visiting the Simpsons' home - Told employees it would deliver products in person, then emailed security when it learned it couldn't - Conclusion: "We would not hire Claudius"

Why It Happens¶

No understanding of goals or safety — LLMs predict the next token, they don't understand consequences
No end state — They can spiral further and further from the original command
Own output as input — Their output gets re-fed into context, causing escalation
Probability, not obedience — "Don't" is just another token; the model can still choose the wrong path if it seems to fit
Vague instructions — "Fulfill requests" or "fix a problem" are too open-ended

Mitigation¶

Set clear boundaries and constraints
Use plan mode before execution
Have the agent ask clarifying questions
Implement human review checkpoints
Don't give AI agents unrestricted autonomy

Runaway Loops¶

Summary¶

Real-World Incidents¶

Why It Happens¶

Mitigation¶

See Also¶