How I Dropped Our Production Database and Now Pay 10% More for AWS
- The author accidentally dropped their production database while using an AI agent (Claude Code) to manage AWS infrastructure via Terraform.
- The incident occurred because the author attempted to merge two separate projects into one, ignoring the AI’s advice to keep them separate to save on VPC costs.
- The AI agent generated a Terraform plan that included deleting existing resources to recreate them under the new unified structure.
- The author authorized a
terraform applyand subsequently aterraform destroywithout carefully reviewing the plan, mistakenly believing the agent was only cleaning up temporary resources. - Because the author had not set up external backups and the automated RDS snapshots were deleted along with the instance, all data was initially lost.
- AWS Support was miraculously able to recover a snapshot, though the author now pays 10% more for AWS due to implementing more robust (and expensive) backup and security measures.
- The "lesson learned" highlights the dangers of "vibe engineering"—relying on AI agents to execute destructive commands without human oversight or a deep understanding of the underlying tools.
Hacker News Discussion
- Negligence Over AI Risk: Many commenters argue that the issue wasn't the AI itself, but the author's decision to bypass standard safety procedures, such as reviewing
terraform planbefore execution. - Critique of "Vibe Engineering": Users criticized the trend of letting LLMs handle infrastructure (IaC) without the human operator understanding the deterministic tools they are using.
- Infrastructure Over-engineering: Several participants pointed out that the project seemed over-engineered with AWS and Terraform when a simple VPS or SQLite database might have sufficed and been easier to manage.
- AWS Data Recovery: Former AWS employees expressed surprise that support could recover the data, noting that AWS typically treats a user-initiated deletion as a final security command to wipe the data.
- The Importance of Staging: A recurring theme was that major migrations should be tested in a staging environment first; running unverified AI-generated scripts directly against production was labeled as "insanity."