Feeling this for myself has only reinforced my belief that these models constitute an addictive substance.
senses addictive component. The lure of magic I think.
Feeling this for myself has only reinforced my belief that these models constitute an addictive substance.
senses addictive component. The lure of magic I think.
As I felt myself bored to tears in this process, I realized that if this is what becomes of software development, not only will it be a terrible occupation,
Author experienced boredom from vibecoding, and sees coding as becoming a terrible job that way
There's a fundamental problem with these tools beyond the capacity of any deployment strategy to solve: the tool requires expertise to validate, but its use diminishes expertise and stunts its growth
the paradox here is that using algogens erodes the skills to be able to judge its output. I think we already see that in the code leak from Anthropic.
the way mistakes can compound make this a dangerous proposition. This all worked with my small project, but a bigger one, with more dependencies or more complex project structures, would likely flummox the models. And reviewing changes across a more complex codebase becomes at once more difficult and more critical as the project size increases.
compounding mistakes. esp in bigger / more complex projects. a neg ratchet This to me points back to vibe assistance in components, less in the overall project. creating libraries of functions above always from scratch. Keeping tech choices clear.
Programming agents might work, but there are a lot of ways they can go wrong. The amount of guardrails necessary to keep the model in check doesn't scale. The more steps of a process rely on generative output, the higher the potential for error and catastrophic failure. Any process that involves these models must strive to maximize determinism and minimize model variance.
stay close to deterministic. n:: Guardrails grow faster than keeping stuff in check, doesn't scale
The help that showed up I turned to generative models not only as an experiment, but out of desperation. I had a need for code that did not exist. Nobody was going to help me build it, nor should I expect help for a project such as this. In the past, I would have cobbled together something quick-and-dirty, probably at the expense of my mental and physical health to get it done. This time, I had another option. In this limited scope, the model was beneficial to all involved: myself, TTI's community, and my family
A reiteration of the 'help that showed up' and the effects it had on author and his environment.
More plainly: I have no reason to expect this technology can succeed at the same level in law, medicine, or any other highly human, highly subjective occupation.
I think author is wrong here. Depends on application scope. In law translation e.g. in the EU, where we have verdicts in 27 languages pertaining to the same market is key. In medicine it's not algogens but other AI that is being deployed, and scaled (e.g. wrt analysing imaging like radiology / mri etc.)
The "works" in "it works" is scoped strictly to coding tasks. I have no evidence, and seemingly no one else does, that the same kind of success is available outside the world of highly structured language with deterministic outputs.
important caveat and back to [[The arc of vibe coding bends towards determinism 20260214145739]]
One time, during a security fix, the model's code introduced a non-obvious DoS vector. Well, obvious from the perspective of how the code would be deployed, but not from the code itself. That's exactly why reading each change was so important. Once the issue was pointed out, the model produced code that both addressed the security issue and avoided the DoS.
this is a core issue: the algogen has no concept of 'deployment' and only has the code itself. Even for simple things, not just security like here, it will not be able to look at the intention of a project outside the project. This a better anchor for human in the loop, the connection to reality / intention?
I do know the code very well by way of careful reading of the code, the relevant libraries' documentation, and the proposed changes during the code's creation. But that safety comes down to human discipline. It is entirely possible (probable?) to take the easy road and trust the model to do the right thing.
n:: having a human in the loop for vibe coding entirely comes down to discipline. Which is a recipe for it not happening. (and for me points to having a fixed starter set of instructions etc, as opposed to coming up with them each time).
In this case, I was the audience rather than the author. I had to back my way into understanding the code, carefully reading and understanding the structure after it had been built. This is much more common for developers who work on large teams or with codebases they didn't build themselves. I have not had as much experience with that kind of development, so this all felt a little awkward.
vibe coding makes you the audience watching the process, and no longer the author. but: says this is a role various people already have in coding teams. Which may be relevant when looking at adoption patterns, imo.
Although I read each proposed change, knowing the codebase deeply was much more challenging. When I write a new application myself, I'm building an elaborate house of cards in my head, a gossamer structure of interlinked ideas and goals. It's a story I'm telling myself in code—and ultimately, a story I share with users.
reading everything during production is not the same as producing it. A mental model of the entire construct is not created. Interesting quote: you no longer have a story in your head about what it is you're doing. No helicopter view. The making is scaffolding for your understanding, and that is being cut out.
Methodology To maximize determinism, each step of the build used test-driven development (TDD). Using the Markdown planning file as a starting point, the model generated tests for functions that would define the features, then implemented each in turn. After each coding round, cargo check and cargo test were run to confirm compilation and test passing. I reviewed every line of code the model generated. For initial drafts, very little had to change. Now to be fair, this is not a particularly complex app. It's basic CRUD app with some specialized requirements. Still, getting it all right, including auth and data handling, really mattered. After the initial drafting phase, I went through the entire app and made a list of tasks for improvement/change in the codebase. This TODO.md became the new starting point for model context in plan creation. Unexpectedly, as items were addressed in the document, the model updated the file with checkmarks and details of implementations. This was not an instruction I gave the model, but it was behavior I liked, since it created a trail of accountability. After all the features I wanted were functional, context was cleared entirely and new instructions were provided to the model. Instead of acting as a software developer, I instructed the model to perform as a security auditor and secure code expert, finding vulnerabilities in the code and recommending remediations. The findings would be written to a FINDINGS.md file, keeping with our established "Plan, Document, Execute, Log" pattern established in earlier rounds.
Stated aim was to maximise determinism. That sounds like a good point for any vibing effort. Also ties in with my general sentiment [[The arc of vibe coding bends towards determinism 20260214145739]]
test driven development. Markdown plan first, then function tests for feature definitions, only then making the functions. Reviewed all code himself. Also makes me wonder about building my own libraries from vibed results. (e.g. the forms I use, the css, the diff functions, although most of the interactive stuff I've written myself already, and use them as components)
[[Doug Belshaw p]] vibecoded a calendly replacement after switching to Proton.
By Margaret-Anne Storey, Professor of Computer Science, University of Victoria and Canada Research Chair in Human and Social Aspects of Software Engineering https://en.wikipedia.org/wiki/Margaret-Anne_Storey
added to feedreader
I've experienced this myself on some of my more ambitious vibe-code-adjacent projects. I've been experimenting with prompting entire new features into existence without reviewing their implementations and, while it works surprisingly well, I've found myself getting lost in my own projects. I no longer have a firm mental model of what they can do and how they work, which means each additional feature becomes harder to reason about, eventually leading me to lose the ability to make confident decisions about where to go next.
Vibecoding and adjacent projects lead to loosing overview of your own work, no mental model of what you made as you would have otherwise. Extending something becomes harder over time, bc you don't know what you're actually extending from. This is a counter force (not counter argument I think) to the notion of genAI having deterministic automation as endpoint.
https://web.archive.org/web/20260215105347/https://simonwillison.net/2026/Feb/15/cognitive-debt/
Simon Willison on cognitive debt (the consequences of vibecoding more or less).
llm code environment as claude code alternative.
OpenHands: Capable but Requiring InterventionI connected my repository to OpenHands through the All Hands cloud platform. I pointed the agent at a specific issue, instructing it to follow the detailed requirements and create a pull request when complete. The conversational interface displayed the agent's reasoning as it worked through the problem, and the approach appeared logical.
Also used openhands for a test. says it needs intervention (not fully delegated iow)
When an agent doesn't deliver what you expected, the temptation is to engage in corrective dialogue — to guide the agent toward the right solution through feedback. While some agents support this interaction model, it's often more valuable to treat failures as specification bugs. Ask yourself: what information was missing that caused the agent to make incorrect decisions? What assumptions did I fail to make explicit?This approach builds your specification-writing skills rapidly. After a few iterations, you develop an intuition for what needs to be explicit, what edge cases to cover, and how to structure instructions for maximum clarity. The goal isn't perfection on the first try, but rather continuous improvement in your ability to delegate effectively.
don't iterate for corrections. Redo and iterate the instructions. This is a bit like prompt engineering the oracle, no? AI isn't the issue, it's your instructions. Up to a point, but in flux too.
One effective technique for creating comprehensive specifications is to use AI assistants that have full awareness of your codeba
ah, turtles all the way down. using AI to generate the task specs.
A complete task specification goes beyond describing what needs to be done. It should encompass the entire development lifecycle for that specific task. Think of it as creating a mini project plan that an intelligent but literal agent can follow from start to finish.
A discrete task description to be treated like a project in the GTD sense (anything above 2 steps is a project). At what point is this overkill, as in templating this project description may well lead to having the solutions once you've done this.
we tend to underspecify because we're exploring, experimenting, and can provide immediate course corrections. We might type a quick prompt, see what the AI produces, and refine from there. This exploratory approach works when you're actively engaged
indeed. as mentioned above too. n:: My sense that this is a learning mode akin to the haptic feedback of working on things by hand.
The fundamental rule for working with asynchronous agents contradicts much of modern agile thinking: create complete and precise task definitions upfront. This isn't about returning to waterfall methodologies, but rather recognizing that when you delegate to an AI agent, you need to provide all the context and guidance that you would naturally provide through conversation and iteration with a human developer.
What I mentioned above: to delegate you need to be able to fully describe and provide context for a discrete task.
The ecosystem of asynchronous coding agents is rapidly evolving, with each offering different integration points and capabilities:GitHub Copilot Agent: Accessible through GitHub by assigning issues to the Copilot user, with additional VS Code integrationCodex: OpenAI's hosted coding agent, available through their platform and accessible from ChatGPTOpenHands: Open-source agent available through the All Hands web app or self-hosted deploymentsJules: Google Labs product with GitHub integration capabilitiesDevin: The pioneering coding agent from Cognition that first demonstrated this paradigmCursor background agents: Embedded directly in the Cursor IDECI/CD integrations: Many command-line tools can function as asynchronous agents when integrated into GitHub Actions or continuous integration scripts
A list of async coding agents in #2025/08 github, openai, google mentioned. OpenHands is the one open source mentioned. mentions that command line tools can be used (if integrated w e.g. github actions to tie into the coding environment) - [ ] check out openhands agent by All Hands
isn't just about saving time — it's about restructuring how software gets built.
not just time saving, but a restructuring. So, any description of how the structure changes (before / after style) further down?
several of these tasks running in parallel, each agent working independently on different parts of your codebas
do multiple things in parallel. note: The assumption here that the context is coding
why asynchronous agents deserve more attention than they currently receive, provides practical guidelines for working with them effectively, and shares real-world experience using multiple agents to refactor a production codebase.
3 things in this article: - why async agents deserve more attention - practical guidelines for effective deployment - real world examples
asynchronous coding agents represent a fundamentally different — and potentially more powerful — approach to AI-augmented software development. These background agents accept complete work items, execute them independently, and return finished solutions while you focus on other tasks.
Async coding agents is a diff kind of vibe coding: you give it a defined more complex tasks and it will work in the background and come back with an outcome.
https://web.archive.org/web/20260125124811/https://elite-ai-assisted-coding.dev/p/working-with-asynchronous-coding-agents Eleanor Berger, August 2025.
on asynchronous coding agents
One of the people [[💎 Claude + Obsidian Got a Level Up]] mentioned. Based in AMS, Kent de Bruin.
[[Eleanor Konik p]] on how her work in Obsidian with Claude Code is changing
By [[Frank Meeuwsen p]], ivm #2026/01/30 sessie in Utrecht
Ethan Mollick prompts Claude AI to come up with something that people will pay for and could make $1k/month
(via [[Stephen Downes p]])
Cursor is an AI using code editor. It connects only to US based models (OpenAI, Anthropic, Google, xAI), and your pricing tier goes piecemeal to whatever model you're using.
Both an editor, and a CLI environment, and integrations with things like Slack and Github. This seems a building block for US-centered agentic AI silo forming for dev teams.
I have yet to try a local model that handles Bash tool calls reliably enough for me to trust that model to operate a coding agent on my device.
this. Need to understand better conceptually the diff set-ups I have, and how I might switch between them.
My excitement for local LLMs was very much rekindled. The problem is that the big cloud models got better too—including those open weight models that, while freely available, were far too large (100B+) to run on my laptop.
Cloud models got much better stil than local models. Coding agents made a huge difference, with it Claude Code becomes very useful
The year of programming on my phone # I wrote significantly more code on my phone this year than I did on my computer.
vibe coding leads to a shift in using your phone to code. (not likely me, I hardly try to do anything productive on the limited interface my phone provides, but if you've already made the switch to speaking instructions I can see how this shift comes about)
The year of vibe coding # In a tweet in February Andrej Karpathy coined the term “vibe coding”, with an unfortunately long definition (I miss the 140 character days) that many people failed to read all the way to the end:
ah, didn't know. Vibe-coding is a term coined by Andrej Karpathy in #2025/02 in a tweet. That took on an own life!
There’s a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like “decrease the padding on the sidebar by half” because I’m too lazy to find it. I “Accept All” always, I don’t read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I’d have to really read through it for a while. Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away. It’s not too bad for throwaway weekend projects, but still quite amusing. I’m building a project or webapp, but it’s not really coding—I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
vibecoding original description by Andrej Karpathy
Quickly distorted to mean any code created w llm assistance. Note: [[Martijn Aslander p]] follows this dev quite closely (dictation, accept always, it mostly works)
The year I built 110 tools # I started my tools.simonwillison.net site last year as a single location for my growing collection of vibe-coded / AI-assisted HTML+JavaScript tools. I wrote several longer pieces about this throughout the year: Here’s how I use LLMs to help me write code Adding AI-generated descriptions to my tools collection Building a tool to copy-paste share terminal sessions using Claude Code for web Useful patterns for building HTML tools—my favourite post of the bunch. The new browse all by month page shows I built 110 of these in 2025!
Simon Willison vibe coded over 100 personal tools in 2025. This chimes with what Frank and Martijn were suggesting. Up above he also indicates that it is something that became possible at this scale only in 2025 too.
I love the asynchronous coding agent category. They’re a great answer to the security challenges of running arbitrary code execution on a personal laptop and it’s really fun being able to fire off multiple tasks at once—often from my phone—and get decent results a few minutes later.
async coding agents: prompt and forget
Vendor-independent options include GitHub Copilot CLI, Amp, OpenCode, OpenHands CLI, and Pi. IDEs such as Zed, VS Code and Cursor invested a lot of effort in coding agent integration as well.
non-vendor related coding agents. - [ ] which of these can I run locally? / integrate into VS Code
The major labs all put out their own CLI coding agents in 2025 Claude Code Codex CLI Gemini CLI Qwen Code Mistral Vibe
list of command line coding agents by major vendors
personal tools built with vibecoding by Simon Willison Resulting tools are mostly HTML and javascript, some python.
You need three things. A Mac with Xcode, which is free to download. A $99 per year Apple Developer account. And an AI tool that can write code based on your descriptions.
Three elements for making his iphone apps Xcode (which I use) Apple Developer account (99USD / yr) AI support in coding (he uses Claude Code, vgl [[Mijn vibe coding set-up 20251220143401]]
Writing a good CLAUDE.md