47 Matching Annotations
  1. Apr 2026
    1. There's a fundamental problem with these tools beyond the capacity of any deployment strategy to solve: the tool requires expertise to validate, but its use diminishes expertise and stunts its growth

      the paradox here is that using algogens erodes the skills to be able to judge its output. I think we already see that in the code leak from Anthropic.

    2. the way mistakes can compound make this a dangerous proposition. This all worked with my small project, but a bigger one, with more dependencies or more complex project structures, would likely flummox the models. And reviewing changes across a more complex codebase becomes at once more difficult and more critical as the project size increases.

      compounding mistakes. esp in bigger / more complex projects. a neg ratchet This to me points back to vibe assistance in components, less in the overall project. creating libraries of functions above always from scratch. Keeping tech choices clear.

    3. Programming agents might work, but there are a lot of ways they can go wrong. The amount of guardrails necessary to keep the model in check doesn't scale. The more steps of a process rely on generative output, the higher the potential for error and catastrophic failure. Any process that involves these models must strive to maximize determinism and minimize model variance.

      stay close to deterministic. n:: Guardrails grow faster than keeping stuff in check, doesn't scale

    4. The help that showed up I turned to generative models not only as an experiment, but out of desperation. I had a need for code that did not exist. Nobody was going to help me build it, nor should I expect help for a project such as this. In the past, I would have cobbled together something quick-and-dirty, probably at the expense of my mental and physical health to get it done. This time, I had another option. In this limited scope, the model was beneficial to all involved: myself, TTI's community, and my family

      A reiteration of the 'help that showed up' and the effects it had on author and his environment.

    5. More plainly: I have no reason to expect this technology can succeed at the same level in law, medicine, or any other highly human, highly subjective occupation.

      I think author is wrong here. Depends on application scope. In law translation e.g. in the EU, where we have verdicts in 27 languages pertaining to the same market is key. In medicine it's not algogens but other AI that is being deployed, and scaled (e.g. wrt analysing imaging like radiology / mri etc.)

    6. The "works" in "it works" is scoped strictly to coding tasks. I have no evidence, and seemingly no one else does, that the same kind of success is available outside the world of highly structured language with deterministic outputs.

      important caveat and back to [[The arc of vibe coding bends towards determinism 20260214145739]]

    7. One time, during a security fix, the model's code introduced a non-obvious DoS vector. Well, obvious from the perspective of how the code would be deployed, but not from the code itself. That's exactly why reading each change was so important. Once the issue was pointed out, the model produced code that both addressed the security issue and avoided the DoS.

      this is a core issue: the algogen has no concept of 'deployment' and only has the code itself. Even for simple things, not just security like here, it will not be able to look at the intention of a project outside the project. This a better anchor for human in the loop, the connection to reality / intention?

    8. I do know the code very well by way of careful reading of the code, the relevant libraries' documentation, and the proposed changes during the code's creation. But that safety comes down to human discipline. It is entirely possible (probable?) to take the easy road and trust the model to do the right thing.

      n:: having a human in the loop for vibe coding entirely comes down to discipline. Which is a recipe for it not happening. (and for me points to having a fixed starter set of instructions etc, as opposed to coming up with them each time).

    9. In this case, I was the audience rather than the author. I had to back my way into understanding the code, carefully reading and understanding the structure after it had been built. This is much more common for developers who work on large teams or with codebases they didn't build themselves. I have not had as much experience with that kind of development, so this all felt a little awkward.

      vibe coding makes you the audience watching the process, and no longer the author. but: says this is a role various people already have in coding teams. Which may be relevant when looking at adoption patterns, imo.

    10. Although I read each proposed change, knowing the codebase deeply was much more challenging. When I write a new application myself, I'm building an elaborate house of cards in my head, a gossamer structure of interlinked ideas and goals. It's a story I'm telling myself in code—and ultimately, a story I share with users.

      reading everything during production is not the same as producing it. A mental model of the entire construct is not created. Interesting quote: you no longer have a story in your head about what it is you're doing. No helicopter view. The making is scaffolding for your understanding, and that is being cut out.

    11. Methodology To maximize determinism, each step of the build used test-driven development (TDD). Using the Markdown planning file as a starting point, the model generated tests for functions that would define the features, then implemented each in turn. After each coding round, cargo check and cargo test were run to confirm compilation and test passing. I reviewed every line of code the model generated. For initial drafts, very little had to change. Now to be fair, this is not a particularly complex app. It's basic CRUD app with some specialized requirements. Still, getting it all right, including auth and data handling, really mattered. After the initial drafting phase, I went through the entire app and made a list of tasks for improvement/change in the codebase. This TODO.md became the new starting point for model context in plan creation. Unexpectedly, as items were addressed in the document, the model updated the file with checkmarks and details of implementations. This was not an instruction I gave the model, but it was behavior I liked, since it created a trail of accountability. After all the features I wanted were functional, context was cleared entirely and new instructions were provided to the model. Instead of acting as a software developer, I instructed the model to perform as a security auditor and secure code expert, finding vulnerabilities in the code and recommending remediations. The findings would be written to a FINDINGS.md file, keeping with our established "Plan, Document, Execute, Log" pattern established in earlier rounds.

      Stated aim was to maximise determinism. That sounds like a good point for any vibing effort. Also ties in with my general sentiment [[The arc of vibe coding bends towards determinism 20260214145739]]

      test driven development. Markdown plan first, then function tests for feature definitions, only then making the functions. Reviewed all code himself. Also makes me wonder about building my own libraries from vibed results. (e.g. the forms I use, the css, the diff functions, although most of the interactive stuff I've written myself already, and use them as components)

  2. Feb 2026
    1. I've experienced this myself on some of my more ambitious vibe-code-adjacent projects. I've been experimenting with prompting entire new features into existence without reviewing their implementations and, while it works surprisingly well, I've found myself getting lost in my own projects. I no longer have a firm mental model of what they can do and how they work, which means each additional feature becomes harder to reason about, eventually leading me to lose the ability to make confident decisions about where to go next.

      Vibecoding and adjacent projects lead to loosing overview of your own work, no mental model of what you made as you would have otherwise. Extending something becomes harder over time, bc you don't know what you're actually extending from. This is a counter force (not counter argument I think) to the notion of genAI having deterministic automation as endpoint.

  3. Jan 2026
    1. OpenHands: Capable but Requiring InterventionI connected my repository to OpenHands through the All Hands cloud platform. I pointed the agent at a specific issue, instructing it to follow the detailed requirements and create a pull request when complete. The conversational interface displayed the agent's reasoning as it worked through the problem, and the approach appeared logical.

      Also used openhands for a test. says it needs intervention (not fully delegated iow)

    2. When an agent doesn't deliver what you expected, the temptation is to engage in corrective dialogue — to guide the agent toward the right solution through feedback. While some agents support this interaction model, it's often more valuable to treat failures as specification bugs. Ask yourself: what information was missing that caused the agent to make incorrect decisions? What assumptions did I fail to make explicit?This approach builds your specification-writing skills rapidly. After a few iterations, you develop an intuition for what needs to be explicit, what edge cases to cover, and how to structure instructions for maximum clarity. The goal isn't perfection on the first try, but rather continuous improvement in your ability to delegate effectively.

      don't iterate for corrections. Redo and iterate the instructions. This is a bit like prompt engineering the oracle, no? AI isn't the issue, it's your instructions. Up to a point, but in flux too.

    3. A complete task specification goes beyond describing what needs to be done. It should encompass the entire development lifecycle for that specific task. Think of it as creating a mini project plan that an intelligent but literal agent can follow from start to finish.

      A discrete task description to be treated like a project in the GTD sense (anything above 2 steps is a project). At what point is this overkill, as in templating this project description may well lead to having the solutions once you've done this.

    4. The fundamental rule for working with asynchronous agents contradicts much of modern agile thinking: create complete and precise task definitions upfront. This isn't about returning to waterfall methodologies, but rather recognizing that when you delegate to an AI agent, you need to provide all the context and guidance that you would naturally provide through conversation and iteration with a human developer.

      What I mentioned above: to delegate you need to be able to fully describe and provide context for a discrete task.

    5. The ecosystem of asynchronous coding agents is rapidly evolving, with each offering different integration points and capabilities:GitHub Copilot Agent: Accessible through GitHub by assigning issues to the Copilot user, with additional VS Code integrationCodex: OpenAI's hosted coding agent, available through their platform and accessible from ChatGPTOpenHands: Open-source agent available through the All Hands web app or self-hosted deploymentsJules: Google Labs product with GitHub integration capabilitiesDevin: The pioneering coding agent from Cognition that first demonstrated this paradigmCursor background agents: Embedded directly in the Cursor IDECI/CD integrations: Many command-line tools can function as asynchronous agents when integrated into GitHub Actions or continuous integration scripts

      A list of async coding agents in #2025/08 github, openai, google mentioned. OpenHands is the one open source mentioned. mentions that command line tools can be used (if integrated w e.g. github actions to tie into the coding environment) - [ ] check out openhands agent by All Hands

    6. why asynchronous agents deserve more attention than they currently receive, provides practical guidelines for working with them effectively, and shares real-world experience using multiple agents to refactor a production codebase.

      3 things in this article: - why async agents deserve more attention - practical guidelines for effective deployment - real world examples

    7. asynchronous coding agents represent a fundamentally different — and potentially more powerful — approach to AI-augmented software development. These background agents accept complete work items, execute them independently, and return finished solutions while you focus on other tasks.

      Async coding agents is a diff kind of vibe coding: you give it a defined more complex tasks and it will work in the background and come back with an outcome.

    1. Cursor is an AI using code editor. It connects only to US based models (OpenAI, Anthropic, Google, xAI), and your pricing tier goes piecemeal to whatever model you're using.

      Both an editor, and a CLI environment, and integrations with things like Slack and Github. This seems a building block for US-centered agentic AI silo forming for dev teams.

    1. My excitement for local LLMs was very much rekindled. The problem is that the big cloud models got better too—including those open weight models that, while freely available, were far too large (100B+) to run on my laptop.

      Cloud models got much better stil than local models. Coding agents made a huge difference, with it Claude Code becomes very useful

    2. The year of programming on my phone # I wrote significantly more code on my phone this year than I did on my computer.

      vibe coding leads to a shift in using your phone to code. (not likely me, I hardly try to do anything productive on the limited interface my phone provides, but if you've already made the switch to speaking instructions I can see how this shift comes about)

    3. The year of vibe coding # In a tweet in February Andrej Karpathy coined the term “vibe coding”, with an unfortunately long definition (I miss the 140 character days) that many people failed to read all the way to the end:

      ah, didn't know. Vibe-coding is a term coined by Andrej Karpathy in #2025/02 in a tweet. That took on an own life!

    4. There’s a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like “decrease the padding on the sidebar by half” because I’m too lazy to find it. I “Accept All” always, I don’t read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I’d have to really read through it for a while. Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away. It’s not too bad for throwaway weekend projects, but still quite amusing. I’m building a project or webapp, but it’s not really coding—I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

      vibecoding original description by Andrej Karpathy

      Quickly distorted to mean any code created w llm assistance. Note: [[Martijn Aslander p]] follows this dev quite closely (dictation, accept always, it mostly works)

    5. The year I built 110 tools # I started my tools.simonwillison.net site last year as a single location for my growing collection of vibe-coded / AI-assisted HTML+JavaScript tools. I wrote several longer pieces about this throughout the year: Here’s how I use LLMs to help me write code Adding AI-generated descriptions to my tools collection Building a tool to copy-paste share terminal sessions using Claude Code for web Useful patterns for building HTML tools—my favourite post of the bunch. The new browse all by month page shows I built 110 of these in 2025!

      Simon Willison vibe coded over 100 personal tools in 2025. This chimes with what Frank and Martijn were suggesting. Up above he also indicates that it is something that became possible at this scale only in 2025 too.

    6. I love the asynchronous coding agent category. They’re a great answer to the security challenges of running arbitrary code execution on a personal laptop and it’s really fun being able to fire off multiple tasks at once—often from my phone—and get decent results a few minutes later.

      async coding agents: prompt and forget

    7. Vendor-independent options include GitHub Copilot CLI, Amp, OpenCode, OpenHands CLI, and Pi. IDEs such as Zed, VS Code and Cursor invested a lot of effort in coding agent integration as well.

      non-vendor related coding agents. - [ ] which of these can I run locally? / integrate into VS Code

  4. Dec 2025
    1. You need three things. A Mac with Xcode, which is free to download. A $99 per year Apple Developer account. And an AI tool that can write code based on your descriptions.

      Three elements for making his iphone apps Xcode (which I use) Apple Developer account (99USD / yr) AI support in coding (he uses Claude Code, vgl [[Mijn vibe coding set-up 20251220143401]]

    1. Writing a good CLAUDE.md
      • CLAUDE.md is a special onboarding file to familiarize Claude (an AI code assistant) with your codebase.
      • It should clearly outline the WHY (purpose of the project), WHAT (tech stack, project structure, key components), and HOW (development process, running tests, build commands) for Claude.
      • The file helps Claude understand your monorepo or multi-application project and know where to look for things without flooding it with unnecessary details.
      • Keep CLAUDE.md concise and focused; ideally, it should be under 300 lines, with many recommending less than 60 lines for clarity and relevance.
      • Use progressive disclosure: point Claude to where to find further information rather than including all details upfront, avoiding overwhelming the model’s context window.
      • Complement CLAUDE.md with tools like linters, code formatters, hooks, and slash commands to separate concerns like implementation and formatting.
      • CLAUDE.md is a powerful leverage point for getting better coding assistance but must be carefully crafted—not auto-generated.
      • The file should include core commands, environment setup, guidelines, and unexpected behaviors relevant to the repository.
      • Encouraging Claude to selectively read or confirm files before use can help maintain focus during sessions.

      Hacker News Discussion

      • Users emphasized the benefit of explicit instruction patterns like "This is what I'm doing, this is what I expect," which improves monitoring and recovery from errors.
      • Some commenters felt these markdown files had marginal gains and that model quality mattered more than the presence of CLAUDE.md.
      • A few highlighted the importance of writing documentation primarily for humans rather than solely for LLMs.
      • Discussion included anticipation of more stateful LLMs with better memory, which would impact how such onboarding files evolve.
      • Recommendations included hierarchical or recursive context structures in CLAUDE.md for large projects, allowing a root file plus targeted sub-files.
      • Comments supported having Claude address the user specifically to verify it is following instructions properly.
      • Some users noted improvements in model adherence compared to past versions, making CLAUDE.md files more effective now.
      • Practical tips were shared for managing large monorepos and integrating CLAUDE.md with version control status.