Why testing is much harder than "computer use" Screenshots, video verification, and the "I know it works" merge moment
The 'I know it works' merge moment captures something real: human engineers have a holistic intuition about whether a change is safe that current agents lack. Video-based verification is a fascinating workaround — using visual confirmation of a running application as a proxy for correctness. This suggests the testing problem for async agents is fundamentally different from unit tests: it requires environmental validation, not just logical assertion.