4 Matching Annotations
  1. Last 7 days
    1. Why testing is much harder than "computer use" Screenshots, video verification, and the "I know it works" merge moment

      The 'I know it works' merge moment captures something real: human engineers have a holistic intuition about whether a change is safe that current agents lack. Video-based verification is a fascinating workaround — using visual confirmation of a running application as a proxy for correctness. This suggests the testing problem for async agents is fundamentally different from unit tests: it requires environmental validation, not just logical assertion.

  2. Apr 2026
    1. but would fail recognize that the feature didn't work end-to-end

      这揭示了Agent在认知上的盲区:它容易陷入“代码视角”的自证预言,以为单元测试通过就等于功能完整。引入端到端浏览器自动化测试,是强迫Agent站在“用户视角”去验证,这是从开发者思维向产品思维跨越的关键。

  3. Sep 2023
  4. Jun 2020