We also introduce an agentic streaming inference framework that supports thousand-second-scale generation while mitigating drift.
大多数人认为长时间视频生成必然会导致内容漂移(drift)和质量下降,但作者声称他们的智能体推理框架能够支持千秒级生成同时减轻漂移,这挑战了关于长时间生成一致性的普遍认知。
We also introduce an agentic streaming inference framework that supports thousand-second-scale generation while mitigating drift.
大多数人认为长时间视频生成必然会导致内容漂移(drift)和质量下降,但作者声称他们的智能体推理框架能够支持千秒级生成同时减轻漂移,这挑战了关于长时间生成一致性的普遍认知。
what changed after the December 2025 model inflection , and why "spec to pull request" is now becoming a real production workflow.
'Spec to pull request' as a production workflow means the human's job becomes writing requirements, not code — a complete inversion of the current engineering process. The December 2025 inflection point is significant: it marks when models became capable enough to close the gap between high-level intent and production-ready implementation without constant human steering.
Progress is saved as the run goes, so a job that's interrupted picks up where it left off instead of starting over. Because the coordination happens outside the conversation, the plan stays on track no matter how big the task gets.
Persistent, resumable state for multi-hour agent runs solves a critical reliability problem that has limited agentic AI adoption. By moving coordination outside the conversation context, the system breaks free from the context window limit that bounds all single-session AI work — this is architecturally different from just a longer context.
Weirdly though, those things have started to blur for me already, which is quite upsetting.
Simon表达了对vibe coding和agentic engineering边界模糊的担忧,这让他感到不安。
Weirdly though, those things have started to blur for me already, which is quite upsetting. I thought we had a very clear delineation where vibe coding is the thing where you're not looking at the code at all. You might not even know how to program. You might be a non-programmer who asks for a thing, and gets a thing, and if the thing works, then great! And if it doesn't, you tell it that it doesn't work and cross your fingers.
作者原本认为vibe coding和agentic engineering有明确界限,但现在发现两者界限正在模糊,这让他感到不安。
LLMs accelerate the wrong part
【洞察】「LLM 加速了错误的部分」——这句话点破了 AI 编程工具的根本问题:它们加速了代码的「生成」(原本不是瓶颈),却无法加速代码的「理解、审查和维护」(真正的瓶颈)。与 a16z 报告的「10-20x 生产力提升」数据对照:生产力的提升是真实的,但被提升的维度是否是最应该被提升的维度,是一个完全不同的问题。
the more you rely on AI to write code, the less you're able to oversee what the AI writes
✉️【洞察·监督悖论】这是本周关于 AI 编程最深刻的一句话:越依赖 AI,越失去监督 AI 的能力。这是一个隐性的技能退化循环,与肌肉萎缩类似——不用则废。与 Uncle Bob「传统编程已终结」的乐观叙事正面交锋:如果开发者失去了理解代码的能力,他们还能做什么来保证 AI 生成代码的质量?
The model kept finding better approaches the longer it ran, which connects directly to the long horizon behavior that makes agentic models actually useful in production.
这个发现揭示了代理模型在长时间运行任务中的独特优势 - 它们能够持续改进而非达到性能上限。这与传统AI模型形成鲜明对比,后者通常在训练完成后性能相对固定。这种持续学习能力可能是代理模型在实际生产环境中超越其他模型的关键因素。
If ChatGPT was the moment consumers discovered AI could talk, OpenClaw may be the moment they discovered AI could act.
精准概括了从对话式 AI 到代理式 AI 的范式跃迁。「说」与「做」之间存在巨大鸿沟:前者只需理解,后者需要执行力和可靠性。OpenClaw 从个人项目到 GitHub 第一,说明开发者对「真正能干活的 AI」有强烈渴求。2026 年可能是 AI 从「聪明聊天者」变为「可靠执行者」的关键转折年。
Agentic AI is increasingly judged not by fluent output alone but by whether it can act, remember, and verify under partial observability, delay, and strategic observation.
大多数人认为AI系统的价值主要取决于其流畅的输出能力和表现,但作者认为AI应该被评估其行动能力、记忆能力和可验证性,因为这些因素在部分可观测性、延迟和战略观察的环境下更为关键。这一观点挑战了当前主流AI评估标准,强调了AI系统在复杂现实环境中的实际表现而非仅仅是语言流畅度。