Performance: dev-browser: 3m53s, $0.88, 100% success rate — beats MCP configs, Chrome extensions, 'browser skill' stacks.
令人惊讶的是:这种新技术不仅在功能上超越传统方法,在性能指标上也取得了显著优势,100%的成功率和相对较低的成本显示了其技术成熟度和实用性,这可能会使现有的浏览器自动化解决方案迅速过时。
Performance: dev-browser: 3m53s, $0.88, 100% success rate — beats MCP configs, Chrome extensions, 'browser skill' stacks.
令人惊讶的是:这种新技术不仅在功能上超越传统方法,在性能指标上也取得了显著优势,100%的成功率和相对较低的成本显示了其技术成熟度和实用性,这可能会使现有的浏览器自动化解决方案迅速过时。
One Agent can now: open X (Twitter), scroll the feed, extract tweets, return clean JSON. No plugins. No extensions. No orchestration.
令人惊讶的是:单个AI代理现在能够独立完成复杂的社交媒体数据提取任务,无需任何插件或扩展编排,这展示了AI自主操作能力的惊人进步,可能会彻底改变数据收集和自动化工作流程。
Add dev-tools package with wt worktree manager CLI - New packages/dev-tools with standalone wt CLI for git worktree management - Commands: wt new, wt scratch, wt prune - Uses Vertex AI (gemini-2.5-flash) for branch name generation via gcloud ADC
令人惊讶的是:这个项目不仅是一个浏览器自动化工具,还内置了一个使用AI生成分支名称的Git工作树管理器。它利用Google的Vertex AI和gemini-2.5-flash模型来自动创建有意义的分支名称,这展示了AI在开发工作流中的创新应用。
Austin built the whole pipeline from his Claude Code terminal using the Notion API. He brain-dumped the desired outcome using Monologue, let Claude Code create the database and data pipeline, and pasted the generated instructions into the Notion custom agent setup.
令人惊讶的是:非技术人员可以通过语音转文本工具(Monologue)直接向AI描述需求,然后由AI自动构建整个数据管道和代理系统,这大大降低了技术门槛,使非技术团队成员也能构建复杂的AI工作流程。
Open Loop + Infinite Demand = Creative Amplifiers. Content creation & marketing strategy. AI can generate a thousand ad variations or blog posts.
令人惊讶的是:AI在创意营销领域的能力已经达到可以瞬间生成数千个广告变体或博客帖子的程度,这展示了AI作为创意放大器的潜力。然而,最终选择仍需人类判断,这揭示了AI与人类创造力之间的互补关系。
Closed Loop + Finite Demand = Efficiency Plays. AI bookkeeping categorizes transactions, reconciles accounts, files returns. Deterministic rules applied to numbers.
令人惊讶的是:即使是有限需求领域,AI也能通过确定性规则实现显著效率提升。AI记账系统能够自动处理分类、对账和报税等任务,这表明即使在传统上需要人工判断的财务领域,AI也能通过标准化流程创造价值。
accounting and auditing showing nearly a 20 percent jump on GDPval and even domains like police / detective work showing a nearly 30 percent improvement.
会计审计能力 4 个月提升 20%,警察/刑侦工作提升近 30%——这两个数字分别代表了两种截然不同的威胁:前者是白领知识工作(会计师)的自动化压力正在加速;后者则更令人不安,AI 在犯罪调查领域的快速进步,意味着监控和执法能力正在以同样的速度提升。GDPval 把这两件事放在同一个坐标轴上,本身就是一个值得深思的设计选择。
METR conclude that “the length of tasks AI can do is doubling every 7 months”. I’m not convinced that pattern will continue to hold, but it’s an eye-catching way of illustrating current trends in agent capabilities.
a potential pattern to watch. Even if it doesn't follow a exponential trajectory. If it keeps the pattern in tact, by August we should see days of SE work being done independently by models.
The chart shows tasks that take humans up to 5 hours, and plots the evolution of models that can achieve the same goals working independently. As you can see, 2025 saw some enormous leaps forward here with GPT-5, GPT-5.1 Codex Max and Claude Opus 4.5 able to perform tasks that take humans multiple hours—2024’s best models tapped out at under 30 minutes.
Interesting metric. Until 2024 models were capable of independently execute software engineering tasks that take a person under 30mins. This chimes with my personal observation that there was no real time saving involved, or regular automation can handle it. In 2025 that jumped to tasks taking a person multiple hours. With Claude Opus 4.5 reaching 4:45 hrs. That is a big jump. How do you leverage that personally?
f you define agents as LLM systems that can perform useful work via tool calls over multiple steps then agents are here and they are proving to be extraordinarily useful. The two breakout categories for agents have been for coding and for search.
recognisable, ai agents as chunked / abstracted away automation. This also creates the pitfall [[After claiming to redeploy 4,000 employees and automating their work with AI agents, Salesforce executives admit We were more confident about…. - The Times of India]] where regular automation is replaced by AI.
Most useful for search and for coding
Home security company Vivint, which uses Agentforce to handle customer support for 2.5 million customers, experienced these reliability problems firsthand. Despite providing clear instructions to send satisfaction surveys after each customer interaction, The Information reported that Agentforce sometimes failed to send surveys for unexplained reasons. Vivint worked with Salesforce to implement "deterministic triggers" to ensure consistent survey delivery.
wtf? Why ever use AI to send out a survey, something you probably already had fully automated beforehand. 'deterministic triggers' is a euphemism for regular scripted automation like 'clicking done on a ticket triggers an e-mail for feedback', which we've had for decades.
All of us were more confident about large language models a year ago," Parulekar stated, revealing the company's strategic shift away from generative AI toward more predictable "deterministic" automation in its flagship product, Agentforce.
Salesforce moving back from fully embracing llms, towards regular automation. I think this is symptomatic in diy enthusiasm too: there is likely an existing 'regular' automation that helps more.
On AI agents, and the engineering to get one going. A few things stand out at first glance: frames it as the next hype (Vgl plateau in model dev), says it's for personal tools (doesn't square w hype which vc-fuelled, personal tools not of interest to them), and mentions a few personal use cases. e.g. automation, vgl [[Open Geodag 20241107100937]] Ed Parsons of Google AI on the same topic.
I believe the final policy shall contain robust rationale and, in the best way possible, avoids the perception of rAIcial discrimination
“You have to assume that things can go wrong,” shared Waymo’s head of cybersecurity, Stacy Janes. “You can’t just design for this success case – you have to design for the worst case.”
Future proofing by asking "what if we're wrong?"
Hoffman, R., Mueller, S., Klein, G., & Litman, J. (2021). Measuring Trust in the XAI Context. PsyArXiv. https://doi.org/10.31234/osf.io/e3kv9
Side note: When I flagged yours as a dupe during review, the review system slapped me in the face and seriously accused me of not paying attention, a ridiculous claim by itself since locating a (potential) dupe requires quite a lot of attention.
Yes, autoexpect is a good tool, but it is used just to automatically create TCL-expect scripts, by watching for user. So it’s can be equal to writing expect-scripts by hand.
You can now distribute your add-on. Note, however, that your add-on may still be subject to further review, if it is you’ll receive notification of the outcome of the review later.
that can be partially automated but still require human oversight and occasional intervention
but then have a tool that will show you each of the change sites one at a time and ask you either to accept the change, reject the change, or manually intervene using your editor of choice.
Overestimating robots and AI underestimates the very people who can save us from this pandemic: Doctors, nurses, and other health workers, who will likely never be replaced by machines outright. They’re just too beautifully human for that.
Yes - we used to have human elevator operators and telephone operators that would manually connect your calls. We now have automated check-out lines in stores and toll booths. In the future, we will have automated taxis and, yes, even some automated health care. Automated healthcare will enable better healthcare coverage with the same number of healthcare workers (or the same level of coverage with fewer workers). There can be good things or bad things about it - the way we do it will absolutely matter. We just need to think through how best to obtain the good without much of the bad ... rather than assuming it wont ever happen.
the demand for products will keep climbing as well, as we’re seeing with this hiring bonanza.
Probably not. The increase in demand is a result of the social-distancing and the hoarding. This is not a steady state. The demand for many things will return to normal (or below) once people figure out what they are using and what is still available. For example - you don't use that much more toilet paper when you are at home ... but you buy more if you don't know when it will be available again.
Last week, Amazon officials announced that in response to the coronavirus they were hiring 100,000 additional humans to work in fulfillment centers and as delivery drivers, showing that not even this mighty tech company can do without people.
Amazon has adopted automation in a very big and increasing way. Just because it has not automated everything yet, doesn't mean that complete automation isn't possible. We already know automated delivery is in the works. Amazon, Uber and Google are all working on the details of autonomous navigation ... and the ultimate result will absolutely impact future drivers (pun intended).
Why haven’t the machines saved us yet?
because machines don't buy tickets to fly on planes and vacation on cruise ships.
And that’s all because of the vulnerabilities of the human worker.
It has more to do with the vulnerabilities of the human traveler and the human guest (and less to do with the workers). The demand for these services has simply gone down while people try to avoid spreading the virus.