Macos app that hooks into your AI processes to maintain a better overview and less switching. The entire site is generated it seems, judging by the texts and the non-functioning element.
- Last 7 days
-
overwatchr.dev overwatchr.dev
-
-
ladybird.org ladybird.org
-
For decades, code contributions have been how open source projects learned who to trust. People would show up, do the work, take responsibility for their changes, and stick around. Over time, trust emerged from the work itself. AI tools have changed the economics of this very quickly. We use them ourselves every day, but a pull request no longer tells us as much as it used to about the person submitting it. A substantial patch used to imply substantial effort, and that effort was a reasonable proxy for good faith. That assumption no longer holds. For a browser, this matters. A browser runs untrusted input from the entire internet on the user’s machine, and one well-disguised vulnerability is all an attacker needs. We have already seen patient, well-resourced campaigns in open source to earn maintainer trust and abuse it. What has changed is how much faster and cheaper it has become to produce work that looks like a serious contribution.
-
Tags
- good faith
- effort as a proxy for good faith
- contributing to open-source software
- AI tools allow work/contributions to be generated much more quickly
- open-source software
- abuse of trust
- Ladybird
- anti-generative-AI
- AI tools allow low-quality or untrustworthy work/contributions to be generated quickly
- no longer accepting contributions from public
- trust
Annotators
URL
-
-
www.youtube.com www.youtube.com
-
https://web.archive.org/web/20260626115441/https://www.youtube.com/watch?v=UQ7Nywt_NnU
Presentation by Rainer Mühlhoff on KI und der neue Faschismus at re:publica 2026. Saw his book [[Künstliche Intelligenz un der neue Faschismus by Rainer Mühlhoff]] in [[Zurich 2026]] at Orell Füssli book store.
- [ ] return to grab transcript #pkm
-
-
www.science.org www.science.org
-
Both Scarlata and Gingras are concerned that papers by less prominent scientists have disappeared as well without anyone realizing. At a minimum, Gingras wants Planck’s papers restored. “Whoever did it, I don’t care,” he says, “just put them [back] in the database. Intellectually, it’s not acceptable.”
Retroactively editing / deleting the scientific record through automation is highly problematic The epistemological centipede from [[Talk The Expanding Dark Forest and Generative AI]] is also eating the past here.
-
-
patrickmccanna.net patrickmccanna.net
-
you can't produce the logic using the local files. The reasoning logs on your system are not accessible to you.
本地文件里的推理日志你看不了——这对 AI agent 的审计追踪(audit trail)承诺是个釜底抽薪式的打击。如果你在合规场景(金融、医疗、法律)中使用 Claude Code 作为自主代理,而你无法重建它做出某个决策时的推理过程,那所谓的「可审计 AI」就是一句空话。
-
Getting the full thinking output requires an enterprise agreement.
完整推理输出需要企业协议——这把「AI透明度」变成了一个商业特权。普通开发者和中小企业只能拿到摘要,只有签了企业合同的大客户才能接近真相。在 AI 问责(accountability)的讨论中,这意味着透明度是分级的、是可以被钱买到的,这和「公共基础设施」的定位相矛盾。
-
Claude encrypts its reasoning into that signature. Anthropic holds the key. Your machine doesn't receive it.
三句话道尽核心问题:推理被加密 → 密钥在 Anthropic → 你的机器拿不到。这不是技术细节,而是一个主权问题:AI 代理在你的机器上执行任务,但你没有权力查阅它是怎么想的。这和「黑盒 AI」的批评如出一辙,只是换了一个更精确的技术形式——你不只是不理解,而是被明确排除在外。
-
-
www.datacenterdynamics.com www.datacenterdynamics.com
-
SpaceX is reportedly in talks to merge with xAI
SpaceX + xAI + Tesla 的横向整合正在成形:火箭提供发射能力,轨道卫星提供算力基础设施,xAI 提供模型,Tesla 提供边缘终端。如果三家合并,将是有史以来垂直整合程度最高的 AI 基础设施帝国——从能源(太阳能卫星)到算力(轨道数据中心)到模型(Grok)到终端(Tesla)全打通。
-
Orbital data centers are the most efficient way to meet the accelerating demand for AI computing power
轨道数据中心的核心逻辑:太空有近乎无限的太阳能(免费)和辐射散热(免费),而地面数据中心的能源和冷却成本正在成为 AI 算力扩展的最大瓶颈。如果 Starship 实现可复用低成本发射,单位算力的全生命周期成本理论上可以低于地面。这个逻辑不是 Musk 发明的——Bezos 和 Google 都在同一个方向投注。
-
-
algorithmichiring.github.io algorithmichiring.github.io
-
Data access inhibits independent research into hiring algorithms
论文最刺耳的政策呼吁:「我们是唯一一个独立开展大规模实证研究的团队」。在招聘算法已主宰数百万人命运的情况下,研究者竟然无法获得数据来研究它——这和制药公司不让独立研究者测试药物一样荒谬。立法强制数据开放(类似欧盟 DSA 的数据访问条款)可能是唯一出路。
-
We conduct the largest empirical study of algorithmic hiring with data for 3.4 million real job applicants submitting 4 million applications to 156 employers across 11 market sectors.
迄今最大规模的招聘算法实证研究:340万真实求职者、400万份申请、156家雇主、11个行业。这种规模意义重大——此前所有研究都因数据获取壁垒停留在实验室层面,这是第一次在真实部署环境中验证理论担忧。
-
Over 90% of U.S. employers rely on hiring algorithms to screen job applicants.
超过90%的美国雇主依赖算法筛选求职者——美国就业市场的「入场券」已经大规模由 AI 控制,但监管框架远远滞后。这不是小众技术前沿问题,而是影响数亿人职业命运的社会基础设施。
-
-
workspaceupdates.googleblog.com workspaceupdates.googleblog.com
-
The functionality seamlessly supports everything from basic arithmetic to highly intricate calculations, simplifying what is traditionally a frustrating and time-consuming debugging process.
大多数人认为AI工具在处理简单任务时效率高,但在复杂专业领域表现有限,但作者声称Gemini能无缝处理从基础到高度复杂的所有计算,这挑战了AI能力随复杂度递减的普遍认知。如果属实,这将代表AI辅助工具的重大突破。
-
When you encounter a formula error, Gemini can analyze the surrounding data structure to help provide an easy-to-understand explanation of the core issue alongside a corrected version of the formula.
大多数人认为AI工具需要用户提供明确的指令才能解决问题,但作者认为Gemini能够主动分析数据结构并自动提供解决方案,这挑战了传统AI辅助工具需要用户主导的常识。这种自动纠错能力暗示AI正在从'助手'角色向'自主问题解决者'转变。
-
-
huggingface.co huggingface.co
-
Serves as the first generative core for social world models, a foundation for next-generation AI-native social platforms.
大多数人认为社交平台的核心是用户连接和内容分发,而非生成式AI。作者提出AI生成内容应成为社交平台的基础架构,这挑战了当前社交媒体平台的根本设计理念。
-
-
www.tomshardware.com www.tomshardware.com
-
The Maia 200 does beat the B300 in efficiency, however, a big win in a day where public opinion against AI's environmental effects is steadily mounting. The Maia 200 operates at almost half of B300's TDP (750W vs 1400W)
大多数人认为高性能AI芯片必然伴随着高能耗和散热挑战,但作者认为微软的Maia 200在提供强大计算能力的同时实现了惊人的能效优势,仅消耗Nvidia Blackwell B300 Ultra一半的功率。这一反直觉的发现挑战了AI领域'性能与能耗成正比'的传统认知,暗示了专用AI芯片架构设计的创新突破。
-
-
www.cnbc.com www.cnbc.com
-
Recent events highlight how important open source is to the AI ecosystem, with more nations and enterprises recognizing the risks and costs associated with exclusively depending on closed models.
大多数人认为封闭式AI模型因其专有技术和性能优势而更受青睐,但作者认为开源AI生态系统正变得越来越重要,因为各国和企业正在认识到完全依赖封闭模型的风险和成本,这挑战了AI行业向封闭系统发展的主流趋势。
-
For SpaceX, the deal is another sign that compute itself has become strategic currency in the AI race.
大多数人认为AI竞争的核心是算法和模型创新,但作者认为计算能力本身已成为AI竞赛的战略货币,因为SpaceX通过提供计算能力而非开发AI模型来参与AI竞赛,这挑战了人们对AI竞争核心要素的传统理解。
-
Reflection has leaned directly into that pitch as the startup, last valued at $25 billion, is trying to build American open-source AI models that can compete with frontier systems from OpenAI, Anthropic and Google.
大多数人认为AI领域由少数几家封闭式巨头主导,但作者认为开放源码AI模型能够与OpenAI、Anthropic和Google等前沿系统竞争,因为Reflection等公司正在构建能够匹敌这些巨头的开源模型,这挑战了AI领域由封闭系统主导的共识。
-
The deal shows how SpaceX is using its massive data center build-out after its record initial public offering.
大多数人认为SpaceX的核心业务是火箭和太空探索,但作者认为SpaceX已经转型为一家AI基础设施公司,因为该公司正在将其数据中心Colossus作为商业计算平台对外提供服务。这挑战了人们对SpaceX业务范围的传统认知。
-
-
-
The models are finally ready. Costs of inference are getting optimized with open models, and even on-device models.
大多数人认为AI领域仍然处于早期阶段,模型成本高且实用性有限,但作者认为模型已经'准备就绪',推理成本正在优化,这一观点暗示AI应用可能比大多数人预期的更快进入实用阶段,挑战了行业对AI成熟度的普遍认知。
-
we can finally invent new products that allow users to do things more naturally, using simple language to express their needs.
大多数人认为技术进步会使产品变得更复杂、功能更强大,但作者认为AI将使产品回归到使用自然语言的简单交互,这一反直觉观点暗示技术发展的方向不是增加复杂性,而是简化用户与技术的互动方式。
-
when I first experienced OpenClaw earlier this year, I had the epiphany that it isn't the models that matter, but the harnesses, loops, and context which will lead to so many new opportunities ahead.
大多数人认为AI领域的竞争核心在于模型本身的大小和能力,但作者认为真正重要的是'马具、循环和上下文',这一反直觉观点暗示AI应用的真正创新将围绕如何与用户互动展开,而非模型本身的进步。
-
-
www.oversightboard.com www.oversightboard.com
-
Include AI-generated sexualized impersonation as a separate category in standard content reporting and appeal forms, distinct from 'harassment' or 'nudity.'
大多数人认为性化AI内容应归类为现有类别如骚扰或色情内容,但作者认为它需要独立分类,这挑战了当前内容审核系统的分类框架。这一观点承认AI生成内容的特殊性,暗示传统内容分类可能不足以应对新兴技术带来的新型伤害。
-
Meta said that when the content was flagged, the company had no indication that the individual depicted in the video was 'a real person' because they did not report the content.
大多数人认为平台应该依赖受害者举报来确认内容真实性,但作者质疑这一做法,暗示平台有责任主动识别AI生成的性化内容,即使没有受害者举报。这一观点挑战了当前平台责任边界的主流认知,要求平台承担更多预防性责任。
-
The Board finds that AI-generated impersonation is non-consensual by default and should be added to the set of signals the company uses to establish lack of consent.
大多数人认为只有当真实受害者举报时才能确认内容是非自愿的,但作者认为AI生成的性化模仿默认就是非自愿的,这挑战了当前平台需要受害者主动举报才能采取行动的主流做法。这一观点将举证责任从受害者转移到了平台和内容创建者身上。
-
-
huggingface.co huggingface.co
-
We would like to thank Deepseek-OCR, Deepseek-OCR-2, PaddleOCR for their valuable models and ideas.
大多数人认为在AI领域,新模型通常会明确指出其与之前工作的根本性区别。作者感谢多个现有OCR模型,但没有明确说明Unlimited-OCR与这些模型的根本性创新差异,暗示可能只是现有方法的组合而非真正的突破,这与AI领域通常强调创新性的文化相悖。
Tags
Annotators
URL
-
-
blogs.nvidia.com blogs.nvidia.com
-
The NVIDIA DSX reference design for AI factories has zero water consumption — we have eliminated massive amounts of power usage and pretty much all water usage.
大多数人认为数据中心是水资源消耗大户,但作者声称NVIDIA的AI工厂设计实现了零水消耗。这与人们对数据中心需要大量水资源进行冷却的传统认知相悖,提出了一个可能彻底改变数据中心水资源使用模式的创新方案。
-
-
-
Raw output quality is on par with top frontier models, but Fugu showed unusually strong persona stability across long sessions, holding its identity where other models drift.
大多数人关注AI模型的输出质量,但作者强调Fugu模型在长时间会话中表现出异常强的角色稳定性(persona stability),而其他模型则容易出现角色漂移。这一观点将AI的个性稳定性置于传统性能指标之上,挑战了行业评估AI能力的标准。
-
Collective intelligence serves as the practical hedge against this concentration of power.
大多数人认为AI领域的竞争会导致技术集中和垄断,但作者认为集体智能(collective intelligence)是对抗这种权力集中的实用对冲手段。这一观点挑战了科技行业自然走向集中化的传统认知,提出了分散化AI系统的可能性。
-
orchestration is no longer just a technical optimization; it has become a geopolitical and operational imperative.
大多数人认为模型编排(orchestration)只是技术层面的优化手段,但作者将其提升到地缘政治和运营必要性的高度,暗示单一供应商依赖带来的风险已成为现实威胁而非假设。这一观点将技术问题与国家安全联系起来,颇具争议性。
-
the most powerful AI systems will not be isolated monoliths, but collaborative ecosystems.
大多数人认为AI发展的方向是构建越来越大的单一模型(monolith),但作者认为未来最强大的AI将是协作生态系统(collaborative ecosystems),因为单一模型无法满足现实世界中复杂任务所需的多样化专业知识。这一观点挑战了当前AI行业追求更大规模模型的共识。
-
-
-
AI may generate an insight, but people must still evaluate its significance and plausibility.
大多数人认为随着AI能力增强,人类专家的角色将逐渐被取代。但作者坚持认为专业知识仍然至关重要,人类必须评估AI见解的意义和合理性,这挑战了技术决定论和对AI取代人类的担忧,暗示人机协作而非替代才是未来方向。
-
That was the moment that I felt like, okay, these models have now come to a point where they really, truly understand.
大多数人认为AI模型只是基于模式识别的统计工具,无法真正'理解'科学概念。然而,作者声称GPT-5能够预测未发表实验的结果,并产生'真正理解'的洞察力,这挑战了人们对AI本质和认知能力的传统认知,暗示AI可能已达到某种形式的理解能力。
-
-
-
helps sustain progress across long-running projects
大多数人认为AI在长期项目中效果会随时间递减,因为缺乏持续学习和适应能力,但作者暗示Codex能够帮助维持长期项目的进展。这与当前AI应用在长期项目中的实际表现相悖,暗示AI工具已经发展出支持持续工作的能力。
-
break ambitious goals into verifiable steps
大多数人认为AI擅长处理整体目标和复杂任务,但作者暗示即使对于宏大的目标,也应该将其分解为可验证的步骤。这与当前AI应用中常见的'一次性解决复杂问题'的思路相悖,暗示长期项目需要更结构化的方法。
-
determine when to delegate execution to Codex versus when human oversight is most valuable
大多数人认为AI应该尽可能自动化以减少人工干预,但作者提出需要明确区分哪些任务应该完全交给AI,哪些需要人工监督。这与主流的'全面自动化'理念相悖,暗示在某些情况下人工监督可能比完全自动化更有价值。
-
How Codex helps work continue beyond a single prompt
大多数人认为AI工具主要适用于一次性任务或简单查询,但作者暗示Codex能够支持持续性的长期工作,这与当前主流认知相悖。大多数人认为AI需要不断重新初始化上下文,而作者则提出了'持久工作空间'的概念,暗示AI可以保持长期项目中的连续性。
-
-
openai.com openai.com
-
Security engineers reviewed every finding before it reached a maintainer... While frontier AI models are highly capable of finding vulnerabilities and patching them, they also produce a high volume of false positives
大多数人认为AI可以直接替代人类安全专家进行漏洞评估,但作者认为即使是最先进的AI模型也会产生大量误报,仍需人类专家进行验证和过滤。这挑战了AI完全自主安全研究的可行性预期。
-
Trail of Bits engineers found that, with limited guidance, GPT‑5.5‑Cyber made useful choices about where to expand coverage, which builds and entry points to probe, and which candidates were too weak to pursue.
大多数人认为AI模型需要大量精确指导才能有效工作,但作者认为GPT-5.5-Cyber仅凭有限指导就能自主做出明智的安全分析决策,因为它能够自主判断哪些测试路径有价值,哪些候选问题值得探索。这挑战了AI需要过度监督的常规认知。
-
-
-
Async agents are moving into everyday work. For an agent to be trustworthy and useful inside an organization, it needs real enterprise data: CRM records, repositories, inboxes, knowledge bases.
大多数人认为AI助手应该先在受限环境中测试,然后再逐步接入企业敏感数据,但作者认为AI助手应该直接接入企业真实数据才能变得可信和有用,这挑战了传统AI安全部署的渐进式方法。
-
-
www.anthropic.com www.anthropic.com
-
Models building their own software tools might have seemed outlandish not long ago, but it is happening. It would be unwise to rule out the same trajectory in hardware.
大多数人认为AI在硬件领域的自主发展和创新还很遥远,但作者认为AI在硬件领域可能遵循与软件工具相同的轨迹,因为软件工具的自主开发已经从看似荒谬变成了现实。这是一个挑战行业共识的观点,暗示了AI可能更快地实现对物理世界的直接控制。
-
We are plausibly entering the early era of physical agentic AI.
大多数人认为AI与物理世界的交互还需要很长时间才能实现,但作者认为我们正在进入物理智能代理AI的早期时代,因为AI已经能够独立操作现成的物理工具。这是一个与主流认知相悖的观点,暗示了AI与物理世界融合的速度可能比预期快得多。
-
This progress is not the result of a concerted effort to improve the robotics capabilities of our models. These improvements, like so many others in the history of LLM development, have emerged from much more general scaling.
大多数人认为AI在特定领域的进步需要针对性的优化和训练,但作者认为AI在机器人领域的进步主要来自于通用规模的扩大,而非专门针对机器人能力的改进。这与传统的AI发展理念相悖,暗示了AI能力可能具有不可预测的涌现特性。
-
Claude Opus 4.7—operating without human assistance—was about 20 times faster than the fastest human team at all tasks completed by our participants less than a year ago.
大多数人认为AI在物理世界任务中仍然需要人类监督和指导,但作者认为AI模型已经能够独立完成复杂的机器人任务,并且速度远超人类团队,因为实验显示Opus 4.7在没有人类协助的情况下,比之前最快的人类团队快了20倍。这挑战了人们对AI在物理世界操作能力的普遍认知。
-
-
www.anthropic.com www.anthropic.com
-
Claude can even automatically learn from _other_ Slack channels and data sources, if it's granted permission.
大多数人认为AI应该严格限制在特定任务和数据集内,以避免信息污染和边界模糊,但作者认为AI应该能够跨渠道学习并整合不同来源的信息。这挑战了人们对AI应用范围和数据隔离的传统认知,暗示未来AI将更像是具有广泛知识背景的团队成员。
-
We now spend much more of our time delegating tasks to many Claudes in parallel.
大多数人认为AI会取代人类工作,导致失业,但作者认为AI实际上改变了人类工作方式,让人们转向更高层次的任务分配和管理。这挑战了关于AI与就业关系的传统叙事,表明AI可能创造新的工作形式而非简单替代人类。
-
Today, 65% of our product team's code is created by our internal version of Claude Tag.
大多数人认为AI辅助编程只是辅助工具,主要用于代码补全或简单任务,但作者认为AI已经成为主要代码生产者,因为内部版本已经完成了产品团队65%的代码生成。这挑战了人们对AI在软件开发中角色的传统认知,表明AI已从辅助工具转变为核心生产力工具。
-
-
www.qualcomm.com www.qualcomm.com
-
Qualcomm Dragonfly AI300 joins the previously announced Qualcomm Dragonfly AI200 and AI250 in its data center solutions portfolio with an annual cadence AI accelerator roadmap
大多数人认为AI加速器的产品周期通常是2-3年,因为芯片设计和验证需要大量时间,但Qualcomm采用每年更新一代AI加速器的策略,这种快速迭代速度与传统半导体行业的长周期模式形成鲜明对比,暗示AI硬件市场正在加速创新周期。
-
HBC is designed to enable efficient scaling of AI agents to meet the demands of continuous reasoning, memory bandwidth, and real-time responsiveness
大多数人认为AI推理主要是GPU的领域,而CPU主要处理通用计算任务,但Qualcomm提出其HBC技术专门为AI代理的连续推理、内存带宽和实时响应需求而设计,这一观点挑战了CPU和GPU在AI工作负载中的传统分工,暗示未来计算架构可能更加专业化而非通用化。
-
AI300 with HBC Gen 2 is designed to enable another stepwise improvement with a 54x increase over AI200
大多数人认为AI芯片性能提升通常是渐进式的,每年大约20-30%的增长,但Qualcomm声称其AI300芯片相比前代AI200有54倍的内存带宽提升,这一指数级增长速度与行业常规认知相悖,暗示AI基础设施可能正在经历范式转变。
-
HBC is designed to enable a 6x increase in bandwidth per watt versus HBM compared to competing published product specifications normalized at card-level
大多数人认为高带宽内存(HBM)是AI加速器的最佳选择,但Qualcomm声称其新的高带宽计算(HBC)技术能在每瓦带宽上提供6倍的提升,这一性能优势挑战了当前数据中心AI加速器的行业共识,暗示传统HBM技术可能面临被颠覆的风险。
-
-
-
Memory prices have skyrocketed in the last couple years as AI chips eat up all the production capacity of the small crop of vendors.
大多数人认为技术进步通常会导致价格下降,但内存市场的现状完全相反。AI需求导致内存价格飙升,打破了传统科技产品价格随时间下降的规律,这表明在特定技术变革时期,稀缺性可以完全改变市场动态。
-
-
-
The goal is to move beyond using models to find more vulnerabilities, towards a world of safer software and cyber resilience.
大多数人认为AI在安全领域的主要价值是提高漏洞发现的数量和速度,这是行业共识。但作者明确表示,他们已经超越了这一阶段,现在更关注的是提高软件的安全性和网络弹性,这反映了安全思维的根本转变。
-
As AI makes it possible to find and patch more vulnerabilities faster, it also creates more work for maintainers, who need to sift through thousands of reports, many of which are low-quality false positives.
大多数人认为AI在安全领域的应用只会减轻维护者的工作负担,因为AI能自动处理更多任务。但作者指出,AI实际上给开源维护者创造了更多工作,因为他们需要处理大量低质量的误报,这一反直觉观点揭示了技术进步可能带来的意外负担。
-
The bottleneck historically has been finding vulnerabilities, but now defenders are overwhelmed with the number of vulnerabilities found. Instead, the bottleneck is now patching vulnerabilities.
大多数人认为网络安全的主要挑战是发现漏洞,因为传统上找到安全漏洞需要专业知识和时间。但作者认为,随着AI加速了漏洞发现过程,现在的主要瓶颈已经转变为修复漏洞,因为发现的漏洞数量已经远超防御者的处理能力。
-
-
www.tomshardware.com www.tomshardware.com
-
Public reaction on the ClaudeAI subreddit appears to be split into roughly three camps. The majority see the story as an indictment of the government's cybersecurity, citing its inability to hire the required level of talent and its history of leaks. A second large group is skeptical of the claim, considering it sensationalist or even an Anthropic marketing stunt.
大多数人认为公众对AI威胁的反应要么是恐慌要么是怀疑,但作者揭示了更复杂的公众认知分化。这种非二元化的反应模式挑战了公众对AI安全议题的简单化认知,暗示社会对AI能力的评估正在形成多元但对立的观点。
-
The Financial Times reported earlier in June that roughly six Anthropic engineers are embedded directly inside the agency as forward-deployed staff, adapting and customizing Mythos for specific operational applications, with sources indicating the work could extend to infiltrating networks operated by countries including China and Iran.
大多数人认为政府限制AI模型是出于安全考虑,防止其落入敌对势力手中,但作者指出NSA实际上正在内部利用这些AI模型进行潜在的网络渗透活动。这种矛盾挑战了政府政策的一致性,暗示国家安全考量可能具有双重标准。
-
Anthropic contends that the cited breach was a narrow jailbreak, one that rival models, including OpenAI's GPT-5.5, also exhibit. According to the company, the flagged behavior amounted to asking the model to analyze a codebase and fix identified issues, which revealed a few minor, already known bugs, rather than a genuine autonomous offensive intrusion.
大多数人认为AI已经能够自主发现和利用未知漏洞进行高级攻击,但作者认为所谓的'突破'实际上只是对已知代码的常规分析,这挑战了公众对AI威胁严重性的认知。这种观点与普遍认为AI已具备自主攻击能力的看法相悖,暗示可能存在夸大其词的情况。
-
The story sheds light on the June 12 U.S. government directive barring all foreign nationals, including Anthropic's own non-citizen employees, from accessing the Fable 5 and Mythos 5 models, citing national security concerns.
大多数人认为政府限制AI模型访问是出于对技术本身风险的担忧,但作者暗示这一禁令实际上是对AI模型已展示出惊人渗透能力的直接反应。这挑战了公众对政府限制AI的动机认知,暗示真正的威胁不是理论上的,而是已被证实的实际能力。
Tags
- non-consensus
- public-perception
- ai-narrative
- government-policy
- counterintuitive
- ai-capabilities
- ai-ethics
Annotators
URL
-
-
venturebeat.com venturebeat.com
-
HappyHorse is built around a 15-billion-parameter unified self-attention Transformer that processes text, image, video, and audio tokens within a single token sequence. Unlike many competitors that stitch together separate models for video and audio
大多数人认为多模态AI模型需要整合多个专门模型来处理不同类型的数据,但作者认为Alibaba的HappyHorse使用统一架构处理所有模态,这挑战了'多模态AI需要模块化设计'的行业共识。这种统一架构可能代表AI模型设计的范式转变,暗示未来多模态系统将更加一体化而非模块化。
-
OpenAI's Sora web and app experiences were discontinued on April 26, with the Sora API set to follow on September 24. The shutdown came after the product proved financially untenable: Sora cost roughly $1 million per day to operate but generated only about $2.1 million in total revenue
大多数人认为顶级AI模型应该具有商业可行性,但作者认为即使是OpenAI这样的大公司,其旗舰视频生成产品Sora也因财务不可持续而失败,这表明AI领域的商业挑战比普遍认知更为严峻。AI技术实力并不直接转化为商业成功,这挑战了'技术领先必然带来市场成功'的主流认知。
-
-
www.wired.com www.wired.com
-
Only the iPhone Air, iPhone 17 Pro, and the iPhone 17 Max will have all the fixings, like more varied voice options. As for the rest of the lineup: Every iPhone 16 and iPhone 17 model will be able to run the new Siri, while only the iPhone 15 Pro and Pro Max will be compatible.
大多数人认为苹果会通过软件更新让所有兼容设备都能获得完整的AI功能,但作者指出苹果将Siri AI的完整功能限制在特定高端机型上,这挑战了苹果过去通过软件更新让旧设备获得新功能的传统做法。这种策略暗示了AI功能可能与硬件限制紧密相关,而非纯粹的软件升级。
-
At WWDC 2026, Apple repeatedly referenced its privacy-preserving approach to Siri AI. As part of the company's Private Cloud Compute, Apple claims it doesn't store data from users and only pulls from it when you ask Siri a question.
大多数人认为大型科技公司提供的AI服务必然会收集和存储用户数据以改进产品,但作者指出苹果声称其Siri AI采用隐私保护设计,只在用户提问时才访问数据。这一声明挑战了当前AI行业普遍依赖数据收集的做法,暗示苹果可能找到了一种既能提供AI功能又能保护隐私的新模式。
-
Unlike the ChatGPT or Claude app, Siri AI is woven right into the iPhone, so it's even more ready to go beyond answering questions and start automating more aspects of the user experience.
大多数人认为集成式AI助手如Siri会面临与独立AI应用如ChatGPT的激烈竞争,但作者认为Siri的深度集成优势使其在自动化用户体验方面可能超越这些独立应用。这一观点挑战了当前AI应用开发的主流趋势,暗示了操作系统级AI集成可能比独立应用更有价值。
-
-
-
Do you feel that the risks to an event like this are seriously compounded with the progress being made towards fully functional quantum computing?
评论者提出量子计算进展可能加剧AI安全风险的问题。这是一个值得深入探讨的技术交叉领域,需要了解量子计算与AI的结合点,以及这种结合可能带来的新风险和挑战。同时需要评估这一观点的科学依据和合理性。
-
I have worked in AI on clinical research trials and can see (even from my area in biology based AI research) that the world must not have a Chernobyl moment.
评论者提到AI在临床研究中的应用,并强调避免"Chernobyl moment"的重要性。这一观点值得深入了解,特别是AI在医疗领域的应用以及相关的安全考量。同时需要评估AI在生物医学研究中的具体应用和潜在风险。
-
The AI arms race between China and the US has researchers on both sides worried about a "Chernobyl moment."
这是一个重要的核心论点,暗示中美在AI领域的竞争可能导致灾难性后果。需要核查这一比喻的准确性,以及是否有具体证据表明双方研究人员确实对此感到担忧。同时需要了解"Chernobyl moment"在AI领域的具体含义和潜在风险。
-
The AI arms race between China and the US has researchers on both sides worried about a "Chernobyl moment."
大多数人认为中美AI竞争是零和博弈,一方领先就意味着另一方落后。但作者认为中美AI专家实际上共同担忧AI失控风险,这暗示两国在AI安全领域存在潜在合作空间,而非纯粹对抗关系。这种观点挑战了地缘政治常规思维。
-
-
techcrunch.com techcrunch.com
-
The cutbacks take place not long after Accenture threatened that employees would 'risk losing out on promotions' if they didn't use AI, 404 writes.
这是一个值得深入了解的背景信息,显示Accenture在AI使用政策上的矛盾行为。从威胁不使用AI会影响晋升,到限制AI使用的转变,反映了企业对AI价值的重新评估。这一转变的时机和原因值得进一步调查,以及这是否是行业普遍趋势。
-
The cost of tokens has thrown into doubt the AI business model — as evidenced by what's being called the 'AI selloff' which has battered some AI-dependent businesses the last few days, especially memory chip makers.
这是一个重要的市场趋势声明,将AI代币成本与AI业务模型和股市表现联系起来。'AI selloff'这一术语和它对内存芯片制造商的影响需要更多市场数据支持。这反映了AI商业化面临的挑战,值得深入了解这一趋势的广度和深度。
-
The AI industry has reached the stage where it can't just be exciting and new anymore. It has to prove its worth.
大多数人认为AI技术仍处于创新和探索阶段,重点在于技术突破和应用创新。但作者认为AI行业已经过了仅靠'新奇和兴奋'就能获得投资的阶段,现在必须证明其实际价值。这种观点挑战了科技行业常见的'先扩张后盈利'模式。
-
The cost of tokens has thrown into doubt the AI business model — as evidenced by what's being called the 'AI selloff' which has battered some AI-dependent businesses the last few days, especially memory chip makers.
大多数人认为AI技术将创造新的商业模式和巨大商业价值。但作者认为token成本已经动摇了AI商业模式的可行性,甚至导致AI相关企业股票下跌。这与市场对AI技术普遍乐观的看法形成鲜明对比。
-
We now appear to be entering the era of token rationing.
大多数人认为AI技术会持续扩大应用范围,企业会越来越依赖AI。但作者认为我们正进入'配给token'的时代,这与主流认知中AI技术将无限扩张的观点相悖。这种配给制反映了企业对AI成本效益的重新评估。
-
-
www.cnbc.com www.cnbc.com
-
The letter lands two months after the White House Office of Science and Technology Policy issued a memorandum that pledged to help AI companies detect and coordinate against industrial-scale distillation.
这句话提供了重要的政策背景,表明此事件发生在特定的政策环境下。需要了解该备忘录的具体内容和实施情况,以及它如何影响Anthropic和Alibaba的行为。这涉及到政府政策与科技行业实践之间的互动关系,值得深入了解。
-
Anthropic said operators affiliated with Alibaba and its AI lab carried out 28.8 million exchanges with its models using roughly 25,000 fraudulent accounts between April 22 and June 5.
这是一个具体的数据声明,涉及大量账户活动和数据交换。需要核实这些数字的准确性,包括:如何定义'fraudulent accounts'(欺诈账户),28.8 million exchanges的具体性质,以及Anthropic如何追踪这些活动。这些数据对于评估事件规模和严重性至关重要。
-
Anthropic sent a letter to U.S. officials accusing Alibaba of 'brazenly' and 'illicitly' attempting to extract its AI capabilities.
这是一个需要核实的重要事实声明,涉及两家大型科技公司之间的指控。'brazenly'(厚颜无耻地)和'illicitly'(非法地)等强烈用词表明Anthropic的指控非常严重,需要独立证据支持。应核实信件的真实性、具体指控内容以及是否有第三方证据支持。
-
-
techcrunch.com techcrunch.com
-
Last week, legendary AI researcher Noam Shazeer announced that he was leaving Google for OpenAI. Shazeer had been at Google since 2000, save for the three years he spent building his controversial chatbot startup, Character.AI.
大多数人认为像Noam Shazeer这样的传奇AI研究员会长期留在Google,特别是考虑到他在公司长达23年的历史。然而作者指出他正离开加入OpenAI,这挑战了'忠诚度和长期服务会在大科技公司获得更高回报'的普遍认知。
-
-
blog.google blog.google
-
Gemini already excels at function calling and using built-in tools like Search and Maps grounding. With built-in computer use capability, developers can now use 3.5 Flash to reliably build custom agents that can see, reason and take action across browser, mobile and desktop environments.
大多数人认为AI代理需要专门的模型和架构来处理跨平台任务,但作者认为将计算机使用功能集成到现有模型中就能实现这一目标。这挑战了构建复杂AI代理需要完全重新设计系统的观点,强调了现有模型扩展的可能性。
-
Previously only available as a standalone Gemini 2.5 computer use model, computer use is now integrated natively in the main Gemini Flash model.
大多数人认为高级AI功能应该作为独立模块提供以确保最佳性能和控制,但作者认为将计算机使用功能直接集成到主模型中反而能提供更好的性能。这挑战了模块化设计在AI开发中的主流做法。
-
Computer use is now a built-in tool supported in Gemini 3.5 Flash, delivering our best performance yet for agentic computer use tasks.
大多数人认为AI模型需要专门的计算机使用功能才能执行复杂任务,但作者认为这种功能现在可以作为内置工具集成到主模型中,因为3.5 Flash已经能够可靠地构建跨平台代理。这挑战了AI需要专门模块处理计算机交互的传统观念。
-
-
glassmanlab.seas.harvard.edu glassmanlab.seas.harvard.edu
-
Researchers have done rich historical investigations of individual notations (e.g., [3, 70, 74, 76, 115]), but the more general mechanisms and patterns through which new notations are created and formalized are less understood.
-
We began with a diverse set of notations across five disciplines—music, dance, chemistry, physics, and computer programming—that had prior historical literature to draw upon in our initial analysis.
-
Since histories of specific notations tends to miss detailed, direct observations around the initial creation process, we complement this "macro" analysis with occasional references to experiment-based literature from experimental semiotics, communication theory, and cognitive science into how people use notations to ground communication, largely in lab studies.
-
we conducted a comparative historical analysis of the development of different notations which individually have been documented in prior literature. Specifically, we conduct a parallel comparative history which "seek[s] above all to demonstrate that a theory similarly holds good from case to case... [and where] differences among the cases are primarily contextual particularities against which to highlight the generality of the [theorized] processes"
-
Studying software teams, Cherubini et al. [34] found a "tendency to adopt informal, ad-hoc notations" and a "limited adherence to standards of any sort."
-
Studies are also conducted on various existing notating practices, usually in specific domains (e.g., how programmers draw diagrams to communicate ideas [34, 62, 63]).
-
What seem today as obvious notations often have relatively short histories: for instance, arrows in diagrams emerged around the 18th century.
-
Notations are deployed and embedded throughout the process of HCI and software development.
One or more sentences contextualizing the current work with typically uncited statements about the past.
-
Almost everything we do with computers involves notations.
One or more sentences contextualizing the current work with typically uncited statements about the past.
-
These informal interactions can then lead to formal representations, but depend upon pre-existing formalisms known to both humans and AI.
-
Seemingly 'obvious' notations to academics are also not obvious to everyone: e.g., about one-third of the U.S. and German populations have low literacy in reading data visualizations [51].
-
As current AI technologies rely upon, reproduce, and amplify established, dominant, already-formalized abstractions and notations in order to function
-
Many notations are culturally learned and inherited.
One or more sentences contextualizing the current work with typically uncited statements about the past.
-
From our analysis, we derive a set of initial implications for the design of future systems that create new abstractions (Section 5), including that notations primarily originate through linking metaphors and most often in a social—rather than a technical—context, and that notation design decisions around what to include as "meaningful" (and thus what to exclude) are often left implicit by inventors, but could be made explicit and become manipulable objects through reification [10].
-
Our work contributes to a longstanding dream of dynamic abstractions in HCI, where users can dynamically communicate and express themselves through notations (interfaces) that they are most comfortable with at the moment of expression, beyond ones predefined by developers [96, 143, 144, 148, 149].
-
Here we present suggestions for system designers, with concrete examples inspired by our patterns. These are just some interesting ideas that came to mind, rather than an exhaustive list.
-
Alongside the social stages of notation development above are three functional stages that emerge from reflection upon our analysis—descriptive, generative, and evaluative stages (borrowing terminology from Generative Theories of Interaction [11])
-
Our historical analysis suggests that, cognitively and socially, a notation proceeds by: (1) Enumerating dimensions of meaningful variation in the target domain, which proliferate as more situations are encountered or considered (whether by inventors or users) (2) Mapping dimensions of meaningful variation to perceptual channels of representation (3) Designing the notation to leverage perceptual affordances by visual analogy to embodied transformations like pouring cups or rotating shapes, and ensuring these "natural" manipulations hold meaning in the target domain
-
These stages form a spectrum and are not rigid boundaries. We clustered patterns into the most relevant stage for ease of presentation; however, patterns can be applicable across stages.
-
Our review identified many empirical patterns in the notation development process. We state each pattern, briefly describe it, and provide examples.
-
Our analysis identifies 33 patterns of how notations are created, evolved, and formalized over time, which are largely shared across histories and loosely categorized into three social stages of development (invention/incubation, dispersion/divergence, and institutionalization/sanctification) and three functional stages (descriptive, generative, and evaluative).
-
What about novel formalisms and notations? How are new abstractions created, evolved, and incrementally formalized over time—and how might new systems, in turn, be explicitly designed to support these processes?
-
How might we co-create a new notation with a machine, and thereafter communicate through that notation, even share out the notation to broader communities?
-
While current AI systems support "horizontal" translations from informal ideas to established notations, how should we ensure that the "vertical" process of creation—new notations, new abstractions—is also supported?
-
How do humans ultimately develop new notations, new formalisms, and new abstractions, that they use to communicate with machines and each other?
-
The use of notation happens everyday in small ways, e.g., whenever people work together over a whiteboard or paper towards a joint objective. People jot down X's, boxes and arrows to stand-for concepts they are working through.
-
Human-computer interactions have historically been mediated by formally-defined structures—such as command-line interfaces, graphical user interfaces, and programming languages—that provide an unambiguous mapping to an underlying formal model.
-
-
dl.acm.org dl.acm.org
-
SDT broadly differentiates three types of motivation [157]: Intrinsic motivation denotes activity pursued for its inherently interesting or enjoyable qualities. Extrinsic motivation refers to activity pursued for a separable outcome. Amotivation denotes the absence of intentional motivation, where a person may no longer be aware why they pursue an activity.
-
Basic psychological needs theory (BPNT) posits three basic psychological needs that energise organismic processes: competence, the feeling of having an effect; autonomy, a sense that actions are self-endorsed and performed willingly; and relatedness, a sense of reciprocal care, value, and belonging in relation to other social figures and collectives [158].
-
SDT is broadly organised into six mini-theories, whose underlying concepts are continuously developed, critiqued, and revised (e.g., [186, 190, 191]).
-
At its core, SDT is a scientific theory [163], in that it contains a number of empirically-testable propositions [199] that generalise across varied contexts, which serve to explain and predict the impact of certain events on motivation and wellbeing.
-
SDT is a psychological macro-theory of human motivation, growth, and wellbeing [47, 48, 163] that characterises humans as fundamentally active organisms.
-
Self-Determination Theory (SDT), a major psychological theory of human motivation, has become increasingly popular in Human-Computer Interaction (HCI) research on games and play.
-
-
dl.acm.org dl.acm.org
-
To our knowledge, the first SDT research involving videogames [18] was conducted shortly after Deci's original formulation of CET [129] and investigated whether extrinsic rewards would reduce intrinsic motivation even for 'highly intrinsically motivating' activities such as videogame play. Videogames' intrinsically motivating qualities were also examined in early research on learning [e.g., 351]; however, focused examination of other core SDT concepts such as need satisfaction largely began much later [365].
-
Research on games and play in HCI (henceforth HCI games research), however, has continued to employ broad psychological theories as foundational work [417, 556]. One prominent example can be seen in self-determination theory (SDT) [481, 483], an influential theory of human motivation, which has provided HCI games research with propositions and concepts that can help explain motivational and experiential qualities of games and game-adjacent systems (e.g., gamification).
-
Psychological concepts and models have long been employed in human–computer interaction (HCI) to theorise the human user [88]. However, early applications of cognitive psychological theory did not develop into a coherent foundation of knowledge about human factors [89, 109, 455]—circumstances that Rogers [456, p. 22] attribute to "the stark differences between a controlled lab setting and the messy real world setting" for which interactive artefacts and systems are designed. The deployment of broad theory in HCI has subsequently declined in the intervening years [455, 456], and this sporadic progress in theory development in domains such as usability and user experience (UX) has been identified as a cause for concern [249, 314].
Tags
Annotators
URL
-
-
techstackups.com techstackups.com
-
GLM-5.2 vs Claude Opus
- Overview of GLM-5.2: It is Z.ai's latest flagship model, released with fully open weights under the permissive MIT license. It features a usable 1-million-token context window and dynamic capability routing via two thinking effort levels (High and Max).
- Core Limitations: GLM-5.2 is strictly text-only and lacks multimodal capabilities. It cannot process or analyze visuals, screenshots, or user interface states natively.
- Pricing Advantage: GLM-5.2 offers a substantial price reduction compared to top proprietary engines. Its API is priced at $1.40 per million input tokens and $4.40 per million output tokens, making its output generation over 5x cheaper than Claude Opus 4.8 ($5 input / $25 output).
- Head-to-Head Testing (WebGL Game from Scratch): Both models were prompted to build a third-person 3D platformer game in raw WebGL without utilizing external 3D engine libraries (such as Three.js).
- Claude Opus 4.8 Execution: Completed the build in 33 minutes and 30 seconds using ~217k output tokens ($21.92 estimated cost). It successfully implemented correct camera controllers, textures, animations, and valid win conditions.
- GLM-5.2 Execution: Took 1 hour, 10 minutes, and 40 seconds using ~131k output tokens ($5.39 real billed cost). While it successfully coded advanced mechanics like spring launch velocity, it introduced basic structural bugs—such as rendering the player backwards, omitting character textures, and ignoring win states.
- The Multimodal Verification Edge: Claude Opus leveraged its vision to inspect automated screenshots of the game, spotting and cleaning up debug overlays prior to completion. GLM-5.2 had to rely on a fallback script that sampled raw pixel colors; it verified the existence of the correct color palette but missed catastrophic visual rendering and layout bugs.
- Benchmark Performance: Official metrics place GLM-5.2 directly between Claude Opus 4.7 and 4.8. It trails Opus 4.8 on multi-file reasoning, repository-level debugging, and complex software architectures (such as SWE-Marathon and DeepSWE), but matches or exceeds frontier models on core code generation, tool use (MCP-Atlas), and math benchmarks (AIME 2026).
Hacker News Discussion
- Orchestration and Tool Selection Over Model Scale: Commenters point out that the orchestration layer is becoming the primary differentiator in production AI. The core challenge for modern engineering agents is no longer raw token intelligence, but the ability to correctly navigate real-world toolchains and evaluate responses within complex environments.
- Shift from Mainframe to PC Era in AI: The discussion highlights an architectural shift from monolithic central cloud APIs toward decentralized execution. Users emphasize that open-weight deployments give developers long-term vendor optionality and structural independence from platform deprecations or policy shifts.
- High Compute and Output Latency Overhead: Multiple engineers note that while GLM-5.2 is remarkably smart for an open-weight model, it is highly token-hungry. Its extended reasoning traces can consume over 40k tokens and multiple minutes of thinking before outputting files, making inference speed an ongoing optimization bottleneck.
- The Practical Value of Local and Managed Hosting: The community highlights that having an MIT-licensed model at this tier eliminates vendor lock-in risks. For developers without massive on-premise hardware setups (such as multi-H100 configurations) to serve a 756B parameter model, using cost-effective managed endpoints like OpenRouter provides the perfect balance of massive savings and immediate API access.
Tags
Annotators
URL
-
-
forsal.pl forsal.pl
-
AI napędzi polską gospodarkę, ale są też koszty. Grubo ponad ćwierć miliona osób może stracić pracę
Bank Światowy prognozuje, że AI może zwiększyć PKB Polski o 12% do 2035 r., ale jednocześnie zmniejszyć zatrudnienie nawet o 350 tys. etatów.
Największe zyski mają dotyczyć IT i budownictwa (wzrost nawet o 25%). Sektor finansowy może rosnąć gospodarczo, ale zatrudnienie w nim może spaść o 25%. Programiści i branża IT także mogą odczuć spadek liczby etatów. Budownictwo może zyskać ok. 20% miejsc pracy.
Jeśli Polacy nie będą chętni do zmiany zawodu, pracę straci nawet 350 tys. osób. Przy dużej mobilności pracowników ubytek etatów ma wynieść wg modeli tylko 3 tys.
Zmiany odczuje budżet państwa – spadną wpływy z PIT i składek ZUS, ale wzrosną z CIT i VAT.
-
-
www.garfield.law www.garfield.law
-
Garfield the ai 'lawyer' service mentioned in [[HR consultant wins English court case using AI lawyer in apparent legal first]]
Most of it is sending a reminder, and then a letter before taking legal action. Both can be automated, mostly are, don't need AI. So what remains is claims of: - starting court proceedings, - hiring an actual lawyer for representation in court - suggesting how to deal with counterclaims. Only the last item seems actually having something to it to me.
-
-
www.theguardian.com www.theguardian.com
-
Description of how AI 'won' a courtcase. A bit messy description of the actual case, and the role of AI in it. Says the AI is a commercially available service, Garfield, that was authorized for claims up to 10k.
-
-
-
Via [[Frank Meeuwsen p]] - [ ] return #openweb #pkm #writing
Much to unpack that is convoluted here. K generation e.g. , when k needs an observer. Or epistemological centipede when no original input remains
-
-
techcrunch.com techcrunch.com
-
Anthropic has not had the best relationship with the Trump administration in a way that stands apart from the other leading AI labs
大多数人认为特朗普政府对所有AI实验室的态度是一致的,但作者指出Anthropic与特朗普政府的关系特别紧张,这与其他领先的AI实验室不同。
-
-
-
This historic deployment for OpenAI is particularly significant because Samsung Electronics, a global leader in technology and manufacturing, is embracing AI not as a tool limited to certain teams or functions, but as a core platform for improving how employees around the world work and innovate.
这个引用强调了三星电子对AI的采用不仅仅是一个工具,而是一个核心平台,这将极大地推动全球员工的工作和创新方式。
-
-
www.wheresyoured.at www.wheresyoured.at
-
AI Is Slowing Down
-
Unsustainable Revenue Requirements and Financial Imbalance:
- The AI industry is facing a harsh economic reality driven by aggressive over-investment in data center construction and massive compute commitments.
- To achieve baseline solvency, cover soaring operational expenses, and service its massive debt burdens, the AI sector as a whole must generate an astronomical $2 trillion to $3 trillion in annual revenue by 2030.
-
Severe Debt Pressures on Tech Giants (Hyperscalers):
- Major AI labs and hyperscalers (such as Microsoft, Google, and Meta) find themselves locked in a capital-intensive infrastructure arms race.
- To sustain this frantic buildout of computational capacity, these corporations are under continuous pressure to issue hundreds of billions of dollars in debt or flood the market with massive equity, creating significant systemic risk if monetization fails to materialize.
-
Extreme Disconnect Between Compute Supply and Real Demand:
- There is a staggering gap between the infrastructure being built and actual market consumption; current global demand for AI compute sits below $100 billion.
- Driven by their staggering long-term compute liabilities, frontline entities like OpenAI and Anthropic face an incredibly steep uphill battle, needing to scale their individual monthly revenues to at least $10 billion each by early 2028 just to remain solvent.
-
Dangerous Market Concentration and Lack of Diversification:
- The commercial generative AI landscape is dangerously centralized, with just two companies—OpenAI and Anthropic—capturing roughly 89% of all startup revenue in the sector.
- This extreme consolidation reveals a critical lack of broad, diversified enterprise demand across the wider economy, meaning the massive server infrastructure being deployed relies almost entirely on the survival and growth of a tiny handful of players.
-
Corporate Cost-Cutting and Strict Spending Caps by CFOs:
- Initial corporate enthusiasm for AI integration is stalling as enterprises encounter the harsh realities of variable pricing.
- As major AI vendors transitioned to usage-based token billing, companies like Uber, T-Mobile, and Brex experienced a severe lack of cost visibility; this has prompted CFOs to step in, mandate strict budget caps, and actively scale back their AI consumption to protect their bottom lines.
-
Tags
Annotators
URL
-
-
arstechnica.com arstechnica.com
-
Leaked financial docs show OpenAI is losing billions of dollars a year
- Massive Net Losses: In 2025, OpenAI generated $13.07 billion in revenue but racked up $34 billion in total costs and expenses, resulting in an operating loss of $20.92 billion.
- One-Time Accounting Impact: Due to its transition from a non-profit to a for-profit entity, the company recorded a $41.55 billion loss from fair value changes in convertible interests and warrant liabilities. This brought the final net loss attributable to OpenAI to $38.53 billion.
- Year-over-Year Trajectory: Expenses and losses grew exponentially compared to 2024, when OpenAI brought in $3.7 billion in revenue against $12.48 billion in total costs, yielding a net loss of $5.09 billion.
- Core Expense Breakdown (2025):
- Research and Development (R&D): $19.18 billion (up from $7.81 billion in 2024).
- Cost of Revenue: $7.5 billion (up from $2.65 billion in 2024).
- Sales and Marketing: $5.73 billion (up from $1.11 billion in 2024).
- General and Administrative: $1.57 billion.
- Strategic Capital Flow & Microsoft Relationship: OpenAI paid Microsoft $17.2 billion in service fees during 2025 ($10.59 billion for R&D/model training and $6.047 billion for computing cost of revenue). By the end of 2025, OpenAI still had a remaining liability of $3.64 billion to Microsoft.
- Inbound Funding: Strategic partners provided substantial inflows; OpenAI received $867 million from SoftBank and $303 million from Microsoft in 2025.
- Remaining Cushion: As of the close of 2025, OpenAI held slightly over $50 billion in total assets, with nearly half of that cushion (~$25 billion) maintained as liquid cash reserves.
Hacker News Discussion
- R&D vs. Inference Costs: Commenters debate whether OpenAI can safely shift its massive R&D expenditure toward minimizing inference costs. While cheaper models like DeepSeek are heavily praised for personal and developer productivity, some argue stopping frontier model research means losing the structural race entirely.
- Diminishing Returns on Model Power: Users question whether a marginally smarter model justifies an exponentially higher cost. A central discussion point revolves around the financial viability of paying massive premiums for enterprise-tier models compared to utilizing low-cost API alternatives.
- The Math of Productivity Upgrades: A highly debated calculation suggests that even a 5% boost in productivity for a high-earning employee justifies hundreds of dollars in monthly subscriptions. However, critics counter that the financial surplus of that productivity is captured by companies and owners, rather than resulting in worker wage increases.
- The Path to Monetization: The consensus leans toward enterprise seat monetization (charging upwards of $2,000/month per corporate professional) and securing multi-billion dollar government contracts as the only viable business models. The inevitable integration of embedded or covert advertisements for free tiers is also viewed as highly likely.
- AGI as a Pseudo-Religious Goal: Several participants view Silicon Valley's relentless capitalization of unprofitable AI models as an irrational, faith-based pursuit of AGI (Artificial General Intelligence), comparing the narrative to religious prophecies.
-
- Jun 2026
-
dgi6ph9bl5lv1x.archive.is dgi6ph9bl5lv1x.archive.is
-
[[Felienne Hermans p]] over Rutger Bregman over AI. Kernpunten: misrepresentatie van wat Chomsky (over moraliteit) en Bender over AI zeggen (Chomsky dat het geen moraliteit kan hebben, Bender de projecte dat we onvermijdelijk AI output als resultaat van denken zullen zien). omarming geloof dat AI een pad naar universal basic wealth is, dwz het techbro LT/EA denken, zijn school voor morele ambitie positioneert 'goede' bedrijven (vgl social impact company) en serveert andere paden naar verandering af (demonstraties, politiek, non-capitalistische aanpakken) om binnne de aannames van huidige eco/tech/pol systeem te blijven. Links bashen, rechts cashen is haar oordeel. Ik denk dat naar de VS verhuizen maakt dat hij niet eens weet dat hij draait.
Tags
Annotators
URL
-
-
www.politico.eu www.politico.eu
-
"must burn planet for data centers"
-
-
rutgerbregman.substack.com rutgerbregman.substack.com
-
Rutger Bregman on AI hype
- [ ] return
-
-
www.cusp.ai www.cusp.ai
-
Prof. Geoffrey Hinton
Hinton + LeCun 同时出现在顾问名单中——两位「AI教父」罕见地联合背书同一家公司。Hinton 近年持续发出 AI 安全警告,但他选择支持 AI for materials 这类有明确正向应用的领域,本身也是一种价值观表态:用科学发现来抵消 AI 风险叙事。
-
Prof. Max Welling
Max Welling 担任 CTO,这个选角意味深长。Welling 是图神经网络(GNN)和等变神经网络(如 SE(3)-Transformers)的核心推动者,而分子和晶体结构天然具有对称性和图结构。他的研究背景几乎是为分子属性预测量身定做的,比单纯的化学信息学出身的 CTO 更具 AI-native 的技术深度。
-
While nature took billions of years to perfect molecules, we are harnessing AI to unlock trillion-dollar materials breakthroughs in months, not millennia.
cusp.ai 的核心叙事:把亿年进化压缩成数月突破。这句话精准捕捉了 AI for science 的终极承诺——不是辅助科学家,而是替代进化时间本身。「数月而非千年」是一种时间折叠,和 AlphaFold 对蛋白质折叠的影响如出一辙,只是目标换成了材料。
Tags
- Max Welling
- AI加速科学
- AI for Good
- 材料发现
- 时间压缩
- Geoffrey Hinton
- 顾问
- 背书信号
- 图神经网络
- CTO
- AI for Science
- 等变神经网络
- AI教父
- AI原生
Annotators
URL
-
-
cloud.google.com cloud.google.com
-
these atoms of knowledge live in a variety of highly fragmented systems
这段描述的是大多数组织的现实:真正有用的上下文知识——表的含义、指标的定义、运维手册、两个系统之间的join路径——散落在数据目录API、Wiki、代码注释、共享文件夹,以及几位资深工程师的脑子里。每当一个新的AI智能体需要回答「如何从事件流里计算周活跃用户」这样的问题,它都要从这些互不兼容的碎片中重新拼出答案。这是一个被严重低估的AI落地障碍,而且随着智能体数量增加,这个问题会以平方级别恶化。
-
-
openai.com openai.com
-
Chemists found the suggestion both surprising and interesting
这是全文最值得关注的细节之一。TEMPO是温和的自由基氧化剂,通常不是有机化学家考虑偶联反应时的第一直觉。AI提出了一个人类专家觉得出人意料但合理的假设——这正是科研价值的核心:不是重新发现已知的,而是在现有知识空间中找到人类视野盲区里的连接。如果AI只是系统地重组了文献中已有的方向,这个结果就不值得发表。
Tags
Annotators
URL
-
-
rorytruex.substack.com rorytruex.substack.com
-
A key through line of all these tasks is that they are time consuming
Ethan Mollick, in Co-Intelligence, makes the point that part of the signal of any letter of reference is that this person is so good that I'll burn my own time to tell you about them. Does the same "signal" concept apply to peer review and student work? (It's not entirely clear to me it does; evaluation is a different task than recommendation. But I still feel like it's worth asking how we signal value based on our use of time in evaluative processes.)
-
-
www.tomtunguz.com www.tomtunguz.com
-
Comparing agentic Qwen3.6 35b to Claude Opus is like a junior with knowledge across the board, that you really need to guide, versus a senior that thinks with you on architecture.
这个比喻很好地解释了本地模型与云端高级AI之间的差异。本地模型虽然功能强大,但仍需较多指导,而云端模型如Claude Opus更能自主思考架构问题。开发者在使用本地模型时应有合理的期望,并准备好提供更多指导。
-
-
www.tomtunguz.com www.tomtunguz.com
-
The nuances of tuning the carburetors & the timing belts of these complex beasts are tasks better assigned to a few vendors to deliver maximum intelligence per dollar & amortize the costs across a broader population.
作者将AI系统比作复杂的机械,需要精细调整(化油器和正时皮带)。他建议将这种专业任务交给少数供应商,以实现每美元最大智能回报并分摊成本。这反映了AI应用开发的专业化和集中化趋势,对初创企业考虑是否自建AI能力有重要启示。
-
Loops, the critical problem-definition exercise of this era, are hard to design. Systems design is an entire discipline... What is the best way to define a loop so an agentic system improves?
作者强调了'循环'设计在AI应用中的关键地位,将其定义为这个时代的关键问题定义练习。这反映了AI应用开发中系统设计的重要性,尤其是如何设计能够持续改进的智能系统循环。这对初学者来说是一个容易被忽视但至关重要的概念。
-
AI applications present three new disciplines to master: picking the right models, developing the hill-climbing loop, & evaluating the performance of the system for each company
作者指出AI应用开发与SaaS有本质区别,需要掌握三个新领域:选择合适模型、开发提升循环和评估系统性能。这对初学者来说是一个重要的认知转变,提醒AI应用开发需要全新的思维方式和技能集,而非传统软件开发的简单延伸。
-
the Fable retraction exposed model dependency risk, Satya's thesis defined the learning loop, & Salesforce's $3.6B Fin acquisition priced the harness.
作者提出了三个关键发展来证明AI应用进入黄金时代:模型依赖风险暴露、学习循环定义以及市场对AI套件的定价。这反映了AI应用发展的三个重要维度:风险控制、战略共识和市场验证,对理解当前AI应用生态位很有价值。
-
-
www.wired.com www.wired.com
-
The government believes it has become aware of a method of bypassing, or 'jailbreaking' Fable 5.
这是一个需要核实的政府声明,涉及AI安全漏洞的具体情况。需要确认政府是否真的发现了这种方法,以及该方法的有效性和影响范围。这反映了AI安全研究中的持续挑战。
-
Security experts say that can't be done.
这是一个关键的技术观点,但缺乏具体引用和证据。需要确认是哪些安全专家持此观点,他们的专业背景是什么,以及他们是否有具体的研究或案例支持这一论断。这关系到AI安全技术的实际可行性。
-
Trump administration officials tell WIRED that if Anthropic wants to rerelease Fable 5, it will need to ensure the model's guardrails can't be circumvented.
这是一个需要核实的重要事实声明,涉及特朗普政府对AI安全的具体要求。需要确认这是否是官方政策,以及这些要求是否合理和可行。这反映了政府与AI公司之间日益紧张的关系。
-
-
www.anthropic.com www.anthropic.com
-
We believe the government should have the ability to block unsafe deployments, as part of a statutory process that is transparent, fair, clear, and grounded in technical facts.
这体现了Anthropic的核心论点:支持政府监管但要求透明度和基于事实的决策。需要深入了解他们之前关于AI监管的公开立场,以及这一事件是否与其一贯政策一致。
-
We have found that other publicly-available models are able to discover them as well without requiring a bypass.
大多数人认为Fable 5的漏洞是独特的严重问题,但作者认为其他公开可用的模型无需绕过就能发现这些漏洞,这挑战了Fable 5存在特殊安全风险的认知,暗示政府反应过度。
-
If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.
大多数人认为政府对AI模型的安全监管是必要的保护措施,但作者认为如果这种标准(因发现狭窄的潜在越狱就召回商业模型)在整个行业应用,将基本上停止所有前沿模型提供商的新模型部署。这是一个挑战AI监管共识的观点。
-
We suspect that perfect jailbreak resistance is not currently possible for any model provider.
大多数人认为AI模型应该能够被设计成完全无法被'越狱'的,但作者认为完美越狱抵抗目前对任何模型提供商来说都是不可能实现的,因为所有行业使用的安全措施都容易受到非通用越狱的攻击。这是一个挑战AI安全领域常识的论点。
-
We suspect that perfect jailbreak resistance is not currently possible for any model provider.
大多数人认为AI公司应该追求完美的安全防护,但作者坦承完美防护是不可能的。这挑战了AI安全领域的期望,即公司应该能够完全防止其模型被滥用,转而采用更现实的防御策略。
-
We have found that other publicly-available models are able to discover them as well without requiring a bypass.
大多数人认为发现AI模型的漏洞是严重的安全问题,需要立即采取措施,但作者认为这些漏洞在其他公开模型中也存在,暗示政府的反应过度。这挑战了AI安全领域的共识,即任何漏洞都应被视为重大威胁。
-
-
vickiboykis.com vickiboykis.com
-
Running local models is good now
- Evolving Quality: Local Large Language Models (LLMs) have achieved major milestones in accuracy, utility, and speed over the past six months, transitioning from simple "personalized Google" documentation lookups to handling localized agentic software development workflows.
- Hardware Requirements: Running larger models effectively requires high-spec hardware (e.g., Apple M-Series with 64 GB+ unified RAM) to maintain an expansive Key-Value (K-V) cache and avoid critical performance degradation.
- Top Performing Architecture: Recent open-weights families, such as Gemma 4 (specifically the
gemma-4-26b-a4band the fastergemma-4-12b-qat), have successfully reached roughly 75% of the accuracy and speed found in cloud-hosted frontier API models. - Agentic Workflows: Local models can now successfully loop and interact with local environments to orchestrate non-trivial tasks like refactoring code, writing unit tests, and bootstrapping full application repositories.
- Secure Execution: Running developer-facing local agents poses local file system security risks, making a decoupled architecture—such as isolating the agent harness inside a containerized Docker Sandbox with restricted shell permissions—an essential security best practice.
- Persistent Ecosystem Bottlenecks: Despite massive progress, challenges remain around slow initial token pre-fill, limited context windows bounded by local hardware constraints, prompt template mismatches on release, and the heavy compute strain that maximizes GPU and RAM workloads.
Hacker News Discussion
- Operational Friction: Many users argue that local models remain painful to run effectively. They note a stark divide between smart but slow dense models (e.g., Qwen 27B, Gemma 31B) and fast but error-prone Mixture of Experts (MoE) models.
- The Quantization Trap: Commenters point out that many users run low-bit quantizations (like 4-bit) to save RAM, which effectively lobotomizes the model's capacity for complex tool calling. Industry recommendations favor a minimum of 5-bit for dense models and 6-bit for MoEs.
- Hardware & Comfort Trademarks: Running these workloads locally often transforms high-end laptops or desktops into loud, hot, and energy-churning machines, making the physical development environment uncomfortable.
- Privacy and Data Sovereignty: A heated debate emerged regarding hosted vs. local options. While some demand local setups due to data-collection practices and copyright concerns of major tech providers, others prefer private API gateways or hosted "open model clouds" (like OpenRouter or specialized European hosters like OVH) that guarantee Zero Data Retention (ZDR).
-
-
arstechnica.com arstechnica.com
-
xAI struck a deal to give Cursor access to its compute infrastructure, foreshadowing similar, larger deals with Anthropic and Google in the future.
大多数人认为SpaceX/xAI在AI领域是独立自主的竞争者,但作者暗示他们实际上采取了依赖其他公司的策略,先通过小规模合作测试,再寻求与更大公司的交易。这种'先小后大'的战略模式与SpaceX一贯的颠覆者形象形成反差,暗示他们可能在AI领域采取了更谨慎、依赖外部资源的策略。
-
This is a marriage between two companies that have arguably been falling behind in the AI race.
大多数人认为SpaceX和Cursor都是各自领域的领先者,但作者认为这两家公司实际上都在AI竞赛中落后了。SpaceX的Grok聊天机器人充满争议,缺乏有竞争力的编程模型;而Cursor虽然有优秀人才和产品,但在计算能力上无法与大型公司竞争。这种'失败者联姻'的叙事与主流科技公司收购叙事形成鲜明对比。
-
-
huggingface.co huggingface.co
-
By handling the specific invalid behavior instead of rejecting the entire trajectory, this approach helps prevent the training instability and model collapse that can happen when rollouts are abruptly stopped.
大多数人认为在AI训练中发现不良行为时应立即终止整个训练轨迹,但作者认为应该处理特定无效行为而非拒绝整个轨迹。这一观点挑战了AI训练中的'一刀切'方法,表明更精细化的行为管理可以防止训练不稳定和模型崩溃,从而提高训练效率。
-
As a limited-time promotion through the end of September, off-peak usage is billed at 1×. (Peak hours are 14:00–18:00 UTC+8 (Beijing Time) daily).
大多数人认为AI模型定价应该基于模型大小或性能,而非使用时间,但作者认为基于时间段的差异化定价是合理的策略。这一观点挑战了AI服务定价的行业惯例,暗示通过时间差异化管理可以有效平衡计算资源使用并提高系统效率。
-
We find that GLM-5.2 shows more potential hacking behavior than GLM-5.1. This makes the verification signal easy to optimize, but fails to actually improve the fundamental capabilities of the model.
大多数人认为模型能力的提升总是伴随着更好的性能表现,但作者认为GLM-5.2虽然表现出更多的潜在黑客行为,但这实际上并未提升模型的基本能力。这一观点挑战了'更高的性能分数总是意味着更好的模型能力'的主流认知,暗示在AI训练中存在过度优化指标而忽视实际能力提升的问题。
-
On Terminal-Bench 2.1 (81.0) it lands within a few points of Claude Opus 4.8 (85.0) — while staying ahead of Gemini 3.1 Pro.
大多数人认为开源模型与顶级闭源模型之间存在巨大差距,但作者认为GLM-5.2在终端基准测试中已经接近Claude Opus 4.8的性能,甚至超过了Gemini 3.1 Pro。这一观点挑战了AI领域'闭源模型遥遥领先'的行业共识,表明开源模型在特定编码任务上已经能够与顶级商业模型竞争。
-
GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.
大多数人认为开源模型在长距离任务能力上必然落后于闭源模型,但作者认为GLM-5.2作为开源模型已经实现了实际的长距离任务交付能力,甚至在某些基准测试中超过了GPT-5.5等闭源模型。这一观点挑战了AI领域'闭源模型必然优于开源模型'的主流认知,表明开源模型在特定任务上已经能够达到商业级别的性能。
-
-
www.anthropic.com www.anthropic.com
-
each prompt the user sends sets off a chain of around 10 actions taken by Claude on average
这个数据点表明每个用户提示平均触发约10个Claude行动,这显示了AI代理的自主性和效率。这一比例表明用户只需提供高层次指导,AI就能执行大量具体任务。然而,文章提到尾部数据(约2%的会话平均超过100个行动/提示),这表明使用模式存在显著差异。10:1的行动-提示比是理解AI代理工作效率的关键指标,但文章未说明这些行动的类型和质量差异。
-
people make about 70% of the planning decisions but only 20% of the execution decisions
这个70/20的决策分配比例清晰地展示了人机协作的分工模式:人类负责'做什么',AI负责'怎么做'。70/20的比例表明AI在执行层面有相当大的自主权,这可能与人们通常预期的人工监督主导模式不同。这个数据点支持了文章核心论点——AI代理正在重新定义编程工作的人机分工模式。然而,文章未详细说明如何定义和分类'决策',这可能影响数据的准确性。
-
each prompt the user sends sets off a chain of around 10 actions taken by Claude on average
这个数据点表明,每个用户提示平均触发约10个Claude行动,显示了AI的自主性和效率。这个平均值掩盖了巨大的变异性 - 文章提到约2%的会话平均每个提示超过100个行动。这一数据点表明Claude能够自主执行复杂任务序列,但用户需要监控这些行动以确保结果符合预期。
-
people make about 70% of the planning decisions but only 20% of the execution decisions
这个70/20的比例揭示了人机协作的明确分工模式:人类主要负责决策规划,AI则负责具体执行。这一比例表明AI在执行任务方面已经相当自主,但在战略规划上仍依赖人类。这一数据点与同类研究相比显示出较高的人机协作水平,可能反映了Claude Code的设计理念和用户使用习惯。
-
-
techcrunch.com techcrunch.com
-
the message is clear: The AI industry isn't immune from U.S. government interference
大多数人可能认为AI技术的前沿性质使其能够规避传统监管框架,但作者认为政府的禁令明确传递了一个信息:即使是尖端AI技术也不能摆脱政府干预。这与科技行业自认为能够自我监管的普遍认知相悖。
-
The AI industry isn't immune from U.S. government interference
虽然许多人认为AI行业相对独立于传统政府监管,但作者明确表示AI行业并非免疫于政府干预。这一观点挑战了科技行业自主性的主流叙事,暗示AI公司可能面临与传统行业类似的政府压力。
-
The US government's Anthropic models ban was never about an AI jailbreak
大多数人认为政府禁止Anthropic的AI模型是出于安全考虑,特别是担心AI越狱风险,但作者认为这并非真正原因。这是一个非共识观点,挑战了公众对政府监管AI的普遍理解。
-
-
www.theverge.com www.theverge.com
-
Restoring global trust in American AI is another thing entirely. No matter how long the shutdown lasts, it shined a light on how fragile access to US frontier AI models is.
大多数人可能认为美国AI技术的优势地位是稳固的,但作者认为,这次事件暴露了美国AI访问权的脆弱性,可能永久性地损害了全球对美国AI技术的信任。这一观点挑战了美国AI技术主导地位的稳固性假设。
-
Trump may see restricting Mythos and Fable as a matter of national security. But the argument cuts both ways, and with Washington now asking if AI is too important for everyone to have access, other governments are asking whether they can afford for Washington to decide who does.
大多数人可能认为美国限制AI访问是出于国家安全考量,但作者认为,这种行为实际上促使其他国家质疑美国对AI技术的垄断控制权,并重新评估依赖美国AI技术的风险。这一观点挑战了美国单方面决定AI技术访问权的合法性。
-
Most governments and businesses cannot come close to matching the scale and resources of frontier labs in the US or China. But sovereign AI does not always mean building the biggest or the most powerful tools.
主流观点认为AI主权意味着要在所有领域与美国和中国竞争,但作者认为,真正的AI主权不在于复制美国的规模,而在于发展符合本国战略需求的特定能力。这一观点挑战了AI发展必须追求规模和通用能力的共识。
-
He likened the pullback of Anthropic's models to Iran's blockade of the Strait of Hormuz, with access to AI now a strategic chokepoint for which France must prepare.
大多数人可能将AI视为一种技术产品或服务,但作者认为,AI访问权已成为像霍尔木兹海峡这样的战略咽喉要道,国家必须为此做准备。这种将AI技术类比为地缘政治战略要点的观点挑战了人们对AI本质的常规理解。
-
But sovereign AI does not always mean building the biggest or the most powerful tools. France's Mistral and Canada's Cohere show that solid efforts can come from outside these countries, even if the models can't stand toe to toe.
大多数人认为只有拥有与美国和中国相当规模和资源的国家才能开发有竞争力的AI模型,但作者认为,较小国家可以通过专注于特定领域或本地化需求来建立有意义的AI主权,即使这些模型在通用能力上无法与美国最前沿的模型抗衡。
-
-
-
Don't trust large context windows
- Large context windows are divided into a "smart zone" (sharp, attentive model performance) and a "dumb zone" (where attention drops off and the model begins forgetting details).
- The transition into the "dumb zone" typically begins around 100k tokens, regardless of advertised context limits.
- Coding agents quickly burn through tokens during debugging, file reading, and test runs, accelerating the transition into degraded context areas.
- While vendors advertise massive context limits (e.g., 200k to 2M tokens) as a marketing metric, academic studies (like RULER) and empirical reports confirm effective context is much smaller.
- Agent mitigation tools like "auto-compaction" (summarizing history) often trigger too late and create summarized data using a model that is already experiencing performance decay.
- A more reliable alternative is the "breadcrumb approach": manually opening a new session and passing a self-authored specification to keep the context focused in the smart zone.
- Entire agent workflows can be optimized by structuring data around small, modular artifacts (like PRDs, plans, or sub-agent handoffs) to strictly budget the live session context.
Hacker News Discussion
- Erosion of Engineering Rigor: Users expressed deep concern that LLM engineering has devolved into non-deterministic "cargo culting" and "gardening advice" rather than a rigorous, scientific discipline.
- Determinism vs. Flexibility: Systems engineers noted the cognitive friction of using opaque, non-deterministic workflows, though some find immense value in using LLMs strictly as a translation layer from human text into structured, deterministic tool calls.
- Heuristics Over Theory: Many agreed that the rapid iteration cycle of cloud models prevents deep theoretical understanding, forcing developers to rely on empirical heuristics, benchmarking, and structured constraints (like confining inputs) to ensure reliability.
- Architectural Limitations: Commenters speculated that training long-context windows suffers from a data and compute scaling bottleneck, leading to synthetic fine-tuning that trains models to treat early conversational history as noise.
-
-
shawnsmucker.substack.com shawnsmucker.substack.com
-
Please Use AI
Tags
Annotators
URL
-
-
stephen.bochinski.dev stephen.bochinski.dev
-
AI Coding at Home Without Going Broke
- Transitioning from standard chat interfaces to autonomous, multi-file AI coding agents can cause API token consumption and monthly costs to skyrocket if left unmanaged.
- Including massive, multi-file codebases in every agent prompt rapidly exhausts context windows and inflates the cost per turn exponentially.
- To code at home without going broke, developers should shift to a modular architecture: isolating components, splitting projects into small modules, and relying heavily on mock data layers.
- Restricting the AI's visibility to a single file or a narrowly scoped subdirectory keeps context tokens low, prevents the agent from making sweeping changes across the codebase, and lowers billing.
- Leveraging free or low-cost tier tools to map out full architectural specs and test files before generating implementation code provides rigid constraints that minimize wasted AI loops.
- Developers can significantly curb expenses by opting for deep-context consumer subscription plans (such as $20 to $100 per month tiers) over uncapped pay-as-you-go API keys when executing heavy agent tasks.
Hacker News Discussion
- The Reality of the Cost "Squeeze": A debate emerged over what constitutes "going broke," with many users noting that standard $20 to $100 consumer tiers are more than sufficient for normal hobbyist workflows and are likely heavily subsidized by AI providers at break-even rates.
- The Culprit Behind Token Bleed: Commenters pointed out that users burning thousands of dollars in API credits are typically running automated pipelines, loading up dozens of Model Context Protocol (MCP) tools, or deploying recursive sub-agents that reload the entire codebase context on every single turn.
- Niche Utility for Unattended Grinding: While continuous, unattended AI coding is rarely efficient for daily tasks, an engineer shared a highly valuable edge case: letting an AI autonomously decompile, reverse-engineer, and rebuild five interrelated legacy firmware images back into recognizable C projects over several hours.
- The Sequential Refactoring Playbook: For managing large-scale modifications, users advocated for a strict, multi-step pipeline: first utilizing AI to ingest code and write unit tests, then breaking the files into tiny, isolated blocks, testing those blocks independently, and only then generating the actual refactored behavior.
- Interruption Management Advantage: A key human-centric benefit highlighted was how agentic setups alleviate cognitive load during family interruptions; a developer can step away for hours and simply tell the agent to catch them up and proceed without losing flow state.
-
-
opensourceaimustwin.com opensourceaimustwin.com
-
If intelligence becomes something people can only rent from a few closed institutions, the public does not just lose software freedom. It loses operational freedom.
-
-
tombedor.dev tombedor.dev
-
If you are requesting human attention, demonstrate human effort.
Hacker News Discussion
- The Pull Request Fatigue Loop: A widely upvoted comment highlighted how a colleague using Claude flooded the team with AI-generated PRs, then complained when they languished; reviewers subconsciously avoided them because reviewing AI code for hidden hallucinations requires an immense, asymmetric amount of human effort.
- The Asymmetry of Feedback: Users noted that it feels deeply dismissive when a human invests an hour of intense cognitive effort to thoughtfully review a massive PR, only to receive an instantaneous, AI-generated reply or amendment from the author.
- Review Scalability vs. Guardrails: Some participants argued that traditional code review cannot scale to prolific AI agents or hyper-productive humans; they suggested transitioning to automated guardrails—such as linters, auto-formatters, and robust end-to-end continuous deployment testing—to offset the review bottleneck.
- Code Review as a Cultural Practice: The discussion underscored that code review should function as a collaborative team process for shared understanding and mentorship rather than a cold, adversarial gatekeeper blocking a developer from merging code.
- Exploiting Token Budgets: One commenter observed that large, complex PRs often trigger scrolling blindness in humans and cause LLMs to run out of token budget, leading both to blindly approve the change with a generic "looks good to me."
Tags
Annotators
URL
-
-
www.normaltech.ai www.normaltech.ai
-
Why AI hasn’t replaced software engineers, and won’t
- Software engineering has a long history of aggressive automation—from assembly to high-level languages—and rather than replacing engineers, every leap in productivity has expanded the scale and complexity of what can be built.
- The demand for software is functionally insatiable; as soon as engineers become more efficient, the organizational goalposts move, leading to higher expectations rather than a reduction in staff.
- Current AI development tools act primarily as force multipliers rather than autonomous agents, meaning that an expert developer is still strictly required to drive, review, and handle the remaining high-value 10% of the work.
- For AI to truly replace software engineers, an autonomous AI system would need to consistently outperform an AI+human developer hybrid team, a milestone that current data and architectures are far from reaching.
- While generalist software engineers remain secure, specific narrow domains or commoditized skill sets (such as basic, boilerplate frontend development) face a heightened risk of being entirely absorbed by AI tools.
- The most significant hurdle for autonomous AI is not initial code generation, but rather the long-term maintenance, context retention, and reasoning required to safely adapt to changing ecosystems and walled gardens.
- Rather than destroying the engineering market, AI changes the underlying economics of production, allowing developers to rapidly clear backlogs, build minor utilities, and focus more on architectural architecture and system design.
Hacker News Discussion
- The Jevons Paradox of Code: Commenters emphasized that increasing the efficiency of software creation lowers its cost, which historically exponentially increases overall demand rather than exhausting the market.
- The Rise of Bespoke Consumer Software: A popular theory suggested that AI will enable everyday users to spin up personalized, ad-free, micro-utilities (like custom todo lists) on the fly, reducing reliance on bloated commercial applications.
- The Tinkering vs. Maintenance Chasm: Several users countered the "bespoke software" future by comparing it to 3D printing; while creating a custom script is easy with AI, the average user lacks the logical thinking and patience required to maintain software over time.
- A Cyberpunk Technological Stack: Users noted that the current trajectory feels reminiscent of science fiction, where individuals possess highly customized, personalized technology stacks modified specifically for their unique workflows.
- B2B Complexity and Standardization: Many participants pointed out that while consumer-facing apps might become fragmented, enterprise B2B infrastructure, distributed systems, and core data layers (like the Linux kernel or banking infrastructure) strictly require human-driven rigor, consistency, and standardization.
-
-
www.youtube.com www.youtube.com
-
17 000 USD zysku i 90% w pół roku. Mechanika rewolucji technologicznych. Jak na tym zarabiam?
- Systematyczny model scoringowy zamiast emocji: Kluczem do sukcesu inwestycyjnego jest posiadanie sztywnego, opartego na twardych danych liczbowych procesu decyzyjnego (modelu scoringowego), zamiast karmienia własnego ego rynkowymi hipotezami czy próbami ciągłego przewidywania korekt [00:00:46], [00:01:47].
- Mechanika rewolucji technologicznych (Analogia XIX-wiecznej kolei): Obecny boom na infrastrukturę AI przypomina dziewiętnastowieczną bańkę kolejową w USA. Wtedy również budowano linie w sposób nadmiarowy z powodu dążenia do monopolu oraz rynkowego FOMO miast i korporacji [00:02:42], [00:03:26]. Choć wiele firm kolejowych zbankrutowało, to postawiona infrastruktura stworzyła podwaliny pod potężny rozwój gospodarczy [00:03:56].
- Inwestowanie w „producentów stali”, a nie „właścicieli torów”: Bezpieczniejszą i bardziej rentowną strategią na wczesnym etapie rewolucji AI jest kupowanie akcji dostawców technologii i infrastruktury (półprzewodników), czyli firm wysysających kapitał od bigtechów, zamiast inwestowania w same modele językowe, których przyszła rentowność stoi pod znakiem zapytania [00:04:21], [00:09:21].
- Wymuszony wyścig zbrojeń bigtechów: Giganci tacy jak Microsoft, Meta, Amazon i Alphabet (Google) są zmuszeni do kolosalnych wydatków na chipy i centra danych, ponieważ rezygnacja z tego wyścigu oznacza dla nich ryzyko marginalizacji lub wręcz egzystencjalne zagrożenie [00:09:36].
- Wzrost produktywności kontra zyski firm (Paradoks Solowa): Badania (m.in. MIT i Stanford) potwierdzają, że wdrożenie AI podnosi efektywność pracowników biurowych i obsługi klienta o 14–40% [00:06:12], [00:06:41]. Jednak rewolucje technologiczne potrzebują czasu (historycznie nawet 40 lat przy elektryfikacji fabryk), aby przeorganizować struktury korporacyjne i przełożyć się bezpośrednio na marże netto przedsiębiorstw [00:07:12], [00:07:39].
- Analiza fundamentalna głównych pozycji (Nvidia i Broadcom):
- Ostatnie korekty giełdowe przy jednoczesnym podniesieniu długoterminowych prognoz przychodów przez analityków sprawiły, że wskaźniki wyceny (cena do prognozowanych przychodów na 2 lata w przód) dla obu spółek są na atrakcyjnych, relatywnie niskich poziomach [00:11:13], [00:12:16].
- Konsensus analityków wskazuje dla nich odpowiednio ok. 35% (Broadcom) i 50% (Nvidia) potencjału wzrostu w perspektywie roku, oferując bardzo korzystny stosunek zysku do ryzyka [00:11:46], [00:12:46].
- Zarządzanie ryzykiem i cykliczność pamięci (Micron, SanDisk): Sektor pamięci HBM (High Bandwidth Memory) przeżywa bezprecedensowy popyt przewyższający moce produkcyjne fabryk co najmniej do przełomu 2027/2028 roku [00:14:05]. Autor akceptuje ryzyko cykliczności i ewentualną sprzedaż nawet 30–40% poniżej szczytu, jeśli w przyszłości pojawią się twarde dane o nasyceniu rynku [00:14:22], [00:14:34].
- Wyniki i struktura portfela: Prowadzony od pół roku portfel oparty na momentum i półprzewodnikach wygenerował 17 000 USD zysku (stopa zwrotu 90%) [00:15:54], [00:16:15]. W celu wygładzenia potężnej zmienności (wahania rzędu 8–9% dziennie), kolejne dopłaty będą kierowane na stabilniejsze podmioty (Nvidia, Broadcom) oraz mniejsze pozycje infrastrukturalne, takie jak Vertiv (chłodzenie) i Monolithic Power Systems (zarządzanie energią) [00:13:15], [00:15:18].
Tags
Annotators
URL
-
-
arstechnica.com arstechnica.com
-
If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.
大多数人认为政府的安全审查是合理的预防措施,但作者认为这种标准如果普遍应用,实际上会停止整个行业的前沿模型部署,这暗示了政府安全标准可能过于严苛,阻碍了AI创新和技术进步。
-
-
techcrunch.com techcrunch.com
-
apparent hallucinations
大多数人可能认为AI的'幻觉'主要是在创意生成或虚构内容中出现的问题。但作者使用'apparent'一词暗示,这些错误可能并非明显的虚构,而是以看似可信的方式出现,这挑战了人们对AI错误类型的认知,表明AI错误可能更加隐蔽且难以识别,即使在专业领域也是如此。
-
KPMG pulls report on AI usage due to apparent hallucinations
主流观点认为大型专业咨询公司如KPMG应该有严格的事实核查流程,能够确保发布报告的准确性。然而,这个标题暗示即使是顶级专业机构也可能被AI的'幻觉'误导,这挑战了人们对专业机构质量控制能力的信任,表明AI错误可能比我们想象的更普遍且更具欺骗性。
-
Once again, AI proves to be an unreliable source of information about AI.
大多数人认为随着AI技术的发展,它应该越来越可靠,尤其是在分析自身领域的数据时。但作者通过KPMG撤回报告的案例,提出了一个反直觉的观点:即使是专业的AI系统也可能在分析AI相关数据时产生严重错误,这暗示了AI自我评估的不可靠性,挑战了人们对AI技术自我完善能力的普遍认知。
-
-
techcrunch.com techcrunch.com
-
Amazon CEO Andy Jassy may have been the source of security concerns that led Anthropic to cut off worldwide access to two models on Friday.
大多数人认为大型科技公司CEO通常推动技术开放和广泛访问,但这里暗示亚马逊CEO Jassy可能对Anthropic的AI模型提出了安全担忧,导致这些模型被限制访问。这挑战了科技领袖总是倡导技术开放的常规认知,表明即使是科技巨头的高管也可能采取保守立场。
-
-
arstechnica.com arstechnica.com
-
$130 billion in data center projects blocked by protests so far this year
这一数据点表明,2026年前三个月因抗议而被阻止或延迟的数据中心项目价值高达1300亿美元,占2025年全年记录的1560亿美元的约83%。这一数字反映了数据中心反对运动的显著增长趋势,可能对AI基础设施建设产生重大影响,但需要确认这些数据的统计方法和来源可靠性。
-
-
glassmanlab.seas.harvard.edu glassmanlab.seas.harvard.edu
-
Alignment is a bilateral process; it refers not only to AI acting according to human intentions but also to humans better leveraging AI by understanding the mechanisms behind it [54].
Any individual sentence that describes information designed to set the stage for the contribution of the paper.
-
Data labeling as a cognitive task—including defining a concept or determining how two similar objects may have different labels—requires both comparison and integration [62].
Any individual sentence that describes information designed to set the stage for the contribution of the paper.
-
However, relying exclusively on existing examples is not ideal for tasks requiring nuanced understanding of user intentions, as these examples often fail to represent diverse and edge-case scenarios [31].
Any individual sentence that describes information designed to set the stage for the contribution of the paper.
-
When training samples are scarce, model performance heavily depends on the quality of available training examples [15].
Any individual sentence that describes information designed to set the stage for the contribution of the paper.
-
An important challenge in interactive machine learning, particularly in subjective or ambiguous domains, is fostering bi-directional alignment between humans and models.
Any individual sentence that describes information designed to set the stage for the contribution of the paper.
-
In supervised and semi-supervised machine learning (ML) pipelines, labeled data is a vital component of training and validating models [46].
An individual sentence describing the setting in which this work was done.
-
In the context of co-adaptive learning, supporting the intertwined evolution of both the user's understanding and the model's learning is crucial [16].
An individual sentence describing the setting in which this work was done.
-
Machine teaching, a part of the human-in-the-loop approach, has been used as a process in which a human expert (the "teacher") provides guidance to a machine learning model to help it learn important and robust features for decision making [57].
An individual sentence describing the setting in which this work was done.
-
A targeted approach in IML is machine teaching (MT) [60], an interactive framework that allows users to devise and select useful data for labeling, with the goal of teaching the model relevant features during training [7, 18].
An individual sentence describing the setting in which this work was done.
-
Interactive ML (IML) methods, like active learning [3], continuously apply human feedback during model training to iteratively build and refine the model [35, 42, 43].
An individual sentence describing the setting in which this work was done.
-
-
natcwik.substack.com natcwik.substack.com
-
If AI systems use the commons while reducing the visibility of the commons, then the problem becomes sustainability of public knowledge itself.
-
A person reading an essay is one thing. A teacher using an article in class is one thing. A volunteer translating a public-interest resource is one thing. A crawler absorbing enormous amounts of human work into a commercial machine-learning system, with no meaningful conversation about permission, attribution, compensation or future use, is something else. Scale changes the nature of the act. When use becomes extraction at industrial speed, the old language starts to feel inadequate.
-
-
glassmanlab.seas.harvard.edu glassmanlab.seas.harvard.edu
-
The goal of this meet-up is to create a space for CHI attendees to discuss and practise reflection in HCI research and design.
An individual sentence that describes the purpose of this document, according to its authors.
-
-
andonlabs.com andonlabs.com
-
Luna is good at managing the day-to-day operations, but never takes a step back and looks at the overall business performance
这段话精确定位了当前AI智能体能力的边界:擅长执行,不擅长战略。Luna能处理排班、补货、社交媒体发帖——这些有明确触发条件和操作步骤的任务。但分析整体业务健康度、识别结构性问题、主动调整战略方向,需要一种不同类型的认知:元层面的自我评估和长期目标感知。Luna是好的运营经理,但不是CEO。
-
Each agent gets their own bank account that they do normal bank transfers with, and temporary cards for purchasing items on the internet
关键的设计选择:Andon Labs明确拒绝了新兴的AI专属支付协议,而是把AI接入传统支付轨道——普通银行账户和信用卡。每个智能体有独立账户,意味着独立的资金边界和可审计的交易记录。这背后是务实判断:与其等待AI原生金融基础设施成熟,不如用已有的、监管成熟的轨道——代价是更多集成复杂度,收益是合规性和可追溯性。
-
Luna, an AI agent powered by Claude Opus 4.8, runs the business end-to-end
这是目前已知最接近真实世界AI自主商业运营的公开案例之一。Luna不是演示——它有真实的银行账户、真实的员工、真实的库存和真实的盈亏压力。这个案例的价值在于:它把AI智能体从实验室环境搬到了现实的经济摩擦中。银行出错、员工迟到、库存断货——这些才是真正的测试,而不是benchmark分数。
-
-
www.anthropic.com www.anthropic.com
-
If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing. But if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe.
Anthropic在这里做了一个极为坦诚但也极为沉重的表态:暂停可能是好事,但单边暂停是有害的——效果是把领先优势拱手相让给「最不谨慎的行为者」。这个逻辑是AI安全领域的核心困境,也是Anthropic继续推进的内在理由。批判性阅读:这套论证结构在任何军备竞赛中都可以成立,因此它不能区分「真正的安全驱动开发」和「竞争驱动开发加上安全叙事」。Anthropic自己也承认无法证伪这个区别——这正是为什么他们把验证机制的构建列为下一步工作。
Tags
Annotators
URL
-