2 Matching Annotations
  1. Jun 2026
    1. We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them.

      「能力涌现」而非「刻意训练」是这篇报告最深刻的政策含义:漏洞发现和利用能力是通用推理能力的副产品,无法被单独抑制。这意味着任何试图「只训练防御能力而屏蔽进攻能力」的方法在根本上是不可行的——使模型更擅长修复漏洞的同样能力,也使它更擅长利用漏洞。这对AI安全治理的含义是:能力限制必须在模型部署层而非训练层实施。

  2. May 2026
    1. I also believe that the Scientist AI could even be more capable than the current approach, and that has to do with a number of design features. It is trained to explicitly reason in a structured way about the statements that it's asked to make a prediction over.

      Bengio大胆预测Scientist AI可能比现有方法更强大,因为它被训练以结构化方式推理,这一反直觉观点挑战了安全与能力必须取舍的假设,为安全AI提供了新视角。