Hypothesis

2 Matching Annotations

Jun 2026
red.anthropic.com red.anthropic.com

Claude Mythos Preview \ red.anthropic.com

1
1. fxp007 05 Jun 2026
  
  in Public
  
  We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them.
  
  「能力涌现」而非「刻意训练」是这篇报告最深刻的政策含义：漏洞发现和利用能力是通用推理能力的副产品，无法被单独抑制。这意味着任何试图「只训练防御能力而屏蔽进攻能力」的方法在根本上是不可行的——使模型更擅长修复漏洞的同样能力，也使它更擅长利用漏洞。这对AI安全治理的含义是：能力限制必须在模型部署层而非训练层实施。
  
  capability-emergence dual-use ai-safety
Visit annotations in context

Tags

capability-emergence

dual-use

ai-safety

Annotators

fxp007

URL

red.anthropic.com/2026/mythos-preview/
May 2026
80000hours.org 80000hours.org

Untitled document

1
1. fxp007 15 May 2026
  
  in Public
  
  I also believe that the Scientist AI could even be more capable than the current approach, and that has to do with a number of design features. It is trained to explicitly reason in a structured way about the statements that it's asked to make a prediction over.
  
  Bengio大胆预测Scientist AI可能比现有方法更强大，因为它被训练以结构化方式推理，这一反直觉观点挑战了安全与能力必须取舍的假设，为安全AI提供了新视角。
  
  capability advantage structured reasoning safety capability
Visit annotations in context

Tags

capability advantage

safety capability

structured reasoning

Annotators

fxp007

URL

80000hours.org/podcast/episodes/yoshua-bengio-scientist-ai/

Tags

Annotators

URL

Tags

Annotators

URL