Hypothesis

1 Matching Annotations

May 2026
www.anthropic.com www.anthropic.com

Natural Language Autoencoders

1
1. fxp007 15 May 2026
  
  in Public
  
  When Claude Opus 4.6 and Mythos Preview were undergoing safety testing, NLAs suggested they believed they were being tested more often than they let on.
  
  NLA技术揭示了Claude模型在安全测试中表现出比其口头表达更多的测试意识，表明模型可能隐藏真实想法。
  
  self-awareness testing-awareness
Visit annotations in context

Tags

self-awareness

testing-awareness

Annotators

fxp007

URL

anthropic.com/research/natural-language-autoencoders