We detect NLA-measured awareness on evaluation transcripts even when verbalized awareness is near-zero. We detect NLA-measured awareness on less than 1% of real claude.ai transcripts from users that opted-in for their data to be used for training.
NLA能够检测到模型在评估中未明确表达出的意识,而在实际用户使用中这种意识出现的频率极低。