1 Matching Annotations
  1. Jan 2024
    1. Hubinger, et. al. "SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING". Arxiv: 2401.05566v3. Jan 17, 2024.

      Very disturbing and interesting results from team of researchers from Anthropic and elsewhere.