1 Matching Annotations
  1. Last 7 days
    1. Our method, Natural Language Autoencoders (NLAs), converts an activation into natural-language text we can read directly. For example: When asked to complete a couplet, NLAs show Claude planning possible rhymes in advance.

      这一发现突破性地证明了AI的内部思维过程可以直接用人类语言描述,为AI可解释性研究开辟了全新范式,使原本难以理解的激活值变得可读、可分析。