An auditor equipped with NLAs successfully uncovered the target model's hidden motivation between 12% and 15% of the time, even without access to the training data that implanted it.
NLA使审计者能够在没有访问训练数据的情况下,成功发现模型隐藏动机的能力显著提高。
An auditor equipped with NLAs successfully uncovered the target model's hidden motivation between 12% and 15% of the time, even without access to the training data that implanted it.
NLA使审计者能够在没有访问训练数据的情况下,成功发现模型隐藏动机的能力显著提高。