When building containment and defense systems, we apply defenses to three main components: the environment in which the agent runs, the model the agent consults, and the external content the agent can reach.
行动建议:构建多层防御体系,同时保护运行环境、模型本身和外部内容三个层面。环境层设置硬边界,模型层使用提示和分类器引导行为,外部内容层限制工具权限。这种重叠防御策略能够有效应对不同类型的攻击向量。