Real-time monitoring of agent actions with a 12-category anomaly detection system derived from frontier model safety evaluations. Three-level alert system: PROHIBITED (immediate block), HIGH_RISK_DUAL_USE (human review), DUAL_USE (log and track).
这种三级警报系统展示了AI安全监控的精细化程度,将代理行为分为不同风险级别,从完全禁止到仅记录跟踪。这种分类方法反映了AI安全中'双重用途'挑战的复杂性,即同一技术既可用于防御也可用于攻击。