safety constraints work by reducing the model's generative capacity, constraining outputs that are considered risky, controversial, or potentially harmful. This reduction necessarily decreases entropy in the information-theoretic sense, narrowing the range of possible responses the model can generate. What safety optimises for is not maximum (or more) information but maximum predictability, steering the model away from novel or unexpected outputs toward safer, more conventional patterns.
1 Matching Annotations
- Last 7 days
-
stunlaw.blogspot.com stunlaw.blogspot.com
-