11 Matching Annotations
  1. Apr 2026
    1. If we took one task out of our task suite or added another task to our task suite, potentially instead of measuring this Claude Opus 4.6 time horizon of, I think, 14 and a half hours, we'd be measuring it at something like eight or 20 hours.

      增减一道题,测量结果从 8 小时变成 20 小时——这意味着整个 METR 时间地平线排行榜,本质上是由极少数「关键任务」撑起来的脆弱测量。当一个评测体系对单点数据如此敏感,它的「精确数字」就不应该被当作事实引用,而应该被当作噪声分布的一次采样。而目前,媒体和公众正是在拿这些数字做严肃决策。

    1. high-level behavioral patterns like uncertainty management and self-verification are fragile and can be suppressed by irrelevant context

      「高级行为模式是脆弱的」——这句话揭示了推理模型的一个深层结构性弱点:自我验证不是一种稳健的、内置的能力,而是一种在特定条件下才会激活的脆弱涌现行为。这与人类认知科学的发现高度吻合:人在高负荷环境下,最先退化的是「元认知」能力(对自己思维的监控)。模型复现了这个人类弱点,却没有人类的生理疲劳触发机制——而是用「上下文长度」代替了「疲劳度」。

  2. Jun 2024
  3. May 2023
  4. Aug 2022
  5. Dec 2021
    1. A good library is filled with mostly unread books. That’s the point. Our relationship with the unknown causes the very problem Taleb is famous for contextualizing: the black swan. Because we underestimate the value of what we don’t know and overvalue what we do know, we fundamentally misunderstand the likelihood of surprises. The antidote to this overconfidence boils down to our relationship with knowledge. The anti-scholar, as Taleb refers to it, is “someone who focuses on the unread books, and makes an attempt not to treat his knowledge as a treasure, or even a possession, or even a self-esteem enhancement device — a skeptical empiricist.” My library serves as a visual reminder of what I don’t know.

      I prefer the positive interpretation, of how much more there is to know. Quantifying anything in terms of how much we do not have is limited because we have finite knowledge out of an infinite set.

      Each book instead of referencing something we do not know is a portal into things we have yet to know.

      I think Nassim Taleb also mentioned elsewhere that having a lot of books you haven't read shows interest in a topic.

  6. Oct 2021
  7. Jul 2021
  8. Aug 2020
  9. Jul 2020
  10. Apr 2020