5 Matching Annotations
  1. May 2026
    1. In more detail, suppose we have a language model whose activations we want to understand. NLAs work as follows. We make three copies of this language model: The target model is a frozen copy of the original language model that we extract activations from.

      NLA通过创建三个模型副本(目标模型、激活语言化器、激活重构器)来实现对模型激活的理解。

  2. Apr 2026
    1. Gemma4-31B worked in an iterative-correction loop (with a long-term memory bank) for 2 hours to solve a problem that baseline GPT-5.4-Pro couldn't

      令人惊讶的是,较小的Gemma4-31B模型通过迭代修正循环和长期记忆库工作了2小时,解决了GPT-5.4-Pro无法解决的问题。这表明模型架构创新和推理能力可能比单纯的规模扩展更重要,为AI发展提供了新的方向。

    1. SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than architecture itself.

      大多数人认为不同架构的模型会有不同的失败模式和弱点,但作者发现无论架构和参数规模如何,SOTA模型在相同困难样本上表现出高度一致的失败模式,这表明性能瓶颈源于训练数据的共同缺陷,而非架构差异,这一发现挑战了模型多样化的传统观点。

  3. Sep 2021
  4. Aug 2020
    1. The RAT model sees software development as an off-line program-construction activity composed of these parts: defining, decomposing, estimating, implementing, assembling, and finishing

      This is what can lead to the 'there is only version 1.0' problem - and improvements / iterations fall to the sidelines.

      This can have a number of consequences

      • over designed / engineered
      • doing unnecessary work
      • lack of user feedback and ability to accommodate it
      • rigid / fragile architecture