5 Matching Annotations
  1. Last 7 days
    1. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block.

      大多数人认为训练深度神经网络需要与网络深度成比例的内存,但作者认为这一限制可以被打破,因为通过分块训练方法,内存需求不再随网络深度线性增长,这一发现可能改变大型模型的训练方式。

  2. Apr 2026
    1. In a 1-million-token context, V4-Pro uses only 27% of the computing power required by its previous model, V3.2, while cutting memory use to 10%.

      大多数人认为AI模型处理更长上下文必然需要更多计算资源,但作者认为DeepSeek V4通过创新架构实现了惊人的效率提升,大幅降低了计算和内存需求。这一反直觉的发现挑战了'长上下文等于高成本'的行业认知。

  3. Oct 2022
    1. On the whole, his efficiency probablyreduced the time required for taking and filing notes to the amountother historians spent in note-taking alone. What he wrote in hisnotes was brief, and yet specific enough so that he saved himself thejob of searching at length for what he had read. His mind was freeto reflect and appraise.

      Earl Pomeroy suggests that Paxson's note taking method freed his mind to better reflect and appraise his work. This allows a greater efficiency of work, particularly when it comes to easier search and recall as well as the overall process which becomes easier through practice.

  4. Feb 2022
    1. We need a reliable and simple external structure tothink in that compensates for the limitations of our brains

      Let's be honest that there are certainly methods for doing all of this within our brains and not needing to rely on external structures. This being said, using writing, literacy, and external structures does allow us to process things faster than before.


      Can we calculate what the level of greater efficiency allows for doing this? What is the overall throughput difference in being able to forget and write? Not rely on communication with others? What does a back of the envelope calculation for this look like?