8 Matching Annotations
  1. Apr 2026
    1. TriAttention matches Full Attention reasoning accuracy while achieving 2.5x higher throughput or 10.7x KV memory reduction

      大多数人认为在大幅压缩KV缓存时必然会牺牲模型推理的准确性,但作者声称TriAttention在实现10.7倍内存减少的同时,仍能保持与完整注意力相同的推理准确性。这一结果挑战了业界在KV压缩与准确性之间的权衡认知。

  2. Mar 2026
  3. Oct 2021
  4. Nov 2020
    1. Throughput in Planned vs Unplanned Work: The graph to the left is even more interesting as it contains the initial hints at what’s actually happening. That graph measures throughput with an emphasize on unplanned work. Now, what’s unplanned work? Typically, everything related to features or improvements is planned, whereas bugs, re-work, and service interruptions are unplanned. Let’s see why unplanned work is relevant.

      [[throughput]] - [[planned work]] [[unplanned work]] - what things fall under planned and unplanned, and how are they impacting things?

  5. Jul 2020
  6. Jul 2018
  7. Aug 2017
  8. Jan 2017
    1. Currently, each of the four pairs has a capacity of 10 terabits per second (Tbps), amounting to a total of 40Tbps on the TGN-A cable. At the time, a figure of 8Tbps was the current lit capacity on this Tata network cable.

      Yet still, us end-users are getting data capped! How would you argue against this affirmation now, Virgin & Verizon, huh?!