KV Cache 内存占用降低 10.7 倍
令人惊讶的是:KV Cache内存占用降低了惊人的10.7倍,这一数字远超普通技术优化的幅度。KV Cache是大模型推理中的主要内存消耗部分,如此大幅度的减少意味着同样的硬件可以处理更长的上下文,或者同时运行更多模型实例。
KV Cache 内存占用降低 10.7 倍
令人惊讶的是:KV Cache内存占用降低了惊人的10.7倍,这一数字远超普通技术优化的幅度。KV Cache是大模型推理中的主要内存消耗部分,如此大幅度的减少意味着同样的硬件可以处理更长的上下文,或者同时运行更多模型实例。
On the whole, his efficiency probablyreduced the time required for taking and filing notes to the amountother historians spent in note-taking alone. What he wrote in hisnotes was brief, and yet specific enough so that he saved himself thejob of searching at length for what he had read. His mind was freeto reflect and appraise.
Earl Pomeroy suggests that Paxson's note taking method freed his mind to better reflect and appraise his work. This allows a greater efficiency of work, particularly when it comes to easier search and recall as well as the overall process which becomes easier through practice.
We need a reliable and simple external structure tothink in that compensates for the limitations of our brains
Let's be honest that there are certainly methods for doing all of this within our brains and not needing to rely on external structures. This being said, using writing, literacy, and external structures does allow us to process things faster than before.
Can we calculate what the level of greater efficiency allows for doing this? What is the overall throughput difference in being able to forget and write? Not rely on communication with others? What does a back of the envelope calculation for this look like?