1 Matching Annotations
  1. Last 7 days
    1. the lack of KV sharing across requests leads to redundant prefill computation and wasted memory.

      KV sharing across concurrent requests is a non-obvious efficiency lever: if two users send similar prompts, their prefill KV states are computed independently. CXL's shared memory pool makes cross-request KV reuse architecturally possible for the first time without expensive GPU-to-GPU transfers.