1 Matching Annotations
  1. Sep 2024
    1. The Attention module splits its Query, Key, and Value parameters N-ways and passes each split independently through a separate Head.

      Q,K,V 被复制为N路,每一路都是一个注意力头(Attention Head)的输入,N 个注意力头的输出被合并为最终的 Attention Score。