The Attention module splits its Query, Key, and Value parameters N-ways and passes each split independently through a separate Head.
Q,K,V 被复制为N路,每一路都是一个注意力头(Attention Head)的输入,N 个注意力头的输出被合并为最终的 Attention Score。
The Attention module splits its Query, Key, and Value parameters N-ways and passes each split independently through a separate Head.
Q,K,V 被复制为N路,每一路都是一个注意力头(Attention Head)的输入,N 个注意力头的输出被合并为最终的 Attention Score。