3 Matching Annotations
  1. Last 7 days
    1. n encoder-decodertransformers, the TTS alignment is learned in certain cross-attention heads of the decoder; while in decoder-only models,the alignment is learned in the self-attention layers.

      Good point of difference between En-De vs De only models.

  2. Jun 2024