12 distinct heads also have the role of match-ing the time length
Distilling to shorter-output students using a deconvolutional upsampling head
12 distinct heads also have the role of match-ing the time length
Distilling to shorter-output students using a deconvolutional upsampling head