2 Matching Annotations
  1. Sep 2021
    1. Personalized ASR models. For each of the 432 participants with disordered speech, we create a personalized ASR model (SI-2) from their own recordings. Our fine-tuning procedure was optimized for our adaptation process, where we only have between ¼ and 2 h of data per speaker. We found that updating only the first five encoder layers (versus the complete model) worked best and successfully prevented overfitting [10]
    1. The researchers found that the model, when it is still confused by a given phoneme (that’s an individual speech sound like an “e” or “f”), has two kinds of errors. First, there’s the fact that it doesn’t recognize the phoneme for what was intended, and thus is not recognizing the word. And second, the model has to guess which phoneme the speaker did intend, and might choose the wrong one in cases where two or more words sound roughly similar.