Interestingly, CPCs capture both speaker identity and speech contents, as demonstrated by the good accuracies attained with a simple linear classifier, which also gets close to the oracle, fully supervised networks.
please point only to the details of the most successful version of this system, especially in tables when there are many options, and also highlight sections that provide supporting context for these conditions, if appropriate