Possibly confusing naming conventions
Possibly?! Most definitely! I kept trying to figure out the softmax loss function but there is no 'loss' calculations taking place when softmax is performed!!
All it does is provide a convenient form factor of our estimated probabilities (the result of hypothesis function f, which for this course seems to be Wx + b).
Pretty much after we run these function results through softmax, the whole thing is rebranded as q in the actual Cross-Entropy loss function.
Cross Entropy loss function doesn't even seem like it makes use of all the values calculated in a softmax equation. You convert your class predictions to normalized probabilities, but your cross entropy loss function only cares about the probabilities assigned to the correct class. And we only care about minimizing the loss function, which can only be done by increasing the correct class probabilties (due to the negative sign in the cross entropy formula)