3 Matching Annotations
  1. Dec 2022
  2. Oct 2016
    1. Back prop is just gradient descent on individual errors. You compare the predictions of the neural network with the desired output and then compute the gradient of the errors with respect to the weights of the neural network. This gives you a direction in the parameter weight space in which the error would become smaller.Interestingly, due to the layered structure of the neural network and the chain rule for derivatives, the formulas you get can be interpreted as propagating the error back through the network. But that's mostly a computational aside, what you really (just) do is gradient descent, that is, changing the weights of the neural network a little bit to make the error on your training examples smaller.