1 Matching Annotations
  1. Oct 2022
    1. instead of adapting learning rates based on the average first moment as in RMSP,

      RMSProp uses the second moment, not the first. Also, the moment is the average. That is, EMA of the gradient squares is an approximation of the second moment.