1 Matching Annotations
  1. Mar 2020
    1. Since Neural Networks are non-convex

      neither convex neither non-convex

      "The fact that J has multiple minima can also be interpreted in a nice way. In each layer, you use multiple nodes which are assigned different parameters to make the cost function small. Except for the values of the parameters, these nodes are the same. So you could exchange the parameters of the first node in one layer with those of the second node in the same layer, and accounting for this change in the subsequent layers. You'd end up with a different set of parameters, but the value of the cost function can't be distinguished by (basically you just moved a node, to another place, but kept all the inputs/outputs the same)."