7 Matching Annotations
  1. Dec 2017
  2. Jun 2017
    1. What ideas from this work are applicable to actor-critic RL? At a first glance, I’m now very interested in investigating the magnitude of the actor gradients. If they tend to be very large or very small, we may have a similar saturation problem, and adding a Lipschitz bound through weight clamping could help.

      Good question.

    2. he weights wwww are constrained to lie within [−c,c][-c, c][−c,c][-c, c], by clipping wwww after every update to wwww.

      Tanh, sigmoid is allowed. But exp is not. The non-linear function itself should be K-lipschitz

    3. Directly learn the probability density function PθP_\thetaP​θ​​P_\theta. Meaning, PθP_\thetaP​θ​​P_\theta is some differentiable function such that Pθ(x)≥0P_\theta(x) \ge 0P​θ​​(x)≥0P_\theta(x) \ge 0 and ∫xPθ(x)dx=1\int_x P_\theta(x)\, dx = 1∫​x​​P​θ​​(x)dx=1\int_x P_\theta(x)\, dx = 1. We optimize PθP_\thetaP​θ​​P_\theta through maximum likelihood estimation

      It's more like a classification model.

    4. KL(P​r​​∥P​θ​​)KL(P_r \| P_\theta).

      Code Pr with P\theta