7 Matching Annotations
- Dec 2017
-
medium.com medium.com
-
ach episode/game is relatively short, of approximately 200 actions
It's a show action.
-
- Jun 2017
-
www.alexirpan.com www.alexirpan.com
-
What ideas from this work are applicable to actor-critic RL? At a first glance, I’m now very interested in investigating the magnitude of the actor gradients. If they tend to be very large or very small, we may have a similar saturation problem, and adding a Lipschitz bound through weight clamping could help.
Good question.
-
he weights wwww are constrained to lie within [−c,c][-c, c][−c,c][-c, c], by clipping wwww after every update to wwww.
Tanh, sigmoid is allowed. But exp is not. The non-linear function itself should be K-lipschitz
-
Directly learn the probability density function PθP_\thetaPθP_\theta. Meaning, PθP_\thetaPθP_\theta is some differentiable function such that Pθ(x)≥0P_\theta(x) \ge 0Pθ(x)≥0P_\theta(x) \ge 0 and ∫xPθ(x)dx=1\int_x P_\theta(x)\, dx = 1∫xPθ(x)dx=1\int_x P_\theta(x)\, dx = 1. We optimize PθP_\thetaPθP_\theta through maximum likelihood estimation
It's more like a classification model.
-
KL(Pr∥Pθ)KL(P_r \| P_\theta).
Code Pr with P\theta
-
-
offconvex.github.io offconvex.github.io
-
Trust region algorithms
Also see TRPO
-
One explanation of Non-convex optimization
Tags
Annotators
URL
-