Hypothesis

7 Matching Annotations

Dec 2017
medium.com medium.com

AlphaGo, in context – Andrej Karpathy – Medium

1
1. airsplay 21 Dec 2017
  
  in Public
  
  ach episode/game is relatively short, of approximately 200 actions
  
  It's a show action.
Visit annotations in context

Annotators

airsplay

URL

medium.com/@karpathy/alphago-in-context-c47718cb95a5
Jun 2017
www.alexirpan.com www.alexirpan.com

Read-through: Wasserstein GAN

4
1. airsplay 28 Jun 2017
  
  in Public
  
  What ideas from this work are applicable to actor-critic RL? At a first glance, I’m now very interested in investigating the magnitude of the actor gradients. If they tend to be very large or very small, we may have a similar saturation problem, and adding a Lipschitz bound through weight clamping could help.
  
  Good question.
2. airsplay 28 Jun 2017
  
  in Public
  
  he weights wwww are constrained to lie within [−c,c][-c, c][−c,c][-c, c], by clipping wwww after every update to wwww.
  
  Tanh, sigmoid is allowed. But exp is not. The non-linear function itself should be K-lipschitz
3. airsplay 28 Jun 2017
  
  in Public
  
  Directly learn the probability density function PθP_\thetaPθP_\theta. Meaning, PθP_\thetaPθP_\theta is some differentiable function such that Pθ(x)≥0P_\theta(x) \ge 0Pθ(x)≥0P_\theta(x) \ge 0 and ∫xPθ(x)dx=1\int_x P_\theta(x)\, dx = 1∫xPθ(x)dx=1\int_x P_\theta(x)\, dx = 1. We optimize PθP_\thetaPθP_\theta through maximum likelihood estimation
  
  It's more like a classification model.
4. airsplay 28 Jun 2017
  
  in Public
  
  KL(Pr∥Pθ)KL(P_r \| P_\theta).
  
  Code Pr with P\theta
Visit annotations in context

Annotators

airsplay

URL

alexirpan.com/2017/02/22/wasserstein-gan.html
offconvex.github.io offconvex.github.io

Escaping from Saddle Points

2
1. airsplay 14 Jun 2017
  
  in Public
  
  Trust region algorithms
  
  Also see TRPO
2. airsplay 14 Jun 2017
  
  in Public
  
  One explanation of Non-convex optimization
  
  ML
Visit annotations in context

Tags

ML

Annotators

airsplay

URL

offconvex.github.io/2016/03/22/saddlepoints/

Annotators

URL

Annotators

URL

Tags

Annotators

URL