Hypothesis

52 Matching Annotations

Jun 2022
people.eecs.berkeley.edu people.eecs.berkeley.edu

NgHaradaRussell-shaping-ICML1999.pdf

3
1. ongwaihong 03 Jun 2022
  
  in Public
  
  fs?zbmn ̄»z'sfm?eqxe ́zXw]qkrme « f quz'afcK|~zbjf»kvykreeghukrfOqeu_ ° z b|rà_'
  
  epsilon greedy
2. ongwaihong 03 Jun 2022
  
  in Public
  
  ¿ÑÈÕ®©Âª z 'hua»z=Å¦ßà Ò Ñ È Õ ®© z'hua»z=Å¦ßà Ò Ñ È ©! =ä8&X©fz'hua»z=Å¦ßà Ò Ñ È ©!
  
  the crux
3. ongwaihong 03 Jun 2022
  
  in Public
  
  _shz'm]w±KÅQfO ̄à ̧³o*Õz'm ̄à
  
  so rewards, and reward functions can be defined over S x A x S
Visit annotations in context

Annotators

ongwaihong

URL

people.eecs.berkeley.edu/~pabbeel/cs287-fa09/readings/NgHaradaRussell-shaping-ICML1999.pdf
May 2022
d3c33hcgiwev3.cloudfront.net d3c33hcgiwev3.cloudfront.net

Ph9QFZnEEemRfw7JJ0OZYA_808e8e7d9a544e1eb31ad11069d45dc4_RLbook2018.pdf

3
1. ongwaihong 30 May 2022
  
  in Public
  
  Will they make exactly the same action selections and weightupdates?
  
  no, on Q-learning the greedy action in the bellman equation is taken before the update, but the next step's action is generated from the updated Q.
  
  whereas in SARSA with a greedy policy, the same greedy action is used in the update equation, and it is also used taken to generate the next state
2. ongwaihong 30 May 2022
  
  in Public
  
  Q-learning considered an o↵-policy control method?
  
  because the policy for which Q estimates is the one that is greedy w.r.t Q, but the policy generating the samples can be anything
3. ongwaihong 15 May 2022
  
  in Public
  
  ⇢ t (R t+1 + Gt+1:h )
  
  see 5.9, this is per-decision importance sampling
Visit annotations in context

Annotators

ongwaihong

URL

d3c33hcgiwev3.cloudfront.net/Ph9QFZnEEemRfw7JJ0OZYA_808e8e7d9a544e1eb31ad11069d45dc4_RLbook2018.pdf
Nov 2021
ocw.mit.edu ocw.mit.edu

Time Series Analysis I

9
1. ongwaihong 14 Nov 2021
  
  in Public
  
  Information Criterion
  
  usually a function of the likelihood function plus some penalty for model complexity
2. ongwaihong 14 Nov 2021
  
  in Public
  
  tσ
  
  sum of t independent innovations
3. ongwaihong 14 Nov 2021
  
  in Public
  
  1 + θ21 + θ22 + ···+ θ2q
  
  due to having uncorrelated innovations
4. ongwaihong 14 Nov 2021
  
  in Public
  
  ex number λ: |λ|> 1,(1 − 1λL)−1
  
  applying the inverse, renders the process an infinite order MA process
5. ongwaihong 13 Nov 2021
  
  in Public
  
  z |≤1}, i.e., |λj |> 1
  
  geometric series
6. ongwaihong 13 Nov 2021
  
  in Public
  
  Li (ηt )
  
  function on eta_t
7. ongwaihong 13 Nov 2021
  
  in Public
  
  ηt−
  
  how does this affect the process over time?
  
  e.g. changing interest rates on the long term economy
8. ongwaihong 13 Nov 2021
  
  in Public
  
  p/n →0
  
  more data than parameters
9. ongwaihong 13 Nov 2021
  
  in Public
  
  St
  
  S_t is a weighted average of white noise
Visit annotations in context

Annotators

ongwaihong

URL

ocw.mit.edu/courses/mathematics/18-s096-topics-in-mathematics-with-applications-in-finance-fall-2013/lecture-notes/MIT18_S096F13_lecnote8.pdf
Jul 2021
d3c33hcgiwev3.cloudfront.net d3c33hcgiwev3.cloudfront.net

Ph9QFZnEEemRfw7JJ0OZYA_808e8e7d9a544e1eb31ad11069d45dc4_RLbook2018.pdf

1
1. ongwaihong 04 Jul 2021
  
  in Public
  
  ✓✓+↵✓Irln⇡(A|S,✓)
  
  actor-critic with state value baseline update, with discounting!
  
  del ln(A|S, theta) is actually CrossEntropy
Visit annotations in context

Annotators

ongwaihong

URL

d3c33hcgiwev3.cloudfront.net/Ph9QFZnEEemRfw7JJ0OZYA_808e8e7d9a544e1eb31ad11069d45dc4_RLbook2018.pdf
May 2021
jmlr.org jmlr.org

15-599.pdf

6
1. ongwaihong 28 May 2021
  
  in Public
  
  (δt+θ>tφt−θ>t−1φt)
  
  modified TD error in terms of regular TD error
2. ongwaihong 28 May 2021
  
  in Public
  
  αφtδ′t
  
  A_{t+1}^t = I
3. ongwaihong 28 May 2021
  
  in Public
  
  At−10θ0+αt−1∑i=0At−1i+1φiGλ|ti
  
  we've achieved something special here - the recursive definition of theta_{t+1} on theta_t
4. ongwaihong 28 May 2021
  
  in Public
  
  = (I−αφtφ>t)θt+αt−1∑i=0Ati+1φi(γλ)t−iδ′t+αφt(Rt+1+γθt>φt+1)
  
  this is already computationally quite nice, but these jokers want to incorporate the last term into the modified TD error
5. ongwaihong 28 May 2021
  
  in Public
  
  Gλ|t+1
  
  lambda return = lambda weighted sum of all n-step returns up to time t+1
6. ongwaihong 28 May 2021
  
  in Public
  
  γλet−1+φt−αγλ(e>t−1φt)φt
  
  the dutch trace update rule
Visit annotations in context

Annotators

ongwaihong

URL

jmlr.org/papers/volume17/15-599/15-599.pdf
d3c33hcgiwev3.cloudfront.net d3c33hcgiwev3.cloudfront.net

Ph9QFZnEEemRfw7JJ0OZYA_808e8e7d9a544e1eb31ad11069d45dc4_RLbook2018.pdf

10
1. ongwaihong 28 May 2021
  
  in Public
  
  G1:3
  
  weighted sum of all available n-step lambda returns from t=1
2. ongwaihong 25 May 2021
  
  in Public
  
  hanging the values of those past states for when theyoccur again in the future
  
  assuming the special linear case, with a discrete state space and where the feature vector is a one-hot encoding, the TD-error multipled by the eligbility vector is the effect of the current TD-error on each state, with the effect amplified (or de-amplified) by each states' recency of occurence.
3. ongwaihong 25 May 2021
  
  in Public
  
  assign it backward to each prior state according to how much that state contributedto the current eligibility trace at that time.
  
  the current TD error contributes less and less to those states that occur furhter back in time (using the linear case helps, where the eligibility trace is a sum of past, fading state input vectors).
4. ongwaihong 23 May 2021
  
  in Public
  
  Tt1Gt
  
  sum of all remaining n-step returns
5. ongwaihong 22 May 2021
  
  in Public
  
  Qt+n1(St,At)
  
  should this be inside the bracket??
  
  QUESTION
6. ongwaihong 22 May 2021
  
  in Public
  
  How about the change in left-side outcome from 0 to1 madein the larger walk? Do you think that made any di↵erence in the best value ofn
  
  yes, because rewards can propagate in from both sides now, making the optimal n shorter?
7. ongwaihong 22 May 2021
  
  in Public
  
  Gt:t+n.=Rt+1+Rt+2+···+n1Rt+n+nQt+n1(St+n,At+n)
  
  G(t:t+n) needs all rewards from time t+1
8. ongwaihong 22 May 2021
  
  in Public
  
  Qt+n(St,At).=Qt+n1(St,At)+↵[Gt:t+nQt+n1(St,At)]
  
  at the current time step t+n, we need the states and actions from time = t (i.e. n steps back)
9. ongwaihong 22 May 2021
  
  in Public
  
  +n1
  
  use the most up-to-date version
10. ongwaihong 16 May 2021
  
  in Public
  
  xpectation of being in a state depends only on thepolicy and the MDP transition probabilities
  
  hmm sure starting states can have an impact?
  
  QUESTION
Visit annotations in context

Tags

QUESTION

Annotators

ongwaihong

URL

d3c33hcgiwev3.cloudfront.net/Ph9QFZnEEemRfw7JJ0OZYA_808e8e7d9a544e1eb31ad11069d45dc4_RLbook2018.pdf
Jan 2021
arxiv.org arxiv.org

1409.2944.pdf

1
1. ongwaihong 03 Jan 2021
  
  in Public
  
  [21, 39] directlyuse conventional CNN or deep belief networks (DBN)
  
  interesting, read!
  
  FURTHER READING
Visit annotations in context

Tags

FURTHER READING

Annotators

ongwaihong

URL

arxiv.org/pdf/1409.2944.pdf
d3c33hcgiwev3.cloudfront.net d3c33hcgiwev3.cloudfront.net

Ph9QFZnEEemRfw7JJ0OZYA_808e8e7d9a544e1eb31ad11069d45dc4_RLbook2018.pdf

1
1. ongwaihong 03 Jan 2021
  
  in Public
  
  f⌧+n<T,then:GG+nV(S⌧+n
  
  V(terminal state) = 0
Visit annotations in context

Annotators

ongwaihong

URL

d3c33hcgiwev3.cloudfront.net/Ph9QFZnEEemRfw7JJ0OZYA_808e8e7d9a544e1eb31ad11069d45dc4_RLbook2018.pdf
Dec 2020
d3c33hcgiwev3.cloudfront.net d3c33hcgiwev3.cloudfront.net

Ph9QFZnEEemRfw7JJ0OZYA_808e8e7d9a544e1eb31ad11069d45dc4_RLbook2018.pdf

7
1. ongwaihong 30 Dec 2020
  
  in Public
  
  ↵Gtr⇡(At|St,✓t)⇡(At|St,✓t)
  
  notice that the multiplier of the gradient here: G_t / pi(a|s) is positive, meaning we are always going in the same direction as the gradient. using a baseline G_t - v(S_t) allows us to revers this direction if G_t is lower than the baseline
2. ongwaihong 30 Dec 2020
  
  in Public
  
  Actor–Critic with Eligibility Traces (continuing), for estimating⇡✓⇡⇡⇤
  
  actor critic algorithm one step TD:
3. ongwaihong 29 Dec 2020
  
  in Public
  
  (13.16)
  
  very similar to box 199 but without h(s)
4. ongwaihong 29 Dec 2020
  
  in Public
  
  ww+↵wrˆv(S,w)
  
  TD(0) update
5. ongwaihong 29 Dec 2020
  
  in Public
  
  Gt:t+1ˆv(St,w)
  
  same as REINFORCE MC baseline, but with the sampled G replaced with a bootstrapped G
6. ongwaihong 29 Dec 2020
  
  in Public
  
  That is,wis a single component,w.
  
  constant baseline?
7. ongwaihong 28 Dec 2020
  
  in Public
  
  If there is discounting (<1) itshould be treated as a form of termination, which can be done simply by includinga factor ofin the second term of
  
  termination because discounting by gamma is equivalent to a non-disounted case, but with termination probability gamma
Visit annotations in context

Annotators

ongwaihong

URL

d3c33hcgiwev3.cloudfront.net/Ph9QFZnEEemRfw7JJ0OZYA_808e8e7d9a544e1eb31ad11069d45dc4_RLbook2018.pdf
Oct 2020
Local file Local file

2001.04193.pdf

11
1. ongwaihong 19 Oct 2020
  
  in Public
  
  Chenget al.[92] design a multi-channel parts-aggregated deep convolutional network byintegrating the local body part features and the global full-body features in a triplet training framework
  
  TODO: read this and find out what the philosophy behind parts-based model is??
  
  TODO
2. ongwaihong 19 Oct 2020
  
  in Public
  
  adaptive average pooling
  
  what is this?
  
  TODO
3. ongwaihong 19 Oct 2020
  
  in Public
  
  Generation/Augmentation
  
  TODO: read
  
  TODO
4. ongwaihong 19 Oct 2020
  
  in Public
  
  Using theannotated source data in the training process of the targetdomain is beneficial for cross-dataset learning
  
  What? Clarify
5. ongwaihong 19 Oct 2020
  
  in Public
  
  Dy-namic graph matching (DGM)
  
  super interesting, but hardly applicable. do rad though!
6. ongwaihong 14 Oct 2020
  
  in Public
  
  Sample Rate Learning
  
  what
7. ongwaihong 14 Oct 2020
  
  in Public
  
  Singular VectorDecomposition (SVDNet)
  
  seems interesting, "iteratively integrate the orthogonality constraint in CNN training"
8. ongwaihong 14 Oct 2020
  
  in Public
  
  Omni-Scale Network (OSNet)
  
  read paper again to see if any good ideas for architecture
9. ongwaihong 14 Oct 2020
  
  in Public
  
  bottleneck laye
  
  Bottleneck layers do a 1x1 convolution to reduce the dimensionality, before a 3x3 convolution, to save computation
  
  https://medium.com/@erikgaas/resnet-torchvision-bottlenecks-and-layers-not-as-they-seem-145620f93096
10. ongwaihong 14 Oct 2020
  
  in Public
  
  Global Feature Representation Learning
  
  someething that came up whilst looking through papers in attention: https://arxiv.org/pdf/1709.01507.pdf squeeze-and-excitation
11. ongwaihong 14 Oct 2020
  
  in Public
  
  [68]
  
  Parts-based paper, interesting approach
Tags

TODO

Annotators

ongwaihong

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators