Hypothesis

10 Matching Annotations

Jul 2024
en.wikipedia.org en.wikipedia.org

Monte Carlo tree search - Wikipedia

2
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT
  
  The UCB algorithm for bandits comes back again as UCT to form the basis for model estimation via MCTS
  
  reinforcement-learning ece457c
2. mark.crowley 22 Jul 2024
  
  in Public
  
  The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and the exploration of moves with few simulations.
  
  Tree search makes this tradeoff very clear, how many paths will you explore before you stop and use the knowledge you already have?
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

ece457c

reinforcement-learning

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Monte_Carlo_tree_search
www.nature.com www.nature.com

Mastering the game of Go with deep neural networks and tree search

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  The summary paper for AlphaGo.
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

ece457c

reinforcement-learning

Annotators

mark.crowley

URL

nature.com/articles/nature16961.pdf
beamlab.org beamlab.org

Deep Learning 101 - Part 1: History and Background

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  A realy nice visual history of the development of Deep Learning, the cornerstone of modern AI and ML.
  
  ece457c
Visit annotations in context

Tags

ece457c

Annotators

mark.crowley

URL

beamlab.org/deeplearning/2017/02/23/deep_learning_101_part1.html
en.wikipedia.org en.wikipedia.org

Alpha–beta pruning - Wikipedia

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  An illustration of alpha–beta pruning. The grayed-out subtrees don't need to be explored (when moves are evaluated from left to right), since it is known that the group of subtrees as a whole yields the value of an equivalent subtree or worse, and as such cannot influence the final result. The max and min levels represent the turn of the player and the adversary, respectively.
  
  Alpha-Beta pruning comes down to being smart about searching the tree of possible future game states to be more efficient about rollouts.
  
  ece457c
Visit annotations in context

Tags

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Alpha–beta_pruning
en.wikipedia.org en.wikipedia.org

Minimax - Wikipedia, the free encyclopedia

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  For example, the chess computer Deep Blue (the first one to beat a reigning world champion, Garry Kasparov at that time) looked ahead at least 12 plies, then applied a heuristic evaluation function.[6]
  
  Deep Blue used a kind of minimax algorithm to beat Garry Kasparov at chess, 12 step lookehead.
  
  ece457c
Visit annotations in context

Tags

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Minimax
en.wikipedia.org en.wikipedia.org

AlphaZero - Wikipedia

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Wikipedia: AlphaZero
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

ece457c

reinforcement-learning

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/AlphaZero
arxiv.org arxiv.org

2403.07691.pdf

1
1. mark.crowley 16 Jul 2024
  
  in Public
  
  2024 paper arguing that other methods beyond PPO could be better for "value alignment" of LLMs
  
  reinforcement-learning ppo ece457c
Visit annotations in context

Tags

ppo

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/2403.07691
arxiv.org arxiv.org

Deep Reinforcement Learning that Matters

1
1. mark.crowley 15 Jul 2024
  
  in Public
  
  Paper "Deep Reinforcement Learning that Matters" on evaluating RL algorithms.
  
  reinforcement-learning ece457c
Visit annotations in context

Tags

ece457c

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1709.06560
Oct 2023
arxiv.org arxiv.org

2209.07550.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  [Kapturowski, DeepMind, Sep 2022] "Human-level Atari 200x Faster"
  
  Improving the 2020 Agent57 performance to be more efficeint.
  
  Arxiv: https://arxiv.org/abs/2209.07550
  
  reinforcement-learning atari-games ece457c to-read
Visit annotations in context

Tags

to-read

atari-games

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/2209.07550.pdf

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL