10 Matching Annotations
  1. Last 7 days
    1. Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT

      The UCB algorithm for bandits comes back again as UCT to form the basis for model estimation via MCTS

    2. The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and the exploration of moves with few simulations.

      Tree search makes this tradeoff very clear, how many paths will you explore before you stop and use the knowledge you already have?

    1. An illustration of alpha–beta pruning. The grayed-out subtrees don't need to be explored (when moves are evaluated from left to right), since it is known that the group of subtrees as a whole yields the value of an equivalent subtree or worse, and as such cannot influence the final result. The max and min levels represent the turn of the player and the adversary, respectively.

      Alpha-Beta pruning comes down to being smart about searching the tree of possible future game states to be more efficient about rollouts.

    1. For example, the chess computer Deep Blue (the first one to beat a reigning world champion, Garry Kasparov at that time) looked ahead at least 12 plies, then applied a heuristic evaluation function.[6]

      Deep Blue used a kind of minimax algorithm to beat Garry Kasparov at chess, 12 step lookehead.

    1. 2024 paper arguing that other methods beyond PPO could be better for "value alignment" of LLMs

  2. Jul 2024
  3. Oct 2023