15 Matching Annotations
  1. Last 7 days
  2. Oct 2023
    1. (Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.

      Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.

    1. Wu, Prabhumoye, Yeon Min, Bisk, Salakhutdinov, Azaria, Mitchell and Li. "SPRING: GPT-4 Out-performs RL Algorithms byStudying Papers and Reasoning". Arxiv preprint arXiv:2305.15486v2, May, 2023.

    2. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RLbaselines, trained for 1M steps, without any training.

      Them's fighten' words!

      I haven't read it yet, but we're putting it on the list for this fall's reading group. Seriously, a strong result with a very strong implied claim. they are careful to say it's from their empirical results, very worth a look. I suspect that amount of implicit knowledge in the papers, text and DAG are helping to do this.

      The Big Question: is their comparison to RL baselines fair, are they being trained from scratch? What does a fair comparison of any from-scratch model (RL or supervised) mean when compared to an LLM approach (or any approach using a foundation model), when that model is not really from scratch.

    1. Wang et. al. "Scientific discovery in the age of artificial intelligence", Nature, 2023.

      A paper about the current state of using AI/ML for scientific discovery, connected with the AI4Science workshops at major conferences.

      (NOTE: since Springer/Nature don't allow public pdfs to be linked without a paywall, we can't use hypothesis directly on the pdf of the paper, this link is to the website version of it which is what we'll use to guide discussion during the reading group.)

    2. Petersen, B. K. et al. Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representations (2020).

      Description: Reinforcement learning uses neural networks to generate a mathematical expression sequentially by adding mathematical symbols from a predefined vocabulary and using the learned policy to decide which notation symbol to be added next. The mathematical formula is represented as a parse tree. The learned policy takes the parse tree as input to determine what leaf node to expand and what notation (from the vocabulary) to add.

    1. Zecevic, Willig, Singh Dhami and Kersting. "Causal Parrots: Large Language Models May Talk Causality But Are Not Causal". In Transactions on Machine Learning Research, Aug, 2023.

    1. LaMDA: Language Models for Dialog Application

      "LaMDA: Language Models for Dialog Application" Meta's introduction of LaMDA v1 Large Language Model.