Hypothesis

71 Matching Annotations

Jan 2025
openreview.net openreview.net

74_Mapping_Social_Choice_Theor.pdf

1
1. mark.crowley 31 Jan 2025
  
  in Public
  
  MAPPING SOCIAL CHOICE THEORY TO RLHF Jessica Dai and Eve Fleisig ICLR Workshop on Reliable and Responsible Foundation Models 2024
  
  Nice overview of how social choice theory has been connected to RLHF and AI alignment ideas.
  
  #ai-morality align rlhf llm #reinforcement-learning
Visit annotations in context

Tags

align

rlhf

llm

#reinforcement-learning

#ai-morality

Annotators

mark.crowley

URL

openreview.net/pdf
Jul 2024
en.wikipedia.org en.wikipedia.org

Monte Carlo tree search - Wikipedia

2
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT
  
  The UCB algorithm for bandits comes back again as UCT to form the basis for model estimation via MCTS
  
  reinforcement-learning ece457c
2. mark.crowley 22 Jul 2024
  
  in Public
  
  The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and the exploration of moves with few simulations.
  
  Tree search makes this tradeoff very clear, how many paths will you explore before you stop and use the knowledge you already have?
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

ece457c

reinforcement-learning

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Monte_Carlo_tree_search
www.nature.com www.nature.com

Mastering the game of Go with deep neural networks and tree search

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  The summary paper for AlphaGo.
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

nature.com/articles/nature16961.pdf
en.wikipedia.org en.wikipedia.org

AlphaZero - Wikipedia

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Wikipedia: AlphaZero
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/AlphaZero
arxiv.org arxiv.org

2403.07691.pdf

1
1. mark.crowley 16 Jul 2024
  
  in Public
  
  2024 paper arguing that other methods beyond PPO could be better for "value alignment" of LLMs
  
  reinforcement-learning ppo ece457c
Visit annotations in context

Tags

ppo

ece457c

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2403.07691
arxiv.org arxiv.org

Deep Reinforcement Learning that Matters

1
1. mark.crowley 15 Jul 2024
  
  in Public
  
  Paper "Deep Reinforcement Learning that Matters" on evaluating RL algorithms.
  
  reinforcement-learning ece457c
Visit annotations in context

Tags

ece457c

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1709.06560
Feb 2024
arxiv.org arxiv.org

2205.08192.pdf

1
1. mark.crowley 18 Feb 2024
  
  in Public
  
  T. Herlau, "Moral Reinforcement Learning Using Actual Causation," 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 2022, pp. 179-185, doi: 10.1109/ICCCR54399.2022.9790262. keywords: {Digital control;Ethics;Costs;Philosophical considerations;Toy manufacturing industry;Reinforcement learning;Forestry;Causality;Reinforcement learning;Actual Causation;Ethical reinforcement learning}
  
  ai-ethics ai-morality reinforcement-learning
Visit annotations in context

Tags

ai-ethics

ai-morality

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.08192.pdf
pdf.sciencedirectassets.com pdf.sciencedirectassets.com

Can model-free reinforcement learning explain deontological moral judgments?

1
1. mark.crowley 18 Feb 2024
  
  in Public
  
  Can model-free reinforcement learning explain deontological moraljudgments?Alisabeth AyarsUniversity of Arizona, Dept. of Psychology, Tucson, AZ, USA
  
  ai-morality ai-ethics reinforcement-learning
Visit annotations in context

Tags

ai-ethics

ai-morality

reinforcement-learning

Annotators

mark.crowley

URL

pdf.sciencedirectassets.com/271061/1-s2.0-S0010027716X00030/1-s2.0-S0010027716300300/am.pdf
Nov 2023
proceedings.mlr.press proceedings.mlr.press

janner22a.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

rdgrp-f23

reinforcement-learning

transformers

Annotators

mark.crowley

URL

proceedings.mlr.press/v162/janner22a/janner22a.pdf
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-offline-reinforcement-learning-as-one-big-sequence-modeling-problem-Paper.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

rdgrp-f23

reinforcement-learning

transformers

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper_files/paper/2021/file/099fe6b0b444c23836c4a5d07346082b-Paper.pdf
Oct 2023
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  (Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
  
  Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
  
  reinforcement-learning transformers generative-models minecraft minerl rdgrp-f23 reading_group_crowley
Visit annotations in context

Tags

reading_group_crowley

generative-models

minecraft

minerl

rdgrp-f23

reinforcement-learning

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2305.15486.pdf

2
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Wu, Prabhumoye, Yeon Min, Bisk, Salakhutdinov, Azaria, Mitchell and Li. "SPRING: GPT-4 Out-performs RL Algorithms byStudying Papers and Reasoning". Arxiv preprint arXiv:2305.15486v2, May, 2023.
  
  reinforcement-learning nlp large-language-models chatgpt minecraft evaluation-methods rdgrp-f23
2. mark.crowley 25 Oct 2023
  
  in Public
  
  Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RLbaselines, trained for 1M steps, without any training.
  
  Them's fighten' words!
  
  I haven't read it yet, but we're putting it on the list for this fall's reading group. Seriously, a strong result with a very strong implied claim. they are careful to say it's from their empirical results, very worth a look. I suspect that amount of implicit knowledge in the papers, text and DAG are helping to do this.
  
  The Big Question: is their comparison to RL baselines fair, are they being trained from scratch? What does a fair comparison of any from-scratch model (RL or supervised) mean when compared to an LLM approach (or any approach using a foundation model), when that model is not really from scratch.
  
  reinforcement-learning rdgrp-f23 reading_group_crowley nlp larg deep-learning self-supervised supervised-learning evaluation-methods
Visit annotations in context

Tags

self-supervised

rdgrp-f23

chatgpt

nlp

minecraft

reinforcement-learning

evaluation-methods

reading_group_crowley

large-language-models

deep-learning

supervised-learning

larg

Annotators

mark.crowley

URL

arxiv.org/pdf/2305.15486.pdf
arxiv.org arxiv.org

2203.02155.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Training language models to follow instructionswith human feedback
  
  Original Paper for discussion of the Reinforcement Learning with Human Feedback algorithm.
  
  large-language-models reinforcement-learning chatgpt
Visit annotations in context

Tags

chatgpt

large-language-models

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2203.02155
arxiv.org arxiv.org

2209.07550.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  [Kapturowski, DeepMind, Sep 2022] "Human-level Atari 200x Faster"
  
  Improving the 2020 Agent57 performance to be more efficeint.
  
  Arxiv: https://arxiv.org/abs/2209.07550
  
  reinforcement-learning atari-games ece457c to-read
Visit annotations in context

Tags

to-read

ece457c

reinforcement-learning

atari-games

Annotators

mark.crowley

URL

arxiv.org/pdf/2209.07550.pdf
Sep 2023
arxiv.org arxiv.org

1908.01046.pdf

1
1. mark.crowley 15 Sep 2023
  
  in Public
  
  Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation
  
  autonomous-driving multi-agent-reinforcement-learning black-box-testing
Visit annotations in context

Tags

black-box-testing

multi-agent-reinforcement-learning

autonomous-driving

Annotators

mark.crowley

URL

arxiv.org/pdf/1908.01046.pdf
Jul 2023
proceedings.mlr.press proceedings.mlr.press

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA: Scalable Distributed Deep-RL with Importance WeightedActor-Learner Architectures
  
  (Espeholt, ICML, 2018) "IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures"
  
  reinforcement-learning impala
Visit annotations in context

Tags

impala

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf
proceedings.mlr.press proceedings.mlr.press

Deterministic Policy Gradient Algorithms

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduced the DPG Algorithm
  
  DPG reinforcement-learning
Visit annotations in context

Tags

DPG

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v32/silver14.pdf
openreview.net openreview.net

babyai_a_platform_to_study_the.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Link to page with information about the paper: https://openreview.net/forum?id=rJeXCo0cYX
  
  reinforcement-learning curriculum-learning grid-world babyai
Visit annotations in context

Tags

grid-world

babyai

curriculum-learning

reinforcement-learning

Annotators

mark.crowley

URL

openreview.net/pdf
openreview.net openreview.net

a_path_towards_autonomous_mach.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Yann LeCun released his vision for the future of Artificial Intelligence research in 2022, and it sounds a lot like Reinforcement Learning.
  
  reinforcement-learning agi
Visit annotations in context

Tags

agi

reinforcement-learning

Annotators

mark.crowley

URL

openreview.net/pdf
www.cs.toronto.edu www.cs.toronto.edu

dqn.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  The paper that introduced the DQN algorithm for using Deep Learning with Reinforcement Learning to play Atari game.
  
  reinforcement-learning dqn atari-games deep-learning
Visit annotations in context

Tags

dqn

deep-learning

reinforcement-learning

atari-games

Annotators

mark.crowley

URL

cs.toronto.edu/~vmnih/docs/dqn.pdf
arxiv.org arxiv.org

Deep Reinforcement Learning with Double Q-learning

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that evaluated the existing Double Q-Learning algorithm on the new DQN approach and validated that it is very effective in the Deep RL realm.
  
  reinforcement-learning dqn deep-learning
Visit annotations in context

Tags

dqn

deep-learning

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.06461
arxiv.org arxiv.org

Continuous control with deep reinforcement learning

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduces the DDPG algorithm which builds on the existing DPG algorithm from classic RL theory. The main idea is to define a deterministic policy, or nearly deterministic, for situations where the environment is very sensitive to suboptimal actions, and one action setting usually dominates in each state. This showed good performance, but could not beat algorithms such as PPO until the additions of SAC were added. SAC adds an entropy penalty which essentially penalizes uncertainty in any states. Using this, the deterministic policy gradient approach performs well.
  
  ddpg reinforcement-learning SAC DPG PPO
Visit annotations in context

Tags

ddpg

DPG

SAC

PPO

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.02971
arxiv.org arxiv.org

1710.02298.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This famous paper gives a great review of the DQN algorithm a couple years after it changed everything in Deep RL. It compares six different extensions to DQN for Deep Reinforcement Learning, many of which have now become standard additions to DQN and other Deep RL algorithms. It also combines all of them together to produce the "rainbow" algorithm, which outperformed many other models for a while.
  
  reinforcement-learning experimental-design
Visit annotations in context

Tags

experimental-design

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1710.02298.pdf
arxiv.org arxiv.org

2104.10986.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Arxiv paper from 2021 on reinforcement learning in a scenario where your aim is to learn a workable POMDP policy, but you start with a fully observable MDP and adjust it over time towards a POMDP.
  
  reinforcement-learning pomdp mdp
Visit annotations in context

Tags

mdp

reinforcement-learning

pomdp

Annotators

mark.crowley

URL

arxiv.org/pdf/2104.10986.pdf
arxiv.org arxiv.org

1707.06347.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that introduced the PPO algorithm. PPO is, in a way, a response to the TRPO algorithm, trying to use the core idea but implement a more efficient and simpler algorithm.
  
  TRPO defines the problem as a straight optimization problem, no learning is actually involved.
  
  ppo reinforcement-learning policy-gradients trpo
Visit annotations in context

Tags

policy-gradients

trpo

ppo

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1707.06347
arxiv.org arxiv.org

2206.11795.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  Introduction of VPT : New semi-supervied pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
Visit annotations in context

Tags

proj-minerl

reinforcement-learning

minecraft

foundation-models

pretrained-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
arxiv.org arxiv.org

Liang15.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  Response paper to DQN showing that well designed Value Function Approximations can also do well at these complex tasks without the use of Deep Learning
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  dqn reinforcement-learning atari-games deep-learning shallow-learning
Visit annotations in context

Tags

dqn

shallow-learning

deep-learning

atari-games

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1512.01563
arxiv.org arxiv.org

1511.05952.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Tom Schaul, John Quan, Ioannis Antonoglou and David Silver. "PRIORITIZED EXPERIENCE REPLAY", ICLR, 2016.
  
  reinforcement-learning ppo deep-learning deep-rl policy-gradient direct-policy-search trust-region
Visit annotations in context

Tags

deep-rl

ppo

direct-policy-search

deep-learning

reinforcement-learning

policy-gradient

trust-region

Annotators

mark.crowley

URL

arxiv.org/pdf/1511.05952.pdf
Jun 2023
www.fandm.edu www.fandm.edu

617813975725918530-aamas2016-shallow-rl.pdf

1
1. mark.crowley 16 Jun 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  reinforcement-learning dqn deep-learning shallow-learning atari-games uwece457C
Visit annotations in context

Tags

dqn

shallow-learning

deep-learning

reinforcement-learning

atari-games

uwece457C

Annotators

mark.crowley

URL

fandm.edu/uploads/files/617813975725918530-aamas2016-shallow-rl.pdf
assets.pubpub.org assets.pubpub.org

01621566588509.pdf

1
1. mark.crowley 09 Jun 2023
  
  in Public
  
  LeBlanc, D. G., & Lee, G. (2021). General Deep Reinforcement Learning in NES Games. Canadian AI 2021. Canadian Artificial Intelligence Association (CAIAC). https://doi.org/10.21428/594757db.8472938b
  
  canadian-ai reinforcement-learning video-games
Visit annotations in context

Tags

video-games

reinforcement-learning

canadian-ai

Annotators

mark.crowley

URL

assets.pubpub.org/uonw8d4k/01621566588509.pdf
Apr 2023
arxiv.org arxiv.org

2206.11795.pdf

1
1. mark.crowley 12 Apr 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  New supervised pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
Visit annotations in context

Tags

proj-minerl

reinforcement-learning

minecraft

foundation-models

pretrained-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
inflecthealth.medium.com inflecthealth.medium.com

I’m an ER doctor: Here’s what I found when I asked ChatGPT to diagnose my patients

1
1. tonz 06 Apr 2023
  
  in Public
  
  If my patient notes don’t include a question I haven’t yet asked, ChatGPT’s output will encourage me to keep missing that question. Like with my young female patient who didn’t know she was pregnant. If a possible ectopic pregnancy had not immediately occurred to me, ChatGPT would have kept enforcing that omission, only reflecting back to me the things I thought were obvious — enthusiastically validating my bias like the world’s most dangerous yes-man.
  
  Things missing in a prompt will not result from a prompt. This may reinforce one's own blind spots / omissions, lowering the probability of an intuitive leap to other possibilities. The machine helps you search under the light you switched on with your prompt. Regardless of whether you're searching in the right place.
  
  generativeai chatgpt healthcare reinforcement blindspot confirmationbias
Visit annotations in context

Tags

confirmationbias

chatgpt

blindspot

generativeai

healthcare

reinforcement

Annotators

tonz

URL

inflecthealth.medium.com/im-an-er-doctor-here-s-what-i-found-when-i-asked-chatgpt-to-diagnose-my-patients-7829c375a9da
Mar 2023
arxiv.org arxiv.org

2010.03950.pdf

1
1. mark.crowley 07 Mar 2023
  
  in Public
  
  asks for the Minecraft domain.
  
  They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.
  
  minecraft reinforcement-learning
Visit annotations in context

Tags

minecraft

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950.pdf
Feb 2023
arxiv.org arxiv.org

2010.03950.pdf

4
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Definition 3.2 (simple reward machine).
  
  The MDP does not change, it's dynamics are the same, with or without the RM, as they are with or without a standard reward model. Additionally, the rewards from the RM can be non-Markovian with respect to the MDP because they inherently have a kind of memory or where you've been, limited to the agents "movement" (almost "in it's mind") about where it is along the goals for this task.
  
  reinforcement-learning reward-machines
2. mark.crowley 16 Feb 2023
  
  in Public
  
  e thenshow that an RM can be interpreted as specifying a single reward function over a largerstate space, and consider types of reward functions that can be expressed using RMs
  
  So by specifying a reward machine you are augmenting the state space of the MDP with higher level goals/subgoals/concepts that provide structure about what is good and what isn't.
  
  reinforcement-learning reward-machines
3. mark.crowley 16 Feb 2023
  
  in Public
  
  However, an agent that hadaccess to the specification of the reward function might be able to use such information tolearn optimal policies faster.
  
  Fascinating idea, why not? Why are we hiding the reward from the agent really?
  
  reinforcement-learning reward-machines
4. mark.crowley 02 Feb 2023
  
  in Public
  
  Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning
  
  [Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reward-machines

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950.pdf
proceedings.mlr.press proceedings.mlr.press

Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning

1
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning
  
  [Icarte, PMLR, 2018] "Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reward-machines

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/icarte18a/icarte18a.pdf
Dec 2022
arxiv.org arxiv.org

2205.15241.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
  
  reinforcement-learning transformers transfer-learning conf-neurips-2022 proj-minerl
Visit annotations in context

Tags

proj-minerl

conf-neurips-2022

transfer-learning

reinforcement-learning

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.15241.pdf
arxiv.org arxiv.org

2210.00849.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  [Neumann, Gros, NeurIPS, 2022] - "SCALING LAWS FOR A MULTI-AGENT REINFORCEMENT LEARNING MODEL"
  
  reinforcement-learning marl multi-agent-reinforcement-learning conf-neurips-2022
Visit annotations in context

Tags

multi-agent-reinforcement-learning

marl

conf-neurips-2022

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2210.00849.pdf
Sep 2022
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 27 Sep 2022
  
  in Public
  
  We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
  
  transformers offline-learning reinforcement-learning
Visit annotations in context

Tags

offline-learning

reinforcement-learning

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2112.09099.pdf

1
1. mark.crowley 12 Sep 2022
  
  in Public
  
  AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.
  
  S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.
  
  reinforcement-learning marl
Visit annotations in context

Tags

marl

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2112.09099.pdf
Jul 2022
ieeexplore.ieee.org ieeexplore.ieee.org

IEEE Xplore Full-Text PDF:

1
1. mark.crowley 26 Jul 2022
  
  in Public
  
  A recent overview of RL methods used for autonomous driving.
  
  reinforcement-learning autonomous-driving
Visit annotations in context

Tags

reinforcement-learning

autonomous-driving

Annotators

mark.crowley

URL

ieeexplore.ieee.org/stamp/stamp.jsp
Jun 2022
assets.pubpub.org assets.pubpub.org

01652987005906.pdf

1
1. mark.crowley 04 Jun 2022
  
  in Public
  
  Discussion on
  
  Bellinger C, Drozdyuk A, Crowley M, Tamblyn I. Balancing Information with Observation Costs in Deep Reinforcement Learning. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/0jmy7gpd
  
  reinforcement-learning artificial-intelligence proj-chemgymrl digital-chemistry material-design national-research-council-of-canada CanAI2022
Visit annotations in context

Tags

CanAI2022

reinforcement-learning

artificial-intelligence

digital-chemistry

material-design

national-research-council-of-canada

proj-chemgymrl

Annotators

mark.crowley

URL

assets.pubpub.org/99r5anzw/01652987005906.pdf
May 2022
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

15756507305185 1..25

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Another piece to the "what can we do with eligibility traces" puzzle for Deep RL.
  
  reinforcement-learning eligibility-traces rl-course
Visit annotations in context

Tags

rl-course

eligibility-traces

reinforcement-learning

Annotators

mark.crowley

URL

ncbi.nlm.nih.gov/pmc/articles/PMC6897511/pdf/elife-47463.pdf
arxiv.org arxiv.org

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

eligibility-traces

rl-course

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2008.10040.pdf
arxiv.org arxiv.org

1810.09967v1.pdf

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

eligibility-traces

rl-course

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1810.09967v1.pdf
arxiv.org arxiv.org

2102.03406.pdf

1
1. mark.crowley 27 May 2022
  
  in Public
  
  Hypothesis page to discuss this high level description of DeepMind's new Gato framework.
  
  reinforcement-learning rl-course artificial-intelligence
Visit annotations in context

Tags

rl-course

reinforcement-learning

artificial-intelligence

Annotators

mark.crowley

URL

arxiv.org/pdf/2410.08146
Mar 2022
arxiv.org arxiv.org

1907.13440.pdf

1
1. mark.crowley 23 Mar 2022
  
  in Public
  
  The paper that introduced the MineRL challenge dataset.
  
  reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1907.13440.pdf
Jan 2022
www.grandin.com www.grandin.com

A stroke convinced B.F. Skinner that the brain and biology could no longer be ignored

1
1. bilalali 11 Jan 2022
  
  in Public
  
  reinforcement
  
  "Reinforcement means to the act of reinforcing."
  
  https://www.dictionary.com/browse/reinforcement
Visit annotations in context

Tags

https://www.dictionary.com/browse/reinforcement

Annotators

bilalali

URL

grandin.com/inc/animals.in.translation.ch1.html
Jul 2021
psyarxiv.com psyarxiv.com

Choice-confirmation bias and gradual perseveration in human reinforcement learning

1
1. lucyparfitt16 08 Jul 2021
  
  in BehSci
  
  Palminteri, S. (2021). Choice-confirmation bias and gradual perseveration in human reinforcement learning [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/dpqj6
  
  is:preprint lang:en computational model confirmation bias reinforcement learning repetition bias behavioral science psychology modeling bias
Visit annotations in context

Tags

modeling

repetition bias

lang:en

is:preprint

psychology

reinforcement learning

computational model

behavioral science

bias

confirmation bias

Annotators

lucyparfitt16

URL

psyarxiv.com/dpqj6/
Jun 2021
psyarxiv.com psyarxiv.com

Reinforcement Learning Based Decision Support Tool For Epidemic Control

1
1. lucyparfitt16 30 Jun 2021
  
  in BehSci
  
  Chadi, M.-A., & Mousannif, H. (2021). Reinforcement Learning Based Decision Support Tool For Epidemic Control [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/tcr8s
  
  is:preprint lang:en COVID-19 epidemics control reinforcement learning modeling simulation vaccine epidemiology transmission economy intervention public health policy
Visit annotations in context

Tags

epidemics control

modeling

transmission

vaccine

economy

intervention

lang:en

is:preprint

public health

policy

reinforcement learning

epidemiology

COVID-19

simulation

Annotators

lucyparfitt16

URL

psyarxiv.com/tcr8s/
Mar 2021
www.opendemocracy.net www.opendemocracy.net

Neurocapitalism

1
1. davidk01 13 Mar 2021
  
  in Public
  
  Using chemicals to improve our economy of attention and become emotionally "fitter" is an option that penetrated public consciousness some time ago.
  
  Same is true of reinforcement learning algorithms.
  
  ai reinforcement learning algorithms
Visit annotations in context

Tags

reinforcement learning algorithms

ai

Annotators

davidk01

URL

opendemocracy.net/en/neurocapitalism/
Sep 2020
arxiv.org arxiv.org

The emergence of segregation: from observable markers to group specific norms

1
1. ErikStuchly 15 Sep 2020
  
  in BehSci
  
  Ozaita, J., Baronchelli, A., & Sánchez, A. (2020). The emergence of segregation: From observable markers to group specific norms. ArXiv:2009.05354 [Physics, q-Bio]. http://arxiv.org/abs/2009.05354
  
  is:preprint lang:en social trait social norm observable marker segregation emergence modeling ethnicity strategy conformity greed game reinforcement learning
Visit annotations in context

Tags

social norm

modeling

social trait

conformity

lang:en

is:preprint

game

segregation

reinforcement learning

ethnicity

strategy

observable marker

greed

emergence

Annotators

ErikStuchly

URL

arxiv.org/abs/2009.05354
journals.sagepub.com journals.sagepub.com

Self-Regulation Without Force: Can Awareness Leverage Reward to Drive Behavior Change? - Vera U. Ludwig, Kirk Warren Brown, Judson A. Brewer, 2020

1
1. ErikStuchly 08 Sep 2020
  
  in BehSci
  
  Ludwig, V. U., Brown, K. W., & Brewer, J. A. (2020). Self-Regulation Without Force: Can Awareness Leverage Reward to Drive Behavior Change? Perspectives on Psychological Science, 1745691620931460. https://doi.org/10.1177/1745691620931460
  
  is:article lang:en self-regulation awareness reward behavior change motivation value satisfaction reinforcement learning valuation sustainability behavioral science
Visit annotations in context

Tags

is:article

value

self-regulation

satisfaction

sustainability

awareness

lang:en

valuation

behavior change

reward

reinforcement learning

behavioral science

motivation

Annotators

ErikStuchly

URL

journals.sagepub.com/doi/abs/10.1177/1745691620931460
Jul 2020
psyarxiv.com psyarxiv.com

COVID-19 Prevention via the Science of Habit Formation

1
1. Marlene_Wulf 14 Jul 2020
  
  in BehSci
  
  Harvey, A., Armstrong, C. C., Callaway, C. A., Gumport, N. B., & Gasperetti, C. E. (2020). COVID-19 Prevention via the Science of Habit Formation [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/57jyg
  
  is:preprint lang:en COVID-19 habit elimination habit formation behavioral intervention treatment guideline lifesaving adherence habit formation process intervention reinforcement
Visit annotations in context

Tags

adherence

intervention

treatment

lang:en

is:preprint

lifesaving

behavioral intervention

COVID-19

habit formation

habit formation process

habit elimination

reinforcement

guideline

Annotators

Marlene_Wulf

URL

psyarxiv.com/57jyg/
May 2020
psyarxiv.com psyarxiv.com

On the convergent validity of risk sensitivity measures

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  Radulescu, A., Holmes, K., & Niv, Y. (2020). On the convergent validity of risk sensitivity measures [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/qdhx4
  
  is:preprint lang:en decision-making reinforcement learning risk sensitivity risk behavior trait measure instrument risk paradigm
Visit annotations in context

Tags

decision-making

risk

behavior

risk sensitivity

risk paradigm

lang:en

is:preprint

instrument

measure

reinforcement learning

trait

Annotators

Marlene_Wulf

URL

psyarxiv.com/qdhx4/
psyarxiv.com psyarxiv.com

Protection from uncertainty in the exploration/exploitation trade-off

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  https://twitter.com/SciBeh/status/1255403798463471616
  
  is:preprint lang:en uncertainty exploration exploitation tradeoff attention cognition decision-making reinforcement learning
Visit annotations in context

Tags

decision-making

uncertainty

exploration

lang:en

is:preprint

attention

exploitation

learning

tradeoff

cognition

reinforcement

Annotators

Marlene_Wulf

URL

psyarxiv.com/5y643/
psyarxiv.com psyarxiv.com

Cognitive learning processes account for asymmetries in adaptations to new social norms

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  Hertz, U. (2020). Cognitive learning processes account for asymmetries in adaptations to new social norms [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/7thku
  
  is:preprint lang:en COVID-19 reinforcement learning social norm social cognition pandemic adaption behavior computational modeling
Visit annotations in context

Tags

social norm

pandemic

behavior

lang:en

is:preprint

computational modeling

learning

COVID-19

adaption

social cognition

reinforcement

Annotators

Marlene_Wulf

URL

psyarxiv.com/7thku/
arxiv.org arxiv.org

Complex social contagion induces bistability on multiplex networks

1
1. edampf 11 May 2020
  
  in BehSci
  
  Liu, L., Wang, X., Tang, S., & Zheng, Z. (2020). Complex social contagion induces bistability on multiplex networks. ArXiv:2005.00664 [Physics]. http://arxiv.org/abs/2005.00664
  
  is:preprint lang:en complex social contagion social reinforcement dynamics collective behavior network multilayer social circle ignorant-spreader-ignorant modeling transmissibility digital internet online behavior
Visit annotations in context

Tags

social circle

modeling

network

collective behavior

lang:en

is:preprint

internet

online behavior

ignorant-spreader-ignorant

social reinforcement

dynamics

complex social contagion

transmissibility

multilayer

digital

Annotators

edampf

URL

arxiv.org/abs/2005.00664
Apr 2020
psyarxiv.com psyarxiv.com

The elusive effects of incidental anxiety on reinforcement-learning

1
1. edampf 23 Apr 2020
  
  in BehSci
  
  Ting, C., Palminteri, S., Lebreton, M., & Engelmann, J. B. (2020, March 25). The elusive effects of incidental anxiety on reinforcement-learning. https://doi.org/10.31234/osf.io/7d4tc MLA
  
  is:preprint lang:en anxiety computation modeling threat shock valence-induced bias learning context-dependent reinforcement neuroscience decision making
Visit annotations in context

Tags

modeling

shock

reinforcement

lang:en

is:preprint

valence-induced bias

context-dependent

neuroscience

learning

anxiety

decision making

threat

computation

Annotators

edampf

URL

psyarxiv.com/7d4tc/
Mar 2019
cjc.ict.ac.cn cjc.ict.ac.cn

liuq-201811662728.pdf

1
1. haiy 08 Mar 2019
  
  in Public
  
  深度强化学习综述
  
  reinforcement-learning review
Visit annotations in context

Tags

review

reinforcement-learning

Annotators

haiy

URL

cjc.ict.ac.cn/online/onlinepaper/liuq-201811662728.pdf
cjc.ict.ac.cn cjc.ict.ac.cn

lq-2017119103322.pdf

1
1. haiy 08 Mar 2019
  
  in Public
  
  深度强化学习综述
  
  reinforcement-learning tutorial
Visit annotations in context

Tags

tutorial

reinforcement-learning

Annotators

haiy

URL

cjc.ict.ac.cn/online/cre/lq-2017119103322.pdf
github.com github.com

dennybritz/reinforcement-learning

1
1. haiy 08 Mar 2019
  
  in Public
  
  reinforcement-learning code and paper tutorials
  
  reinforcement-learning valuable tutorial
Visit annotations in context

Tags

valuable

tutorial

reinforcement-learning

Annotators

haiy

URL

github.com/dennybritz/reinforcement-learning
Feb 2019
gitee.com gitee.com

SuttonBartoIPRLBook2ndEd.pdf

1
1. haiy 21 Feb 2019
  
  in Public
  
  reinforcement-learning book
Visit annotations in context

Tags

book

reinforcement-learning

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/SuttonBartoIPRLBook2ndEd.pdf
gitee.com gitee.com

强化学习在阿里的技术演进与业务创新.pdf

1
1. haiy 20 Feb 2019
  
  in Public
  
  reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/强化学习在阿里的技术演进与业务创新.pdf
gitee.com gitee.com

nips_oral6

1
1. haiy 20 Feb 2019
  
  in Public
  
  reinforcement-learning ppt
Visit annotations in context

Tags

ppt

reinforcement-learning

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/2017NIPS大会Facebook人工智能研究院演讲.pdf
gitee.com gitee.com

1709.02349.pdf

1
1. haiy 19 Feb 2019
  
  in Public
  
  We present MILABOT: a deep reinforcement learning chatbot developed by theMontreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prizecompetition. MILABOT is capable of conversing with humans on popular smalltalk topics through both speech and text. The system consists of an ensemble ofnatural language generation and retrieval models, including template-based models,bag-of-words models, sequence-to-sequence neural network and latent variableneural network models. By applying reinforcement learning to crowdsourced dataand real-world user interactions, the system has been trained to select an appropriateresponse from the models in its ensemble. The system has been evaluated throughA/B testing with real-world users, where it performed significantly better thanmany competing systems. Due to its machine learning architecture, the system islikely to improve with additional data
  
  chatbot reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

chatbot

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/1709.02349.pdf
Jul 2016
thesocialwrite.com thesocialwrite.com

What Is Confidence?

1
1. CaseyTheColeman 06 Jul 2016
  
  in Public
  
  Think of all the hard work and the sweat you put in to the things that your proudest of.
  
  Always feels good to say, "I worked out today!"
  
  Get There Self-Acceptance Positive Reinforcement
Visit annotations in context

Tags

Get There

Positive Reinforcement

Self-Acceptance

Annotators

CaseyTheColeman

URL

thesocialwrite.com/2015/08/11/what-is-confidence-2/