374 Matching Annotations

Jul 2023
proceedings.mlr.press proceedings.mlr.press

Deterministic Policy Gradient Algorithms

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduced the DPG Algorithm
  
  DPG reinforcement-learning
Visit annotations in context

Tags

DPG

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v32/silver14.pdf
proceedings.mlr.press proceedings.mlr.press

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

18
1. mark.crowley 10 Jul 2023
  
  in Public
  
  We achieve stable learning at high through-put by combining decoupled acting and learningwith a novel off-policy correction method calledV-trace.
2. mark.crowley 10 Jul 2023
  
  in Public
  
  we aim to solve a large collection oftasks using a single reinforcement learning agentwith a single set of parameters
3. mark.crowley 10 Jul 2023
  
  in Public
  
  the progress has been primarily in singletask performance
4. mark.crowley 10 Jul 2023
  
  in Public
  
  multi-task reinforcement learning
  
  Task: Multi-task Reinforcement Learning
5. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA (Figure 1) uses an actor-critic setup to learn apolicy π and a baseline function V π . The process of gener-ating experiences is decoupled from learning the parametersof π and V π . The architecture consists of a set of actors,repeatedly generating trajectories of experience, and one ormore learners that use the experiences sent from actors tolearn π off-policy.
6. mark.crowley 10 Jul 2023
  
  in Public
  
  an agent is trained on each task
7. mark.crowley 10 Jul 2023
  
  in Public
  
  scalability
8. mark.crowley 10 Jul 2023
  
  in Public
  
  separately
9. mark.crowley 10 Jul 2023
  
  in Public
  
  We are interested in developing new methodscapable of mastering a diverse set of tasks simultaneously aswell as environments suitable for evaluating such methods.
  
  Task: train agents that can do more than one thing.
10. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA actors communicate trajectoriesof experience (sequences of states, actions, and rewards) to acentralised learner
11. mark.crowley 10 Jul 2023
  
  in Public
  
  full trajectories of experience
12. mark.crowley 10 Jul 2023
  
  in Public
  
  aggressivelyparallelising all time independent operations
13. mark.crowley 10 Jul 2023
  
  in Public
  
  learning becomes off-policy
14. mark.crowley 10 Jul 2023
  
  in Public
  
  IM-PALA achieves exceptionally high data throughput rates of250,000 frames per second, making it over 30 times fasterthan single-machine A3C
15. mark.crowley 10 Jul 2023
  
  in Public
  
  With the introduction of very deep model architectures, thespeed of a single GPU is often the limiting factor duringtraining.
16. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA is also moredata efficient than A3C based agents and more robust tohyperparameter values and network architectures
17. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA use synchronised parameter update which is vitalto maintain data efficiency when scaling to many machines
18. mark.crowley 10 Jul 2023
  
  in Public
  
  A3C
Visit annotations in context

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf
openreview.net openreview.net

babyai_a_platform_to_study_the.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Link to page with information about the paper: https://openreview.net/forum?id=rJeXCo0cYX
  
  reinforcement-learning curriculum-learning grid-world babyai
Visit annotations in context

Tags

grid-world

babyai

curriculum-learning

reinforcement-learning

Annotators

mark.crowley

URL

openreview.net/pdf
openreview.net openreview.net

a_path_towards_autonomous_mach.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Yann LeCun released his vision for the future of Artificial Intelligence research in 2022, and it sounds a lot like Reinforcement Learning.
  
  reinforcement-learning agi
Visit annotations in context

Tags

agi

reinforcement-learning

Annotators

mark.crowley

URL

openreview.net/pdf
www.cs.toronto.edu www.cs.toronto.edu

dqn.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  The paper that introduced the DQN algorithm for using Deep Learning with Reinforcement Learning to play Atari game.
  
  reinforcement-learning dqn atari-games deep-learning
Visit annotations in context

Tags

atari-games

deep-learning

dqn

reinforcement-learning

Annotators

mark.crowley

URL

cs.toronto.edu/~vmnih/docs/dqn.pdf
arxiv.org arxiv.org

1509.06461.pdf

15
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that evaluated the existing Double Q-Learning algorithm on the new DQN approach and validated that it is very effective in the Deep RL realm.
  
  reinforcement-learning dqn deep-learning
2. mark.crowley 10 Jul 2023
  
  in Public
  
  Q-learning(Watkins, 1989) is one of the most popular reinforcementlearning algorithms, but it is known to sometimes learn un-realistically high action values because it includes a maxi-mization step over estimated action values, which tends toprefer overestimated to underestimated values
  
  Q-learning tends to overestimate the value of an action.
3. mark.crowley 10 Jul 2023
  
  in Public
  
  noise
4. mark.crowley 10 Jul 2023
  
  in Public
  
  unify these views
5. mark.crowley 10 Jul 2023
  
  in Public
  
  we can learn a parameterized value function
6. mark.crowley 10 Jul 2023
  
  in Public
  
  insufficiently flexible function approximation
7. mark.crowley 10 Jul 2023
  
  in Public
  
  Both the target networkand the experience replay dramatically improve the perfor-mance of the algorithm
8. mark.crowley 10 Jul 2023
  
  in Public
  
  The target used by DQN is then
9. mark.crowley 10 Jul 2023
  
  in Public
  
  show overestimationscan occur when the action values are inaccurate, irrespectiveof the source of approximation error
  
  They show overestimations occur when there is approximation error in the value function approximation for Q(s,a).
10. mark.crowley 10 Jul 2023
  
  in Public
  
  θt
11. mark.crowley 10 Jul 2023
  
  in Public
  
  upward bias
12. mark.crowley 10 Jul 2023
  
  in Public
  
  In the original Double Q-learning algorithm, two valuefunctions are learned by assigning each experience ran-domly to update one of the two value functions, such thatthere are two sets of weights, θ and θ′
13. mark.crowley 10 Jul 2023
  
  in Public
  
  θ′t
14. mark.crowley 10 Jul 2023
  
  in Public
  
  while Double Q-learning is unbiased.
15. mark.crowley 10 Jul 2023
  
  in Public
  
  The orange bars show the bias in a single Q-learning update when the action values are Q(s, a) =V∗(s) + a and the errors {a}ma=1 are independent standardnormal random variables. The second set of action valuesQ′, used for the blue bars, was generated identically and in-dependently. All bars are the average of 100 repetitions.
Visit annotations in context

Tags

deep-learning

dqn

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.06461.pdf
arxiv.org arxiv.org

1509.02971.pdf

9
1. mark.crowley 10 Jul 2023
  
  in Public
  
  We refer to our algorithm as Deep DPG (DDPG, Algorithm 1).
2. mark.crowley 10 Jul 2023
  
  in Public
  
  This means that the target values are constrained to change slowly, greatlyimproving the stability of learning.
3. mark.crowley 10 Jul 2023
  
  in Public
  
  A major challenge of learning in continuous action spaces is exploration. An advantage of off-policies algorithms such as DDPG is that we can treat the problem of exploration independentlyfrom the learning algorithm.
  
  Learning and Exploration are handled seperately.
4. mark.crowley 10 Jul 2023
  
  in Public
  
  but modified for actor-critic and using “soft” target updates, rather thandirectly copying the weights.
5. mark.crowley 10 Jul 2023
  
  in Public
  
  his simple change moves the relatively unstable problem oflearning the action-value function closer to the case of supervised learning, a problem for whichrobust solutions exist.
6. mark.crowley 10 Jul 2023
  
  in Public
  
  One approach to this problem is to manually scale the features so they are in similar ranges acrossenvironments and units. We address this issue by adapting a recent technique from deep learningcalled batch normalization
7. mark.crowley 10 Jul 2023
  
  in Public
  
  minimize covariance shif
8. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduces the DDPG algorithm which builds on the existing DPG algorithm from classic RL theory. The main idea is to define a deterministic policy, or nearly deterministic, for situations where the environment is very sensitive to suboptimal actions, and one action setting usually dominates in each state. This showed good performance, but could not beat algorithms such as PPO until the additions of SAC were added. SAC adds an entropy penalty which essentially penalizes uncertainty in any states. Using this, the deterministic policy gradient approach performs well.
  
  ddpg reinforcement-learning SAC DPG PPO
9. mark.crowley 10 Jul 2023
  
  in Public
  
  normalizes each dimensionacross the samples in a minibatch to have unit mean and variance
Visit annotations in context

Tags

PPO

DPG

ddpg

reinforcement-learning

SAC

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.02971.pdf
arxiv.org arxiv.org

Deep Reinforcement Learning that Matters

7
1. mark.crowley 10 Jul 2023
  
  in Public
  
  DDPG
2. mark.crowley 10 Jul 2023
  
  in Public
  
  multiplying the rewards gen-erated from an environment by some scalar
3. mark.crowley 10 Jul 2023
  
  in Public
  
  ELU
4. mark.crowley 10 Jul 2023
  
  in Public
  
  his is akin to clipping therewards to [0, 1]
5. mark.crowley 10 Jul 2023
  
  in Public
  
  network structure of
  
  differernt activiation functions tried
6. mark.crowley 10 Jul 2023
  
  in Public
  
  Hyperparameters
  
  hyperparameters: alpha, dropbox prob, number of layers in your network, width of network layers, activation function (RELU, ELU, tanh, ...), CNN?, RNN?, ..., , epsilon (for e-greedy policy)
  
  parameters: specific to problem - paramters of Q(S,a) and policy pi (theta, w), gamma (? how important is the future)
7. mark.crowley 10 Jul 2023
  
  in Public
  
  PPO
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/1709.06560
arxiv.org arxiv.org

1707.06347.pdf

17
1. mark.crowley 10 Jul 2023
  
  in Public
  
  TRPO uses a hard constraint rather than a penalty because it is hardto choose a single value of β that performs well across different problems
2. mark.crowley 10 Jul 2023
  
  in Public
  
  gradient estimator
3. mark.crowley 10 Jul 2023
  
  in Public
  
  we only ignore the change in probability ratio when it would make the objective improve,and we include it when it makes the objective worse.
4. mark.crowley 10 Jul 2023
  
  in Public
  
  ot sufficient to simply choose a fixed penalty coefficient β and optimize the penalizedobjective Equation (5) with SGD
5. mark.crowley 10 Jul 2023
  
  in Public
  
  objective function (the “surrogate” objective) is maximized
  
  PPO is a response to the TRPO algorithm, trying to use the core idea but implement a more efficient and simpler algorithm.
  
  TRPO defines the problem as a straight optimization problem, no learning is actually involved.
6. mark.crowley 10 Jul 2023
  
  in Public
  
  Generalizingthis choice, we can use a truncated version of generalized advantage estimation
7. mark.crowley 10 Jul 2023
  
  in Public
  
  Without a constraint, maximization of LCP I would lead to an excessively large policyupdate; hence, we now consider how to modify the objective, to penalize changes to the policy thatmove rt(θ) away from 1
  
  The policy iteration objective proposes steps which are too large. It uses a likelihood ratio of the current policy with and older version of the policy multiplied by the Advantage function. So, it uses the change in the policy probability for an action to weight the Advantage function.
8. mark.crowley 10 Jul 2023
  
  in Public
  
  our goalof a first-order algorithm that emulates the monotonic improvement of TRPO,
9. mark.crowley 10 Jul 2023
  
  in Public
  
  A proximal policy optimization (PPO) algorithm that uses fixed-length trajectory segments isshown below. Each iteration, each of N (parallel) actors collect T timesteps of data. Then weconstruct the surrogate loss on these N T timesteps of data, and optimize it with minibatch SGD
10. mark.crowley 10 Jul 2023
  
  in Public
  
  Thefirst term inside the min is LCP I . The second term, clip(rt(θ), 1 − , 1 + ) ˆAt, modifies the surrogateobjective by clipping the probability ratio, which removes the incentive for moving rt outside of theinterval [1 − , 1 + ]. Finally, we take the minimum of the clipped and unclipped objective, so thefinal objective is a lower bound (i.e., a pessimistic bound) on the unclipped objective
  
  The "clip" function cuts off the probability ratio output so that some changes in Advantage are ignored.
11. mark.crowley 10 Jul 2023
  
  in Public
  
  Clipped Surrogate Objective
12. mark.crowley 10 Jul 2023
  
  in Public
  
  Wecan see that LCLIP is a lower bound on LCP I , with a penalty for having too large of a policyupdate
  
  The clipped loss is a lower bound on the actual loss defined in TRPO. So it is simpler to compute, and will provide some guidance at least, it will never overestimate the true loss.
13. mark.crowley 10 Jul 2023
  
  in Public
  
  hese methods havethe stability and reliability of trust-region methods but are much simpler to implement, requiringonly few lines of code change to a vanilla policy gradient implementation, applicable in more generalsetting
14. mark.crowley 10 Jul 2023
  
  in Public
  
  shows howseveral objectives vary as we interpolate along the policy update direction
15. mark.crowley 10 Jul 2023
  
  in Public
  
  Surrogate objectives, as we interpolate between the initial policy parameter θold, and the updatedpolicy parameter, which we compute after one iteration of PPO.
  
  Another figure to show intuition for the approach by showing how each component changes with respect to following the policy update along the gradient direction.
16. mark.crowley 10 Jul 2023
  
  in Public
  
  lower bound (i.e., a pessimistic bound
17. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that introduced the PPO algorithm. PPO is, in a way, a response to the TRPO algorithm, trying to use the core idea but implement a more efficient and simpler algorithm.
  
  TRPO defines the problem as a straight optimization problem, no learning is actually involved.
  
  ppo reinforcement-learning policy-gradients trpo
Visit annotations in context

Tags

trpo

policy-gradients

ppo

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1707.06347
arxiv.org arxiv.org

1710.02298.pdf

16
1. mark.crowley 10 Jul 2023
  
  in Public
  
  New transitions
2. mark.crowley 10 Jul 2023
  
  in Public
  
  bias towards re-cent transitions
3. mark.crowley 10 Jul 2023
  
  in Public
  
  samples transitions with probability ptrelative to the last encountered absolute TD error
4. mark.crowley 10 Jul 2023
  
  in Public
  
  RMSprop
5. mark.crowley 10 Jul 2023
  
  in Public
  
  his means thatin the loss above, the time index t will be a random time in-dex from the last million transitions, rather than the currenttime.
6. mark.crowley 10 Jul 2023
  
  in Public
  
  Multi-step learning.
7. mark.crowley 10 Jul 2023
  
  in Public
  
  Prioritized replay.
8. mark.crowley 10 Jul 2023
  
  in Public
  
  Prioritized
9. mark.crowley 10 Jul 2023
  
  in Public
  
  parameters θ of the online network (which is alsoused to select actions
10. mark.crowley 10 Jul 2023
  
  in Public
  
  ablation
11. mark.crowley 10 Jul 2023
  
  in Public
  
  θ represents the parame-ters of a target network
12. mark.crowley 10 Jul 2023
  
  in Public
  
  a periodic copy of the online net-work which is not directly optimized.
13. mark.crowley 10 Jul 2023
  
  in Public
  
  Noisy Nets. The limitations of exploring using -greedypolicies are clear in games such as Montezuma’s Revenge,where many actions must be executed to collect the first re-ward
14. mark.crowley 10 Jul 2023
  
  in Public
  
  t is a time step randomly picked from the replaymemory
15. mark.crowley 10 Jul 2023
  
  in Public
  
  DDQN
16. mark.crowley 10 Jul 2023
  
  in Public
  
  This famous paper gives a great review of the DQN algorithm a couple years after it changed everything in Deep RL. It compares six different extensions to DQN for Deep Reinforcement Learning, many of which have now become standard additions to DQN and other Deep RL algorithms. It also combines all of them together to produce the "rainbow" algorithm, which outperformed many other models for a while.
  
  reinforcement-learning experimental-design
Visit annotations in context

Tags

experimental-design

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1710.02298
arxiv.org arxiv.org

2206.11795.pdf

3
1. mark.crowley 10 Jul 2023
  
  in Public
  
  For many tasks our models exhibit human-level performance, and we are the first to report computer agents that can craftdiamond tools, which can take proficient humans upwards of 20 minutes (24,000environment actions) of gameplay to accomplish
2. mark.crowley 10 Jul 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  Introduction of VPT : New semi-supervied pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
3. mark.crowley 10 Jul 2023
  
  in Public
  
  e extend the internet-scalepretraining paradigm to sequential decision domains through semi-supervisedimitation learning wherein agents learn to act by watching online unlabeled videos.
Visit annotations in context

Tags

pretrained-models

minecraft

proj-minerl

foundation-models

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
arxiv.org arxiv.org

2106.14876.pdf

7
1. mark.crowley 10 Jul 2023
  
  in Public
  
  an agent isinstructed to obtain a desired goal item
  
  Problem: Agent must complete the instructed task in MineCraft
2. mark.crowley 10 Jul 2023
  
  in Public
  
  urriculum learning (at least using current RL methods) is that the agentachieves a small success probability (within available/reasonable compute) on a new task aftermastering a previous task.
  
  Curriculum Learning
3. mark.crowley 10 Jul 2023
  
  in Public
  
  We study curriculum learning on a set of goal-conditioned Minecraft tasks, in which the agent istasked to collect one out of a set of 107 items from the Minecraft tech tree
4. mark.crowley 10 Jul 2023
  
  in Public
  
  Results
  
  Experiments: They compared a variety of policies and training approachs
5. mark.crowley 10 Jul 2023
  
  in Public
  
  It has 5 minutes (1500 time steps) to complete the task and obtains areward of +1 upon success. After each success or failure a new task is selected without resettingthe world or respawning the agent
  
  Agent has 5 min to find item
  
  Next item chosen without resetting world
6. mark.crowley 10 Jul 2023
  
  in Public
  
  Simon Says”
7. mark.crowley 10 Jul 2023
  
  in Public
  
  Learning progress curriculum
  
  Approach: Curriculum Learning
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.14876.pdf
arxiv.org arxiv.org

2104.10986.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Arxiv paper from 2021 on reinforcement learning in a scenario where your aim is to learn a workable POMDP policy, but you start with a fully observable MDP and adjust it over time towards a POMDP.
  
  reinforcement-learning pomdp mdp
Visit annotations in context

Tags

pomdp

mdp

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2104.10986.pdf
arxiv.org arxiv.org

Liang15.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  Response paper to DQN showing that well designed Value Function Approximations can also do well at these complex tasks without the use of Deep Learning
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  dqn reinforcement-learning atari-games deep-learning shallow-learning
Visit annotations in context

Tags

atari-games

deep-learning

dqn

shallow-learning

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1512.01563.pdf
arxiv.org arxiv.org

1511.05952.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Tom Schaul, John Quan, Ioannis Antonoglou and David Silver. "PRIORITIZED EXPERIENCE REPLAY", ICLR, 2016.
  
  reinforcement-learning ppo deep-learning deep-rl policy-gradient direct-policy-search trust-region
Visit annotations in context

Tags

deep-learning

policy-gradient

trust-region

deep-rl

direct-policy-search

ppo

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1511.05952.pdf
openaccess.thecvf.com openaccess.thecvf.com

Temporal Recurrent Networks for Online Action Detection

1
1. mark.crowley 07 Jul 2023
  
  in Public
  
  Xu, ICCV, 2019 "Temporal Recurrent Networks for Online Action Detection"
  
  arxiv: https://arxiv.org/abs/1811.07391 hypothesis: https://hyp.is/go?url=https%3A%2F%2Fopenaccess.thecvf.com%2Fcontent_ICCV_2019%2Fpapers%2FXu_Temporal_Recurrent_Networks_for_Online_Action_Detection_ICCV_2019_paper.pdf&group=world
  
  driver-behaviour-learning autonomous-driving lstm rnn deep-learning recurrent-neural-networks time-series
Visit annotations in context

Tags

driver-behaviour-learning

time-series

recurrent-neural-networks

deep-learning

autonomous-driving

lstm

rnn

Annotators

mark.crowley

URL

openaccess.thecvf.com/content_ICCV_2019/papers/Xu_Temporal_Recurrent_Networks_for_Online_Action_Detection_ICCV_2019_paper.pdf
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

1
1. mark.crowley 05 Jul 2023
  
  in Public
  
  Few-Shot (FS) - the model is given a few demonstrations of the task at inference time asconditioning [ RWC+19 ], but no weights are updated
  
  hints are given but the model is not updated
Visit annotations in context

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Jun 2023
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

4
1. mark.crowley 28 Jun 2023
  
  in Public
  
  fuzzy
  
  fuzzy!
2. mark.crowley 28 Jun 2023
  
  in Public
  
  [KMH+20] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess,Rewon Child, Scott Gray, Alec Ra
  
  Justification for low learning rate in large language models.
3. mark.crowley 28 Jun 2023
  
  in Public
  
  As found in [ KMH+20 , MKAT18 ], larger models can typically use a larger batch size, but requirea smaller learning rate. We measure the gradient noise scale during training and use it to guideour choice of batch size [MKAT18 ]. Table A.1 shows the parameter settings we used. To train thelarger models without running out of memory, we use a mixture of model parallelism within eachmatrix multiply and model parallelism across the layers of the network. All models were trained onV100 GPU’s on part of a high-bandwidth cluster. Details of the training process and hyperparametersettings are described in the appendix.
  
  Why is this?
4. mark.crowley 28 Jun 2023
  
  in Public
  
  We use the same model and architecture as GPT-2
  
  What do they mean by "model" here? If they have retrained on more data, with a slightly different architecture, then the model weights after training must be different.
  
  machine-learning transformers gpt ml-practice
Visit annotations in context

Tags

transformers

gpt

machine-learning

ml-practice

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
d4mucfpksywv.cloudfront.net d4mucfpksywv.cloudfront.net

Language Models are Unsupervised Multitask Learners

5
1. mark.crowley 28 Jun 2023
  
  in Public
  
  While zero-shot performance establishes a baseline of thepotential performance of GPT-2 on many tasks, it is notclear where the ceiling is with finetuning.
  
  So finetuning could lead to better models.
2. mark.crowley 28 Jun 2023
  
  in Public
  
  13.19%
  
  that's a lot!
  
  evaluation-methods
3. mark.crowley 28 Jun 2023
  
  in Public
  
  The Bloom filterswere constructed such that the false positive rate is upperbounded by 1108 . We further verified the low false positiverate by generating 1M strings, of which zero were found bythe filter
  
  Bloom filters used to determine how much overlap there is between train and test set, to be more sure of their results.
  
  evaluation-methods
4. mark.crowley 28 Jun 2023
  
  in Public
  
  Bloom filters
  
  Bloom Filter:
  
  The high level idea is to map elements x∈X to values y=h(x)∈Y using a hash function h, and then test for membership of x' in X by checking whether y'=h(x')∈Y, and do that using multiple hash functions h.
  
  Bloom Filter - Wikipedia
5. mark.crowley 28 Jun 2023
  
  in Public
  
  Recent work in computer vision has shown that common im-age datasets contain a non-trivial amount of near-duplicateimages. For instance CIFAR-10 has 3.3% overlap betweentrain and test images (Barz & Denzler, 2019). This results inan over-reporting of the generalization performance of ma-chine learning systems.
  
  CIFAR-10 performance results are overestimates since some of the training data is essentially in the test set.
  
  image-processing convolutional-neural-networks deep-learning machine-learning datasets
Visit annotations in context

Tags

datasets

deep-learning

evaluation-methods

machine-learning

convolutional-neural-networks

image-processing

Annotators

mark.crowley

URL

d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
www.fandm.edu www.fandm.edu

617813975725918530-aamas2016-shallow-rl.pdf

1
1. mark.crowley 16 Jun 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  reinforcement-learning dqn deep-learning shallow-learning atari-games uwece457C
Visit annotations in context

Tags

atari-games

deep-learning

dqn

uwece457C

shallow-learning

reinforcement-learning

Annotators

mark.crowley

URL

fandm.edu/uploads/files/617813975725918530-aamas2016-shallow-rl.pdf
arxiv.org arxiv.org

2306.00937.pdf

1
1. mark.crowley 13 Jun 2023
  
  in Public
  
  e also add 8,000 text-instructions generated bythe OpenAI API gpt-3.5-turbo model [38],
  
  how does this work? takes images as well as input?
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2306.00937.pdf
www.cs.mcgill.ca www.cs.mcgill.ca

rnn_nips.pdf

1
1. mark.crowley 12 Jun 2023
  
  in Public
  
  Paper from 2016 soon after DQN paper, about how to use eligbility traces to improve performance further.
Visit annotations in context

Annotators

mark.crowley

URL

cs.mcgill.ca/~jmerhe1/rnn_nips.pdf
assets.pubpub.org assets.pubpub.org

01621566588509.pdf

1
1. mark.crowley 09 Jun 2023
  
  in Public
  
  LeBlanc, D. G., & Lee, G. (2021). General Deep Reinforcement Learning in NES Games. Canadian AI 2021. Canadian Artificial Intelligence Association (CAIAC). https://doi.org/10.21428/594757db.8472938b
  
  canadian-ai reinforcement-learning video-games
Visit annotations in context

Tags

video-games

canadian-ai

reinforcement-learning

Annotators

mark.crowley

URL

assets.pubpub.org/uonw8d4k/01621566588509.pdf
assets.pubpub.org assets.pubpub.org

61682631405782.pdf

1
1. mark.crowley 07 Jun 2023
  
  in Public
  
  hypothesis test for CANAI23 paper
  
  canadian-ai
Visit annotations in context

Tags

canadian-ai

Annotators

mark.crowley

URL

assets.pubpub.org/j00xvl6z/61682631405782.pdf
jmlr.org jmlr.org

20-074.pdf

2
1. mark.crowley 06 Jun 2023
  
  in Public
  
  introducing a unified framework that converts all text-basedlanguage problems into a text-to-text format
  
  this is their goal, to have a single model, including hyperparameters and setup, that can be used for any NLP task.
  
  nlp transformers
2. mark.crowley 06 Jun 2023
  
  in Public
  
  Paper introducing the T5 Text-to-Text transformer mdoel from google. (Raffel, JMLR, 2020)
  
  transformers nlp
Visit annotations in context

Tags

nlp

transformers

Annotators

mark.crowley

URL

jmlr.org/papers/volume21/20-074/20-074.pdf
Apr 2023
srush.github.io srush.github.io

The Annotated S4

1
1. mark.crowley 18 Apr 2023
  
  in Public
  
  The Annotated S4 Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré.
  
  A new approach to transformers
  
  transformers large-language-models
Visit annotations in context

Tags

transformers

large-language-models

Annotators

mark.crowley

URL

srush.github.io/annotated-s4/
arxiv.org arxiv.org

2111.00396.pdf

1
1. mark.crowley 18 Apr 2023
  
  in Public
  
  Efficiently Modeling Long Sequences with Structured State SpacesAlbert Gu, Karan Goel, and Christopher R ́eDepartment of Computer Science, Stanford University
  
  transformers large-language-models
Visit annotations in context

Tags

transformers

large-language-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2111.00396.pdf
arxiv.org arxiv.org

2206.11795.pdf

1
1. mark.crowley 12 Apr 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  New supervised pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
Visit annotations in context

Tags

pretrained-models

minecraft

proj-minerl

foundation-models

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
openai.com openai.com

Learning to play Minecraft with Video PreTraining

1
1. mark.crowley 12 Apr 2023
  
  in Public
  
  Open AI page describing their video pretraining for minecraft.
  
  proj-minerl
Visit annotations in context

Tags

proj-minerl

Annotators

mark.crowley

URL

openai.com/research/vpt
en.wikipedia.org en.wikipedia.org

Travelling salesman problem - Wikipedia

1
1. mark.crowley 03 Apr 2023
  
  in Public
  
  Wikipedia article for the Travelling Salesman Problem (TSP)
Visit annotations in context

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Travelling_salesman_problem
en.wikipedia.org en.wikipedia.org

Complexity class - Wikipedia

2
1. mark.crowley 03 Apr 2023
  
  in Public
  
  In this way, an NTM can be thought of as simultaneously exploring all computational possibilities in parallel and selecting an accepting branch
  
  Non-deterministic Turing Machines are able to get lucky and choose the single path to the answer in polynomial time, or be given a "hint" or "proof" or "certificate" for that path. This isn't realistic, but it separates the difficulty of the problem of verifying a solution and finding one into two different tasks.
2. mark.crowley 03 Apr 2023
  
  in Public
  
  Computational problems[edit] Intuitively, a computational problem is just a question that can be solved by an algorithm. For example, "is the natural number n {\displaystyle n} prime?" is a computational problem. A computational problem is mathematically represented as the set of answers to the problem. In the primality example, the problem (call it PRIME {\displaystyle {\texttt {PRIME}}} ) is represented by the set of all natural numbers that are prime: PRIME = { n ∈ N | n is prime } {\displaystyle {\texttt {PRIME}}=\{n\in \mathbb {N} |n{\text{ is prime}}\}} . In the theory of computation, these answers are represented as strings; for example, in the primality example the natural numbers could be represented as strings of bits that represent binary numbers. For this reason, computational problems are often synonymously referred to as languages, since strings of bits represent formal languages (a concept borrowed from linguistics); for example, saying that the PRIME {\displaystyle {\texttt {PRIME}}} problem is in the complexity class NP is equivalent to saying that the language PRIME {\displaystyle {\texttt {PRIME}}} is in NP.
  
  Explanation of why computational complexity class proofs with Turing Machines use "strings" instead of algorithms or programs.
Visit annotations in context

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Complexity_class
en.wikipedia.org en.wikipedia.org

Presburger arithmetic - Wikipedia

1
1. mark.crowley 03 Apr 2023
  
  in Public
  
  Presburger arithmetic is much weaker than Peano arithmetic, which includes both addition and multiplication operations. Unlike Peano arithmetic, Presburger arithmetic is a decidable theory. This means it is possible to algorithmically determine, for any sentence in the language of Presburger arithmetic, whether that sentence is provable from the axioms of Presburger arithmetic. The asymptotic running-time computational complexity of this algorithm is at least doubly exponential, however, as shown by Fischer & Rabin (1974).
  
  This is an example of a decision problem that is at least doubly exponential \(2^{2^n}\). It is a simpler form of arithmetic where the Halting problem/incompleteness theorem does not apply.
Visit annotations in context

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Presburger_arithmetic
en.wikipedia.org en.wikipedia.org

NP-completeness - Wikipedia

4
1. mark.crowley 03 Apr 2023
  
  in Public
  
  NP-hard if everything in NP can be transformed in polynomial time into it even though it may not be in NP
  
  Definition of NP-hard problem
2. mark.crowley 03 Apr 2023
  
  in Public
  
  At present, all known algorithms for NP-complete problems require time that is superpolynomial in the input size, in fact exponential in O ( n k ) {\displaystyle O(n^{k})} [clarify] for some k > 0 {\displaystyle k>0} and it is unknown whether there are any faster algorithms.
  
  So how hard are NP-complete problems?
3. mark.crowley 03 Apr 2023
  
  in Public
  
  The Subgraph Isomorphism problem is NP-complete. The graph isomorphism problem is suspected to be neither in P nor NP-complete, though it is in NP. This is an example of a problem that is thought to be hard, but is not thought to be NP-complete. This class is called NP-Intermediate problems and exists if and only if P≠NP.
  
  There might even be some problems in NP but not in P and that are not NP-complete.
4. mark.crowley 03 Apr 2023
  
  in Public
  
  NP-complete problems are often addressed by using heuristic methods and approximation algorithms.
  
  usually solved with approximation algorithms
Visit annotations in context

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/NP-completeness
openai.com openai.com

GPT-4

4
1. mark.crowley 03 Apr 2023
  
  in Public
  
  my annotations for the OpenAI GPT4 info page.
  
  chatgpt
2. mark.crowley 03 Apr 2023
  
  in Public
  
  GPT-4 outperforms ChatGPT by scoring in higher approximate percentiles among test-takers.
  
  oh, great.
  
  chatgpt cheating
3. mark.crowley 03 Apr 2023
  
  in Public
  
  40% more likely to produce factual responses than GPT-3.5
  
  great, 40% more than what though?
  
  chatgpt
4. mark.crowley 03 Apr 2023
  
  in Public
  
  We used GPT-4 to help create training data for model fine-tuning and iterate on classifiers across training, evaluations, and monitoring.
  
  Interesting, you need to consider, is this like data augmentation, like bootstrapping, like adversarial training, or is it like overfitting to your data?
  
  chatgpt
Visit annotations in context

Tags

chatgpt

cheating

Annotators

mark.crowley

URL

openai.com/product/gpt-4
Mar 2023
www.inc.com www.inc.com

Bill Gates Says We're Witnessing a 'Stunning' New Technology Age. 5 Ways You Must Prepare Now

1
1. mark.crowley 27 Mar 2023
  
  in Public
  
  "There is a robust debate going on in the computing industry about how to create it, and whether it can even be created at all."
  
  Is there? By whom? Why industry only and not government, academia and civil society?
  
  ai-for-good aigpt20230326 large-language-models chat open
Visit annotations in context

Tags

open

aigpt20230326

large-language-models

ai-for-good

chat

Annotators

mark.crowley

URL

inc.com/minda-zetlin/bill-gates-says-were-witnessing-a-stunning-new-technology-age-5-ways-to-prepare.html
arxiv.org arxiv.org

2010.03950.pdf

1
1. mark.crowley 07 Mar 2023
  
  in Public
  
  asks for the Minecraft domain.
  
  They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.
  
  minecraft reinforcement-learning
Visit annotations in context

Tags

minecraft

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950.pdf
Feb 2023
arxiv.org arxiv.org

2010.03950.pdf

6
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Definition 3.2 (simple reward machine).
  
  The MDP does not change, it's dynamics are the same, with or without the RM, as they are with or without a standard reward model. Additionally, the rewards from the RM can be non-Markovian with respect to the MDP because they inherently have a kind of memory or where you've been, limited to the agents "movement" (almost "in it's mind") about where it is along the goals for this task.
  
  reinforcement-learning reward-machines
2. mark.crowley 16 Feb 2023
  
  in Public
  
  e thenshow that an RM can be interpreted as specifying a single reward function over a largerstate space, and consider types of reward functions that can be expressed using RMs
  
  So by specifying a reward machine you are augmenting the state space of the MDP with higher level goals/subgoals/concepts that provide structure about what is good and what isn't.
  
  reinforcement-learning reward-machines
3. mark.crowley 16 Feb 2023
  
  in Public
  
  However, an agent that hadaccess to the specification of the reward function might be able to use such information tolearn optimal policies faster.
  
  Fascinating idea, why not? Why are we hiding the reward from the agent really?
  
  reinforcement-learning reward-machines
4. mark.crowley 02 Feb 2023
  
  in Public
  
  U is a finite set of states,
  
  Apply a set of logical rules to the state space to obtain a finite set of states.
5. mark.crowley 02 Feb 2023
  
  in Public
  
  state-reward function,
  
  reward is a constant number assigned to each set of states
6. mark.crowley 02 Feb 2023
  
  in Public
  
  Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning
  
  [Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reward-machines

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950.pdf
proceedings.mlr.press proceedings.mlr.press

Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning

1
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning
  
  [Icarte, PMLR, 2018] "Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reward-machines

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/icarte18a/icarte18a.pdf
royalsocietypublishing.org royalsocietypublishing.org

Untitled document

1
1. mark.crowley 07 Feb 2023
  
  in Public
  
  Bell’s theorem is aboutcorrelations (joint probabilities) of stochastic real variables and therefore doesnot apply to quantum theory, which neither describes stochastic motion nor usesreal-valued observables
  
  strong statement, what do people think about this? is it accepted by anyone or dismissed?
  
  bells-theorem quantum-physics
Visit annotations in context

Tags

bells-theorem

quantum-physics

Annotators

mark.crowley

URL

royalsocietypublishing.org/doi/pdf/10.1098/rspa.2011.0420
Jan 2023
www.cs.princeton.edu www.cs.princeton.edu

rubik.dvi

1
1. mark.crowley 13 Jan 2023
  
  in Public
  
  "Finding Optimal Solutions to Rubik's Cub e Using Pattern Databases" by Richard E. Korf, AAAI 1997.
  
  The famous "Korf Algorithm" for finding the optimal solution to any Rubik's Cube state.
  
  algorithms ece406 rubiks-cube path-search a-star iterative-deepening
Visit annotations in context

Tags

rubiks-cube

ece406

iterative-deepening

a-star

path-search

algorithms

Annotators

mark.crowley

URL

cs.princeton.edu/courses/archive/fall06/cos402/papers/korfrubik.pdf
openai.com openai.com

Aligning Language Models to Follow Instructions

3
1. mark.crowley 12 Jan 2023
  
  in Public
  
  make up facts less often
  
  but not "never"
2. mark.crowley 12 Jan 2023
  
  in Public
  
  On prompts submitted by our customers to the API,[1
  
  really? so that's how they make money.
  
  Question: what kind of bias does this introduce into the model?
  
  which topics and questions grt trained on?
  
  what is the goal of training? truth? clickability?
3. mark.crowley 12 Jan 2023
  
  in Public
  
  Blog post from OpenAI in Jan 2022 explaining some of the approaches they use to train, reduce and tube their LLM for particular tasks. This was all precursor to the ChatGPT system we now see.
  
  nlp, llm, chagpt
Visit annotations in context

Tags

nlp, llm, chagpt

Annotators

mark.crowley

URL

openai.com/blog/instruction-following/
inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net

Untitled document

1
1. mark.crowley 10 Jan 2023
  
  in Public
  
  "Talking About Large Language Models" by Murray Shanahan
  
  nlp large-language-models deep-learning transformers
Visit annotations in context

Tags

nlp

deep-learning

transformers

large-language-models

Annotators

mark.crowley

URL

inst-fs-iad-prod.inscloudgate.net/files/4b2a700d-1125-444d-bd72-8045fe274f37/Shanahan2023.pdf
Dec 2022
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-reinforcement-learning-with-state-observation-costs-in-action-contingent-noiselessly-observable-markov-decision-processes-Paper.pdf

1
1. mark.crowley 19 Dec 2022
  
  in Public
  
  [Nam, NeurIPS, 2022]. "Reinforcement Learning with State ObservationCosts in Action-Contingent Noiselessly Observable Markov Decision Processes"
  
  proj-chemgymrl digital-chemistry ai-for-science
Visit annotations in context

Tags

proj-chemgymrl

ai-for-science

digital-chemistry

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper/2021/file/83e8fe6279ad25f15b23c6298c6a3584-Paper.pdf
arxiv.org arxiv.org

2205.15241.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
  
  reinforcement-learning transformers transfer-learning conf-neurips-2022 proj-minerl
Visit annotations in context

Tags

conf-neurips-2022

transfer-learning

transformers

proj-minerl

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.15241.pdf
arxiv.org arxiv.org

2210.00849.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  [Neumann, Gros, NeurIPS, 2022] - "SCALING LAWS FOR A MULTI-AGENT REINFORCEMENT LEARNING MODEL"
  
  reinforcement-learning marl multi-agent-reinforcement-learning conf-neurips-2022
Visit annotations in context

Tags

conf-neurips-2022

multi-agent-reinforcement-learning

reinforcement-learning

marl

Annotators

mark.crowley

URL

arxiv.org/pdf/2210.00849.pdf
Nov 2022
arxiv.org arxiv.org

2106.01345.pdf

2
1. mark.crowley 22 Nov 2022
  
  in Public
  
  10K
  
  Kind of ambiguous to use 10K when one of the most important variables is K.
2. mark.crowley 08 Nov 2022
  
  in Public
  
  n embedding for each timestep is learned and added to eachtoken – note this is different than the standard positional embedding used by transformers, as onetimestep corresponds to three tokens
  
  one timestep corresponds to three tokens
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

1706.03762.pdf

1
1. mark.crowley 22 Nov 2022
  
  in Public
  
  we propose the Transformer, a model architecture eschewing recurrence and insteadrelying entirely on an attention mechanism to draw global dependencies between input and output.The Transformer allows for significantly more parallelization a
  
  Using the attention mechanism to determine global dependencies between input and output instead of using recurrent links to past states. This is the essence of their new idea.
  
  transformers attention-mechanism
Visit annotations in context

Tags

transformers

attention-mechanism

Annotators

mark.crowley

URL

arxiv.org/pdf/1706.03762
agupubs.onlinelibrary.wiley.com agupubs.onlinelibrary.wiley.com

Burn Severity in Canada's Mountain National Parks: Patterns, Drivers, and Predictions

1
1. mark.crowley 08 Nov 2022
  
  in Public
  
  "Burn Severity in Canada's Mountain National Parks: Patterns, Drivers, and Predictions" Weiwei Wang, Xianli Wang, et al Geophysical Research Letters
  
  forest-fire-management burn-severity-prediction
Visit annotations in context

Tags

burn-severity-prediction

forest-fire-management

Annotators

mark.crowley

URL

agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2022GL097945
arxiv.org arxiv.org

On the Opportunities and Risks of Foundation ModelsOn the Opportunities and Risks of Foundation Models

1
1. mark.crowley 08 Nov 2022
  
  in Public
  
  "On the Opportunities and Risks of Foundation Models" This is a large report by the Center for Research on Foundation Models at Stanford. They are creating and promoting the use of these models and trying to coin this name for them. They are also simply called large pre-trained models. So take it with a grain of salt, but also it has a lot of information about what they are, why they work so well in some domains and how they are changing the nature of ML research and application.
  
  foundation-models machine-learning pretrained-models
Visit annotations in context

Tags

pretrained-models

foundation-models

machine-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2108.07258
Sep 2022
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 27 Sep 2022
  
  in Public
  
  We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
  
  transformers offline-learning reinforcement-learning
Visit annotations in context

Tags

offline-learning

transformers

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2112.09099.pdf

1
1. mark.crowley 12 Sep 2022
  
  in Public
  
  AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.
  
  S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.
  
  reinforcement-learning marl
Visit annotations in context

Tags

reinforcement-learning

marl

Annotators

mark.crowley

URL

arxiv.org/pdf/2112.09099.pdf
Jul 2022
ieeexplore.ieee.org ieeexplore.ieee.org

IEEE Xplore Full-Text PDF:

1
1. mark.crowley 26 Jul 2022
  
  in Public
  
  A recent overview of RL methods used for autonomous driving.
  
  reinforcement-learning autonomous-driving
Visit annotations in context

Tags

autonomous-driving

reinforcement-learning

Annotators

mark.crowley

URL

ieeexplore.ieee.org/stamp/stamp.jsp
minerl.bhagat.io minerl.bhagat.io

Abstract

1
1. mark.crowley 18 Jul 2022
  
  in Public
  
  As a baseline model we took the feature representation from a large pre-trained CNN such as ResNet50, by using the model and excluding the final dense layer, and using this in place of our convolution layers. We had predicted that this would likely get us some performance, but would inherently be worse, since we had fixed some of our trainable parameters.
  
  They didn't try to train the CNN from scratch.
Visit annotations in context

Annotators

mark.crowley

URL

minerl.bhagat.io/
Jun 2022
assets.pubpub.org assets.pubpub.org

01652987005906.pdf

1
1. mark.crowley 04 Jun 2022
  
  in Public
  
  Discussion on
  
  Bellinger C, Drozdyuk A, Crowley M, Tamblyn I. Balancing Information with Observation Costs in Deep Reinforcement Learning. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/0jmy7gpd
  
  reinforcement-learning artificial-intelligence proj-chemgymrl digital-chemistry material-design national-research-council-of-canada CanAI2022
Visit annotations in context

Tags

CanAI2022

proj-chemgymrl

artificial-intelligence

material-design

national-research-council-of-canada

reinforcement-learning

digital-chemistry

Annotators

mark.crowley

URL

assets.pubpub.org/99r5anzw/01652987005906.pdf
May 2022
link.springer.com link.springer.com

Intelligence - Consider This and Respond!

1
1. mark.crowley 29 May 2022
  
  in Public
  
  Interesting sounding high level paper about the limits and constraints on general intelligence and how this might relate to the struggles AI/ML research has had historically.
  
  artificial-intelligence artificial-general-intelligence agi aiml cognitive-science
Visit annotations in context

Tags

artificial-intelligence

artificial-general-intelligence

cognitive-science

agi

aiml

Annotators

mark.crowley

URL

link.springer.com/chapter/10.1007/978-3-030-65596-9_48
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

15756507305185 1..25

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Another piece to the "what can we do with eligibility traces" puzzle for Deep RL.
  
  reinforcement-learning eligibility-traces rl-course
Visit annotations in context

Tags

eligibility-traces

rl-course

reinforcement-learning

Annotators

mark.crowley

URL

ncbi.nlm.nih.gov/pmc/articles/PMC6897511/pdf/elife-47463.pdf
arxiv.org arxiv.org

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

eligibility-traces

rl-course

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2008.10040.pdf
arxiv.org arxiv.org

1810.09967v1.pdf

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

eligibility-traces

rl-course

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1810.09967v1.pdf
arxiv.org arxiv.org

2102.03406.pdf

1
1. mark.crowley 27 May 2022
  
  in Public
  
  Hypothesis page to discuss this high level description of DeepMind's new Gato framework.
  
  reinforcement-learning rl-course artificial-intelligence
Visit annotations in context

Tags

rl-course

artificial-intelligence

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2408.03314
Mar 2022
arxiv.org arxiv.org

1907.13440.pdf

1
1. mark.crowley 23 Mar 2022
  
  in Public
  
  The paper that introduced the MineRL challenge dataset.
  
  reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1907.13440.pdf
arxiv.org arxiv.org

2004.09666.pdf

1
1. mark.crowley 22 Mar 2022
  
  in Public
  
  Weak supervision also objectively identifies relevant morphological features from the tissue microenv-iornment without any a priori knowledge or subjective annotation. In three separate analyses, we showed thatour models can identify well-known morphological features and accordingly, has the capability of identify-ing new morphological features of diagnostic, prognostic, and therapeutic relevance.
  
  Their target images are very large and there is a known (supervised) label for the entire image, but no labels for parts of an image (e.g. where is the tumor exactly?). So the powerful property of their method is the ability to learn what parts of the image relate to the label on it's own.
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2004.09666.pdf
Jan 2022
www.theglobeandmail.com www.theglobeandmail.com

Opinion: 2022 is the year America falls off a cliff. How will Canada hang on?

4
1. mark.crowley 04 Jan 2022
  
  in Public
  
  The Canadian experiment has been built, in large part, around the American experiment: They have the melting pot, we have the cultural mosaic; they have the free market, we have sensible regulation; they have “life, liberty and the pursuit of happiness,” we have “peace, order and good government.”
  
  I agree with this.
2. mark.crowley 04 Jan 2022
  
  in Public
  
  Northrop Frye once defined a Canadian as “an American who rejects the Revolution.”
  
  I see what he means but I wouldn't go this far. Canadians do have a seperate cultural identity. It is defined by its lack of definiton and certainty, in contrast to American certainty. This ks why it isore resilient. It cannot have certainty because our nation was founded on "two solitudes" of French and English, Catholic and Protestant, and also the very different, though equally destructive relationship of the Eurooean colonizers with the Indigenous Peoples of Canada.
3. mark.crowley 04 Jan 2022
  
  in Public
  
  A flaw lurked right at the core of the experiment, as flaws so often do in works of ambitious genius.
  
  The flaw was an assumption that everyone had the nation's best interests at heart, that they all wanted the same thing deep down.
4. mark.crowley 04 Jan 2022
  
  in Public
  
  Difference is the core of the American experience. Difference is its genius. There has never been a country so comfortable with difference, so full of difference.
  
  Diversity is Strength. This is really one of their founding principles, even in its hypocrisy. For them the diversity was in religious faith and ways of thinking but did not include gender, ethnicity or anything else. In time this changed and it is the only reason America has done so well.
Visit annotations in context

Annotators

mark.crowley

URL

theglobeandmail.com/opinion/article-2022-is-the-year-america-falls-off-a-cliff-how-will-canada-hang-on/
Jul 2021
golem.ph.utexas.edu golem.ph.utexas.edu

The n-Category Café

2
1. mark.crowley 26 Jul 2021
  
  in Public
  
  Such a map, plus the universal property of AA A<math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math>, is in fact enough to reconstruct the entire Turing structure of CC \mathsf{C}<math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mstyle mathvariant="sans-serif"><mi>C</mi></mstyle></mrow><annotation encoding="application/x-tex">\mathsf{C}</annotation></semantics></math>.
  
  The minimal necessary to construct a Turing machine
2. mark.crowley 26 Jul 2021
  
  in Public
  
  not necessarily extensional, only intensional)
  
  Whats the difference?
  
  Question
Visit annotations in context

Tags

Question

Annotators

mark.crowley

URL

golem.ph.utexas.edu/category/2019/08/turing_categories.html

Mark Crowley

Associate Professor as the University of Waterloo.

Research and teaching on topics in Artificial Intelligence, Machine Learning and Reinforcement Learning.

Reading group links: https://markcrowley.ca/reading-groups/

Annotations: 374

Joined: April 4, 2020

Location: Waterloo, Canada

Link: markcrowley.ca

ORCID: 0000-0003-3921-4762

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators