- Jan 2024
-
cdn.openai.com cdn.openai.com
-
GPT-4 System CardOpenAIMarch 23, 2023
-
- Oct 2023
-
-
Introduction of the RoBERTa improved analysis and training approach to BERT NLP models.
-
-
arxiv.org arxiv.org
-
(Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
-
-
www.nature.com www.nature.com
-
Wang et. al. "Scientific discovery in the age of artificial intelligence", Nature, 2023.
A paper about the current state of using AI/ML for scientific discovery, connected with the AI4Science workshops at major conferences.
(NOTE: since Springer/Nature don't allow public pdfs to be linked without a paywall, we can't use hypothesis directly on the pdf of the paper, this link is to the website version of it which is what we'll use to guide discussion during the reading group.)
-
-
arxiv.org arxiv.org
-
Zecevic, Willig, Singh Dhami and Kersting. "Causal Parrots: Large Language Models May Talk Causality But Are Not Causal". In Transactions on Machine Learning Research, Aug, 2023.
-
-
cdn.openai.com cdn.openai.com
-
GPT-2 Introduction paper
Language Models are Unsupervised Multitask Learners A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, (2019).
-
-
arxiv.org arxiv.org
-
"Attention is All You Need" Foundational paper introducing the Transformer Architecture.
-
-
-
GPT-3 introduction paper
-
-
arxiv.org arxiv.org
-
"Are Pre-trained Convolutions Better than Pre-trained Transformers?"
-
-
arxiv.org arxiv.org
-
LaMDA: Language Models for Dialog Application
"LaMDA: Language Models for Dialog Application" Meta's introduction of LaMDA v1 Large Language Model.
-
-
arxiv.org arxiv.org
-
Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RLbaselines, trained for 1M steps, without any training.
Them's fighten' words!
I haven't read it yet, but we're putting it on the list for this fall's reading group. Seriously, a strong result with a very strong implied claim. they are careful to say it's from their empirical results, very worth a look. I suspect that amount of implicit knowledge in the papers, text and DAG are helping to do this.
The Big Question: is their comparison to RL baselines fair, are they being trained from scratch? What does a fair comparison of any from-scratch model (RL or supervised) mean when compared to an LLM approach (or any approach using a foundation model), when that model is not really from scratch.
-
-
-
Benyamin GhojoghAli Ghodsi. "Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey"
-