- Dec 2022
-
scottaaronson.blog scottaaronson.blog
-
Now, this can all be defeated with enough effort. For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that. On the other hand, if you just insert or delete a few words here and there, or rearrange the order of some sentences, the watermarking signal will still be there. Because it depends only on a sum over n-grams, it’s robust against those sorts of interventions.
this mechanism can be defeated by paraphrasing the output with another model
-
Anyway, we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT. In principle, you could even take a long text and isolate which parts probably came from GPT and which parts probably didn’t.
Scott's team hsas already developed a prototype watermarking scheme at OpenAI and it works pretty well
-
So then to watermark, instead of selecting the next token randomly, the idea will be to select it pseudorandomly, using a cryptographic pseudorandom function, whose key is known only to OpenAI.
Watermarking by applying cryptographic pseudorandom functions to the model output instead of true random (true pseudo-random)
-
Eventually GPT will say, “oh, I know what game we’re playing! it’s the ‘give false answers’ game!” And it will then continue playing that game and give you more false answers. What the new paper shows is that, in such cases, one can actually look at the inner layers of the neural net and find where it has an internal representation of what was the true answer, which then gets overridden once you get to the output layer.
this is fascinating - GPT learns the true answer to a question but will ignore it and let the user override this in later layers of the model
-
(3) A third direction, and I would say maybe the most popular one in AI alignment research right now, is called interpretability. This is also a major direction in mainstream machine learning research, so there’s a big point of intersection there. The idea of interpretability is, why don’t we exploit the fact that we actually have complete access to the code of the AI—or if it’s a neural net, complete access to its parameters? So we can look inside of it. We can do the AI analogue of neuroscience. Except, unlike an fMRI machine, which gives you only an extremely crude snapshot of what a brain is doing, we can see exactly what every neuron in a neural net is doing at every point in time. If we don’t exploit that, then aren’t we trying to make AI safe with our hands tied behind our backs?
Interesting metaphor - it is a bit like MRI for neural networks but actually more accurate/powerful
-
“AI alignment”
AI Alignment is terminator situation. This versus AI Ethics which is more the concern around current models being racist etc.
-
And famously, self-driving cars have taken a lot longer than many people expected a decade ago. This is partly because of regulatory barriers and public relations: even if a self-driving car actually crashes less than a human does, that’s still not good enough, because when it does crash the circumstances are too weird. So, the AI is actually held to a higher standard. But it’s also partly just that there was a long tail of really weird events. A deer crosses the road, or you have some crazy lighting conditions—such things are really hard to get right, and of course 99% isn’t good enough here.
I think the emphasis is wrong here. The regulation is secondary. The long tail of weird events is the more important thing.
-
Okay, but one thing that’s been found empirically is that you take commonsense questions that are flubbed by GPT-2, let’s say, and you try them on GPT-3, and very often now it gets them right. You take the things that the original GPT-3 flubbed, and you try them on the latest public model, which is sometimes called GPT-3.5 (incorporating an advance called InstructGPT), and again it often gets them right. So it’s extremely risky right now to pin your case against AI on these sorts of examples! Very plausibly, just one more order of magnitude of scale is all it’ll take to kick the ball in, and then you’ll have to move the goal again.
the stochastic parrots argument could be defeated as models get bigger and more complex
Tags
Annotators
URL
-
-
jarche.com jarche.com
-
If my interpretation of the Retrieval quadrant is correct, it will become much more difficult to be an average, or even above average, writer. Only the best will flourish. Perhaps we will see a rise in neo-generalists.
This is probably true of average or poor software engineers given that GPT-3 can produce pretty reasonable code snippets
-
-
garymarcus.substack.com garymarcus.substack.com
-
every country is going to need to reconsider its policies on misinformation. It’s one thing for the occasional lie to slip through; it’s another for us all to swim in a veritable ocean of lies. In time, though it would not be a popular decision, we may have to begin to treat misinformation as we do libel, making it actionable if it is created with sufficient malice and sufficient volume.
What to do then when our government reps are already happy to perpetuate "culture wars" and empty talking points?
-
anyone skilled in the art can now replicate their recipe.
Well anyone skilled enough who has $500k for the gpu bill and access to and the means to store the corpus... So corporations I guess... Yey!
-
-
arxiv.org arxiv.org
-
We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher
By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.
Tags
Annotators
URL
-
- Nov 2022
-
arxiv.org arxiv.org
-
Extractive summarization may be regarded as acontextual bandit as follows. Each document is acontext, and each ordered subset of a document’ssentences is a different action
We can represent extractive summarization as a bandit problem by treating the document as the context and possible reorderings of sentences as actions an agent could take
-
andit is a decision-making formal-ization in which an agent repeatedly chooses oneof several actions, and receives a reward based onthis choice.
Definition for contextual bandit: an agent that repeatedly choses one of several actions and receives a reward based on this choice.
Tags
Annotators
URL
-
-
aclanthology.org aclanthology.org
-
BanditSum a hierarchical bi-LSTM
Banditsum uses bi-directional LSTM encoding. It generates sentence-level representations
Tags
Annotators
URL
-
-
aclanthology.org aclanthology.org
-
Misleading Templates There is no consistent re-lation between the performance of models trainedwith templates that are moderately misleading (e.g.{premise} Can that be paraphrasedas "{hypothesis}"?) vs. templates that areextremely misleading (e.g., {premise} Isthis a sports news? {hypothesis}).T0 (both 3B and 11B) perform better givenmisleading-moderate (Figure 3), ALBERT andT5 3B perform better given misleading-extreme(Appendices E and G.4), whereas T5 11B andGPT-3 perform comparably on both sets (Figure 2;also see Table 2 for a summary of statisticalsignificances.) Despite a lack of pattern between
Their misleading templates really are misleading
{premise} Can that be paraphrased as "{hypothesis}"
{premise} Is this a sports news? {hypothesis}
-
Insum, notwithstanding prompt-based models’impressive improvement, we find evidence ofserious limitations that question the degree towhich such improvement is derived from mod-els understanding task instructions in waysanalogous to humans’ use of task instructions.
although prompts seem to help NLP models improve their performance, the authors find that this performance is still present even when prompts are deliberately misleading which is a bit weird
-
Suppose a human is given two sentences: “Noweapons of mass destruction found in Iraq yet.”and “Weapons of mass destruction found in Iraq.”They are then asked to respond 0 or 1 and receive areward if they are correct. In this setup, they wouldlikely need a large number of trials and errors be-fore figuring out what they are really being re-warded to do. This setup is akin to the pretrain-and-fine-tune setup which has dominated NLP in recentyears, in which models are asked to classify a sen-tence representation (e.g., a CLS token) into some
This is a really excellent illustration of the difference in paradigm between "normal" text model fine tuning and prompt-based modelling
-
-
aclanthology.org aclanthology.org
-
Antibiotic resistance has become a growingworldwide concern as new resistance mech-anisms are emerging and spreading globally,and thus detecting and collecting the cause– Antibiotic Resistance Genes (ARGs), havebeen more critical than ever. In this work,we aim to automate the curation of ARGs byextracting ARG-related assertive statementsfrom scientific papers. To support the researchtowards this direction, we build SCIARG, anew benchmark dataset containing 2,000 man-ually annotated statements as the evaluationset and 12,516 silver-standard training state-ments that are automatically created from sci-entific papers by a set of rules. To set upthe baseline performance on SCIARG, weexploit three state-of-the-art neural architec-tures based on pre-trained language modelsand prompt tuning, and further ensemble themto attain the highest 77.0% F-score. To the bestof our knowledge, we are the first to leveragenatural language processing techniques to cu-rate all validated ARGs from scientific papers.Both the code and data are publicly availableat https://github.com/VT-NLP/SciARG.
The authors use prompt training on LLMs to build a classifier that can identify statements that describe whether or not micro-organisms have antibiotic resistant genes in scientific papers.
Tags
Annotators
URL
-
-
arxiv.org arxiv.org
-
Our annotators achieve thehighest precision with OntoNotes, suggesting thatmost of the entities identified by crowdworkers arecorrect for this dataset.
interesting that the mention detection algorithm gives poor precision on OntoNotes and the annotators get high precision. Does this imply that there are a lot of invalid mentions in this data and the guidelines for ontonotes are correct to ignore generic pronouns without pronominals?
-
an algorithm with high precision on LitBank orOntoNotes would miss a huge percentage of rele-vant mentions and entities on other datasets (con-straining our analysis)
these datasets have the most limited/constrained definitions for co-reference and what should be marked up so it makes sense that precision is poor in these datasets
-
Procedure: We first launch an annotation tutorial(paid $4.50) and recruit the annotators on the AMTplatform.9 At the end of the tutorial, each annotatoris asked to annotate a short passage (around 150words). Only annotators with a B3 score (Bagga
Annotators are asked to complete a quality control exercise and only annotators who achieve a B3 score of 0.9 or higher are invited to do more annotation
-
Annotation structure: Two annotation ap-proaches are prominent in the literature: (1) a localpairwise approach, annotators are shown a pairof mentions and asked whether they refer to thesame entity (Hladká et al., 2009; Chamberlain et al.,2016a; Li et al., 2020; Ravenscroft et al., 2021),which is time-consuming; or (2) a cluster-basedapproach (Reiter, 2018; Oberle, 2018; Bornsteinet al., 2020), in which annotators group all men-tions of the same entity into a single cluster. InezCoref we use the latter approach, which can befaster but requires the UI to support more complexactions for creating and editing cluster structures.
ezCoref presents clusters of coreferences all at the same time - this is a nice efficient way to do annotation versus pairwise annotation (like we did for CD^2CR)
-
owever, these datasets vary widelyin their definitions of coreference (expressed viaannotation guidelines), resulting in inconsistent an-notations both within and across domains and lan-guages. For instance, as shown in Figure 1, whileARRAU (Uryupina et al., 2019) treats generic pro-nouns as non-referring, OntoNotes chooses not tomark them at all
One of the big issues is that different co-reference datasets have significant differences in annotation guidelines even within the coreference family of tasks - I found this quite shocking as one might expect coreference to be fairly well defined as a task.
-
Specifically, our work investigates the quality ofcrowdsourced coreference annotations when anno-tators are taught only simple coreference cases thatare treated uniformly across existing datasets (e.g.,pronouns). By providing only these simple cases,we are able to teach the annotators the concept ofcoreference, while allowing them to freely interpretcases treated differently across the existing datasets.This setup allows us to identify cases where ourannotators disagree among each other, but moreimportantly cases where they unanimously agreewith each other but disagree with the expert, thussuggesting cases that should be revisited by theresearch community when curating future unifiedannotation guidelines
The aim of the work is to examine a simplified subset of co-reference phenomena which are generally treated the same across different existing datasets.
This makes spotting inter-annotator disagreement easier - presumably because for simpler cases there are fewer modes of failure?
-
this work, we developa crowdsourcing-friendly coreference annota-tion methodology, ezCoref, consisting of anannotation tool and an interactive tutorial. Weuse ezCoref to re-annotate 240 passages fromseven existing English coreference datasets(spanning fiction, news, and multiple other do-mains) while teaching annotators only casesthat are treated similarly across these datasets
this paper describes a new efficient coreference annotation tool which simplifies co-reference annotation. They use their tool to re-annotate passages from widely used coreference datasets.
Tags
Annotators
URL
-
-
www.researchgate.net www.researchgate.net
-
n recent years, the neural network based topic modelshave been proposed for many NLP tasks, such as infor-mation retrieval [11], aspect extraction [12] and sentimentclassification [13]. The basic idea is to construct a neuralnetwork which aims to approximate the topic-word distri-bution in probabilistic topic models. Additional constraints,such as incorporating prior distribution [14], enforcing di-versity among topics [15] or encouraging topic sparsity [16],have been explored for neural topic model learning andproved effective.
Neural topic models are often trained to mimic the behaviours of probabilistic topic models - I should come back and look at some of the works:
- R. Das, M. Zaheer, and C. Dyer, “Gaussian LDA for topic models with word embeddings,”
- P. Xie, J. Zhu, and E. P. Xing, “Diversity-promoting bayesian learning of latent variable models,”
- M. Peng, Q. Xie, H. Wang, Y. Zhang, X. Zhang, J. Huang, and G. Tian, “Neural sparse topical coding,”
-
e argue that mutual learningwould benefit sentiment classification since it enriches theinformation required for the training of the sentiment clas-sifier (e.g., when the word “incredible” is used to describe“acting” or “movie”, the polarity should be positive)
By training a topic model that has "similar" weights to the word vector model the sentiment task can also be improved (as per the example "incredible" should be positive when used to describe "acting" or "movie" in this context
-
. However, such a framework is not applicablehere since the learned latent topic representations in topicmodels can not be shared directly with word or sentencerepresentations learned in classifiers, due to their differentinherent meanings
Latent word vectors and topic models learn different and entirely unrelated representations
Tags
Annotators
URL
-