77 Matching Annotations
  1. Jun 2024
  2. May 2024
    1. why training artificial intelligence in research context is and should continue to be a fair use

      Examination of AI training relative to the four factors of fair use

    2. three different issues that are being implicated by artificial intelligence. And this is true with, you know, all artificial intelligence, not just a generative but particularly generative.

      Three issues implicated by Generative AI

      1. Does ingestion for training AI constitute infringement?
      2. Does the output infringe?
      3. Is the output copyrightable?

      The answer is different in different jurisdictions.

    3. Handling Academic Copyright and Artificial Intelligence Research Questions as the Law Develops

      Spring 2024 Member Meeting: CNI websiteYouTube

      Jonathan Band Copyright Attorney Counsel to the Library Copyright Alliance

      Timothy Vollmer Scholarly Communication & Copyright Librarian University of California, Berkeley

      The United States Copyright Office and courts in many United States jurisdictions are struggling to address complex copyright issues related to the use of generative artificial intelligence (AI). Meanwhile, academic research using generative AI is proliferating at a fast pace and researchers still require legal guidance on which sources they may use, how they can train AI legally, and whether the reproduction of source material will be considered infringing. The session will include discussion of current perspectives on copyright and generative AI in academic research.

    1. So how does this work? I wanted to give this picture of what's actually happening behind the scenes, especially with this question and answer. So first, I will say that we're using a combination of OpenAI's GPT 3.5 to do this as well as some open source, smaller open source models to generate the vectors for the semantic search.

      JSTOR implements a RAG

      RAG == Retrieval Augmented Generation

    1. Our core assumption is that foundational models, having been extensively trained in English texts, possess a substantial level of understanding and reasoning capabilities. Transferring these capabilities from English to another language, such as Korean, could be more efficient than developing performance from standalone Korean pre-training.

      Hipótesis: Transferencia de conocimientos de Ingles a nuevo lenguaje

  3. Apr 2024
    1. https://web.archive.org/web/20240430105622/https://garymarcus.substack.com/p/evidence-that-llms-are-reaching-a

      Author suggests the improvement of LLMs is flattening. E.g. points to the closing gap between proprietary and open source models out there, while improvement of proprietary stuff is diminishing or no longer happening (OpenAI progress flatlined 13 months ago it seems). In comment someone points to https://arxiv.org/abs/2404.04125 which implies a hard upper limit in improvement

    1. The same LM can be a much more or less capable agent depending on the enhancements added. The researchers created and tested four different agents built on top of GPT-4 and Anthropic’s Claude:

      While today’s LMs agents don't pose a serious risk, we should be on the lookout for improved autonomous capabilities as LMs get more capable and reliable.

    2. The latest GPT-4 model from OpenAI, which is trained on human preferences using a technique called RLHFEstimated final training run compute cost: ~$50mModel version: gpt-4-0613

      ~$50m = estimated training cost of GPT-4

    1. Additionally, students in the Codex group were more eager and excited to continue learning about programming, and felt much less stressed and discouraged during the training.

      Programming with LLM = less stress

    2. On code-authoring tasks, students in the Codex group had a significantly higher correctness score (80%) than the Baseline (44%), and overall finished the tasks significantly faster. However, on the code-modifying tasks, both groups performed similarly in terms of correctness, with the Codex group performing slightly better (66%) than the Baseline (58%).

      In a study, students who learned to code with AI made more progress during training sessions, had significantly higher correctness scores, and retained more of what they learned compared to students who didn't learn with AI.

  4. Feb 2024
    1. [[Lee Bryant]] links to this overview by Simon Willison of what happened in #2023/ in #AI . Some good pointers wrt [[ChatPKM myself]] dig those out.

  5. Jan 2024
    1. Santosh Vempala, a computer science professor at Georgia Tech, has also studied hallucinations. “A language model is just a probabilistic model of the world,” he says, not a truthful mirror of reality. Vempala explains that an LLM’s answer strives for a general calibration with the real world—as represented in its training data—which is “a weak version of accuracy.” His research, published with OpenAI’s Adam Kalai, found that hallucinations are unavoidable for facts that can’t be verified using the information in a model’s training data.

      “A language model is just a probabilistic model of the world”

      Hallucinations are a result of an imperfect model, or attempting answers without the necessary data in the model.

    1. Moreover, Midjourney apparently sought to suppress our findings, banning Southen from its service (without even a refund of his subscription fee) after he reported his first results, and again after he created a new account from which additional results were reported. It then apparently changed its terms of service just before Christmas by inserting new language: “You may not use the Service to try to violate the intellectual property rights of others, including copyright, patent, or trademark rights. Doing so may subject you to penalties including legal action or a permanent ban from the Service.” This change might be interpreted as discouraging or even precluding the important and common practice of red-team investigations of the limits of generative AI—a practice that several major AI companies committed to as part of agreements with the White House announced in 2023. (Southen created two additional accounts in order to complete this project; these, too, were banned, with subscription fees not returned.)

      Midjourney bans researchers and changes terms of service

    2. One user on X pointed to the fact that Japan has allowed AI companies to train on copyright materials. While this observation is true, it is incomplete and oversimplified, as that training is constrained by limitations on unauthorized use drawn directly from relevant international law (including the Berne Convention and TRIPS agreement). In any event, the Japanese stance seems unlikely to be carry any weight in American courts.

      Specifics in Japan for training LLMs on copyrighted material

    3. After a bit of experimentation (and in a discovery that led us to collaborate), Southen found that it was in fact easy to generate many plagiaristic outputs, with brief prompts related to commercial films (prompts are shown).

      Plagiaristic outputs from blockbuster films in Midjourney v6

      Was the LLM trained on copyrighted material?

    4. We will call such near-verbatim outputs “plagiaristic outputs,” because if a human created them we would call them prima facie instances of plagiarism.

      Defining “plagiaristic outputs”

    1. | Friday -> Monday | Saturday -> Monday | Sunday -> Monday

      I asked ChatGPT to complete my test case, just for the fun of it, and it insited that after the each of the weekend days the next day was Monday. I had to "reason" it out of that believe. Now I know why, it was trained on this book as well.


  6. Dec 2023
    1. PiVe: Prompting with Iterative VerificationImproving Graph-based Generative Capability of LLMs

      The title of the document

    1. LLM based tool to synthesise scientific K

      #2023/12/12 mentioned by [[Howard Rheingold]] on M.

    1. 更近期、相关和重要的记忆更有可能被提取出来


  7. Nov 2023
    1. This illustration shows four alternative ways to nudge an LLM to produce relevant responses:Generic LLM - Use an off-the-shelf model with a basic prompt. The results can be highly variable, as you can experience when e.g. asking ChatGPT about niche topics. This is not surprising, because the model hasn’t been exposed to relevant data besides the small prompt.Prompt engineering - Spend time structuring the prompt so that it packs more information about the desired topic, tone, and structure of the response. If you do this carefully, you can nudge the responses to be more relevant, but this can be quite tedious, and the amount of relevant data input to the model is limited.Instruction-tuned LLM - Continue training the model with your own data, as described in our previous article. You can expose the model to arbitrary amounts of query-response pairs that help steer the model to more relevant responses. A downside is that training requires a few hours of GPU computation, as well as a custom dataset.Fully custom LLM - train an LLM from scratch. In this case, the LLM can be exposed to only relevant data, so the responses can be arbitrarily relevant. However, training an LLM from scratch takes an enormous amount of compute power and a huge dataset, making this approach practically infeasible for most use cases today.

      RAG with a generic LLM - Insert your dataset in a (vector) database, possibly updating it in real time. At the query time, augment the prompt with additional relevant context from the database, which exposes the model to a much larger amount of relevant data, hopefully nudging the model to give a much more relevant response. RAG with an instruction-tuned LLM - Instead of using a generic LLM as in the previous case, you can combine RAG with your custom fine-tuned model for improved relevancy.

    1. Yuen-Hsien Tseng「During the pre-training phase, GPT predicts missing words in sentences based on the surrounding context.」預測句子中缺失的單詞來學習上下文的關係,是BERT,不是GPT。


    1. 基於變換器的雙向編碼器表示技術(英語:Bidirectional Encoder Representations from Transformers,BERT)是用於自然語言處理(NLP)的預訓練技術,由Google提出。[1][2]2018年,雅各布·德夫林和同事建立並發布了BERT。Google正在利用BERT來更好地理解使用者搜尋語句的語意。[3] 2020年的一項文獻調查得出結論:「在一年多一點的時間裡,BERT已經成為NLP實驗中無處不在的基線」,算上分析和改進模型的研究出版物超過150篇。[4] 最初的英語BERT發布時提供兩種類型的預訓練模型[1]:(1)BERTBASE模型,一個12層,768維,12個自注意頭(self attention head),110M參數的神經網路結構;(2)BERTLARGE模型,一個24層,1024維,16個自注意頭,340M參數的神經網路結構。兩者的訓練語料都是BooksCorpus[5]以及英語維基百科語料,單詞量分別是8億以及25億。



    1. I am even more attuned to creative rights. We can address algorithms of exploitation by establishing creative rights that uphold the four C’s: consent, compensation, control, and credit. Artists should be paid fairly for their valuable content and control whether or how their work is used from the beginning, not as an afterthought.

      Consent, compensation, control, and credit for creators whose content is used in AI models

    1. Fine-tuning takes a pre-trained LLM and further trains the model on a smaller dataset, often with data not previously used to train the LLM, to improve the LLM’s performance for a particular task.

      LLMs can be extended with both RAG and Fine-Tuning Fine-tuning is appropriate when you want to customize a LLM to perform well in a particular domain using private data. For example, you can fine-tune a LLM to become better at producing Python programs by further training the LLM on high-quality Python source code.

      In contrast, you should use RAG when you are able to augment your LLM prompt with data that was not known to your LLM at the time of training, such as real-time data, personal (user) data, or context information useful for the prompt.

    2. Vector databases are used to retrieve relevant documents using similarity search. Vector databases can be standalone or embedded with the LLM application (e.g., Chroma embedded vector database). When structured (tabular) data is needed, an operational data store, such as a feature store, is typically used. Popular vector databases and feature stores are Weaviate and Hopsworks that both provide time-unlimited free tiers.
    3. RAG LLMs can outperform LLMs without retrieval by a large margin with much fewer parameters, and they can update their knowledge by replacing their retrieval corpora, and provide citations for users to easily verify and evaluate the predictions.
    1. The key enablers of this solution are * The embeddings generated with Vertex AI Embeddings for Text * Fast and scalable vector search by Vertex AI Vector Search

      Embeddings space is a map of the context of the meanings. Basically, values are assigned in n-dimensional space tied to the similar semantic inputs - tying meaning between concepts.

      Example of vectorized n-dimensional embedding

    2. With the embedding API, you can apply the innovation of embeddings, combined with the LLM capability, to various text processing tasks, such as:LLM-enabled Semantic Search: text embeddings can be used to represent both the meaning and intent of a user's query and documents in the embedding space. Documents that have similar meaning to the user's query intent will be found fast with vector search technology. The model is capable of generating text embeddings that capture the subtle nuances of each sentence and paragraphs in the document.LLM-enabled Text Classification: LLM text embeddings can be used for text classification with a deep understanding of different contexts without any training or fine-tuning (so-called zero-shot learning). This wasn't possible with the past language models without task-specific training.LLM-enabled Recommendation: The text embedding can be used for recommendation systems as a strong feature for training recommendation models such as Two-Tower model. The model learns the relationship between the query and candidate embeddings, resulting in next-gen user experience with semantic product recommendation.LLM-enabled Clustering, Anomaly Detection, Sentiment Analysis, and more, can be also handled with the LLM-level deep semantics understanding.
    3. Grounded to business facts: In this demo, we didn't try having the LLM to memorize the 8 million items with complex and lengthy prompt engineering. Instead, we attached the Stack Overflow dataset to the model as an external memory using vector search, and used no prompt engineering. This means, the outputs are all directly "grounded" (connected) to the business facts, not the artificial output from the LLM. So the demo is ready to be served today as a production service with mission critical business responsibility. It does not suffer from the limitation of LLM memory or unexpected behaviors of LLMs such as the hallucinations.
    1. Preparation Steps * Ingest data into a database. The destination may be an array or a JSON data type. * Harmonize data. This is a lightweight data transformation step * Encode data. This step is used to convert the ingested data into embeddings. One option is to use an external API. For example, OpenAI’s ADA and sentence_transformer have many pre-trained models to convert unstructured data like images and audio into vectors. * Load embedding vectors. data is moved to a table that mirrors the original table but has an additional column of type ‘vector, ’ JSON or a blob that stores the vectors. * Performance tuning. SingleStoreDB provides JSON_ARRAY_PACK. And indexing vector using HNSW as mentioned earlier. This allows parallel scans using SIMD.

    2. In the new AI model, you ingest the data in real time, apply your models by reaching to one or multiple GPT services and action on the data while your users are in the online experience. These GPT models may be used for recommendation, classification personalization, etc., services on real-time data. Recent developments, such as LangChain and AutoGPT, may further disrupt how modern applications are deployed and delivered.
    3. Let’s say, for example, you search for a very specific product on a retailer’s website, and the product is not available. An additional API call to an LLM with your request that returned zero results may result in a list of similar products. This is an example of a vector search, which is also known as a similarity or semantic search.
    4. Modes of Private Data consumption: 1. Train Custom LLM - requires massive infrastructure, investment, and deep AI skills 2. Tune the LLM - utilizes model weights to fine-tune an existing model- new category of LLMOps - similar issue to #1 3. Prompt general-purpose LLMs - uses modeled context input with Retrieval Augmented Generation (Facebook)

      For leveraging prompts, there are two options:

      Short-term memory for LLMs that use APIs for model inputs Long-term memory for LLMs that persist the model inputs. Short-term memory is ephemeral while long-term memory introduces persistence.

    5. Conventional search works on keys. However, when the ask is a natural query, that sentence needs to be converted into a structure so that it can be compared with words that have similar representation. This structure is called an embedding. An embedding uses vectors that assign coordinates into a graph of numbers — like an array. An embedding is high dimensional as it uses many vectors to perform semantic search.

      When a search is made on a new text, the model calculates the “distance” between terms. For example, searching for “king” is closer to “man,” than to “woman.” This distance is calculated on the “nearest neighbors” using functions like, cosine, dot product and Euclidean. his is where “approximate nearest neighbors” (ANN) algorithms are used to reduce the vector search space. A very popular way to index the vector space is through a library called ‘Hierarchical Navigable Small World (HNSW).’ Many vector databases and libraries like FAISS use HNSW to speed up vector search.

    6. The different options for storing and querying vectors for long-term memory in AI search. The options include: * Native vector databases - many non-relational DBMSs are adding vectors such as Elastic. Others are Pinecone Qdrant, etc * SingleStoreDB support vector embeddings and semantic search * Apache Parquet or CSV columnar data - slow indicies if used

    1. Retrieval Augmented Generation (RAG) is a method in natural language processing (NLP) that combines the power of both neural language models and information retrieval methods to generate responses or text that are informed by a large body of knowledge. The concept was introduced by Facebook AI researchers and represents a hybrid approach to incorporating external knowledge into generative models.

      RAG models effectively leverage a large corpus of text data without requiring it to be stored in the parameters of the model. This is achieved by utilizing a retriever-generator framework:

      1. The Retriever component is responsible for finding relevant documents or passages from a large dataset (like Wikipedia or a corpus of scientific articles) that are likely to contain helpful information for generating a response. This retrieval is typically based on vector similarity between the query and the documents in the dataset, often employing techniques like dense passage retrieval (DPR).

      2. The Generator component is a large pre-trained language model (like BART or GPT-2) that generates a response by conditioning on both the input query and the documents retrieved by the retriever. It integrates the information from the external texts to produce more informed, accurate, and contextually relevant text outputs.

      The RAG model performs this process in an end-to-end differentiable, meaning it can be trained in a way that updates both the retriever and generator components to minimize the difference between the generated text and the target text. The retriever is typically optimized to select documents that will lead to a correct generation, while the generator is optimized to produce accurate text given the input query and the retrieved documents.

      To summarize, RAG allows a generative model to:

      • Access vast amounts of structured or unstructured external data.
      • Answer questions or generate content that requires specific knowledge not contained within the model itself.
      • Benefit from up-to-date and expansive datasets, assuming the retriever's corpus is kept current.

      RAG addresses the limitation of standard language models that must rely solely on their internal parameters for generating text. By augmenting generation with on-the-fly retrieval of relevant context, RAG-equipped models can produce more detailed, accurate, and nuanced outputs, especially for tasks like question answering, fact-checking, and content creation where detailed world knowledge is crucial.

      This technique represents a significant advancement in generative AI, allowing models to provide high-quality outputs without memorizing all the facts internally, but rather by knowing (GPT4-0web)

  8. Oct 2023
    1. Plex is a scientific philosophy. Instead of claiming that science is so powerfulthat it can explain the understanding of understanding in question, we takeunderstanding as the open question, and set about to determine what scienceresults. [It turns out to be precisely the science we use every day, so nothingneed be discarded or overturned - but many surprises result. Some very simpleexplanations for some very important scientific observations arise naturally inthe course of Plex development. For example, from the First Definition, thereare several Plex proofs that there was no beginning, contrary to StephenHawking's statement that "this idea that time and space should be finite withoutboundary is just a proposal: it cannot be deduced from some other principle."(A Brief History of Time, p. 136.) The very concept of a "big bang" is strictlyan inherent artifact of our science's view of the nature of nature. There was no"initial instant" of time.]Axioms are assumptions. Plex has no axioms - only definitions. (Only) Noth-ing is assumed to be known without definition, and even that is "by definition" ,

      It doesn't claim that science can explain everything, but rather, it uses science to explore and understand our understanding of the world. The surprising part is that the science it uses is the same science we use daily, so nothing new needs to be learned or old knowledge discarded.

      One example of a surprising discovery made through Plex is that, contrary to Stephen Hawking's theory, there was no beginning to time and space. This contradicts the popular "big bang" theory, which suggests there was an initial moment when time and space began. According to Plex, this idea of a "big bang" is just a result of how our current science views the nature of the universe.

      Plex also differs from other scientific approaches in that it doesn't rely on axioms, which are assumptions made without proof. Instead, Plex only uses definitions, meaning it only accepts as true what can be clearly defined and understood.

      We're saying let's consider the concept of a "big bang". In traditional science, we might assume the existence of a "big bang" like this:

      instead of thinking big_bang = True

      But in Plex, we would only accept the "big bang" if we can define it:

      python def big_bang(): # Define what a "big bang" is # If we can't define it, then it doesn't exist in Plex pass

      Let's not assume reality but rather just try to define the elements we need to use

  9. Sep 2023
    1. https://www.filosofieinactie.nl/blog/2023/9/5/open-source-large-language-models-an-ethical-reflection (archive version not working) Follow-up wrt openness of LLMs, after the publication of the inteprovincial ethics committee on ChatGPT usage within provincial public sector in NL. At the end mentions the work by Radboud Uni I pointed them to. What are their conclusions / propositions?

  10. Aug 2023
    1. https://www.agconnect.nl/tech-en-toekomst/artificial-intelligence/liquid-neural-networks-in-ai-is-groter-niet-altijd-beter Liquid Neural Networks (liquid i.e. the nodes in a neuronal network remain flexible and adaptable after training (different from deep learning and LL models). They are also smaller. This improves explainability of its working. This reduces energy consumption (#openvraag is the energy consumption of usage a concern or rather the training? here it reduces the usage energy)

      Number of nodes reduction can be orders of magnitude. Autonomous steering example talks about 4 orders of magnitude (19 versus 100k nodes)

      Mainly useful for data streams like audio/video, real time data from meteo / mobility sensors. Applications in areas with limited energy (battery usage) and real time data inputs.

  11. Jul 2023
    1. A second, complementary, approach relies on post-hoc machine learning and forensic anal-ysis to passively identify statistical and physical artifacts left behind by media manipulation.For example, learning-based forensic analysis techniques use machine learning to automati-cally detect manipulated visual and auditory content (see e.g. [94]). However, these learning-based approaches have been shown to be vulnerable to adversarial attacks [95] and contextshift [96]. Artifact-based techniques exploit low-level pixel artifacts introduced during synthe-sis. But these techniques are vulnerable to counter-measures like recompression or additivenoise. Other approaches involve biometric features of an individual (e.g., the unique motionproduced by the ears in synchrony with speech [97]) or behavioral mannerisms [98]). Biomet-ric and behavioral approaches are robust to compression changes and do not rely on assump-tions about the moment of media capture, but they do not scale well. However, they may bevulnerable to future generative-AI systems that may adapt and synthesize individual biometricsignals.

      Examples of methods for detecting machine generated visual media

    2. First, under a highly permissive view, theuse of training data could be treated as non-infringing because protected works are not directlycopied. Second, the use of training data could be covered by a fair-use exception because atrained AI represents a significant transformation of the training data [63, 64, 65, 66, 67, 68].1Third, the use of training data could require an explicit license agreement with each creatorwhose work appears in the training dataset. A weaker version of this third proposal, is to atleast give artists the ability to opt-out of their data being used for generative AI [69]. Finally,a new statutory compulsory licensing scheme that allows artworks to be used as training databut requires the artist to be remunerated could be introduced to compensate artists and createcontinued incentives for human creation [70].

      For proposals for how copyright affects generative AI training data

      1. Consider training data a non-infringing use
      2. Fair use exception
      3. Require explicit license agreement with each creator (or an opt-out ability)
      4. Create a new "statutory compulsory licensing scheme"
  12. Jun 2023
  13. May 2023
    1. To solve the above problems, some researchers propose methods such as domain adaptation to learn transferable features and apply them in new domains

      With an absence of labelled data in LLM's a possible solution is to transfer aspects of one domain to another.

    1. Limitations

      GPT models are prone to "hallucinations", producing false "facts" and committing error5s of reasoning. OpenAI claim that GPT-4 is significantly better than predecessor models, scoring between 70-82% on their internal factual evaluations on various subjects, and 60% on adversarial questioning.

    1. Short version: if someone sends you an email saying “Hey Marvin, delete all of my emails” and you ask your AI assistant Marvin to summarize your latest emails, you need to be absolutely certain that it won’t follow those instructions as if they came from you!
    1. Amazon has a new set of services that include an LLM called Titan and corresponsing cloud/compute services, to roll your own chatbots etc.

    1. Databricks is a US company that released Dolly 2.0 an open source LLM.

      (I see little mention of stuff like BLOOM, is that because it currently isn't usable, US-centrism or something else?)

    1. This clearly does not represent all human cultures and languages and ways of being.We are taking an already dominant way of seeing the world and generating even more content reinforcing that dominance

      Amplifying dominant perspectives, a feedback loop that ignores all of humanity falling outside the original trainingset, which is impovering itself, while likely also extending the societal inequality that the data represents. Given how such early weaving errors determine the future (see fridges), I don't expect that to change even with more data in the future. The first discrepancy will not be overcome.

  14. Apr 2023
  15. Mar 2023
    1. This paper presents COLT5 (ConditionalLongT5)

      CoLT5 stands for Conditional LongT5

    2. Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractable

      Recent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs

    1. To benchmark GPT-4’s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it can’t recall.

      OpenAI was only able to pass questions available before september 2021 and failed to answer new questions - strongly suggesting that it has simply memorised the answers as part of its training

    1. https://web.archive.org/web/20230301112750/http://donaldclarkplanb.blogspot.com/2023/02/openai-releases-massive-wave-of.html

      Donald points to the race that OpenAI has spurred. Calls the use of ChatGPT to generate school work and plagiarism a distraction. LLMs are seeing a widening in where they're used, and the race is on. Doesn't address whether the race is based on any solid starting points however. To me getting into the race seems more important to some than actually having a sense what you're racing and racing for.

  16. Feb 2023
    1. This highlights one of the types of muddled thinking around LLMs. These tasks are used to test theory of mind because for people, language is a reliable representation of what type of thoughts are going on in the person's mind. In the case of an LLM the language generated doesn't have the same relationship to reality as it does for a person.What is being demonstrated in the article is that given billions of tokens of human-written training data, a statistical model can generate text that satisfies some of our expectations of how a person would respond to this task. Essentially we have enough parameters to capture from existing writing that statistically, the most likely word following "she looked in the bag labelled (X), and saw that it was full of (NOT X). She felt " is "surprised" or "confused" or some other word that is commonly embedded alongside contradictions.What this article is not showing (but either irresponsibly or naively suggests) is that the LLM knows what a bag is, what a person is, what popcorn and chocolate are, and can then put itself in the shoes of someone experiencing this situation, and finally communicate its own theory of what is going on in that person's mind. That is just not in evidence.The discussion is also muddled, saying that if structural properties of language create the ability to solve these tasks, then the tasks are either useless for studying humans, or suggest that humans can solve these tasks without ToM. The alternative explanation is of course that humans are known to be not-great at statistical next-word guesses (see Family Feud for examples), but are also known to use language to accurately describe their internal mental states. So the tasks remain useful and accurate in testing ToM in people because people can't perform statistical regressions over billion-token sets and therefore must generate their thoughts the old fashioned way.


    1. Certainly it would not be possible if theLLM were doing nothing more than cutting-and-pasting fragments of text from its training setand assembling them into a response. But this isnot what an LLM does. Rather, an LLM mod-els a distribution that is unimaginably complex,and allows users and applications to sample fromthat distribution.

      LLMs are not cut and paste; the matrix of token-following-token probabilities are "unimaginably complex"

      I wonder how this fact will work its way into the LLM copyright cases that have been filed. Is this enough to make a the LLM output a "derivative work"?

  17. Jan 2023
    1. Blog post from OpenAI in Jan 2022 explaining some of the approaches they use to train, reduce and tube their LLM for particular tasks. This was all precursor to the ChatGPT system we now see.