3 Matching Annotations
  1. Jul 2023
    1. First, under a highly permissive view, theuse of training data could be treated as non-infringing because protected works are not directlycopied. Second, the use of training data could be covered by a fair-use exception because atrained AI represents a significant transformation of the training data [63, 64, 65, 66, 67, 68].1Third, the use of training data could require an explicit license agreement with each creatorwhose work appears in the training dataset. A weaker version of this third proposal, is to atleast give artists the ability to opt-out of their data being used for generative AI [69]. Finally,a new statutory compulsory licensing scheme that allows artworks to be used as training databut requires the artist to be remunerated could be introduced to compensate artists and createcontinued incentives for human creation [70].

      For proposals for how copyright affects generative AI training data

      1. Consider training data a non-infringing use
      2. Fair use exception
      3. Require explicit license agreement with each creator (or an opt-out ability)
      4. Create a new "statutory compulsory licensing scheme"
  2. Feb 2023
    1. Certainly it would not be possible if theLLM were doing nothing more than cutting-and-pasting fragments of text from its training setand assembling them into a response. But this isnot what an LLM does. Rather, an LLM mod-els a distribution that is unimaginably complex,and allows users and applications to sample fromthat distribution.

      LLMs are not cut and paste; the matrix of token-following-token probabilities are "unimaginably complex"

      I wonder how this fact will work its way into the LLM copyright cases that have been filed. Is this enough to make a the LLM output a "derivative work"?