Hypothesis

40 Matching Annotations

Sep 2024
github.com github.com

wordfreq/SUNSET.md at master · rspeer/wordfreq

2
1. tonz 21 Sep 2024
  
  in Public
  
  I don't think anyone has reliable information about post-2021 language usage by humans. The open Web (via OSCAR) was one of wordfreq's data sources. Now the Web at large is full of slop generated by large language models, written by no one to communicate nothing. Including this slop in the data skews the word frequencies. Sure, there was spam in the wordfreq data sources, but it was manageable and often identifiable. Large language models generate text that masquerades as real language with intention behind it, even though there is none, and their output crops up everywhere.
  
  Robyn Speer will no update longer Wordfreq States that n:: there is no reliable post-2021 language usage data! Wordfreq was using open web sources, but it getting pollutted by #algogens output
  
  wordfreq llms dataquality corpus reverseturing epistomology_centipede
2. tonz 21 Sep 2024
  
  in Public
  
  The field I know as "natural language processing" is hard to find these days. It's all being devoured by generative AI. Other techniques still exist but generative AI sucks up all the air in the room and gets all the money. It's rare to see NLP research that doesn't have a dependency on closed data controlled by OpenAI and Google
  
  Robyn Speer says in his view natural language processing as a field has been taken over by #algogens And most NLP research now depends on closed data from the #algogens providers.
  
  wordfreq nlp algogens research corpus linguistics
Visit annotations in context

Tags

algogens

nlp

llms

linguistics

wordfreq

dataquality

corpus

research

epistomology_centipede

reverseturing

Annotators

tonz

URL

github.com/rspeer/wordfreq/blob/master/SUNSET.md
www.biblonia.com www.biblonia.com

Putting the body back into corporate

1
1. tonz 17 Sep 2024
  
  in Public
  
  In an age where "corporate" evokes images of towering glass buildings and faceless multinational conglomerates, it's easy to forget that the roots of the word lie in something far more tangible and human: the body.In the medieval period, the idea of a corporation wasn't about shareholder value or quarterly profits; it was about flesh and blood, a community bound together as a single "body"—a corpus.
  
  Via [[Lee Bryant]]
  
  corporation from corpus. Medieval roots of corporation were people brought together in a single purpose/economic entity. Guilds, cities. Based on Roman law roots, where a corpus could have legal personhood status. Overtones of collective identity, governance. Pointer suggests a difference with how we see corporations as does the first paragraph here, but the piece itself sees mostly parallels actually. Note that Roman/medieval corpora were about property, (royal) privileges. That is a diff e.g. in US where corporates seek to both be a legal person (wrt politics/finance) and seek distance from accountability a person would have (pollution, externalising negative impacts). I treat a legal entity also as a trade: it bestows certain protections and privileges on me as entrepreneur, but also certain conditions and obligations (public transparancy, financial reporting etc.)
  
  A contrast with ME corpus is seeing [[Corporations as Slow AI 20180201210258]] (anonymous processes, mindlessly wandering to a financial goal)
  
  corpus corporation corporates ai slowai
Visit annotations in context

Tags

ai

corporation

corpus

slowai

corporates

Annotators

tonz

URL

biblonia.com/p/putting-the-body-back-into-corporate
Jul 2024
www.google.com www.google.com

high resolution addressing of disaggregated text corpus mapped to graph - Google Search

1
1. stopresetgo 30 Jul 2024
  
  in Public
  
  for - search - google - high resolution addressing of disaggregated text corpus mapped to graph - search results of interest - high resolution addressing of disaggregated text corpus mapped to graph
  
  search - google - high resolution addressing of disaggregated text corpus mapped to graph - https://www.google.com/search?q=high+resolution+addressing+of+disaggregated+text+corpus+mapped+to+graph&oq=high+resolution+addressing+of+disaggregated+text+corpus+mapped+to+graph&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigAdIBCTMzNjEzajBqN6gCALACAA&sourceid=chrome&ie=UTF-8
  
  to - search results of interest - high resolution addressing of disaggregated text corpus mapped to graph - A New Method for Graph-Based Representation of Text in - The use of a new text representation method to predict book categories based on the analysis of its content resulted in accuracy, precision, recall and an F1- ... - https://hyp.is/H9UAbk46Ee-PT_vokcnTqA/www.mdpi.com/2076-3417/10/12/4081 - Encoding Text Information with Graph Convolutional Networks - According to our understanding, this is the first personality recognition study to model the entire user text information corpus as a heterogeneous graph and ... - https://hyp.is/H9UAbk46Ee-PT_vokcnTqA/www.mdpi.com/2076-3417/10/12/4081
  
  search - google - high resolution addressing of disaggregated text corpus mapped to graph to - A New Method for Graph-Based Representation of Text in to - - Encoding Text Information with Graph Convolutional Networks - personality recognition
Visit annotations in context

Tags

to - A New Method for Graph-Based Representation of Text in

search - google - high resolution addressing of disaggregated text corpus mapped to graph

to - - Encoding Text Information with Graph Convolutional Networks - personality recognition

Annotators

stopresetgo

URL

google.com/search
www.mdpi.com www.mdpi.com

Encoding Text Information with Graph Convolutional Networks for Personality Recognition

1
1. stopresetgo 30 Jul 2024
  
  in Public
  
  he most commonly used personality model is the Big Five personality traits model, which describes personality in five aspects: extroversion, neuroticism, agreeableness, conscientiousness, and openness
  
  for - from - search - google - high resolution addressing of disaggregated text corpus mapped to graph
  
  from - search - google - high resolution addressing of disaggregated text corpus mapped to graph - https://hyp.is/ch_J9k43Ee-lGzfOapoCvQ/www.google.com/search?q=high+resolution+addressing+of+disaggregated+text+corpus+mapped+to+graph&oq=high+resolution+addressing+of+disaggregated+text+corpus+mapped+to+graph&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigAdIBCTMzNjEzajBqN6gCALACAA&sourceid=chrome&ie=UTF-8
  
  Big Five personality trait model from - search - google - high resolution addressing of disaggregated text corpus mapped to graph
Visit annotations in context

Tags

Big Five personality trait model

from - search - google - high resolution addressing of disaggregated text corpus mapped to graph

Annotators

stopresetgo

URL

mdpi.com/2076-3417/10/12/4081
www.mdpi.com www.mdpi.com

A New Method for Graph-Based Representation of Text in Natural Language Processing

2
1. stopresetgo 30 Jul 2024
  
  in Public
  
  An innovative element of the proposed approach is the use of common cliques in graphs representing documents to create a feature vector.
  
  for - further research - common cliques in graphs - question - relevance to disaggregating text corpus into sub-sentence graph nodes?
  
  further research - common cliques in graphs question - relevance to disaggregating text corpus into sub-sentence graph nodes?
2. stopresetgo 30 Jul 2024
  
  in Public
  
  for - from - search - google - high resolution addressing of disaggregated text corpus mapped to graph
  
  from - search - google - high resolution addressing of disaggregated text corpus mapped to graph - https://hyp.is/ch_J9k43Ee-lGzfOapoCvQ/www.google.com/search?q=high+resolution+addressing+of+disaggregated+text+corpus+mapped+to+graph&oq=high+resolution+addressing+of+disaggregated+text+corpus+mapped+to+graph&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigAdIBCTMzNjEzajBqN6gCALACAA&sourceid=chrome&ie=UTF-8
  
  from - search - google - high resolution addressing of disaggregated text corpus mapped to graph
Visit annotations in context

Tags

from - search - google - high resolution addressing of disaggregated text corpus mapped to graph

further research - common cliques in graphs

question - relevance to disaggregating text corpus into sub-sentence graph nodes?

Annotators

stopresetgo

URL

mdpi.com/2079-9292/12/13/2846
Apr 2024
www.newyorker.com www.newyorker.com

A New History of Arabia, Written in Stone

1
1. chrisaldrich 11 Apr 2024
  
  in Public
  
  Michael Macdonald amassed a vast collection of photographs of these texts and launched a digital Safaitic database, with the help of Laïla Nehmé, a French archeologist and one of the world’s leading experts on early Arabic inscriptions. “When we started working, Michael’s corpus was all on index cards,” Nehmé recalled. “With the database, you could search for sequences of words across the whole collection, and you could study them statistically. It worked beautifully.”
  
  Researcher Michael Macdonald created a card index database of safaitic inscriptions which he and French archaeologist Laïla Nehmé eventually morphed into a digital database which included a collection of photographs of the extant texts.
  
  card index for dictionaries card index for philology safaitic script Michael Macdonald corpus linguistics Laïla Nehmé
Visit annotations in context

Tags

card index for philology

corpus linguistics

Michael Macdonald

safaitic script

Laïla Nehmé

card index for dictionaries

Annotators

chrisaldrich

URL

newyorker.com/culture/culture-desk/a-new-history-of-arabia-written-in-stone
May 2023
ourworldindata.org ourworldindata.org

Books

1
1. WHPrivate 27 May 2023
  
  in Public
  
  A book is defined as a published title with more than 49 pages.
  
  [24] AI - Bias in Training Materials
  
  AI Artificial Intelligence Subtle Bias Training Corpus
Visit annotations in context

Tags

AI

Training Corpus

Artificial Intelligence

Subtle Bias

Annotators

WHPrivate

URL

ourworldindata.org/books
Apr 2023
verskorpusz.elte-dh.hu verskorpusz.elte-dh.hu

ELTE verskorpusz

1
1. B.barnabas 16 Apr 2023
  
  in Public
  
  korpusz
  
  korpusz
  
  a latin corpus (test) szóbol ered nyelvi korpusz felhasználási formáj pl.a szótár
Visit annotations in context

Tags

a latin corpus (test) szóbol ered

nyelvi korpusz felhasználási formáj pl.a szótár

Annotators

B.barnabas

URL

verskorpusz.elte-dh.hu/
Feb 2023
wordcraft-writers-workshop.appspot.com wordcraft-writers-workshop.appspot.com

Wordcraft Writers Workshop

1
1. chrisaldrich 12 Feb 2023
  
  in Public
  
  The application is powered by LaMDA, one of the latest generation of large language models. At its core, LaMDA is a simple machine — it's trained to predict the most likely next word given a textual prompt. But because the model is so large and has been trained on a massive amount of text, it's able to learn higher-level concepts.
  
  Is LaMDA really able to "learn higher-level concepts" or is it just a large, straight-forward information theoretic-based prediction engine?
  
  PAIR (Google) LaMDA information theory predictive text large langue models corpus linguistics Wordcraft artificial intelligence
Visit annotations in context

Tags

corpus linguistics

predictive text

artificial intelligence

LaMDA

Wordcraft

PAIR (Google)

information theory

large langue models

Annotators

chrisaldrich

URL

wordcraft-writers-workshop.appspot.com/learn
www.complexityexplorer.org www.complexityexplorer.org

Complexity Explorer

1
1. chrisaldrich 07 Feb 2023
  
  in Public
  
  Rhetoric of encomium
  
  How do institutions form around notions of merit?
  
  Me: what about blurbs as evidence of implied social networks? Who blurbs whom? How are these invitations sent/received and by whom?
  
  diachronic: how blurbs evolve over time
  
  Signals, can blurbs predict: - the field of the work - gender - other
  
  Emergence or decrease of signals with respect to time
  
  Imitation of styles and choices. - how does this happen? contagion - I'm reminded of George Mathew Dutcher admonition:
  
  Imitation to be avoided. Avoid the mannerisms and personal peculiarities of method or style of well-known writers, such as Carlyle or Macaulay. (see: https://hypothes.is/a/ROR3VCDEEe2sZNOy4rwRgQ )
  
  Systematic studies of related words within corpora. (this idea should have a clever name) word2vec, word correlations, information theory
  
  How does praise work?
  
  metaphors within blurbs (eg: light, scintillating, brilliant, new lens, etc.)
  
  metaphors blurbs praise writing style information theory corpus linguistics merit signaling merit
Visit annotations in context

Tags

corpus linguistics

merit

metaphors

signaling merit

information theory

blurbs

praise

writing style

Annotators

chrisaldrich

URL

complexityexplorer.org/courses/162-foundations-applications-of-humanities-analytics/segments/15635
Jan 2023
github.com github.com

Coptic SCRIPTORIUM

1
1. chrisaldrich 16 Jan 2023
  
  in Public
  
  https://github.com/CopticScriptorium<br /> Tools and technologies for digital and computational research into Coptic language and literature
  
  Coptic linguistics corpus linguistics
Visit annotations in context

Tags

corpus linguistics

linguistics

Coptic

Annotators

chrisaldrich

URL

github.com/CopticScriptorium
genizalab.princeton.edu genizalab.princeton.edu

Princeton Machine Learning and the Future of Philology Symposium

1
1. chrisaldrich 09 Jan 2023
  
  in Public
  
  https://genizalab.princeton.edu/events/2022/princeton-machine-learning-and-future-philology-symposium
  
  Was this recorded?
  
  machine learning philology symposia digital humanities manuscript studies artificial intelligence corpus linguistics incunabula handwriting recognition natural language processing
Visit annotations in context

Tags

corpus linguistics

symposia

natural language processing

philology

handwriting recognition

artificial intelligence

machine learning

digital humanities

incunabula

manuscript studies

Annotators

chrisaldrich

URL

genizalab.princeton.edu/events/2022/princeton-machine-learning-and-future-philology-symposium
Local file Local file

Finding a Fragment in a Pile of Geniza: A Practical Guide to Collections, Editions, and Resources

2
1. chrisaldrich 09 Jan 2023
  
  in Public
  
  Fried-berg Judeo-Arabic Project, accessible at http://fjms.genizah.org. This projectmaintains a digital corpus of Judeo-Arabic texts that can be searched and an-alyzed.
  
  The Friedberg Judeo-Arabic Project contains a large corpus of Judeo-Arabic text which can be manually searched to help improve translations of texts, but it might also be profitably mined using information theoretic and corpus linguistic methods to provide larger group textual translations and suggestions at a grander scale.
  
  Friedberg Jewish Manuscript Society Friedberg Judeo-Arabic Project corpus linguistics digital humanities information theory artificial intelligence natural language processing contextual clues contextual extrapolation
2. chrisaldrich 09 Jan 2023
  
  in Public
  
  More recent ad-ditions to the website include a “jigsaw puzzle” screen that lets users viewseveral items while playing with them to check whether they are “joins.” An-other useful feature permits the user to split the screen into several panelsand, thus, examine several items simultaneously (useful, e.g., when compar-ing handwriting in several documents). Finally, the “join suggestions” screenprovides the results of a technologically groundbreaking computerized anal-ysis of paleographic and codiocological features that suggests possible joinsor items written by the same scribe or belonging to the same codex. 35
  
  Computer means can potentially be used to check or suggest potential "joins" of fragments of historical documents.
  
  An example of some of this work can be seen in the Friedberg Genizah Project and their digital tools.
  
  digital humanities joins codicology textual scholarship fragments artificial intelligence corpus linguistics Cairo Geniza epigraphy graphology Friedberg Genizah Project jigsaw puzzles
Tags

jigsaw puzzles

textual scholarship

digital humanities

contextual extrapolation

fragments

corpus linguistics

joins

natural language processing

Friedberg Jewish Manuscript Society

Friedberg Genizah Project

artificial intelligence

Friedberg Judeo-Arabic Project

epigraphy

graphology

contextual clues

information theory

Cairo Geniza

codicology

Annotators

chrisaldrich
Nov 2022
www.researchgate.net www.researchgate.net

(20) Robert Amsler

1
1. chrisaldrich 14 Nov 2022
  
  in Public
  
  Robert Amsler is a retired computational lexicology, computational linguist, information scientist. His P.D. was from UT-Austin in 1980. His primary work was in the area of understanding how machine-readable dictionaries could be used to create a taxonomy of dictionary word senses (which served as the motivation for the creation of WordNet) and in understanding how lexicon can be extracted from text corpora. He also invented a new technique in citation analysis that bears his name. His work is mentioned in Wikipedia articles on Machine-Readable dictionary, Computational lexicology, Bibliographic coupling, and Text mining. He currently lives in Vienna, VA and reads email at robert.amsler at utexas. edu. He is currenly interested in chronological studies of vocabulary, esp. computer terms.
  
  https://www.researchgate.net/profile/Robert-Amsler
  
  Apparently follow my blog. :)
  
  Makes me wonder how we might better process and semantically parse peoples' personal notes, particularly when they're atomic and cross-linked?
  
  Robert Amsler linguistics dictionaries natural language processing corpus linguistics idea links open questions
Visit annotations in context

Tags

open questions

corpus linguistics

Robert Amsler

natural language processing

idea links

linguistics

dictionaries

Annotators

chrisaldrich

URL

researchgate.net/profile/Robert-Amsler
Oct 2022
Local file Local file

Barthes: A Biography

1
1. chrisaldrich 07 Oct 2022
  
  in Public
  
  So there is nothing exaggerated in thinking that Barthes would havefinally found the form for these many scattered raw materials (indexcards, desk diaries, old or ongoing private diaries, current notes,narratives to come, the planned discussion of homosexuality) ifdeath had not brought his work and his reflection to an end. Thework would certainly not have corresponded to the current definitionof the novel as narration and the unfolding of a plot, but the historyof forms tells us that the word ‘novel’ has been used to designate themost diverse objects.
  
  Just as in Jason Lustig's paper about Gotthard Deutsch's zettelkasten, here is an example of an outside observer bemoaning the idea of things not done with a deceased's corpus of notes.
  
  It's almost like looking at the "corpus" of notes being reminiscent of the person who has died and thinking about what could have bee if they had left. It gives the impression that, "here are their ideas" or "here is their brain and thoughts" loosely connected. Almost as if they are still with us, in a way that doesn't quite exist when looking at their corpus of books.
  
  Gotthard Deutsch's zettelkasten Jason Lustig corpus of notes posthumous notes
Tags

posthumous notes

Gotthard Deutsch's zettelkasten

Jason Lustig

corpus of notes

Annotators

chrisaldrich
Sep 2022
Local file Local file

Introduction to the study of history

2
1. chrisaldrich 09 Sep 2022
  
  in Public
  
  The systematic order, or arrangement by sub-jects, is not to be recommended for the compilationof a Corpus or of regesta.
  
  corpus regesta order arrangement scientific note taking
2. chrisaldrich 08 Sep 2022
  
  in Public
  
  We distinguish between the historian whoclassifies verified documents for the purposes ofhistorical work, and the scholar who compiles" Regesta'' By the words " Regesta " and " Corpus ''we understand methodically classified collections ofhistorical documents. In a " Corpus " documentsare reproduced in extenso ; in " Regesta " they areanalysed and described.
  
  a few technical words to clearly define within this context versus other related contexts.
  
  regesta in extenso classification corpus
Tags

classification

arrangement

in extenso

scientific note taking

corpus

regesta

order

Annotators

chrisaldrich
Aug 2022
Local file Local file

Directions and Suggestions for the Writing of Essays or Theses in History

1
1. chrisaldrich 20 Aug 2022
  
  in Public
  
  Mechanical form.Use standard size (8t/,xll in.) type-writer pa er or the essay paper in standard use a t the in-stitution. %or typing, use an unruled bond paper of goodquality, such as “Paragon Linen” or “Old Hampshire Mills.”At the left of the page leave a margin of 1% to l’/e inches;and a t the top, bottom, and right of the page, a margin of1 inch. Write only on one side of the paper. In ty in thelines should be double-spaced. Each chapter shouyd feginon a new page. Theses for honors and degrees must be typed;other essays may be typed or legibly written in ink. Whetherthe essay is typed or written, the use of black ink is prefer-able. The original typewritten copy must be presented. Incase two copies of a thesis are required, the second copymust be the first carbon and must be on the same quality ofpaper as the original.
  
  Definitely a paragraph aimed at the student in the manner of a syllabus, but also an interesting tidbit on the potential evolution of writing forms over time.
  
  How does language over time change with respect to the types and styles of writing forms, particularly when they're prescribed or generally standardized over time? How do these same standards evolve over time and change things at the level of the larger pictures?
  
  historical linguistics syllabi paper sizes paper standards corpus linguistics style vs. substance
Tags

corpus linguistics

syllabi

historical linguistics

style vs. substance

paper standards

paper sizes

Annotators

chrisaldrich
Local file Local file

Language and Mind, Third Edition

1
1. chrisaldrich 11 Aug 2022
  
  in Public
  
  I recall being told by a distinguishedanthropological linguist, in 1953, that he had no intention of working througha vast collection of materials that he had assembled because within a few yearsit would surely be possible to program a computer to construct a grammar froma large corpus of data by the use of techniques that were already fairly wellformalized.
  
  rose colored glasses...
  
  corpus linguistics humans vs. computers artificial intelligence
Tags

corpus linguistics

artificial intelligence

humans vs. computers

Annotators

chrisaldrich
Apr 2022
docdrop.org docdrop.org

Too Much to Know: Managing Scholarly Information before the Modern Age

1
1. chrisaldrich 07 Apr 2022
  
  in Public
  
  Yeshiva teaching in the modern period famously relied on memorization of the most important texts, but a few medieval Hebrew manu-scripts from the twelfth or thirteenth centuries include examples of alphabetical lists of words with the biblical phrases in which they occurred, but without pre-cise locations in the Bible—presumably because the learned would know them.
  
  Prior to concordances of the Christian Bible there are examples of Hebrew manuscripts in the twelfth and thirteenth centuries that have lists of words and sentences or phrases in which they occurred. They didn't include exact locations with the presumption being that most scholars would know the texts well enough to quickly find them based on the phrases used.
  
  Early concordances were later made unnecessary as tools as digital search could dramatically decrease the load. However these tools might miss the value found in the serendipity of searching through broad word lists.
  
  Has anyone made a concordance search and display tool to automatically generate concordances of any particular texts? Do professional indexers use these? What might be the implications of overlapping concordances of seminal texts within the corpus linguistics space?
  
  Fun tools like the Bible Munger now exist to play around with find and replace functionality. https://biblemunger.micahrl.com/munge
  
  Online tools also have multi-translation versions that will show translational differences between the seemingly ever-growing number of English translations of the Bible.
  
  concordances search discovery find and replace text editors Bible Munger corpus linguistics satire bible translation
Visit annotations in context

Tags

corpus linguistics

bible translation

text editors

search

discovery

satire

Bible Munger

concordances

find and replace

Annotators

chrisaldrich

URL

docdrop.org/download_annotation_doc/Too-Much-to-Know_-Managing-Scho---Blair-Ann-M_-5eglr.pdf
Feb 2022
www.robinsloan.com www.robinsloan.com

Writing with the machine

1
1. chrisaldrich 16 Feb 2022
  
  in Public
  
  Together: responsive, inline “autocomplete” powered by an RNN trained on a corpus of old sci-fi stories.
  
  I can't help but think, what if one used their own collected corpus of ideas based on their ever-growing commonplace book to create a text generator? Then by taking notes, highlighting other work, and doing your own work, you're creating a corpus of material that's imminently interesting to you. This also means that by subsuming text over time in making your own notes, the artificial intelligence will more likely also be using your own prior thought patterns to make something that from an information theoretic standpoint look and sound more like you. It would have your "hand" so to speak.
  
  hand information theory text generators le mot juste corpus linguistics adjacent possible commonplace books
Visit annotations in context

Tags

text generators

corpus linguistics

commonplace books

adjacent possible

information theory

hand

le mot juste

Annotators

chrisaldrich

URL

robinsloan.com/notes/writing-with-the-machine/
Jan 2022
vimeo.com vimeo.com

Eyeo 2017 - Robin Sloan

1
1. chrisaldrich 31 Jan 2022
  
  in Public
  
  https://vimeo.com/232545219
  
  from: Eyeo Conference 2017
  
  http://eyeofestival.com/2017/speaker/robin-sloan/
  
  https://vimeo.com/channels/eyeo2017/232545219
  
  Description
  
  Robin Sloan at Eyeo 2017 | Writing with the Machine | Language models built with recurrent neural networks are advancing the state of the art on what feels like a weekly basis; off-the-shelf code is capable of astonishing mimicry and composition. What happens, though, when we take those models off the command line and put them into an interactive writing environment? In this talk Robin presents demos of several tools, including one presented here for the first time. He discusses motivations and process, shares some technical tips, proposes a course for the future — and along the way, write at least one short story together with the audience: all of us, and the machine.
  
  Notes
  
  Robin created a corpus using If Magazine and Galaxy Magazine from the Internet Archive and used it as a writing tool. He talks about using a few other models for generating text.
  
  Some of the idea here is reminiscent of the way John McPhee used the 1913 Webster Dictionary for finding words (or le mot juste) for his work, as tangentially suggested in Draft #4 in The New Yorker (2013-04-22)
  
  Cross reference: https://hypothes.is/a/t2a9_pTQEeuNSDf16lq3qw and https://hypothes.is/a/vUG82pTOEeu6Z99lBsrRrg from https://jsomers.net/blog/dictionary
  
  Croatian acapella singing: klapa https://www.youtube.com/watch?v=sciwtWcfdH4
  
  Writing using the adjacent possible.
  
  Corpus building as an art [~37:00]
  
  Forgetting what one trained their model on and then seeing the unexpected come out of it. This is similar to Luhmann's use of the zettelkasten as a serendipitous writing partner.
  
  Open questions
  
  How might we use information theory to do this more easily?
  
  What does a person or machine's "hand" look like in the long term with these tools?
  
  Can we use corpus linguistics in reverse for this?
  
  What sources would you use to train your model?
  
  References:
  
  Andrej Karpathy. 2015. "The Unreasonable Effectiveness of Recurrent Neural Networks"
  
  Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, et al. "Generating sentences from a continuous space." 2015. arXiv: 1511.06349
  
  Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. 2017. "A Hybrid Convolutional Variational Autoencoder for Text generation." arXiv:1702.02390
  
  Soroush Mehri, et al. 2017. "SampleRNN: An Unconditional End-to-End Neural Audio Generation Model." arXiv:1612.07837 applies neural networks to sound and sound production
  
  writing tools for thought artificial intelligence neural networks Andrej Karpathy Eyeo Festival Robin Sloan corpus linguistics Webster's dictionary John McPhee Draft #4 2017 Croatia acapella klapa throat singing Milman Parry adjacent possible le mot juste
Visit annotations in context

Tags

2017

Milman Parry

John McPhee

Croatia

le mot juste

klapa

Eyeo Festival

corpus linguistics

acapella

writing

adjacent possible

Andrej Karpathy

artificial intelligence

tools for thought

throat singing

Webster's dictionary

Draft #4

neural networks

Robin Sloan

Annotators

chrisaldrich

URL

vimeo.com/232545219
Jun 2021
www.theatlantic.com www.theatlantic.com

The Scandal Rocking the Evangelical World

1
1. chrisaldrich 15 Jun 2021
  
  in Public
  
  The viciousness of church politics can rival pretty much any other politics you can name; the difference is that the viciousness within churches is often cloaked in lofty spiritual language and euphemisms.
  
  It would be interesting to examine some of this language and these euphemisms to uncover the change over time.
  
  euphemisms corpus linguistics historical linguistics religion politics
Visit annotations in context

Tags

religion

corpus linguistics

historical linguistics

politics

euphemisms

Annotators

chrisaldrich

URL

theatlantic.com/ideas/archive/2021/06/russell-moore-sbc/619122/
Feb 2021
psyarxiv.com psyarxiv.com

Lessons from lockdown: Media discourse on the role of behavioural science in the UK COVID-19 response

1
1. XanaButt 16 Feb 2021
  
  in BehSci
  
  Sanders, J., Tosi, A., Obradović, S., Miligi, I., & Delaney, L. (2021). Lessons from lockdown: Media discourse on the role of behavioural science in the UK COVID-19 response. PsyArXiv. https://doi.org/10.31234/osf.io/dw85a
  
  is:preprint lang:en COVID-19 response UK behavioral science behavioral policy lockdown trust in science corpus linguistics media discourse analysis Twitter credibility
Visit annotations in context

Tags

lang:en

corpus linguistics

behavioral science

UK

lockdown

COVID-19

behavioral policy

is:preprint

trust in science

Twitter

media discourse analysis

response

credibility

Annotators

XanaButt

URL

psyarxiv.com/dw85a/
www.nybooks.com www.nybooks.com

Extraordinary Commonplaces

1
1. chrisaldrich 14 Feb 2021
  
  in Public
  
  Only fifteen of the thirty-seven commonplace books were written in his hand. He might have dictated the others to a secretary, but the nature of his authorship, if it existed, remains a matter of conjecture. A great deal of guesswork also must go into the interpretation of the entries in his own hand, because none of them are dated. Unlike the notes of Harvey, they consist of endless excerpts, which cannot be connected with anything that was happening in the world of politics.
  
  I find myself wondering what this study of his commonplace books would look like if it were digitized and cross-linked? Sadly the lack of dates on the posts would prevent some knowledge from being captured, but what would the broader corpus look like?
  
  Consider the broader digital humanities perspective of this. Something akin to corpus linguistics, but at the level of view of what a single person reads, thinks, and reacts to over the course of their own lifetime.
  
  How much of a person could be recreated from such a collection?
  
  open questions corpus linguistics commonplace books Frankenstein
Visit annotations in context

Tags

open questions

corpus linguistics

Frankenstein

commonplace books

Annotators

chrisaldrich

URL

nybooks.com/articles/2000/12/21/extraordinary-commonplaces/
voyant-tools.org voyant-tools.org

Creating a Corpus - Voyant Tools Help

1
1. chrisaldrich 06 Feb 2021
  
  in Public
  
  Looks like some serious power hiding in here.
  
  corpus linguistics tools bookmark
Visit annotations in context

Tags

corpus linguistics

bookmark

tools

Annotators

chrisaldrich

URL

voyant-tools.org/docs/
Oct 2020
www.aclweb.org www.aclweb.org

OpusFilter: A Configurable Parallel Corpus Filtering Toolbox

1
1. cindyRkt2015 22 Oct 2020
  
  in Public
  
  OpusFilter
  
  Corpus Tools
Visit annotations in context

Tags

Corpus Tools

Annotators

cindyRkt2015

URL

aclweb.org/anthology/2020.acl-demos.20.pdf
www.nytimes.com www.nytimes.com

Thomas Piketty Turns Marx on His Head

1
1. chrisaldrich 10 Oct 2020
  
  in Public
  
  To have, but maybe not to read. Like Stephen Hawking’s “A Brief History of Time,” “Capital in the Twenty-First Century” seems to have been an “event” book that many buyers didn’t stick with; an analysis of Kindle highlights suggested that the typical reader got through only around 26 of its 700 pages. Still, Piketty was undaunted.
  
  Interesting use of digital highlights--determining how "read" a particular book is.
  
  publishing unread books Amazon Kindle highlights corpus linguistics annotations
Visit annotations in context

Tags

highlights

corpus linguistics

publishing

unread books

Amazon Kindle

annotations

Annotators

chrisaldrich

URL

nytimes.com/2020/03/08/books/review/capital-and-ideology-thomas-piketty.html
adanewmedia.org adanewmedia.org

“Who Do You Think You Are?”: When Marginality Meets Academic Microcelebrity - Ada: A Journal of Gender, New Media, and Technology

1
1. chrisaldrich 10 Oct 2020
  
  in Public
  
  identity IndieWeb IndieWeb for Education Domain of One's Own sociology Economics attention economy social media media studies corpus linguistics
Visit annotations in context

Tags

sociology

corpus linguistics

IndieWeb for Education

IndieWeb

Domain of One's Own

media studies

Economics

attention economy

identity

social media

Annotators

chrisaldrich

URL

adanewmedia.org/2015/04/issue7-mcmillancottom/
Feb 2020
niklasblog.com niklasblog.com

Review: David Adger – ‘Language Unlimited’

1
1. pivic 19 Feb 2020
  
  in Public
  
  The British National Corpus
  
  Find it here.
  
  british national corpus
Visit annotations in context

Tags

british national corpus

Annotators

pivic

URL

niklasblog.com/
Nov 2019
buttondown.email buttondown.email

Humane Ingenuity 9: GPT-2 and You

3
1. chrisaldrich 13 Nov 2019
  
  in Public
  
  From this perspective, GPT-2 says less about artificial intelligence and more about how human intelligence is constantly looking for, and accepting of, stereotypical narrative genres, and how our mind always wants to make sense of any text it encounters, no matter how odd. Reflecting on that process can be the source of helpful self-awareness—about our past and present views and inclinations—and also, some significant enjoyment as our minds spin stories well beyond the thrown-together words on a page or screen.
  
  And it's not just happening with text, but it also happens with speech as I've written before: Complexity isn’t a Vice: 10 Word Answers and Doubletalk in Election 2016 In fact, in this mentioned case, looking at transcripts actually helps to reveal that the emperor had no clothes because there's so much missing from the speech that the text doesn't have enough space to fill in the gaps the way the live speech did.
  
  doubletalk speech corpus linguistics
2. chrisaldrich 13 Nov 2019
  
  in Public
  
  The most interesting examples have been the weird ones (cf. HI7), where the language model has been trained on narrower, more colorful sets of texts, and then sparked with creative prompts. Archaeologist Shawn Graham, who is working on a book I’d like to preorder right now, An Enchantment of Digital Archaeology: Raising the Dead with Agent Based Models, Archaeogaming, and Artificial Intelligence, fed GPT-2 the works of the English Egyptologist Flinders Petrie (1853-1942) and then resurrected him at the command line for a conversation about his work. Robin Sloan had similar good fun this summer with a focus on fantasy quests, and helpfully documented how he did it.
  
  Circle back around and read this when it comes out.
  
  Similarly, these other references should be an interesting read as well.
  
  corpus linguistics information theory
3. chrisaldrich 13 Nov 2019
  
  in Public
  
  For those not familiar with GPT-2, it is, according to its creators OpenAI (a socially conscious artificial intelligence lab overseen by a nonprofit entity), “a large-scale unsupervised language model which generates coherent paragraphs of text.” Think of it as a computer that has consumed so much text that it’s very good at figuring out which words are likely to follow other words, and when strung together, these words create fairly coherent sentences and paragraphs that are plausible continuations of any initial (or “seed”) text.
  
  This isn't a very difficult problem and the underpinnings of it are well laid out by John R. Pierce in An Introduction to Information Theory: Symbols, Signals and Noise. In it he has a lot of interesting tidbits about language and structure from an engineering perspective including the reason why crossword puzzles work.
  
  close reading, distant reading, corpus linguistics
  
  close reading distant reading corpus linguistics
Visit annotations in context

Tags

distant reading

doubletalk

corpus linguistics

speech

information theory

close reading

Annotators

chrisaldrich

URL

buttondown.email/dancohen/archive/humane-ingenuity-9-gpt-2-and-you/
Sep 2019
www.theguardian.com www.theguardian.com

When Milton met Shakespeare: poet's notes on Bard appear to have been found

1
1. chrisaldrich 17 Sep 2019
  
  in Public
  
  He is now intending to collaborate with Bourne on a series of articles about the find. “Having these annotations might allow us to identify further books that have been annotated by Milton,” he said. “This is evidence of how digital technology and the opening up of libraries [could] transform our knowledge of this period.”
  
  information theory corpus linguistics
Visit annotations in context

Tags

corpus linguistics

information theory

Annotators

chrisaldrich

URL

theguardian.com/books/2019/sep/16/when-milton-met-shakespeare-poets-notes-on-bard-appear-to-have-been-found
Apr 2019
tressiemc.com tressiemc.com

Why Is Digital Sociology?

1
1. chrisaldrich 05 Apr 2019
  
  in Public
  
  Digital sociology needs more big theory as well as testable theory.
  
  I can't help but think here about the application of digital technology to large bodies of literature in the creation of the field of corpus linguistics.
  
  If traditional sociology means anything, then a digital incarnation of it should create physical and trackable means that can potentially be more easily studied as a result. Just the same way that Mark Dredze has been able to look at Twitter data to analyze public health data like influenza, we should be able to more easily quantify sociological phenomenon in aggregate by looking at larger and richer data sets of online interactions.
  
  There's also likely some value in studying the quantities of digital exhaust that companies like Google, Amazon, Facebook, etc. are using for surveillance capitalism.
  
  sociology corpus linguistics digital sociology surveillance capitalism
Visit annotations in context

Tags

sociology

surveillance capitalism

corpus linguistics

digital sociology

Annotators

chrisaldrich

URL

tressiemc.com/uncategorized/why-is-digital-sociology/
Dec 2018
en.wikipedia.org en.wikipedia.org

ETAP-3 - Wikipedia

1
1. ildar 02 Dec 2018
  
  in Public
  
  A syntactically annotated corpus (treebank) is a part of Russian National Corpus.[2] It contains 40,000 sentences (600,000 words) which are fully syntactically and morphologically annotated. The primary annotation was made by ETAP-3 and then manually verified by competent linguists.
  
  syntax corpus
Visit annotations in context

Tags

syntax

corpus

Annotators

ildar

URL

en.wikipedia.org/wiki/ETAP-3
Jul 2017
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

A corpus-based approach for automated LOINC mapping

1
1. SamRose 10 Jul 2017
  
  in Public
  
  opennlp uom corpus
Visit annotations in context

Tags

opennlp

uom

corpus

Annotators

SamRose

URL

ncbi.nlm.nih.gov/pmc/articles/PMC3912728/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

Tags

Annotators

URL

Tags

Annotators

Tags

Annotators

Tags

Annotators

Tags

Annotators

Tags

Annotators

URL

Tags

Annotators

URL

Description

Notes

Open questions

References:

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL