but increasingly primarily machines,
is there a point when human action will be completely obsolete?
but increasingly primarily machines,
is there a point when human action will be completely obsolete?
There are ongoing and fierce debates on what exactly constitutes a 'persistent' identifier
solidifying definitions is difficult and takes time for people to agree on a final definition
Through a well-defined protocol
who/what decides a "well-defined protocol"
Data should be Findable Data should be Accessible Data should be Interoperable Data should be Re-usable.
is data that is not all four illegitimate?
In reactions to the Turing test, one may easily discern a fear of machine intelligence underlying many of the counterarguments.
limits are essential, but where do we decide to draw the limits is a difficult question
But even as these arguments are advanced, the detractions seem obvious
flawed definitions
Grants in excess of fifty thousand dollars are quite rare in the humanities; grants exceeding twice that amount are by no means unusual in DH
what's the reason for this large discretion
Increasingly, people who publish things online that look like articles and are subjected to the usual system of peer review need not fear reprisal from a hostile review committee
does this imply that information is becoming less and less reliable?
articles about “new media” seldom
what defines "new media"?
Latino precincts are those in which at least 50% of the registered voters are Latinos. The actualLatino population is higher, especially in areas with high concentrations of Latinos without UScitizenship
interesting
including number of registered Democrats and Republicans,and ethnic breakdowns of registered voters and the voting age population. Contextual datafrom the 2000 census by census tract were linked with the precinct-level data.
difficulty with people who aren't registered to vote. Also a likely difficulty with people who aren't US citizens and aren't recorded in the census
a consequence of the post-Cold War restructuring of the aerospace industry in Los An-geles
externalities
confusion between residence in the City or County of Los Angeles may reduce ease of politicalparticipation
natural difficulties
applied automatically over a large corpus of digitized texts.
essential
we start by finding references to specific political divisions
it says "we", but is this whole process computerized?
can locate place names using the document's context.
is it foolproof? or does it still misinterpret sometimes
when a text mentions Paris, does the writer mean Paris, Texas, USA or Paris, France
more difficulties with the complexity of language
Such visualizations are possible because of the datasets that have been gathered by scholars over many decades
were all lynchings recorded? How do we know what percentage of lynchings we actually know about?
But both methods share a similar vision and a similar blindness.
consistency in difficulties no matter what methodology
aimed to bring the hidden patterns in the data on lynching to light.
intentionally patterns or subconscious commonalities?
a focus on the designer’s ownsubject position can help to expose the decision
this is how the scientific process is biased and subjective
Is our data the right type?
how can this question ever truly be answered?
One of the key contributions of STS has been to challenge the ideathat science and/or technology is objective and neutral by demonstrat-ing how scientific thought is situated in particular cultural, historical,economic, and social systems [77
how can science be subjective? I can see how studies themselves can be but scientific fact is scientific fact
People are linked through the groups they belong to. Groups are linked through the people they share.
all linked one way or another
links between people and some other kind of thing
correlation analysis?
All I know is whether someone was a member of an organization or not.
can process and evaluate general information, not specifics
has been used to collect and analyze information on more than two hundred and sixty persons (of varying degrees of suspicion) belonging variously to seven different organizations in the Boston area.
effectiveness in processing large amounts of data easily
none of the information that I’ve just told you is immediately evident in the letter on the screen.
oof
For instance, here’s an 1801 letter from Jefferson to William Evans, an innkeeper friend. You can see in this letter that Jefferson makes reference to a “former servant James,” underlined in green. If you click the link, it takes you to an editorial note that informs us that this “former servant” is in fact James Hemings, who, as it turns out, three years after the agreement with Jefferson was recorded and witnessed, finally received his freedom
difficulties in available data can stem from the societal norms of the time
thinks that Hemings most likely made a verbal agreement with Jefferson to return to America
"verbal agreement", assumptions and guesses can be made but we don't know anything for sure that isn't recorded
How does one identify and extract meaning from the unique set of documents that do remain
how valid is the meaning if so many things have been undocumented or lost?
this data to pursue their own lines of inquiry. Getting a broad sense of where the field has b
key distinction
oreover, the overall increase in work on women’s history does not translate into an equitable
outside influences effecting data or interpretation of data?
ting schema (such as Library of Congress subject headings), topic model
allows for the analysis of massive libraries, because no previous knowledge is necessary
pic modeling, a computer science data mining technology that is arguably the state-of-the-art model for text document collections, allows for a m
simple statistical modeling
. America: History and Life (hereafter AHL) focuses on the history of the geographic regions th
reasonable sample size, this shows an effective use of text mining
By highlighting the process of topic modeling, Goldstone and Underwood reveal how different methodological choices may lead to contrasting results.
important, not all processes of topic modeling are the same
Rhody’s work is perhaps the best evidence thus-far that what we might have identified as cohesive “topics” are more complex than simple thematic connections
important to note
It describes a method of extracting clusters of words from sets of documents
statistical extraction, not extraction of meaning
before realizing there isn’t much immediately apparent you can actually do with it
again, need human thought
You’re introduced to topics, and how a computer came to generate them automatically without any prior knowledge of word definitions or grammar.
is this done through frequency of words? Does this still rely on the readers interpretation of the discerned topic?
treating the works themselves as unceremonious “buckets of words,” and providing seductive but obscure results in the forms of easily interpreted (and manipulated) “topics.”
effective uses
we conclude the chapter by saying that all the sentiment analysis tasks are very challenging
common theme, it's hard
The existing research assumes that the document is known to be opinionated
important, can't just throw any text in there
In many cases, opinions are hidden in long forum posts and blogs. It is difficult for a human reader to find relevant sources, extract related sentences with opinions, read them, summarize them, and organize them into usable forms.
weakness in human ability, sometimes unbiased analysis is essential for discovering biases
One of the main reasons for the lack of study on opinions is the fact that there was little opinionated text available before the World Wide Web
why is this? I understand the fact that the web made all text way more accessible, but weren't people still writing opinionated pieces before that?
In this chapter, we only focus on opinion expressions that convey people’s positive or negative sentiments
how would a computer analyze a double negative? Or statements that are societal expressions used to convey sentiment but don't make sense literally?
Most tools at our disposal have weak or nonexistent semanticcapabilities; they count, compare, track, and represent words, but they do not produce meaning
important
Computers can help extend human reading and understanding, especially for largecollections of texts that you couldn't read in a lifetime
how would it discern the meaning of the text?
With such rich and sophisticated analytic environments, do we even need to read texts anymore?
The analysis of texts and contextual meaning, I believe, keeps reading essential. Our minds cannot be entirely replaced
They provide a snapshot,but do not allow exploration and experimentation
common theme here
As a result,there is a significant inequality in the availability of digital texts, one that has a profound effect on the kindsof work that scholars are able to pursue
limiting research
struggling with how to derive meaning from texts, fromhigh-school students researching an essay topic to journalists combing through leaked security documents,or from companies measuring social media reaction to a product launch to historians studying diversity ofimmigration based on more than two centuries of trial proceedings
finding the text one is looking for is computerized, but analyzing and drawing conclusions from such text remains a human capability
were it not for text-based searches of the title, description, andother metadata.
importance of text based search capability-->it is essential it circumstances like this which are non opinionated and just statistical
You type in a word to search for and the interactive returns a simple bar graph that youcan drop into a comment
application of text analysis, purely statistical
One of the stylistic breaks takes place in the 1870s (i.e. after the Civil War), the other in the 1920s (in the period of prosperity before the Great Depression); the third peak is not fully formed yet, even if one can observe an acceleration of language change at the end of the 20th century
interesting
Then we dismiss the original hypothesis, in order to test new ones: we iterate over the timeline, testing the years 1836, 1837, 1838, 1839, … for their discriminating power
interesting that the original hypothesis is dismissed? I don't quite understand why this is?
The procedure randomly picks n text samples written before and after the assumed break;
if computer functions run with such high speed, why even select n samples? Why can't they go through every sample if it's not human time and resources being sacrificed?
Since none of the out-of-the-box techniques is suitable to analyze temporal datasets
what makes temporal data sets different in analyzing? why the difficulty?
is a tacit assumption that the researcher knows in advance which elements of the language are subject to change.
this seems very important, if researcher doesn't know, false data and conclusions could be presented
Although some of these instances might refer to a body of flowing water, there is no guarantee that they all use the word in the same sense
more difficulties with discerning the true meanings of words because computers programs recognizing the words don't have the human awareness and context
statistical methods in applications like search engines, spellcheckers, autocomplete features, and computer vision systems.
better applied in definite statistical situations, not fluid interpretable situations
I argue that, when applied to the study of literary and cultural texts, statistical text-mining methods tend to reinforce conceptions of language and meaning that are, at best, overly dependent on the “literal” definitions of words
this seems obvious, how would a software possibly know to not interpret something literally? It can't sense the humanity in the language
The odds of reporting a high level of psychological strain increase with age
I think a better was to do this, although difficult and maybe not possible, would be to ask the same people over, say, 10 year intervals to see how that individuals psychological strain changes with age
It appears that work has a different meaning for different people.
exactly, everyone views it differently, so seems difficult to generalize the effects of unemployment to make sweeping claims
The data are taken from the 1992 British Household Panel Study (BHPS), which includes data on 7897 individuals.
this is a long time ago... it would be interesting to do the study again as mental health issues seem to be dramatically growing in this day and age
level of general happiness.
how does one measure happiness?
individual characteristics on measurements of psychological well-being
I feel like a topic like psychological well-being is difficult to quantify because it varies so much from person to person... two people in the same exact circumstances may react completely different to their situation
that food prices were positively correlated with workhousedeath rates,
correlation =/ causation, so can we conclude anything from this?
Indeed, no measure isentirely immune to outside conditions.
so if no measure is immune to outside conditions, then is any data truly representative...?
Thequestion is inherently relative
isn't all data relative?
A workhouse in whicheveryone died shortly after admittance might seem badly managed, butthat would not necessarily be the case if it attracted only those in themost extreme state of need.
importance of data in context
Thus the population in a workhouse is a‘choice-based’sample
how is it choice based if some people entered for reasons beyond their control?
Though at its peak 3 million people out ofa total population of 8.5 million were dependent on relief, for people incomfortable circumstances life went on more or less as normal.
generalizing data can cause misinterpretations... importance of fragmenting the society into homogenous groups in order to get accurate data
thereareways ofaddressing this most likely problem thatoftensucceed
How does one know which method of intervention is appropriate for different studies?
variationbetweenpeople, not variationwithin individuals over time.
important distinction
over time.
does length of time have any effect? (long-run vs short-run)
Globally, there appears tobe bias toward males, but when individual graduate schools aretaken into account, there seems to be bias toward females.
this makes it seem like either statistic could be used in an argument depending on the motives of the person arguing... To me, this emphasizes the importance of understanding the motives behind an argument
A treatmentthat appears effective at the population-level may, in fact, haveadverse consequences within each of the population’s subgroups
How does this work? Why and how does it appear effective at the population but not in subgroups? Wouldn't the population effect be the cumulative of subgroup effects?
We show that Simpson’s paradox is most likely to occur wheninferences are drawn across different levels of explanation (e.g., from populations tosubgroups, or subgroups to individuals).
proves the difficulty in generalizing an entire population, subgroups will all be slightly different
he sample size of Census data isalso considerably larger than the sample size of GSS data
The census and GSS each have a specific tradeoff of information that the other doesn't... so is data from either really reliable or "better"?
researchers can classify respondents as same-sex behaving only if theyare in a cohabiting relationship,
misses out a big chunk of potential data from people who aren't in a cohabiting relationship
public policymechanisms must increase the number of tolerant workplaces and not simply punishdiscriminating firms
proactice vs retroactive, I feel like this is a common theme in a lot of topics
Wage differentialsfor same-sex behaving men are surprising because sexual orientation is not a visible trait.
interesting...not at all like wage differences between men and women where the difference is an observable trait then?
These states represent 68 percent of released prisoners in 2005, 69 percentin 2010, and 67 percent in 2012
relatively low percentages- problematic?
Only 23 states provided data for the entire 2005-15 time frame
Did only these states provide data because only these states had favorable data? High chance that data is skewed
collects data submitted voluntarily by state departments of corrections and parole
"submitted voluntarily"- not the best way to collect data
l and legal environment in which they must operate. To deal with such issues, it
This is how it seems reliable data should be collected
Another finding is that death did not choose people at random. Analysis of data for Trinidad indicates that the annual death rate for the shortest quintile of
How can trends in Trinidad possibly be representative of the trends across the world...?
from 1750 to the present. One of the principal findings to date is that native-born Americans reached modern levels of height and nutrition by the time of the American Revolution, but there were long periods of declining nutrition and height during the
this is just odd
Tbis is Tr.uitor tlui;:e Ct:nrntit:s from now
Does anything really allow us to extrapolate this far?
Add to lhis the:.-known probabiJity of I mpt:ria.1 assassi-nation. viceregal tevolt. the conrem1>0rary recurrence of pei·iods <>f economjc depression, the dc.:cli.oing: f;.tte of pboct:u-y c....xplo• rations, the:
is psychohistory more theoretical or factual?
you wiJJ le:lrn ro ap-pJy psychohisrory to all probk111s ;i,.; :1 11:uler of course
applicable to literally anything?
data gathered by the corporation is not nearly as complete or accurate as that gathered by community organizations in the state
problematic
chose to purchase California eviction data covering the same areas for $100,000
data=$$
but we never received an adequate response.
This doesn't seem like a good sign
Some of the groups
implying not all groups were consulted...? Seems sketch
we have encountered major problems with Eviction Lab’s practices of big data production
If data can be manipulated by those collecting it, is any data reliable?
it only cares that you have an identity that is addressable by Facebook.
$$$, has this mindset of data collecting changed at all? Gotten more profit driven or less? More intrusive or less?
individual behavior is regulated through the enforcement of norms
Something I've never thought about... kinda creepy
they were still engaging in an act of identification
How could they possibly not engage in an act of identification? It seems like an unavoidable fact of being a user on the site
The morecomplex the data structure of a computer program, the simpler thealgorithm needs to be, and vice versa.
interesting
Computeralgorithm games, for instance, are experienced by their players as narratives.
Clarify; is it essentially one large database but narratives exist within that database?
Computeralgorithm games, for instance, are experienced by their players as narratives.
Clarify, the overarching structure is a database but there can be narratives within the database?
t is more interesting to think about the ways in which search engine results perpetuate particular narratives that reflect historically uneven distributions of power in society
This had never crossed my mind before, interesting.
Published text on the web can have a plethora of meanings, so in my analysis of all of these results,
Very good thing to note; always be wary of the context
In the case of Google's history of racist bias
Why do these biases exist? Who "put" them there?
Understand which domains are goto when it comes to finding domain experts.
What classifies any given domain as expert? Why would some be expert and others not?