108 Matching Annotations
  1. Nov 2018
    1. There are ongoing and fierce debates on what exactly constitutes a 'persistent' identifier

      solidifying definitions is difficult and takes time for people to agree on a final definition

    1. In reactions to the Turing test, one may easily discern a fear of machine intelligence underlying many of the counterarguments.

      limits are essential, but where do we decide to draw the limits is a difficult question

    2. Grants in excess of fifty thousand dollars are quite rare in the humanities; grants exceeding twice that amount are by no means unusual in DH

      what's the reason for this large discretion

    3. Increasingly, people who publish things online that look like articles and are subjected to the usual system of peer review need not fear reprisal from a hostile review committee

      does this imply that information is becoming less and less reliable?

    1. Latino precincts are those in which at least 50% of the registered voters are Latinos. The actualLatino population is higher, especially in areas with high concentrations of Latinos without UScitizenship

      interesting

    2. including number of registered Democrats and Republicans,and ethnic breakdowns of registered voters and the voting age population. Contextual datafrom the 2000 census by census tract were linked with the precinct-level data.

      difficulty with people who aren't registered to vote. Also a likely difficulty with people who aren't US citizens and aren't recorded in the census

    1. Such visualizations are possible because of the datasets that have been gathered by scholars over many decades

      were all lynchings recorded? How do we know what percentage of lynchings we actually know about?

    1. One of the key contributions of STS has been to challenge the ideathat science and/or technology is objective and neutral by demonstrat-ing how scientific thought is situated in particular cultural, historical,economic, and social systems [77

      how can science be subjective? I can see how studies themselves can be but scientific fact is scientific fact

  2. Oct 2018
    1. has been used to collect and analyze information on more than two hundred and sixty persons (of varying degrees of suspicion) belonging variously to seven different organizations in the Boston area.

      effectiveness in processing large amounts of data easily

    1. For instance, here’s an 1801 letter from Jefferson to William Evans, an innkeeper friend. You can see in this letter that Jefferson makes reference to a “former servant James,” underlined in green. If you click the link, it takes you to an editorial note that informs us that this “former servant” is in fact James Hemings, who, as it turns out, three years after the agreement with Jefferson was recorded and witnessed, finally received his freedom

      difficulties in available data can stem from the societal norms of the time

    2. thinks that Hemings most likely made a verbal agreement with Jefferson to return to America

      "verbal agreement", assumptions and guesses can be made but we don't know anything for sure that isn't recorded

    3. How does one identify and extract meaning from the unique set of documents that do remain

      how valid is the meaning if so many things have been undocumented or lost?

    1. oreover, the overall increase in work on women’s history does not translate into an equitable

      outside influences effecting data or interpretation of data?

    2. ting schema (such as Library of Congress subject headings), topic model

      allows for the analysis of massive libraries, because no previous knowledge is necessary

    3. pic modeling, a computer science data mining technology that is arguably the state-of-the-art model for text document collections, allows for a m

      simple statistical modeling

    4. . America: History and Life (hereafter AHL) focuses on the history of the geographic regions th

      reasonable sample size, this shows an effective use of text mining

    1. By highlighting the process of topic modeling, Goldstone and Underwood reveal how different methodological choices may lead to contrasting results.

      important, not all processes of topic modeling are the same

    2. Rhody’s work is perhaps the best evidence thus-far that what we might have identified as cohesive “topics” are more complex than simple thematic connections

      important to note

    3. You’re introduced to topics, and how a computer came to generate them automatically without any prior knowledge of word definitions or grammar.

      is this done through frequency of words? Does this still rely on the readers interpretation of the discerned topic?

    4. treating the works themselves as unceremonious “buckets of words,” and providing seductive but obscure results in the forms of easily interpreted (and manipulated) “topics.”

      effective uses

    1. In many cases, opinions are hidden in long forum posts and blogs. It is difficult for a human reader to find relevant sources, extract related sentences with opinions, read them, summarize them, and organize them into usable forms.

      weakness in human ability, sometimes unbiased analysis is essential for discovering biases

    2. One of the main reasons for the lack of study on opinions is the fact that there was little opinionated text available before the World Wide Web

      why is this? I understand the fact that the web made all text way more accessible, but weren't people still writing opinionated pieces before that?

    3. In this chapter, we only focus on opinion expressions that convey people’s positive or negative sentiments

      how would a computer analyze a double negative? Or statements that are societal expressions used to convey sentiment but don't make sense literally?

    1. Most tools at our disposal have weak or nonexistent semanticcapabilities; they count, compare, track, and represent words, but they do not produce meaning

      important

    2. Computers can help extend human reading and understanding, especially for largecollections of texts that you couldn't read in a lifetime

      how would it discern the meaning of the text?

    3. With such rich and sophisticated analytic environments, do we even need to read texts anymore?

      The analysis of texts and contextual meaning, I believe, keeps reading essential. Our minds cannot be entirely replaced

    4. As a result,there is a significant inequality in the availability of digital texts, one that has a profound effect on the kindsof work that scholars are able to pursue

      limiting research

    5. struggling with how to derive meaning from texts, fromhigh-school students researching an essay topic to journalists combing through leaked security documents,or from companies measuring social media reaction to a product launch to historians studying diversity ofimmigration based on more than two centuries of trial proceedings

      finding the text one is looking for is computerized, but analyzing and drawing conclusions from such text remains a human capability

    6. were it not for text-based searches of the title, description, andother metadata.

      importance of text based search capability-->it is essential it circumstances like this which are non opinionated and just statistical

    7. You type in a word to search for and the interactive returns a simple bar graph that youcan drop into a comment

      application of text analysis, purely statistical

    1. One of the stylistic breaks takes place in the 1870s (i.e. after the Civil War), the other in the 1920s (in the period of prosperity before the Great Depression); the third peak is not fully formed yet, even if one can observe an acceleration of language change at the end of the 20th century

      interesting

    2. Then we dismiss the original hypothesis, in order to test new ones: we iterate over the timeline, testing the years 1836, 1837, 1838, 1839, … for their discriminating power

      interesting that the original hypothesis is dismissed? I don't quite understand why this is?

    3. The procedure randomly picks n text samples written before and after the assumed break;

      if computer functions run with such high speed, why even select n samples? Why can't they go through every sample if it's not human time and resources being sacrificed?

    4. Since none of the out-of-the-box techniques is suitable to analyze temporal datasets

      what makes temporal data sets different in analyzing? why the difficulty?

    5. is a tacit assumption that the researcher knows in advance which elements of the language are subject to change.

      this seems very important, if researcher doesn't know, false data and conclusions could be presented

    1. Although some of these instances might refer to a body of flowing water, there is no guarantee that they all use the word in the same sense

      more difficulties with discerning the true meanings of words because computers programs recognizing the words don't have the human awareness and context

    2. statistical methods in applications like search engines, spellcheckers, autocomplete features, and computer vision systems.

      better applied in definite statistical situations, not fluid interpretable situations

    3. I argue that, when applied to the study of literary and cultural texts, statistical text-mining methods tend to reinforce conceptions of language and meaning that are, at best, overly dependent on the “literal” definitions of words

      this seems obvious, how would a software possibly know to not interpret something literally? It can't sense the humanity in the language

    1. The odds of reporting a high level of psychological strain increase with age

      I think a better was to do this, although difficult and maybe not possible, would be to ask the same people over, say, 10 year intervals to see how that individuals psychological strain changes with age

    2. It appears that work has a different meaning for different people.

      exactly, everyone views it differently, so seems difficult to generalize the effects of unemployment to make sweeping claims

    3. The data are taken from the 1992 British Household Panel Study (BHPS), which includes data on 7897 individuals.

      this is a long time ago... it would be interesting to do the study again as mental health issues seem to be dramatically growing in this day and age

    4. individual characteristics on measurements of psychological well-being

      I feel like a topic like psychological well-being is difficult to quantify because it varies so much from person to person... two people in the same exact circumstances may react completely different to their situation

    1. A workhouse in whicheveryone died shortly after admittance might seem badly managed, butthat would not necessarily be the case if it attracted only those in themost extreme state of need.

      importance of data in context

    2. Though at its peak 3 million people out ofa total population of 8.5 million were dependent on relief, for people incomfortable circumstances life went on more or less as normal.

      generalizing data can cause misinterpretations... importance of fragmenting the society into homogenous groups in order to get accurate data

    1. Globally, there appears tobe bias toward males, but when individual graduate schools aretaken into account, there seems to be bias toward females.

      this makes it seem like either statistic could be used in an argument depending on the motives of the person arguing... To me, this emphasizes the importance of understanding the motives behind an argument

    2. A treatmentthat appears effective at the population-level may, in fact, haveadverse consequences within each of the population’s subgroups

      How does this work? Why and how does it appear effective at the population but not in subgroups? Wouldn't the population effect be the cumulative of subgroup effects?

    3. We show that Simpson’s paradox is most likely to occur wheninferences are drawn across different levels of explanation (e.g., from populations tosubgroups, or subgroups to individuals).

      proves the difficulty in generalizing an entire population, subgroups will all be slightly different

    1. he sample size of Census data isalso considerably larger than the sample size of GSS data

      The census and GSS each have a specific tradeoff of information that the other doesn't... so is data from either really reliable or "better"?

    2. researchers can classify respondents as same-sex behaving only if theyare in a cohabiting relationship,

      misses out a big chunk of potential data from people who aren't in a cohabiting relationship

    3. public policymechanisms must increase the number of tolerant workplaces and not simply punishdiscriminating firms

      proactice vs retroactive, I feel like this is a common theme in a lot of topics

    4. Wage differentialsfor same-sex behaving men are surprising because sexual orientation is not a visible trait.

      interesting...not at all like wage differences between men and women where the difference is an observable trait then?

    1. Only 23 states provided data for the entire 2005-15 time frame

      Did only these states provide data because only these states had favorable data? High chance that data is skewed

    1. Another finding is that death did not choose people at random. Analysis of data for Trinidad indicates that the annual death rate for the shortest quintile of

      How can trends in Trinidad possibly be representative of the trends across the world...?

    2. from 1750 to the present. One of the principal findings to date is that native-born Americans reached modern levels of height and nutrition by the time of the American Revolution, but there were long periods of declining nutrition and height during the

      this is just odd

    1. Add to lhis the:.-known probabiJity of I mpt:ria.1 assassi-nation. viceregal tevolt. the conrem1>0rary recurrence of pei·iods <>f economjc depression, the dc.:cli.oing: f;.tte of pboct:u-y c....xplo• rations, the:

      is psychohistory more theoretical or factual?

  3. Sep 2018
    1. we have encountered major problems with Eviction Lab’s practices of big data production

      If data can be manipulated by those collecting it, is any data reliable?

    1. it only cares that you have an identity that is addressable by Facebook.

      $$$, has this mindset of data collecting changed at all? Gotten more profit driven or less? More intrusive or less?

    2. they were still engaging in an act of identification

      How could they possibly not engage in an act of identification? It seems like an unavoidable fact of being a user on the site

    1. Computeralgorithm games, for instance, are experienced by their players as narratives.

      Clarify; is it essentially one large database but narratives exist within that database?

    2. Computeralgorithm games, for instance, are experienced by their players as narratives.

      Clarify, the overarching structure is a database but there can be narratives within the database?

    1. t is more interesting to think about the ways in which search engine results perpetuate particular narratives that reflect historically uneven distributions of power in society

      This had never crossed my mind before, interesting.

    2. Published text on the web can have a plethora of meanings, so in my analysis of all of these results,

      Very good thing to note; always be wary of the context

    1. Understand which domains are goto when it comes to finding domain experts.

      What classifies any given domain as expert? Why would some be expert and others not?