1,192 Matching Annotations
  1. Mar 2016
    1. Therefore, it is worth noting here that a large majority of the tweets in our dataset (75.3%) are retweets. In contrast, only a small minority of tweets (7.6%) contain @-mentions outside of a retweet context.

      Wow, clearly retweets are huge on Twitter. I knew it was a big portion of the traffic but didn't know it was this big.

    2. Figure 2

      I love this presentation -- is it a standard visualization technique?

    3. To account for both a site’s birth date and content production, we use as a starting point the date that a site was first crawled by the Internet Wayback Machine.

      Another interesting use of the Internet Archive, to determine when a website started amassing content. It seems like one of the only ways to do it perhaps, without talking to people running the website, or attempting to do a full crawl and then looking for dates.

    4. But we have known since the 1970s that networks consisting of weak ties are valuable for other reasons.14 Specifically, they are critical for broadly and efficiently distributing infor-mation produced by network members. I

      Twitter also is all about weak ties.

    5. One can imagine a potentially different structure for this network. It could be a dense network with many reciprocal ties—conducive to building trust between connections. Such trust would be necessary if what those trafficking in Black Lives Matter-related content on the Web were trying to do was, say, organize clandestine gatherings, or circulate ideas for how to mobilize, or develop strategic action plans.

      I imagine there are some personal email archives that might resemble that type of network.

    6. VOSON

      VOSON is new to me: http://voson.anu.edu.au/ seems like there could be some useful functionality here for DocNow. It looks like a desktop application.

    7. As an indication of this, the Internet Archive’s Wayback Machine did not first crawl the site until October 8, 2014.

      It is interesting that the presence in IA is a measure of how much content the website has created. A content analysis of the site itself would be better, but probably more time consuming, and perhaps not worth the effort.

    8. Official websites are usually extensions of individuals’ and organizations’ digital identities. Accord-ingly, individuals and organizations often tether their Twitter and Web accounts to freely circulate content between them.

      The connection of profiles to websites seems like an important link I hadn't really considered for DocNow. It speaks to who the content creators are, or at least who they say they are.

    9. Consistent with BLM’s origin and Twitter activity patterns, the BlackLivesMatter.com website was created on July 17, 2013—just days after George Zimmerman was acquitted for killing Trayvon Martin

      I thought the website came later. DNS records could be useful in this analysis.

    10. Ultimately, looking beyond Twitter provides a more complete account of how online media have influenced the social and political discourse around race and criminal justice, both online and off

      It is interesting that they start off of social media and on the Web in general. I would've thought the progression would've been looking at their data and then expanding outwards from there on to the Web. But perhaps the narrative about social media requires that we understand the way the Web in general works first?

    11. These questions address at both a macro level and a micro level who was heard most frequently and what they said. W

      Who was heard most frequently and what they said. I wonder if the analysis at the macro level informed the analysis at the micro level. Kind of like how generalized surveys can provide a basis for doing interviews.

    12. Eric Garne

      I didn't know that the BLM story focused on Eric Garner just prior to Michael Brown. I remember the I Can't Breathe hashtag trending after Ferguson hit the major news venues.

    13. We did not divide the data into equal time units, but rather set their boundaries at points when the Twitter discussion rose and fell drastically.

      Ahah, so the episodes are just looking at the overall numbers over time. This seems like an obvious thing for DocNow to do. But it requires access to historical data.

    14. Examining the data as a series of sequential time periods allows us to capture and describe this adaptive process in greater detail than considering the entire year as a single unit

      We took a similar approach in our study of Ferguson, but it was more driven by the data we collected. I wonder how they identified these episodes?

    15. site’s search rank

      Ooh this could be a useful metric in DocNow.

    16. 136,587 websites

      Websites not resources/documents!? Dang!

    17. research software package

      This sounds like something to look at for DocNow.

    18. another of 45 keywords

      I hope these are available somewhere...

    19. Twitter, BLM participant interviews, and the open Web

      These three sources might map well to DocNow: twitter, web content and interviews.

    20. We would also like to stress that this report is not a work of advocacy; that said, all of the authors personally share BLM’s core concerns, which directly affect each of us and our respective families. But we do not believe that fundamentally agreeing with BLM compromises this report’s rigor or findings any more than agreeing with the Civil Rights Movement or feminism compromises research on those topics. On the contrary, our strong interests in ending police brutality and advancing racial justice more generally inspire us to get the empirical story right, regardless of how it may reflect on the involved parties

      This is a super example of being transparent and self-reflective about research motivations while still emphasizing the focus on empirical methods.

    21. The report’s specific contribution is to draw a set of conclusions about the roles online media played in the movement during a critical time in its history.

      Focus on the system of social media, and its role/use in BLM.

    22. To clarify our discussions in the following pages, then, we will use the term “Black Lives Matter” to refer to the official organi-zation; “#Blacklivesmatter” to refer to the hashtag; and “BLM” to refer to the overall movement.

      This seems like a useful nomenclature.

    23. Studies have revealed a decades-long de-cline in youth civic engagement as traditionally defined: that is, interacting with established civic and political institutions such as government, long-established community organiza-tions, and K12 civics classes.

      I didn't know about this. I wonder if it is a general trend?

    24. The general idea here is that social media helps level a media playing field dominated by pro-corporate, pro-government, and (in the United States) anti-Black ideologies.

      I like how the argument is that it is a relative leveling. It's not like there aren't power systems at play on Twitter as well...but they are different from traditional mass media outlets.

    25. Because not every social movement uses online media in the same ways, it is important to under-stand each new movement’s digital activities on their own terms.

      Are different methodologies and tools needed as well?

    1. we also need to recognize how we are implicated here as digital researchers into the politics we purport to critique

      This is hard to do. I wonder if there are some useful techniques for achieving it.

    2. tools that seek to make visible the power relations of the digital infrastructures but that actually generate those power relations in the act of making them visible (boyd and Crawford 2012)

      Such an important point.

    1. it puts full accountability on the authority sharing the data

      I imagine the the authority sharing the data could actually be a chain of custody; where the organization sharing it has acquired it from another organization.

    2. perhaps it is more ethical to acknowledge that it is happening without explicit individual consent.

      counter-intuitive, but it's an interesting position.

    3. the data would gradually become repurposed in a process that the surveillance field terms ‘function creep’, making people’s consent meaningless

      They consented to many uses, not a particular use.

    4. control often becomes defined as care in emergency situations

      And control is all about who is doing the controlling and who is controlled. These contexts can shift & drift. Data collected for one purpose can be put to another purpose once it becomes available.

    5. This argument, although in line with all existing data protection rules and norms, is problematic in a practical context. Consent without purpose limitation – knowing what one is consenting to – is widely judged to be legally (and practically) meaningless.

      The documents are just too hard to parse, and place in context.

    1. However, very little is known about how these policies translate into the actual appraisal of Web content.

      If you have evidence to the contrary please do get in touch. It would actually help me move on to another problem if this is even partially addressed by someone else's research out there.

    1. This case study performs an in-depth investigation of the way that CDRs could be used to track Ebola, revealing that they’re only useful when re-identified, invalidating anonymization as a balancing approach to privacy, and thus legal, protection.

      This is a fascinating angle. You need to know the identities for the data to be useful in these situations.

    2. there is very little evidence to suggest that CDRs, especially those that have been anonymized, are useful to track the spread of the Ebola virus

      Is there evidence that it's not?

    3. Not only were these information systems unresponsive, they were disconnected from the responders, meaning they didn’t have any ability to answer questions, provide treatment, or even refer people to facilities that could provide treatment

      This is just sad.

    4. e-mail and Google Fusion tables

      Fascinating. Lowest barrier to entry.

    5. Political relationships are one of – if not the – most determinative factor in access to both information and funding support.

      Is political here another word for power?

    6. hat job is no small feat - there were more than 50 separate technology systems introduced during the response effort alone

      This sounds like an interesting study. Do the 50 systems map to 50 organizations? How were they connected up?

    7. It is practically easier and financially beneficial for humanitarian organizations to develop their own information systems, instead of focusing on building functional communication sharing.

      Data sharing is harder than not data sharing. This seems almost obvious? But it's less logical that there can be negative incentives to data sharing.

    8. they discount the value and importance of building functional communication standards and coordination frameworks.

      long term vs short term thinking ; triage in the emergency room

    9. The assumption that open and interoperable data will lead to better health response is untested, as is the assumption that mobile network data records measurably improve health system response efforts.

      Is it not tested, because it seems so logical? If you know person A died of Ebola, and you are able to track As whereabouts for the last 3 weeks, and see who they came into contact with, it's possible (in theory) to identify people A may have transmitted the disease to?

    10. That these powers are largely being outsourced to international organizations without the institutional capacity, processes, regulation, standards, infrastructure, or appropriate risk frameworks, is why we should all be concerned

      These are some powerful organizations.

    11. There has been no public presentation about whether or how mobile data information was actually used – or what the effect of that use was.

      This is particularly damning. There should be some kind of public output when the public's privacy is breached like this.

    12. due process and fair compensation

      Kind of like the Bush Administration routing around the FISA court to obtain the same information in the US.

    13. These laws, taken together, form a broad protection for the privacy of mobile data – requiring user consent or a governmental invocation of emergency powers in order to compel their release.

      Ok, so it looks like the ethics were pretty clear, at leaset in Liberia.

    14. pplicable legal frameworks

      There was probably a fair amount of pressure to act quickly to stop the spread of the disease instead of letting lawyers debate and untangle the ethics of privacy in many different legal jurisdictions.

    15. However, Ebola is not a vector-borne disease, meaning that the same probabilities aren’t a useful indicator of transmission.

      I feel like I should understand what this means, but I don't.

    Annotators

    1. it has been shownthat they are more unreliable than considering high-confidence ma-chine tags

      Wow, that is odd.

    2. Since we areinterested in determining how many photos are taken at night on astreet, we count the number of pictures that are classified asnight,and the number of those that are classified otherwise.

      Flickr generate these tags?

    3. We gather a random sample of7M geo-referencedFlickr pictures within the bounding box of Central London.

      How?

    4. We collect information about all the8KFoursquare venues in London.

      How?

    Annotators

    1. Our evaluation and validation for three different cities with varied physical layouts shows two important results. First, our methodology constitutes a good complement to model and understand in an affordableand near real-time manner land uses in urban environments. In fact, we have shown that residential, commercial and parks & recreation areas are well identified with coverage above 70%. Also, our approach is able to identify a land use, nightlife activity, not being considered up to now by city halls. This has implications from a planning perspective as these areas usually cause noise and security problems and can move over time.

      So there is very little discussion of the sort of type of lens that Twitter provides. What types of people are likely to use Twitter? What types of people don't use Twitter. What types of people enable geolocation? What does this say about the findings?

    2. On the other hand, Cluster 3 is associated to very large activity peaks at night (see Figure 2(c)). These peaks happen at around 20:00-21:00PM during weekdays and between 00:00-06:00AM during the weekends. We observe that the peaks happen earlier in London and Manhattan while a little bit later inMadrid suggesting that nightlife might continue until late hours in this city. Studying the physical layout of these clusters on the city maps, we observe that they cover areas like the East Village in Manhattan; the West End in London and Malasaña/Chueca and Alonso Martinez in Madrid (see Figure 4), areas associated with restaurants, pubs and discos. All these elements suggest that this cluster might represent nightlife activities

      How is this analysis not qualitative? And isn't influenced by the researchers knowledge of New York City?

    3. n order to validate our land use hypotheses, we compare the evaluation results against official land use data released by the NYC Department of City Planning and the NYC Department of Parks & Recreation through the NYC Open Data Initiative (NYC, 2013);

      So did they have no idea what this research said prior to conducting the study and coming up with the hypotheses?

    4. slightly shifted in time.

      timezones?

    5. currently about one percent of the full Firehose set of tweets

      This is not accurate.

    6. The DB index is used to evaluate clustering algorithms, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset.

      This almost makes sense to me :-)

    7. Where N is the number of clusters (neurons), wi and wj represent the position in the two dimensional spaceof the SOM neurons i and j, and σi and σj represent the standard deviation of the elements (geolocated tweets) included in each cluster.

      Go ahead, blind me with science.

    8. SOM is a type of neural network trained using unsupervised learning that produces a two-dimensional representation of the training samples.

      I'm confused, I thought unsupervised learning didn't have training data?

    9. (1) lack of a formal validation of the results using independent land use data; (2) the studies are presented just for one city, somehow limiting the potential generality of the proposed approach; (3) some data sources (mainly cell phone traces) have strong privacy limitations and (3) in some cases supervised approaches are used, which implies the need of having initial knowledge of the city to derive land uses

      Are 1) and 2) related? Can you produce valid results if you only look at one location?

      I wonder what 3) means. Supervised learning leans on training data, so where does this training data come from?

    10. The results are qualitatively presented and validated and no land use information is actually used

      Is this a problem?

    11. GPS data

      What devices generate this?

    12. call detail records (CDRs) from cell-phones;

      Metadata...

    13. without accessing personal details or the content of the user-generated information.

      Isn't place personal? What does this phrase mean here?

    14. and they lack a quantitative validation of the results

      I'm curious, what does this validation look like?

    Annotators

    1. highlighting regions of the text

      Like this! It's actually enabled all over this website.

    1. Another fascinating tidbit: The prominence of the Michael Brown case, relative to some other stories of police violence, is somewhat counterintuitive. The incident that led to Eric Garner's death was captured on video and took place in New York City, the nation's largest media market, while Michael Brown's death was in a tiny suburb in the Midwest. Yet, while #ericgarner was appended to about 4.3 million tweets in the study period, #ferguson showed up in 21.6 million tweets, and #michaelbrown/mikebrown was used in about 9.4 million.

      It is fascinating. I guess part of the story might be the actual boots on the ground in Ferguson, and some savvy organizers on the scene.

  2. Feb 2016
    1. The central part of this pipeline – Open Analysis – has a basic problem: what’s the use of sharing analysis nobody can read or understand? It’s great that people put their analysis online for the world to see, but what’s the point if that analysis is written in dense code that can barely be read by its author?

      Indeed.

    1. That kind of standardization could benefit non-technical users, who would become more familiar with how such projects work and what to expect.

      What if you could interrogate a story, like a bot?

    2. These tools support news organizations in their push to develop new storytelling formats that highlight the relationships between news events and help provide readers with richer context.

      Seems to me that more could be done to automate responses to people who have questions about the media, rather than just pushing information out.

    3. Typically, we imagine an all-or-nothing scenario: all with humans or all with machines. She says that’s wrong; across all kinds of industries the approach to automation has changed to focus on more assistive technologies.

      This is an important insight.

    4. Tom Kent, the AP’s standards editor, acknowledges that mistakes are an issue that the AP takes seriously—but he also points out that human-written stories aren’t error free, either.

      The difference in scale here is significant though isn't it?

    5. Patterson says it’s wrong to blame automation for that kind of error. “If the data’s bad you get a bad story,” she says

      What happened to the actual stock price. Isn't this sort of thing what some people think triggered the real estate mortgage crisis?

    6. Now, the majority of stories go live on the wire without a human editor’s review

      Except for the large companies with huge legal teams? :-) You could imagine hooking up some litigation database to determine likely negative consequence of misprinting something. Now that would be weird right?

    7. this kind of information is easily handled by digital systems

      What was it about this data that made it easy to handle? Its dependence on numbers and statistics?

    8. Information of all types is increasingly accessible in the form of “structured data”—predictably organized information, like a spreadsheet, database, or filled-out form. This makes it well suited for analysis and presentation using computers.

      Is there any chance of journalists publishing structured data? Or are there too many forces working against it? Bots originally were for reading data on the Web, before they were creating it.

    9. Companies like Bloomberg and Thomson-Reuters have built empires on their ability to provide market data to business readers

      the data is now going to extremely valuable

    10. who translated those story models into code the computer could run to create a unique story for each new earnings release

      What do these story models look like? How are they translated into code?

    11. With new tools for discovering and understanding massive amounts of information, journalists and publishers alike are finding new ways to identify and report important, very human tales embedded in big data.

      Human in the loop, telling new types of stories.

    12. Automation is also opening up new opportunities for journalists to do what they do best: tell stories that matter.

      This is the dream, right?

    1. Whatmust we know about how bots work in order to trust them?

      Fascinating question.

    2. ad-infested intermediary page

      LOL

    3. Beyond simplyinformingthe reader, news bots can havemore complex functions, such as:reporting/recommending breaking news(@WikiLive-Mon, which draws on Wikipedia edits, recommends “breaking news candidates” basedon the frequency of article edits in a given time period).

      Seems like it could be difficult to get at they Why question.

    Annotators

    1. Moststudiesfindeitherno,oronlyverysmall,effectsofpersonalizationonsearchresults.

      Then why are they doing it?

    Annotators

    1. Her willingness to share information has little to do with me or my ridiculous consent form, but it is about a kind of openness or mutual exchange.

      Information flows both ways. I can see why this would be important.

    2. And yet. Who is being protected by this consent form?

      Indeed, it's not operating to protect the participant. It's protecting the institution.

    1. The need for multiple local grain mills waned as the economy shifted with the arrival ofthe railroads from grain to beef production, and flour became readily available for pur-chase.

      It doesn't seem like they needed this study to discover this though.

    2. Farms and villages were most often connected by footpaths

      Good times.

    3. One of the main factors influencing the change in the agricultural economy that began inthe mid-nineteenth century was a nation-wide expansion of the road system

      How do they know this if they didn't study it? {{Citation Needed}}

    4. The pond

      That is a pond!?

    5. We were able to walk in thefield and see exactly where we were on the old maps, and found several new features,including old roadways and the remains of structures related to mills and ponds that are nolonger extant.

      Maps lend themselves to some forms of Archaeology .

    6. We have conducted research in this area for over 35 years

      The authors have worked together for 35 years?

    7. We highlightdifferent successful strategies for integrating data that are often incongruent in scales oftime and space, as they have been created for widely differing purposes with variablecontent, and uneven temporal and spatial coverage.

      What does this even mean?

    8. transdisciplinary

      Wait, now it is transdisciplinary instead of interdisciplinary? Make up your minds!

    9. but the overall result falls short of atruly integrated effort

      How do you know if they are integrated or not?

    10. integrate

      Is a single paper that takes those approaches considered integrated?

    11. multidisciplinary

      How is multidisciplinary different from interdisciplinary?

    12. The third group covers other interdisciplinaryhistorical landscape researc

      Doesn't this paper fall in that group? It seems like they are going out of their way to distinguish their approach as unique. Perhaps going a bit too far?

    Annotators

    1. Giventhatitisusuallynotfeasibletoseekadditionalconsent,aprofessionaljudgementmayhavetobemadeaboutwhetherreuseofthedataviolatesthecontractmadebetweensubjectsandtheprimaryresearchers(Hinds,VogelandClarkeSteffen1997).

      Is this a consideration an IRB can make?

    Annotators

    1. One problematic answer is that everything is emergent, every-thing is interrelated, and therefore“it is complex”and nothing can be understood, let alonedone.

      Sounds like Latour?

    2. In contrast to the traditional assumption that the organization is thebounded container for work and tech-nology, many scientists work at a university but identify professionally with their discipline—an invisibleacademic college of similar scholars

      Isn't this paper, with authors from multiple organizations an example of neo-STS work?

    3. Next we show how they can be applied to yield valuable in-sights into three novel working arrangements:

      This is why it won best paper: putting the idea into practice, not just a reformulation of prior research.

    4. The resulting work systems are better described as a“negotiated order”among different organiza-tions and individuals (Strauss, 1978).

      Negotiated order from ethnography?

    5. newformsthat are greater than their components

      Interesting way of looking at things. It seems like another way of looking at innovation?

    6. Context collapse in social networking systems (Vitak, 2012)isnottheelimination of boundaries so much as their reconfiguration.

      Where context is an organization?

    7. not necessarily encapsulated within single organizations

      Ahah, yes -- a single organization is and kind of always was, a myth or mirage.

    8. traditionalorganization

      Yes, traditional -- but that doesn't mean there is no organization.

    9. ree and Open Source Software (FLOSS) provides free,reusable, easily available software7and these infrastructure-based technologies are enduring even withoutan encapsulating organization to design and manage them.

      Are they really saying there are no organizations behind these FOSS projects?

    10. work is no longer tied to systems built and managed by a single organization, but is ratherenabled by broader infrastructures

      couldn't our idea of what is an organization be changing?

    11. informal communities of practice

      aren't these still organizations?

    12. erosion and transcendence

      odd juxtaposition of words

    13. Much of the work in participatorydesign and end user involvement is rooted in the STS tradition,“jointly optimizing”the technology and relat-ed work with attention to the overall user conditions within an organization (

      I didn't expect the connection to participatory design...

    14. IS scholarship that is rooted in the STS tradition will be limited in its ability to address the organization ofwork outside of traditional organizational containers. Organizations no longer create and control many of theIS their workers rely on. Infrastructures and systems exist outside of and independent of the organizationsthat use them.

      Innovation happens elsewhere.

    15. The technology reaches beyond traditional local sociotechnical ensembles, acrosslarge numbers of organizations, and shapes industries, institutions, and society.

      The World Wide Web.

    Annotators

    1. Find ways to integrate individual career needs with the achievement of team goals

      This seems super important.

    2. Of all the aspects of team science, sharing recognition and credit is among the most difficult to master.

      What is an author again ...

    Annotators

    1. to lead

      Could this idea of leadership be a bit antiquated?

    2. The emergent nature of these ad-hoc scientific collaborations surfacesthe final important trend –the increased participation of the non-professional(Shirky 2008)

      Ahh, here is an anti-disciplinary thread.

    3. interdisciplinary, multidisciplinary and transdisciplinary

      Reminds me of Joi Ito's anti-disciplinary :-) how does that fit?

    4. the

      Is it the future or a future?

    5. All too often VSOs underestimate both the complexity and importance of their technical infrastructure, only to be frustrated later when spending more time debugging software than doing science.

      This sounds like infrastructure work instead of scientific work? Building foundations so that science can be done?

    6. ill-fitted to the tasks at hand

      HCI, usability interests here. How can tools be built to the tasks required of VSO?

    7. Amateur astronomers or ornithologists are a classic example. Here the asset maps, governance structures and knowledge flows are more diffuse.

      Andrew Wiggins work with ornithology -- I wonder how her findings relate to these VSOs?

    8. Optimal choices in assets, governance, and knowledge flow will also vary by the scale and scopeof the endeavor.

      Are there good examples of scaling up & down these organizations?

    9. In a short period of time, these VSOs need to quickly determine what assets they have, who makes decisions and who needs to talk to whom.

      How can artifacts & results outlive the VSO?

    10. The longevity of VSOs can also vary from temporary through recurring to permanent.

      What has happened in the last 13 years with the Human Genome Project. Is the data still open? How is it being used, built upon?

    11. Systems that support different-time, different-place interactions preserve the history of interactions in repositories, blogs, and wikis.

      Is it interesting/useful to think of timbl's World Wide Web as a product of VSO work?

    12. governance decisions can be codified into the technologies used by the VSO

      Software/platform studies relevant here?

    13. o research teams that contributed data to the shared human genome database receive credit for their intellectual contributions that is equivalent to a journal publication?

      Reminds me of the What is an Author piece from last week.

    14. periodic face-to-face meetings of all 20 centers and weekly conference calls between the 5 largest centers to share advances in a "lab meeting"format.

      This doesn't sound so virtual.

    15. The DOE championed the project as a means of tracking mutations caused by radiation

      Weird. Was this some kind of perceived risk?

    16. Secondly, intellectual resources are becoming more evenly distributed around the world.

    Annotators

    1. However, this restriction will not apply in the event of the occurrence (certified by the United States Centers for Disease Control or successor body) of a widespread viral infection transmitted via bites or contact with bodily fluids that causes human corpses to reanimate and seek to consume living human flesh, blood, brain or nerve tissue and is likely to result in the fall of organized civilization.

      I guess sometimes a corner a case is a coroner case.

  3. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. However, we would have the same confidencein her empirical findings as we do in Alfonse’s statements that stereotypethreat reduces performance.

      It's funny that I wouldn't typically think of empirical as applying here, but it does. Just because the findings may not extend to a large population it does not mean they aren't empirical evidence.

    2. The objective is saturation. An important component ofcase study design is that each subsequent case attempts to replicate the priorones. Through ‘literal replication’ a similar case is found to determinewhether the same mechanisms are in play; through ‘theoretical replication’a case different according to the theory is found to determine whether theexpected difference is found. Sampling logic is superior when askingdescriptive questions about a population; case study logic is probably moreeffective when asking how or why questions about processes unknownbefore the start of the study.

      How or why questions.

    1. second criticism is that the very idea of quantifying scientific impact is misguided.

      BINGO.

    2. oming years will see evaluators playing an academic version of Moneyball (the statistical approach to US baseball): instead of trying to field teams of identical superstars, we will leverage nuanced impact data to build teams of specialists who add up to more than the sum of their parts.

      How will the functioning of these moneball algorithms be vetted and verified? Who will own them? Will they be businesses or infrastructural services we all invest in?

    3. RECONSTRUCTING PUBLISHING

      Where is the identity of the researcher in all this?

    4. Qualitative peer review will move into the open and become yet another public schol-arly product — complete with its own altmet-rics, citations and even other peer reviews

      Qualitative and altmetrics -- what does that even mean?

    5. As the former constituents of the article — data, tables, figures, reference lists and so on — fracture and dissolve into the fluid Web, they will leave behind the core of the article: the story.

      What does it mean for this article to be appearing in Nature -- one of the grandaddy's of scientific publishing? Is it a red herring?

    6. Conversations, data collection, analysis and description will be born published.

      So on the Web means published. This makes a lot of sense. But are all Web resources created equal?

    7. This core approach is also the future for scholarly communication.

      What about the ability to trace ideas and their influences through citations? Isn't this still very important to research? What are there parallels in the altmetric view? Aren't altmetrics just another view on things like a journal's impact factor, which is really just a number?

    8. This core approach is also the future for scholarly communication.

      What about the ability to trace ideas and their influences through citations? Isn't this still very important to research? What are there parallels in the altmetric view? Aren't altmetrics just another view on things like a journal's impact factor, which is really just a number?

    9. views on figshare, mentions in discussions between colleagues on Twitter, saves in a reference manager such as Zotero or Mendeley, citations in an open-access preprint, recommendations on Faculty of 1000, and many more

      altmetrics

    10. The Web opens the workshop windows to disseminate scholarship as it happens, erasing the artificial distinction between process and product.

      Interesting to see process and product used together here: MPLP.

    Annotators

    1. Subjects in our experiment created solutions to a challenging bioinformatics problem thatboth industrial and academic labs face, and one that has been subject to a process of cu-mulative innovation outside of our experiment.

      It seems like the pressures in industrial labs are such that an open system would be very difficult to sustain.

    2. 10-digit binary code

      bizarre

    3. speed and accurac

      Are the solutions known then?

    4. Subjects in our experiment hadto developde novooperational algorithms, written in computer code

      Why would anyone want to do this? What were the incentives?

    5. In open source software projects,all manner of software development instructions are instantly made available for others tosee and reuse, as developers make submissions to the code base (Lerner and Tirole 2005;Lerner and Schankerman 2011)

      Many eyes.

    Annotators

    1. Moreover, an analysis using game theory by Engers et al.(1999) has suggested that alphabetical author lists can theo-retically result in research of a poorer quality than if contri-bution strength is signaled by author order.

      This is kind of surprising or at least unexpeced. I wonder how game theory is used to measure quality...

    2. All authors should be able to defend the research presentedin a publication if, for example, they are challenged by a col-league from another experiment at a conference or meeting.Thus, people not closely familiar with the work should notbe listed as authors (or at least as “major” authors)

      This seems like a good idea...

    3. These contributions can be called “infrastruc-tural.”

      Infrastructure is normally invisible. I guess this hyperauthorship model is making the infrastructure visible?

    4. In addition, some physicists described a need to beknown as somebody who can come up with novel solutionsto difficult problems. It is interesting and important to notethat these problems need not be discovery-oriented. To besure, there is significant value in solving, for example, ananalysis problem that leads to a major discovery. At the sametime, though, many participants indicated that a novel solu-tion to a difficult detector design or construction problemalso could carry significant weight.

      This need to be known may not require traditional scholarly publishing: perhaps blogs, software, reputation are important too?

    Annotators

    1. I amsuggesting that we take the time to rethink past research projects and connectthis research with our new projects.

      Looking backwards as well as forwards...

    2. action research, participatory re-search

      I think I need a better understanding of action research. It seems like it could be particularly useful for the study of how software systems are designed and implemented (re: agile).

    3. praxis theory conceptualized by John Deweyin his bookExperience and Education

      Might be a good one to read...

    4. Researchers often find themselves in the predicament of studyingissues they have minimal tacit, intuitive, or experiential understanding.

      I would've thought researchers would be studying things they were interested in, and had some experience with. Perhaps that's a function of my age.

    5. a person must have a practical sense of the domainwithin which a phenomenon is situated in order to develop understanding

      I like this. It seems counter to previous notions from Grounded Theory that a researchers should have no biases from previous literature during data gathering.

    Annotators

    1. Beginning by identifying a hashtag of interest

      Identifying the hashtag of interest is in itself and interesting question, and one we're hoping to examine in DocNow. Could there be an iterative process or heuristic for deriving a good query?

    2. This might speak to a new collection strategy?

      It might be worth referencing TwitterVane, EventsArchive and iCrawl work here? Or not :-D

    3. Future historians may have difficulty studying the online advertisements – annoying as they can be – of our day.

      Unless the original data is deposited somewhere where it can be studied?

    4. prevent

      prevents

    5. Twitter’s term for turning a Twitter up to 100 Tweet IDs

      wording here is awkward

    6. only 20.34% or 68,112 existed at all in the Wayback Machine

      This sounds similar to what we saw with the Ferguson data. Very interesting! I sometimes worry that part of this are the awful query params that get added for tracking purposes.

    7. Can't embed the actual map here.

      Great idea! You can always screenshot, and link the screenshot too.

    8. for date in dates: date_plus_one = date + pd.DateOffset(1) pretty_print = date.to_pydatetime().strftime('%Y%m%d') filename = 'elxn42-tweets-' + pretty_print + '.json' f = io.open(filename, 'w', encoding='utf-8') for line in fileinput.input(): tweet = json.loads(line) created_at = dateutil.parser.parse(tweet["created_at"]) created_at = created_at.astimezone(eastern) if ((created_at >= date) and (created_at < date_plus_one)): f.write(unicode(json.dumps(tweet, ensure_ascii=False) + '\n'))

      Just as an aside it might be a lot faster to take one pass through the tweets file and see if the date of the tweet falls within the given range, rather than taking a full pass through the data for each date.

    9. TwitterEthics Manifesto,

      Nice, I had not run across this before!

    10. fair dealing as a spectrum

      Is this lawyerese for something, or perhaps misphrased?

    11. twarc.py --stream "#elxn42" > elxn42-stream.json

      Syntax for this recently changed in v0.5.0 -- instead of --stream you now use --track. This is the parameter that Twitter use in their documentation. It was introduced to allow other streaming filter parameters to be added: --follow and --locations.

    12. as we do with all source bases

      Is it worth calling out selection bias here?

    13. Military historians will have access to the voices of soldiers, posting from overseas missions and their bases at home. And political historians will have a significant opportunity to see how people engaged with politicians and the political sphere, during both elections and between them. The scale boggles. Modern social movements, from the Canadian #IdleNoMore protest focusing on the situation of First Nations peoples to the global #Occupy movement that grew out of New York City, leave the sorts of records that would rarely, if ever, have been kept by previous generations

      💖 these examples ... the whole paragraph rocks

  4. Jan 2016
    1. And maybe my desire to submerge myself in that sediment, to weave The Cloud into the timelines of railroad robber-barons and military R&D, emerges from the same anxiety that makes me go try to find these buildings in the first place: that maybe we have mistaken The Cloud's fiction of infinite storage capacity for history itself. It is a misunderstanding that hinges on a weird, sad, very human hope that history might actually end, or at least reach some kind of perfect equipoise in which nothing terrible could ever happen again. As though if we could only collate and collect and process and store enough data points, the world’s infinite vaporware of real-time data dashboards would align into some kind of ultimate sand mandala of total world knowledge, a proprietary data nirvana without terror or heartbreak or bankruptcy or death, heretofore only gestured towards in terrifying wall-to-wall Accenture and IBM advertisements at airports.

      I love how this paragraph unpacks the metaphor of The Cloud! For some reason I found myself thinking about the Pyramids in Gaza afterwards: monuments to an almost insane, but deeply human ambition.

    1. Isay,fuckit.

      YASSS!

    2. Fuck Nuance*

      YASSS!

    3. ó),Durkheimtheorizedlikeapigformostofhiscareer,snuœingthroughPhilosophyandAnthropologytoemerge,coveredindirt,withafewtruœe-likeideasthatherelentlesslypushedbecausetheyweresoempiricallyproductive.

      Wow, what an image -- Durkheim as pig. I'm not sure the shoe fits though. He looks more raven-like to me.

      Durkheim

    4. Davis’s account is usefully dialogical. He has a convincing explanation of howinterestingnessisdependsontherelationshipbetweenthetheoreticalclaimbeingmade,thepositionofthepersonmakingit,andthecompositionoftheaudiencehearingit.esameideamaybeinterestingordulldependingontheserelationships.

      Seems like an interesting article to follow up on (hahahah). Seriously though, this argument for context as an important determinant of whether a theory is interesting or not seems spot on.

    1. historical memory is still in formation

      Is it really still in formation? Just because archives are perhaps rare doesn't mean historical memory is new.

    Annotators

  5. Dec 2015
    1. There’s a neat side effect to having a part of your system that knows all of the possible actions. A new developer can come on the project, open up the action creator files and see the entire API — all of the possible state changes — that your system provides.

      Having a comprehensive list of events that can take place in an application does seem like a really useful view to have, especially in an event driven programming environment like the modern Web browser.

    1. Contemporary literary and cultural theorists would take pains to deny any claim that linear perspective painting, photography, film, television, or computer graphics could achieve unmediated presentation. 27[End Page 325] For them the desire for immediacy through visual representation has become a somewhat embarrassing (because under-theorized) tradition. 28 Outside of the circles of theory, however, the discourse of the immediate has been and remains culturally compelling.

      This seems like a familiar disconnect. Is this sort of dichotomy the way that academe becomes a thing? Or is there more to it?

    2. observers cannot distinguish these images from photographs.

      how quaint!

    3. The desktop metaphor, which has replaced the wholly textual command-line interface, is supposed to assimilate the computer to the physical desktop and materials (file folders, sheets of paper, in-box, trash basket, etc.) familiar to office workers.

      Textual interfaces are still used heavily by a lot of users of computers.

    Annotators

  6. Nov 2015
    1. First, it posts many more HITs than areactually required at any time because only a fraction will ac-tually be picked up within the first few minutes. These HITsare posted in batches, helping quikTurkit HITs stay near thetop. Finally, quikTurkit supports posting multiple HIT vari-ants at once with different titles or reward amounts to covermore of the first page of search results.

      Interesting technique. I guess this adds to the cost?

    Annotators

    1. Why is a device iPod-like, but not an iPod?

      Good question!

    2. If assistive devices mark users as ―other,‖ this may create social barriers to access even while such devices should help overcome them.

      One step forward, two steps back.

    3. The movement rejected the idea of disability as a medical condition,

      Interesting paradigm shift.

    4. As researchers who do not have disabilities

      Interesting personal statement.

    5. Google Faculty Research Award

      Kind of funny that the direct implications are for Apple :-D

    6. Finally, we aggregated the data based on videos rather than on individual users, an approach that could bias results toward users who had uploaded many videos

      This is a very important point.

    7. As such, there was little representation from users who cannot use a touchscreen at all.

      This is an important point.

    8. Users had made them out of different materials to keep from hitting other buttons on the screen, but these materials were often not very sturdy (e.g., paper, cardboard).

      This adaptation reminds me of Jackson's work no repair.

    9. We did not see any instances of pinch-to-zoom or other multitouch gestures in the videos. The built-in accessibility feature on iOS devices called AssistiveTouch could support these interactions. However, this feature was not used in a single video and only 3 survey respondents had ever used it.

      Wow!

    10. 6 users who responded to our survey had never heard of the iOS feature known as AssistiveTouch

      This seems significant, given the number of responses.

    11. Category of application(s) used

      I even wonder about the name of the application used.

    12. were the user from thevideo themselves, while the other 9 were caregivers or relatives answering for the main user.

      Is it important that the responses are mediated?

    13. Video Purpose

      I'd be interested to know what sorts of purposes videos were uploaded to YouTube, and what the coding options were.

    14. the iPad dominated the videos in the dataset

      Interesting. I wonder if this was expected? I would've expected mobile devices instead of tablets.

    15. The diversity of our dataset also highlights that YouTube can be a rich source of data for similar work.

      Was there any thought put into the selection bias involved in using YouTube as a data source. Users who are able to record and upload video will be more proficient at using technology?

    Annotators