1,134 Matching Annotations
  1. Aug 2016
    1. Latour, B. 2005. Reassembling the social: An introduction to actor-network-theory: Oxford University Press, USA.

      Maybe it would be good to read this one again?

    2. Woolgar, S. 1991. The turn to technology in social studies of science. Science, Technology & Human Values16 (1):20

      Overview of STS use in IS research.

    3. MacKenzie, D., and J. Wajcman. 1985. The social shaping of technology. Buckingham. Open University Press.

      Seminal STS work for studying ITC.

    4. Arguably, critical social theory in these studies is mostlyrooted in the work of Jurgen Habermas and Michel Foucault.

      Two key social theorists used by IS folks.

    5. meta-theory:

      Meta-theory could be interesting, but difficult to use.

    6. What this “relation to machines” might be, and how it affects social actors’ practices, however, is not elaborated in structuration theoryand analysis of these properties remains largely underdeveloped (Jones and Orlikowski 2007).

      This could make it problematic for looking at web archiving systems and archivists.

    7. Schultze, U., and W. J. Orlikowski. 2004. A practice perspective on technology-mediated network relations: the use of internet-based self-serve technologies. Information Systems Research15 (1):87.


    8. DeSanctis, G., and M. S. Poole. 1994. Capturing the complexity in advanced technology use: Adaptive structuration theory. Organization Science:121-147.

      First citation for adaptive structuration.

    9. Mumford, E. 2006. The story of socio-technical design: reflections on its successes, failures and potential. Information Systems Journal16 (4):317

      Could be a useful historical overview from one of the key theorists.

    10. Lower status positions (service, support and administration) were often outsourced or transferred to temporary employment and given far less voice or status in the contemporary firm.

      I wonder if it could be useful to think about archival work in this context?

    11. Carr, N.G. 2008. The big switch: Rewiring the world, from Edison to Google. New York: WW Norton & Company.

      The trend away from participatory design.

    12. However,the sociotechnicalideals of the Tavistock Institute found fertile ground in Scandinavian countries. In the late1960s, “the Norwegian Industrial Democracy Projects” introduced the principle that technology innovation should improve work practices along with productivity measures(Thorsrud 1970). This was meant to empower employees to organize their own jobs. In the 1970s, figures like Kristen Nygaard–and more recently Bo Dahlbom, Pelle Ehn, Erik Stolterman andtheir students –pioneered the Scandinavianapproaches to the social analyses of computing.

      Connection between Tavistock Institute and Norwegian participatory design principles.

    13. Theessence of the early sociotechnicaldiscoursein ISisfoundin the proceedings of the “Human Choice and Computers” conference (Mumford andSackman(1975).

      Could provide some useful information on early forumulations in sociotechnical theory.

    14. The second attribute of the Tavistock approach was the importance of worker involvement. At its core, Tavistock’s sociotechnical approach was interventionistand activist. As we notebelow, thisorientation to worker’s interests andactivism underlies the action-research orientation of Enid Mumford’s ETHICS and Peter Checkland’s Soft-Systems Method (SSM)(Checkland 1995; Mumford and Weir 1979), the participatory design principles that characterize the Scandinavian and Nordic scholarship , and perhaps some of the more contemporary design-centric approaches to IS

      The connection to participatory design is interesting here. I suspected, but didn't know that sociotechnical approaches had something in common with participatory design.

    15. Tavistock scholars advocatedequal attention should be paid to providing a satisfactory work environment for employees. In this regard, the main innovation of the Tavistock research was the design of technology-supported work arrangements that could enrich work practices using multi-skilled jobs with workers organized intoteams.

      The focus on environmental concerns was new -- fitting technologies to their social situation, rather than the other way around.

    16. In contrast to a-contextualizedand de-temporalizedapproaches, the sociotechnical perspective is premised on the embedding of the ICT/IS into the morecomplex worldof situated action: a world that is tightly tied to the characteristics of where the actions occur.

      Is situated action a pre-existing field of work that STS draws on?

    17. The underlying premise of mutual constitution isco-evolution among that which is technological and that which is social. Thefocusoninterdependency amongtechnology and human organizationis done by attending to material triggers, actions of social groups, pressures from contextual influences, and the complexprocesses of development, adoption, adaptation, and use of new (digital) technologies in people’s social worlds(Jones and Orlikowski 2007).

      A nice definition of mutual constitution in sociotechnical theory. The reference might be good to follow up on.

    1. The poem is something seen, not just conceived

      It is discovered more than created?

    1. “There is no one ‘magic algorithm’ for identifying terrorist content on the Internet,” the company said. But it deploys technologies such as proprietary spam-fighting tools to supplement reports from the public to help identify people who violate Twitter’s user policies. During the past six months, these tools have helped the firm to automatically identify more than one-third of the accounts that were ultimately suspended for promoting terrorism, the company said.

      It would be interesting to know more about how Twitter is doing this work. If anyone has any tips or leads I'd love to hear from you.

    1. Or so I thought! After deleting about 3200 tweets, however, I started hitting the end of the road. I couldn’t go back any further. I was super confused–why wouldn’t I be able to see my new 3200 most recent tweets, now that all the other crud had been deleted? Was it a caching issue? Had I hit a speed limit? Then I read the documentation more closely: The value of count is best thought of as a limit to the number of tweets to return because suspended or deleted content is removed after the count has been applied. Meaning: Twitter doesn’t actually delete your tweets, I guess. I suppose they just mark them as deleted, but keep those tweets you don’t want anymore to themselves. If that’s the case, that’s pretty creepy to me–and also kind of hilarious, since I’m pretty sure not deleting deleted tweets is against their TOS. Here’s what Twitter said while talking about Politiwoops, a service that showed politician’s deleted tweets that Twitter shut down:

      This is a really interesting finding, that content is not deleted but simply blocked for access. What are the implications for things like right to be forgotten?

    1. the perfect web archive does not exist

      I totally agree. I think you could remove "web" from this sentence and it would still work.

    1. How can we extend the same care and attention to the billion or so people who post to Facebook each day?

      Does this question even make sense? What does it mean to extend care & attention to a billion people? The only thing that can feasibly do that is computation isn't it?

    2. The Facebook user and the amateur intelligence agent are both enmeshed in a “neoliberal system of free labor,” voluntarily producing value for others — NATO, Facebook, whomever — while expecting no formal commitment or compensation in return.

      What a weird world we live in.

    3. Not only did institutional power survive the coming of the network society, it appeared to be thriving.

      This arc from the techno-liberation to techo-surveillance was so jarring. It's hard to imagine it not being at play the whole time.

  2. dev.twitter.com dev.twitter.com
    1. Utility used to post the Tweet, as an HTML-formatted string. Tweets from the Twitter website have a source value of web.

      Interesting to see how this was used as part of http://varianceexplained.org/r/trump-tweets/

    1. The Left needs a vocabulary, and a self-understanding, that highlights and foregrounds the importance of constructing and expanding anti-systemic movements that aim to defeat systems of oppressive and exploitative power.

      I find this language of defeat problematic. I guess I'd prefer overcome, sidestep, dismantle, etc.

    2. Moreover, the way we challenge everyday impacts should be informed by our understanding that they are not produced simply by individual actions, but by the operation of large-scale systems

      I'm confused: is it possible to understand the systems without considering the individual acts that comprise them?

    3. But if we are to defeat colonialism and capitalism, we cannot do so one person at a time, or one interaction or relationship at a time.

      Reminds me of the old "think global, act local".

    4. It seems clear that the attentiveness in today’s Left activist subcultures to interpersonal dynamics within the movement reflects a genuine learning process. It is a step toward beginning to address problems that were, in effect, glossed over and ignored by phrases like “the people” and a complacent view of the prospects for building genuine “solidarity” and “alliances.”

      I wonder how much feminist ethics is at play here with the shift to focusing on interpersonal dynamics?

    1. These archiving endeavours were initiated by institutions such as the American University in Cairo, media initiatives such as Mosireen, by artists and citizens.

      It would be interesting to learn more about how these projects came to be, and their methods/tools.

    2. And so, new distribution networks appeared. Art as we knew it not only left the gallery spaces for the streets but also for online platforms, that is, for platforms of knowledge and spaces of resistance. Online projects, web-platforms and communities (Facebook groups) supporting the revolution were also in full bloom.

      I like this idea of looking at these Web spaces as new distribution networks for art & activism.

    3. The present reason for this silence is obvious, and serious: that speaking out can put people in danger, in particular those who are legally responsible for NGOs or who are authors of even marginally critical work.

      Chilling effects.

    4. While surveillance systems have always been an integral part of Egypt's governance, if this is true, it is the first time that such an extensive system as the Deep Packet Inspection technology – enabling geo-location, tracking, and combing through Facebook, Skype, Twitter, among other social networks – is being used in Egypt (and most certainly like in many more countries in the world.) Since 2014, the crackdown on the Internet has been relentless.

      Social Media is a tool now for the activists and the state.

    5. Can we imagine a deed or gift for digital archives? Can activists/archivists negotiate consent between the owner of the content, creator of the content and subject of the content of the archives in the digital age?

      A key question for those of us working on the Documenting the Now project!

    1. Meanwhile, we who build digital libraries (which we hope future researchers will utilize) are designing new trauma archives. The open source software we build enables discovery for the collections of the United States Holocaust Memorial Museum, and archival collections at Stanford like the records of the STOP AIDS Project. One should not attempt to engage with these collections without a sense of embodiment and advocacy. We need to push back against the notion of the dispassionate researcher, the dispassionate archivist.

      Verne Harris is a good read in this vein.

    2. From listening to historians like Davis, and from our understanding of psychology, it is clear that design decisions which divorce scholarship from emotional response are not in the best interests of our users

      Makes me wonder the degree to which this attention to emotion is aligned with what people are talking about when they talk about impact.

    3. because they represent a single majority-rules point-of-view masquerading as neutrality

      Are they really masquerading as neutrality, or are they limited by the affordances offered by the list of things? What alternatives are there to a list of things?

    4. Try Google searches on every variation you can think of for women’s and girls’ identities and you will see many of the ways in which commercial interests have subverted a diverse (or realistic) range of representations

      Maybe things have improved? I tried african american girls learn use internet which seemed to work reasonably well?

    5. Although Google’s exact search algorithms are trade secrets and shift over time, we do know that they are based on a patented and published algorithm called PageRank, and that they work by defining relevance and significance by looking at what pages on the Internet are linked to most often on a given subject (https://en.wikipedia.org/wiki/PageRank). This creates a majority-rules definition of relevance that masquerades as neutrality.

      Isn't writing a paper about the algorithm and publishing it actually not a bad example of self-disclosure of the bias built into the system that is (or was) Google search?

  3. Jul 2016
    1. The techniques and software of surveillance are freely shared between practitioners on both sides.

      http://www.orbooks.com/catalog/when-google-met-wikileaks/ to be taken with some Assange flavored grains of salt.

    2. We obsess over these fake problems while creating some real ones. In our attempt to feed the world to software, techies have built the greatest surveillance apparatus the world has ever seen. Unlike earlier efforts, this one is fully mechanized and in a large sense autonomous. Its power is latent, lying in the vast amounts of permanently stored personal data about entire populations.

      Once you see what Maciej is saying here it's impossible to unsee it. It's frightening, but so important.

    1. I’mparticularly drawn to the meso-levels of infra-structures, where people create and rely uponnew forms of data as information. It is at themeso-level where ethnographers who exam-ine information systems (such as Peter Botti-celli,36Kalpana Shankar,37and Susan Leigh

      This focus appeals to me too ; getting at the everyday practices, and building theory based up on observed activity.

    2. Micro refers to the individual orpersonal level, the day-to-day practices thatmake up our lives. The meso-scale is theorganizational or institutional change that weseewithgroupsofpeopleacrossweeksandyears. Finally, the macro-scale refers to infra-structure over long periods of time, decades oreven centuries (what some have called the“long now”

      Useful to think about these different levels. I've found my own work getting muddled without thinking about them.

    3. To my mind, data’s impact on soci-ety and studies of data have reached a point forwhich it is now time for historians of comput-ing to historicize data directly.

      It's an interesting that computing has reached this inflection point where it can now be the subject of history.


    1. However, if Twitter does not prominently describe how its Trending Topic algorithm works, then Zelda’s informational power is entirely dependent on the accuracy of other sources she may (or may not) have used to build her understanding of the information flows on Twitter.

      Information power can be used by a variety of actors, including those that may use that power in ways that are at odds with the organizational goals of the platform.

    2. What is important is that access to information about how this part of the platform works creates the possibility for the individual to make a choice. Choice creates the possibility for the expression of informational power. These possibilities are closed off when users do not have the basis of informational power from which to enter these fields of action.

      Being able to recognize choices that can be made, and being able to make them is an elegantly simple indicator of information power.

  4. May 2016
    1. The restrictions that arise from authorita-tive management of knowledge can be minimized with theparticipatory, inclusive, and representative knowledgeecology that is fostered by social, community tools,although an approach that is too decentralized runs the riskof having a chaotic approach to standards, or no standardsat all.

      Balancing decentralized and centralized models is difficult. Now that we have WARC is a chaotic approach to standards that much of a problem?

    2. As outlined, there are several approaches to building webarchives—some developing from institutional mission state-ments, some from frustration with existing resources, and allfrom limited understanding of the end users’ needs. Tempo-rary ad-hoc practices that are developed to circumventobstacles were discussed in several interviews. All respon-dents described similar obstacles whatever their disciplinarybackground. The ways in which these obstacles are handleddetermines, among many things, the character of the result-ing archive, the limitations of use as set by access points tothe resulting archive, and ultimately the perceived value theresulting web archive offers to different communities ofresearchers

      Seems like the ad-hoc solutions aren't being seen as potential sources of innovation.

    3. But we just found that there was no proper solution to do it. Welooked at Archive-It (the one from archive.org) but it uses Javaand has to be installed on a TomCat server and there aren’tmany web hosting companies that do that. We also talked toHanzo and it was just way too expensive. So we realized wewould have to build something ourselves and that meant gettingfunding. But no one was interested, they didn’t get why wewanted to archive something that was happening right then.

      Tooling sucks and is expensive.

    4. Beyond the public viewsof web archiving are those of funding institutions andresearch communities that do not see the purpose of archivingsomething that is happening now on the web.

      kind of funny in light of the funding mellon provided for docnow

    5. Col-laboration and partnership are complex issues that are essen-tial to the success of large-scale web archiving projects

      Why are they essential? I understand the spirit of this, but it also seems to be operating from a given position.

    6. The more we isolateaccess to web archives from other archives, the less attentionthey will receive, and the less progress will be made.

      the need to mainstream web archives into archival collections -- it's interesting that archives have struggled to do this with other materials before the Web though.

    7. but through each themethe conceptual and methodological differences betweenstakeholders can be seen as a foundational rift.

      rift between people doing things, which doesn't account for the sociotechnical system that they are a part of?

    8. ethical guides

      Interesting that lack of ethical guidance was a problem. I guess it is unclear what can be collected from the Web, and how it can be used.

    9. The lack of shared practices, accessible tools, and clearlegal and ethical guides were repeatedly named as obstaclesto advancing web archiving

      One thing that has happened since is the emergence of Archive-It as the de facto accessible tool.

    10. For instance, several interviewees spokeabout what could be categorized as ontological and episte-mological approaches to web artifacts and identified theirinvolvement in both concrete local decisions and morewide-ranging professional debates on how best to integratearchives of such objects into existing collections

      Here's that ontology again, this time with epistemology. What's going on here?

    11. hey all spoke about obstacles to advancingweb archiving, reasons for those obstacles, and potentialsolutions to overcome those obstacles in different ways

      problems and solutions -- possibly similar to breakdown?

    12. Each inter-view aimed to solicit opinions, ideas and reflection from therespondent based specifically on his or her own personalexperience with web archiving.

      Could be useful to add information to my article that indicated that the goal was to hone in stories about the seed list.

    13. Of the 17 people interviewed, 4are researchers in the social sciences and humanities, 9 arearchivists or librarians working on digital preservation proj-ects at their institutions, and 4 are technicians or softwareengineers building tools to support digital preservation

      It would be useful to characterize the people in my study as well.

    14. Since the webarchiving community is relatively small, we used a purpo-sive sampling method to identify some of the key experts inweb archiving

      Important to show how my study is different because it does not focus on experts so much as it does working archivists who build seedlists.

    15. The risk, of course, is that without an ontological under-standing of those methods and collection development poli-cies, these collections may be difficult for other researchersto use. Furthermore, because these archives are built on ashoestring budget by a researcher who may have little tono understanding of archiving procedures, and no real tech-nological infrastructure to rely on, they are often inacces-sible to others, residing on the hard drives of individualresearchers

      The problems with subject based collections. The use of ontological here is interesting.

    16. Approaches to web archiving tend tofall into three categories: large-scale collections, smaller-scale thematic collections, and idiosyncratic collections(Dougherty, Meyer, Madsen, van den Heuvel, Thomas, &Wyatt, 2010).

      This looks like a useful characterization of web archive work. Could be useful to add to my literatures review.

    17. (Cho & Garcia-Molina, 2000; Fetterly, Manasse, Najork, &Wiener, 2004),

      These might be good sources for the rate of change on the Web.

    18. Although these pages are updated and refreshed continu-ously, older versions are rarely archived by content produc-ers

      Just because there is change doesn't mean (necessarily) that content is disappearing.

    19. There is currently a wide gap between the researchers whoneed archival data sets to support their studies of onlinephenomena, and the archivists and other practitioners whohave the expertise to build such collections and the tools tomanage and access them

      Gap between researchers and archivists. This is the space that Ian Milligan is working in.


    1. HALL: For the most part, the rules were basically you have to collaborate. So what they did was create kind of an internal Facebook page - for a lack of a better reference - where we all created groups. We joined groups, and then we would post as we researched, so we had one system in which we did the research, another where we posted what we were finding. And what was remarkable about it - this was what - in the truest sense collaboration. If I found something from India or Russia that I thought looked of interest, you know, you'd find somebody. You'd spend a little time looking at it. You'd go do an Internet search and see if you can find a footprint of this person anywhere. And then if you found that you'd post it. You might post some corresponding links. And particularly in some of the Latin American countries where resources are pretty slim and many of them are under threat, we are able to do a lot of the groundwork for them that then they could run with.

      I wonder what this system is that allowed the journalists to collaborate in groups on the documents: sharing links, notes, etc. I didn't see it detailed in the writeup at The Source.

      Perhaps it's Nuix, but it sounds like more than a system for ocr, indexing, etc.

      Just an aside, it's kind of weird that the Source article doesn't even mention Nuix. Perhaps there were different teams working with different technologies?


    1. A recorded album can be just the same 20 years later, but software has to change
    1. How do wedemonstrate that the digitized evidence of human rights abuse contributes in the dispensa-tion of justice or in preventing or ending injustice? In what ways do efforts to provide access todigital surrogates contribute positively toward social equity and access to life opportunities?Furthermore, what infrastructure must be in place so that this is achieved?

      Putting records to use to effect social change.

    2. Moreover, while data collection strategies thatrecord numbers of visits and frequency of requests or borrowing may provide useful informa-tion, these data do not offer reliable measures of institutional impact or nuanced portraits ofaudience engagement (Saracevic 2009). Recent studies have noted the inability of a significantportion of institutions to demonstrate the value of their work beyond simple usage statistics andfrequency of visits (Davies 2002; Fraser et al. 2002; Lakos and Phipps 2004; Duff et al. 2008;Franklin and Plum 2010; Carter 2012; Chapman and Yakel 2012; Hughes 2012; Duff et al. 2013).

      What are ways to measure impact absent these superficial and mass appeal driven statistics?

    3. Projects such as the InuvialuitLiving History (http://www.inuvialuitlivinghistory.ca) and the Plateau Peoples’Web Portal(http://plateauportal.wsulibs.wsu.edu) provide platforms that support Indigenous knowledgesystems and values over digital access to the archival holdings of various institutions (Christen2011)

      Need to follow up on these.

    4. Meanwhile, corporations have stepped in to partiallyfill the economic void by offeringfunding and labor for digitization projects in exchange for control of information, causingthe increased privatization of public records.

      Public Private Partnerships.

    5. As funding has been slashed for professional positions, repositories haveincreasingly relied on unpaid intern and volunteer labor, raising serious challenges for stu-dents and new professionals and lingering questions about the sustainability of the profes-sion

      Sad and true.

    6. Here, the challenge is not just how to get more faces of color at the table but to interrogate thecultural foundations and accompanying power structures upon which the table is built.

      Nice metaphor.

    7. For example, widespread professional resistance toIndigenous ways of knowing, as evidenced by the debate surrounding the Protocols for theNative American Archival Materials and the ultimate failure of the Society of American Archi-vists to endorse them, reveals how much more work is needed to open up pathways forpluralism in mainstream archival practice.

      I'm not familiar with this story. Seems like a pretty important thing to know about.

    8. Furthermore, archival studiescannot continue to ignore burgeoning critiques of“human rights”as a neocolonial industryemerging from otherfields for much longer (Posner 2014

      I hadn't quite considered how human rights could reinscribe the problems it is trying to redress.

    9. from the psychologicalimpact that processing records of violence may have on archivists

      Diane's research interest.

    10. How would the archival conversation shift if we abandoned the rights-based model in favorof a feminist ethics of care? What happens when we begin to think of record keepers andarchivists less as enforcers or violators of human rights and more as caregivers, bound to re-cords creators, subjects, and users through a web of mutual responsibility?

      This is a much more attractive and fruitful seeming approach.

    11. an“ethics of care,”which stresses the ways people are linked to each other and larger communitiesthrough webs of responsibilities, is a more inclusive and apt model for envisioning and enactinga more just society (Gilligan 1982; Card 1991; Cole and Coultrap-McQuin 1992; Frazer, Hornsby,and Lovibond 1992)

      good stuff to read

    12. At its core is the activeacknowledgment of cultural difference, Indigenous epistemologies, and multiple ways of know-ing as equally valid perspectives of knowledge creation (Christen 2011; McKemmish, Faulkhead,and Russell 2011; PACG 2011

      How do multiple ways of knowing impact the archive? Are there multiple archives?

    13. To encourage larger societalparticipation in archival endeavors, archivists are called to relinquish their role as authorita-tive professionals in order to assume a more facilitative role in crucial archival practices of ap-praisal, description, and development of access systems.

      This is an interesting idea. Reminds me of what Sam has been trying to do in Wisconsin.

    14. provenance has been recast as a dynamic concept that includes not only the initial creators ofthe records, who might be agents of a dominant colonial or oppressive institution, but moreimportantly the subjects of the records themselves, the archivists who processed those re-cords, and the various instantiations of their interpretation and use by researchers.

      provenance has a broader application than just to the record creators

    15. Wallace (2010) writes that the term can appear so broad that itsmeaning can be elusive at best and, at worst, is watered down or co-opted by the very powersit seeks to critique.

      I can relate.


  5. Apr 2016
    1. sci-fi’s outsider heroes interrogate systems of power.

      What does it mean to interrogate power?

    1. Ethnography in this dynamic arena eventu-ally necessitates a ‘technologized’ researcher (Lash, 2002, Lunenfeld, 2000).

      Reminds me of issues/concerns from the digital humanities.

    2. I will argue in this article that everyday life takes place on the internet: there is no difference between online and offline interpersonal communica-tion (IC).

      This is an odd claim, given that it seems like it's easy to imagine differences between offline interpersonal communication.


    1. It is a disservice to users and ourselves to ask only how much or how often and to avoid understanding why or how. User research methods work best as an accumulation of triangulation points of data in a mutually supportive, on-going inquiry. More data points from multiple methods mean deeper insights and a deeper understanding.

      Qualitative and quantitative methods are mutually reinforcing. Love the diagram.

    2. In other words, we only know what we know and can only ask questions framed about what we know.

      This seems to highlight how important it is to have looseness built into the methods like interviews -- so there is space for surprise.

    3. The most exciting part of my own work is feeling surprised with a new insight or observation of what our users do, say, and believe. In research on various topics, we’ve seen and heard so many surprising answers.

      How do these moments of surprise registe? What constitutes them?

    4. This is all to say that insights, answers, and explanations are limited by the breadth of a researcher’s understanding of users’ behaviors

      This seems like the key insight. We must acknowledge the limits of our own understanding of the world, and how they frame our study of the world.

    1. He also splurged on the very best fake Twitter profiles; they’d been maintained for at least a year, giving them a patina of believability.

      Curation gone wrong.

  6. Mar 2016
    1. recursion, mixed-media or combinations of text-and-image, and virtual reality invention

      ok, this sounds interesting and revisionist

    2. What makes a DH project experimental now? How do the digital humanities benefit from experimental projects? And within the parameters of DH work that is now recognized as scholarship that counts, how can we make room for experimentation, for projects that might and sometimes do fail?

      This idea that an archive isn't radical or experimental kinda irks me a bit.

    1. Insum,documentsprovidebackgroundandcontext,additionalquestionstobeasked,supplementarydata,ameansoftrackingchangeanddevelopment,andverificationoffindings

      nice summary of the uses of document analysis


    1. Understandingthepoliticalpreferenceofanaudiencecanbeimportantforpresentingtailoredinformation(includingorexcludinginformationtotheuser’stastes


    2. Inourcase,theseareRepresentativesusingTwitter

      So these are the House of Representatives and Senators that are on Twitter?

    3. liberal/conservativebias

      Are political preferences really a single axis?

    4. removedforanonymity


    5. Wepresentamethodforcom-putingthepoliticalpreferenceofanaudiencebyanalyzingtheir“following”behaviorandillustratethisapproachwithmediaout-lets,government,andinterestgroupsandthinktanks.

      Where political preference becomes a coke vs pepsi type of decision?


    1. In follow-on research, we seek to collectground truth on PPD in new mothers, wherein we can empirically validate the relationship between the measures we predict and blues and deeper depressionexperienced by somemothers

      It is kind of astounding that they could publish this paper without the ground truth, isn't it?

      I wonder if they have done it?

    2. The ability to predict significant changes in behavior and mood postpartum has broad implications.Thepostpartum behavioral markers exhibited bythesubset of mothers who we identified as showing extreme changes, resonate with the feelings ofhopelessness, dejection, anddepressive tendenciesseen in postpartum depression

      This seems like the main cognitive leap that they are making.

    3. We conjecture that the general social and psychologicaldistancing characterizing the circumstances of new motherhood is linked tosuch high attentional focuson oneself, and turns out to be a strong predictor of postpartum change for the extreme-changing mothersfor these measures of linguistic style

      And what does this have to do with depression again?

    4. We compareseveral different parametric and non-parametric classifiers to empirically determine the best suitable classification technique, including linear, quadratic, discriminant classifiers, naïve Bayes, k-nearest neighbor, decision trees, and Support Vector Machineswith a radial-basisfunction(RBF) kernel[10]. Thebest performing classifier was found to be the SVM across all measures, which outperformed the other methods with prediction accuracy improvements in the rangeof10-35%.

      I'm still confused about how they know if the prediction is correct.

    5. uchdecreases indicate that these women are posting less, suggesting a possible loss of social connectedness following childbirth.

      Couldn't it also mean the mother is kinda busy taking care of a baby, and social media seems kind of unimportant?

    6. We filter the Twitter Firehose stream (made available to us via a contract with Twitter)

      Wow, ok.

    7. We chose Twitter because it is public and provides a longitudinal record of the events, thoughts,and emotions experienced in daily life.

      Twitter is public.

    8. 71% when we leverage behavioraldata from only the prenatal period

      This is surprising, that it can be predicted based on messages from before the birth.

    9. training data

      Where did the training data come from?

    10. Within this context, we investigate the feasibility of forecastingfuture behavioral changes of mothers following the important life event of childbirth.

      Who will use the forecasts?


    1. Im-portantly, given the ever-increasing amount of digital traces peopleleave behind, it becomes difficult for individuals to control which oftheir attributes are being revealed. For example, merely avoidingexplicitly homosexual content may be insufficient to prevent othersfrom discovering one’s sexual orientation

      Why is this framed as if homosexuality is something to be hidden?

    2. given appropriatetraining data, it may be possible to reveal other attributes as well.

      and pigs might fly

      ... and pigs might possibly fly ...

    3. moderately indicative of being gay

      What the heck does this mean? Didn't Kinsey show that pretty much everyone is moderately gay?

    4. best predictorsof high intelligence

      What does intelligence mean hear.

    5. AfricanAmericans and Caucasian Americans were correctly classified in95% of cases, and males and females were correctly classified in93% of cases

      How did the verify the predictions?

    6. 58,466volunteers

      Is this is a significant sample?

    7. obtainedthroughthemyPersonality Facebookapplication

      An app.

    8. were obtained from users’Facebook profiles

      Scraping public profiles?

    9. 52,700

      This is a lot people!

    10. In contrast tothese other sources of information, Facebook Likes are unusual inthat they are currently publicly available by default.

      Is this still the case?

    11. The analysis presented is based on a datasetof over 58,000 volunteers who provided their Facebook Likes,detailed demographic profiles, and the results of several psychomet-ric tests.

      If they know the demographics then aren't some of those properties known: gender, age, race, etc.

    1. However, with the exception of the group of academicsworking with King’s College Archives, it became clear that it was easier, faster andmore productive to get focus group members to respond to visual material.

      Visual material more useful in focus groups.

    2. uch of the content is veryemotive so may not be appropriate and also safeguarding issues at the College limitour use of student comments and their names

      Interesting that the very thing that makes these descriptions of value is what could exclude them from the record.

    3. The responses East Sussexencountered at the Chailey School were similarly powerful, often angry anddisturbing, always revealing and creative

      juxtaposition of angry & disturbing with revealing & creative is interesting

    4. required to think about records in a more lateral way

      broadening perspectives -- possibly from hearing other voices.

    5. A relationship of trust has developed betweenthe group and the record office that will be revisited in the future.

      I wonder if the focus group methodology helped foster that?

    6. The evaluation of the pilots showed that the focus group model gave an effectiveframework within which record offices could work with a variety of diverseindividuals and groups.

      Focus groups seemed to work?

    7. other offices were keen to explore the expertise orexperiences of people who had worked with or been subject to the records.

      Bring in people who had used the archives or were themselves in the archive in some fashion.

    8. Certainly, some of the archive pilots usedthis approach in choosing collections of records and focus groups: the RoyalGeographical Society sought the expertise of a group of Tanzanians on its EastAfrican collections.

      Purposefully selecting focus groups.

    9. imilarly, after the cataloguing event, most archivists arehappy to accept, though not always to incorporate, a correction, amplification orcomment about an existing catalogue.

      How do we even know this? It sounds like wishful thinking to me.

    10. RC’s majordeparture from earlier practice was that the resulting user-generated contributionswere moderated and added to the museum catalogue.

      An important step, that is probably not taken that often.

    11. This participationcan manifest itself in our website comments, Wikipedia entries and online reviews;we contribute to consultation processes, we are interviewed for our opinions and weblog, tweet or otherwise share them across our social networks

      The influence of the Web and Social Media on archival expectations.

    12. If we step outside the archival goldfish bowl to look at the larger picture we can seeother powerful influences aligning with some of these ideas

      An important step!

    13. So, citingHayden White, he argued that the archive should be used ‘not merely as a storagetechnique but primarily as a force for de-legitimation of mythified and tradi-tionalized memories’

      Wow, so actively working against the grain of the past.

    14. ‘It might mean providing space for researchers to embed their ownstories of use within the descriptive layerings...Itwould require engagement withthe marginalized and silenced. Space would be given to the sub-narratives andcounter-narratives...Itwould embrace a ‘politics of ambiguity and multiplicity’

      emphasis on those not typically represented in archives -- the marginalized

    15. One common assumption underpinning this recent work has been that the user,rather than archivists or the records themselves, should be the central priority.

      This is an interesting shift in focus. User-centered archives?

    16. amethodology for systematically capturing and incorporating the comments andcontributions of individuals outside the profession as to the accuracy, completenessand attractiveness of archival catalogues and finding aids

      sounds more nuanced than participatory archives somehow

    17. This article will argue the case for a broaderadoption ofRACby record offices by first considering some of the larger ideas andtheory that have shaped it and then by summarizing and evaluating the work of theparticipating record offices to draw conclusions as to the more general applicabilityof the methodology.

      It looks like they liked it. It will be interesting to see how they evaluate the process.


    1. For ethical reasons, both Kazemi and thricedotted avoid making bots that—for lack of a better description—seem like people.

      Not wanting to imitate a human seems like an important step forward to perhaps being more useful.

    2. “My bot is not me, and should not be read as me. But it’s something that I’m responsible for. It’s sort of like a child in that way—you don’t want to see your child misbehave.”

      The child metaphor is interesting because, on one sense they are you (or were), and we are we.

    1. Using software designed by this report’s first author, we sorted all users within each period into a set of communities. At the end of this process, each user belonged to one and only one community within each period. We labeled each of the 10 largest communities in each period based on the shared identities of its hubs.

      Was the assignment of the users to particular communities done automatically? It must've been right?

    2. We use a research method drawn from the field of network analysis called community detection to sort the users in our dataset into subsets called communities based on recurring patterns of retweeting and mentioning. (Additional technical details on this method can be found in Appendix A.)

      Community detection seems like a core use case for DocNow. It could be very useful for finding the documents out on the Web that are of actual historical significance?

    3. Therefore, it is worth noting here that a large majority of the tweets in our dataset (75.3%) are retweets. In contrast, only a small minority of tweets (7.6%) contain @-mentions outside of a retweet context.

      Wow, clearly retweets are huge on Twitter. I knew it was a big portion of the traffic but didn't know it was this big.

    4. Figure 2

      I love this presentation -- is it a standard visualization technique?

    5. To account for both a site’s birth date and content production, we use as a starting point the date that a site was first crawled by the Internet Wayback Machine.

      Another interesting use of the Internet Archive, to determine when a website started amassing content. It seems like one of the only ways to do it perhaps, without talking to people running the website, or attempting to do a full crawl and then looking for dates.

    6. But we have known since the 1970s that networks consisting of weak ties are valuable for other reasons.14 Specifically, they are critical for broadly and efficiently distributing infor-mation produced by network members. I

      Twitter also is all about weak ties.

    7. One can imagine a potentially different structure for this network. It could be a dense network with many reciprocal ties—conducive to building trust between connections. Such trust would be necessary if what those trafficking in Black Lives Matter-related content on the Web were trying to do was, say, organize clandestine gatherings, or circulate ideas for how to mobilize, or develop strategic action plans.

      I imagine there are some personal email archives that might resemble that type of network.

    8. VOSON

      VOSON is new to me: http://voson.anu.edu.au/ seems like there could be some useful functionality here for DocNow. It looks like a desktop application.

    9. As an indication of this, the Internet Archive’s Wayback Machine did not first crawl the site until October 8, 2014.

      It is interesting that the presence in IA is a measure of how much content the website has created. A content analysis of the site itself would be better, but probably more time consuming, and perhaps not worth the effort.

    10. Official websites are usually extensions of individuals’ and organizations’ digital identities. Accord-ingly, individuals and organizations often tether their Twitter and Web accounts to freely circulate content between them.

      The connection of profiles to websites seems like an important link I hadn't really considered for DocNow. It speaks to who the content creators are, or at least who they say they are.

    11. Consistent with BLM’s origin and Twitter activity patterns, the BlackLivesMatter.com website was created on July 17, 2013—just days after George Zimmerman was acquitted for killing Trayvon Martin

      I thought the website came later. DNS records could be useful in this analysis.

    12. Ultimately, looking beyond Twitter provides a more complete account of how online media have influenced the social and political discourse around race and criminal justice, both online and off

      It is interesting that they start off of social media and on the Web in general. I would've thought the progression would've been looking at their data and then expanding outwards from there on to the Web. But perhaps the narrative about social media requires that we understand the way the Web in general works first?

    13. These questions address at both a macro level and a micro level who was heard most frequently and what they said. W

      Who was heard most frequently and what they said. I wonder if the analysis at the macro level informed the analysis at the micro level. Kind of like how generalized surveys can provide a basis for doing interviews.

    14. Eric Garne

      I didn't know that the BLM story focused on Eric Garner just prior to Michael Brown. I remember the I Can't Breathe hashtag trending after Ferguson hit the major news venues.

    15. We did not divide the data into equal time units, but rather set their boundaries at points when the Twitter discussion rose and fell drastically.

      Ahah, so the episodes are just looking at the overall numbers over time. This seems like an obvious thing for DocNow to do. But it requires access to historical data.

    16. Examining the data as a series of sequential time periods allows us to capture and describe this adaptive process in greater detail than considering the entire year as a single unit

      We took a similar approach in our study of Ferguson, but it was more driven by the data we collected. I wonder how they identified these episodes?

    17. site’s search rank

      Ooh this could be a useful metric in DocNow.

    18. 136,587 websites

      Websites not resources/documents!? Dang!

    19. research software package

      This sounds like something to look at for DocNow.

    20. another of 45 keywords

      I hope these are available somewhere...

    21. Twitter, BLM participant interviews, and the open Web

      These three sources might map well to DocNow: twitter, web content and interviews.

    22. We would also like to stress that this report is not a work of advocacy; that said, all of the authors personally share BLM’s core concerns, which directly affect each of us and our respective families. But we do not believe that fundamentally agreeing with BLM compromises this report’s rigor or findings any more than agreeing with the Civil Rights Movement or feminism compromises research on those topics. On the contrary, our strong interests in ending police brutality and advancing racial justice more generally inspire us to get the empirical story right, regardless of how it may reflect on the involved parties

      This is a super example of being transparent and self-reflective about research motivations while still emphasizing the focus on empirical methods.

    23. The report’s specific contribution is to draw a set of conclusions about the roles online media played in the movement during a critical time in its history.

      Focus on the system of social media, and its role/use in BLM.

    24. To clarify our discussions in the following pages, then, we will use the term “Black Lives Matter” to refer to the official organi-zation; “#Blacklivesmatter” to refer to the hashtag; and “BLM” to refer to the overall movement.

      This seems like a useful nomenclature.

    25. Studies have revealed a decades-long de-cline in youth civic engagement as traditionally defined: that is, interacting with established civic and political institutions such as government, long-established community organiza-tions, and K12 civics classes.

      I didn't know about this. I wonder if it is a general trend?

    26. The general idea here is that social media helps level a media playing field dominated by pro-corporate, pro-government, and (in the United States) anti-Black ideologies.

      I like how the argument is that it is a relative leveling. It's not like there aren't power systems at play on Twitter as well...but they are different from traditional mass media outlets.

    27. Because not every social movement uses online media in the same ways, it is important to under-stand each new movement’s digital activities on their own terms.

      Are different methodologies and tools needed as well?

    1. we also need to recognize how we are implicated here as digital researchers into the politics we purport to critique

      This is hard to do. I wonder if there are some useful techniques for achieving it.

    2. tools that seek to make visible the power relations of the digital infrastructures but that actually generate those power relations in the act of making them visible (boyd and Crawford 2012)

      Such an important point.

    1. it puts full accountability on the authority sharing the data

      I imagine the the authority sharing the data could actually be a chain of custody; where the organization sharing it has acquired it from another organization.

    2. perhaps it is more ethical to acknowledge that it is happening without explicit individual consent.

      counter-intuitive, but it's an interesting position.

    3. the data would gradually become repurposed in a process that the surveillance field terms ‘function creep’, making people’s consent meaningless

      They consented to many uses, not a particular use.

    4. control often becomes defined as care in emergency situations

      And control is all about who is doing the controlling and who is controlled. These contexts can shift & drift. Data collected for one purpose can be put to another purpose once it becomes available.

    5. This argument, although in line with all existing data protection rules and norms, is problematic in a practical context. Consent without purpose limitation – knowing what one is consenting to – is widely judged to be legally (and practically) meaningless.

      The documents are just too hard to parse, and place in context.

    1. However, very little is known about how these policies translate into the actual appraisal of Web content.

      If you have evidence to the contrary please do get in touch. It would actually help me move on to another problem if this is even partially addressed by someone else's research out there.

    1. This case study performs an in-depth investigation of the way that CDRs could be used to track Ebola, revealing that they’re only useful when re-identified, invalidating anonymization as a balancing approach to privacy, and thus legal, protection.

      This is a fascinating angle. You need to know the identities for the data to be useful in these situations.

    2. there is very little evidence to suggest that CDRs, especially those that have been anonymized, are useful to track the spread of the Ebola virus

      Is there evidence that it's not?

    3. Not only were these information systems unresponsive, they were disconnected from the responders, meaning they didn’t have any ability to answer questions, provide treatment, or even refer people to facilities that could provide treatment

      This is just sad.

    4. e-mail and Google Fusion tables

      Fascinating. Lowest barrier to entry.

    5. Political relationships are one of – if not the – most determinative factor in access to both information and funding support.

      Is political here another word for power?

    6. hat job is no small feat - there were more than 50 separate technology systems introduced during the response effort alone

      This sounds like an interesting study. Do the 50 systems map to 50 organizations? How were they connected up?

    7. It is practically easier and financially beneficial for humanitarian organizations to develop their own information systems, instead of focusing on building functional communication sharing.

      Data sharing is harder than not data sharing. This seems almost obvious? But it's less logical that there can be negative incentives to data sharing.

    8. they discount the value and importance of building functional communication standards and coordination frameworks.

      long term vs short term thinking ; triage in the emergency room

    9. The assumption that open and interoperable data will lead to better health response is untested, as is the assumption that mobile network data records measurably improve health system response efforts.

      Is it not tested, because it seems so logical? If you know person A died of Ebola, and you are able to track As whereabouts for the last 3 weeks, and see who they came into contact with, it's possible (in theory) to identify people A may have transmitted the disease to?

    10. That these powers are largely being outsourced to international organizations without the institutional capacity, processes, regulation, standards, infrastructure, or appropriate risk frameworks, is why we should all be concerned

      These are some powerful organizations.

    11. There has been no public presentation about whether or how mobile data information was actually used – or what the effect of that use was.

      This is particularly damning. There should be some kind of public output when the public's privacy is breached like this.

    12. due process and fair compensation

      Kind of like the Bush Administration routing around the FISA court to obtain the same information in the US.

    13. These laws, taken together, form a broad protection for the privacy of mobile data – requiring user consent or a governmental invocation of emergency powers in order to compel their release.

      Ok, so it looks like the ethics were pretty clear, at leaset in Liberia.

    14. pplicable legal frameworks

      There was probably a fair amount of pressure to act quickly to stop the spread of the disease instead of letting lawyers debate and untangle the ethics of privacy in many different legal jurisdictions.

    15. However, Ebola is not a vector-borne disease, meaning that the same probabilities aren’t a useful indicator of transmission.

      I feel like I should understand what this means, but I don't.


    1. it has been shownthat they are more unreliable than considering high-confidence ma-chine tags

      Wow, that is odd.

    2. Since we areinterested in determining how many photos are taken at night on astreet, we count the number of pictures that are classified asnight,and the number of those that are classified otherwise.

      Flickr generate these tags?

    3. We gather a random sample of7M geo-referencedFlickr pictures within the bounding box of Central London.


    4. We collect information about all the8KFoursquare venues in London.



    1. Our evaluation and validation for three different cities with varied physical layouts shows two important results. First, our methodology constitutes a good complement to model and understand in an affordableand near real-time manner land uses in urban environments. In fact, we have shown that residential, commercial and parks & recreation areas are well identified with coverage above 70%. Also, our approach is able to identify a land use, nightlife activity, not being considered up to now by city halls. This has implications from a planning perspective as these areas usually cause noise and security problems and can move over time.

      So there is very little discussion of the sort of type of lens that Twitter provides. What types of people are likely to use Twitter? What types of people don't use Twitter. What types of people enable geolocation? What does this say about the findings?

    2. On the other hand, Cluster 3 is associated to very large activity peaks at night (see Figure 2(c)). These peaks happen at around 20:00-21:00PM during weekdays and between 00:00-06:00AM during the weekends. We observe that the peaks happen earlier in London and Manhattan while a little bit later inMadrid suggesting that nightlife might continue until late hours in this city. Studying the physical layout of these clusters on the city maps, we observe that they cover areas like the East Village in Manhattan; the West End in London and Malasaña/Chueca and Alonso Martinez in Madrid (see Figure 4), areas associated with restaurants, pubs and discos. All these elements suggest that this cluster might represent nightlife activities

      How is this analysis not qualitative? And isn't influenced by the researchers knowledge of New York City?

    3. n order to validate our land use hypotheses, we compare the evaluation results against official land use data released by the NYC Department of City Planning and the NYC Department of Parks & Recreation through the NYC Open Data Initiative (NYC, 2013);

      So did they have no idea what this research said prior to conducting the study and coming up with the hypotheses?

    4. slightly shifted in time.


    5. currently about one percent of the full Firehose set of tweets

      This is not accurate.

    6. The DB index is used to evaluate clustering algorithms, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset.

      This almost makes sense to me :-)