1,135 Matching Annotations
  1. Jul 2023
    1. Prior to its destruction, the library had reached new levels of growth with laptops, a Wi-Fi hub, and a tent donated by author and rock legend Patti Smith and dubbed “Fort Patti.”

      Fort Patti

  2. Mar 2023
    1. This tension persists through the broader history of the web.

      this is so true -- I think it's very evident in the httpRange-14 debate, which (shameless plug) I tried to write about in: https://arxiv.org/abs/1302.4591

      Also, https://www.ibiblio.org/hhalpin/homepage/publications/hayes-halpin-final-copyedited6.pdf is a good read.

    2. Vulgar Linked Data

      Love this idea of Vulgar Linked Data for how it draws on the history of Vulgar Latin. I wonder how https://linkeddatafragments.org/ might fit into this picture by de-centering the knowledge graph?

    3. The merger of AI shit and knowledge graph shit

      Yes, maybe it's another paper, but it is interesting to consider GPT-n and the massive web scale data extraction that took place to create these LLM models that are gate kept behind APIs, which people are binding their services too with abandon.

      Maybe there's a connection to be made here to the debate around whether these models just represent surface level statistics (Markov chains) or if some sort underlying alien representational model: https://thegradient.pub/othello/

    4. As with search, we should be particularly wary of information infrastructures that are technically open13 but embed design logics that preserve the hegemony of the organizations that have the resources to make use of them.

      well said!

    5. google


    6. “Open” standards are yet another fraught domain of openness. For an example within academia, the seemingly-open Digital Object Identifier (DOI) system was concocted as a means for publishers to retain control of indexing research, avoiding the impact of the proposed free repository PubMedCentral and the high overhead of linking documents between publishers11 (see sec. 3.1.1 in [73]). The nonprofit standards body NISO’s standards for indicating journal article versions [74] and licensing [75] are used by publishers to enforce their intellectual property monopolies and programmatically scour the web to prevent free access to publicly funded information

      The Handle System that underlays DOI is particularly opaque, and thus has led to the single point of failure that is https://dx.doi.org/

    7. After shuttering Freebase, Google has donated a substantial amount of money to kickstart its successor [69] Wikidata,

      Wikidata's inception actually predates Freebase's donation by about two years...

      Denny Vrandečić helped start Wikidata then went to Google to help start their Knowledge Graph and then went back to Wikimedia Foundation: https://en.wikipedia.org/wiki/Denny_Vrande%C4%8Di%C4%87

    8. They are coproductive with the corporate and technical structure of surveillance capitalism, facilitating conglomerates that gobble up as many platforms and data sources as possible to stitch them into an expanding, heterogeneous graph of data.

      The knowledge graph that was meant to live on the web, is extracted from the web and kept almost completely private, with small rich snippets showing up in search results like the tip of an iceberg, with the assertions and structured data hidden below the surface.

    9. Palantir

      Glad you made this connection. In case its of interest there are some details about how entities are linked together in https://logicmag.io/commons/enter-the-dragnet/

    10. That’s all within biomedical sciences, but RELX’s risk division also provides “comprehensive data, analytics, and decision tools for […] life insurance carriers” [35], so while we will never have the kind of external visibility into its infrastructure to say for certain, it’s not difficult to imagine combining its diverse biomedical knowledge graph with personal medical information in order to sell risk-assessment services to health and life insurance companies.

      This section is reminding me of how biomedical use cases were some of the first "real world" implementations of semweb technology.

    11. asymmetry

      This is such an important keyword that sums up how APIs disciplined the web as an information space during the "Web2" period.

    12. We were recast from our role as people creating a digital world to consumers of subscriptions and services.

      Yes, Google recast knowledge graphs as an SEO problem -- "publish your linked data this way, and it will show up in our search results like this".

    13. The mutation from “Linked Open Data” [21] to “Knowledge Graphs” is a shift in meaning from a public and densely linked web of information from many sources to a proprietary information store used to power derivative platforms and services.

      I think this is right. The outlier here being Wikidata I guess?

    14. driven more by an empirical approach of trying to realize these systems on the wilds of the web, creating some of the first public “Linked Open Data” systems like DBPedia and Freebase.

      Also driven by these uncamp-style conferences, vocamps: http://vocamp.org/wiki/Main_Page

      Although arguably these were meant to be more oriented around liberating people to create their own vocabularies, instead of aligning everyone to using the same ones. So perhaps mentioning vocamps here doesn't make sense. They almost represent a deferred future, or road not taken...

    15. [13]

      love this quote!

    16. “people were frightened of getting lost in it. You could follow links forever.”
    17. triplet links

      maybe "triples" will be a more familiar term?

    18. dissolve the Silos

      This reminded me of timbl's original purpose of the WWW: http://cds.cern.ch/record/369245/files/dd-89-001.pdf

  3. Aug 2021
    1. The tool relies on a new algorithm designed to recognize known child sexual abuse images, even if they have been slightly altered. Apple says this algorithm is extremely unlikely to accidentally flag legitimate content, and it has added some safeguards, including having Apple employees review images before forwarding them to the National Center for Missing and Exploited Children. But Apple has allowed few if any independent computer scientists to test its algorithm.

      Even with the white paper they published a lot of questions remain about how this self-supervised ConvNet model is generated and used.

    1. We are privacy and cybersecurity researchers whose careers are built on protecting users. That’s why we’ve been so careful to make sure that our Ad Observer tool collects only limited and anonymous information from the users who agreed to participate in our research. And it is also why we made the tool’s source code public so that Facebook and others can verify that it does what we say it does.

      This is absolutely key.

    2. our work shows that the archive of political ads that Facebook makes available to researchers is missing more than 100,000 ads.

      Need to follow and see if the ads that are missing from the archive. It would be super interesting to see if they are different from the public ones in some way.

    1. Collecting data via scraping is an industry-wide problem that jeopardizes people’s privacy, and we’ve been clear about our public position on this as recently as April.

      How ironic that FB would fall victim to scraping which is the very thing that they did to bootstrap their startup collecting women's faces from university websites.

    2. The extension also collected data about Facebook users who did not install it or consent to the collection.

      What information?

    3. usernames, ads, links to user profiles and “Why am I seeing this ad?” information, some of which is not publicly-viewable on Facebook.

      Why is the "Why am I seeing this ad?" information not public? Why are ads not public? Maybe this is the problem, and not the fact that they had to scrape them?

  4. Mar 2021
    1. "The Facebooks and the Googles are taking over, and they want to make money," Bailey said. The more people act on the internet behind a password and the more the web becomes corporate, the more the open internet ethos fades away from the public consciousness, easing the way toward that splintering that Kahle fears.

      And IA doesn't want to make money?

    2. force

      This is the right word to use here.

    3. Imagine if each of us could look back on our great-grandparents and know what they said or thought at age 15, and then 25, and 50. The Archive would allow that.

      What does this type of memory do to our culture? Is it really a win-win?

    4. Social media companies want us to focus on tomorrow, not on the posts we made a year ago. Publishers do, too. HarperCollins is suing the archive to try to prevent it from sharing out-of-print books in its digital library, arguing that publicly sharing out-of-print books is a massive violation of copyright laws. While at first it might seem odd that publishers would care about books that aren't in print anymore, for companies whose business depends on people buying new things, archiving so that people can focus on the past is not in their financial interest.

      IA is on the wrong side of this one. People are tired of Silicon Valley disruptions. Book publishers might like to dip into their back catalog and republish things. How often does that happen? Why is book circulation practice ripe for disruption by this one organization that stands to profit from it?

    5. The people building open-source translation tools at Mozilla have also found the internet archive's collection of websites in multiple languages useful for training their translation tools

      Interesting, I wonder what that project was and how it worked.

    6. "That's always the dilemma of the librarian."

      Yes, it is an old problem, and not just an idle one of the anxious as Jefferson seems to imply. Or maybe he was taken out of context.

    7. the most important fraction

      Oh, it's that easy? Just find the most important stuff!

    8. There's no use being anxious over what's outside your control," he said.

      What a brush off.

    9. "I'd look like an idiot," he said — because no one really can guess the size or scale of the internet. (Don't get there in your head, if you can avoid it. How would you even measure: by data size? Number of objects? Number of distinct URLs?)

      Companies that have a larger crawl breadth than IA are one way to measure. e.g. Google.

    10. Web pages have an average lifespan of about 90 days before they change or disappear, and so the Archive needs to capture those pages at a minimum of every 90 days to preserve a full picture of the web over time.

      A citation would be good here.

    11. Section 230 — which protects website owners from legal liability for content created and posted by its users — would destroy the delicate legal framework that protects the Internet Archive's work (as well as Wikipedia and user-contributed projects),

      So it appears the IA and the platforms are allied in not updating 230 which allows these corporations to profit from the distribution of disinformation and lies of the powerful. Not that the baby should be thrown out with the bathwater, but clearly some adjustments need to be made to make these companies legally accountable for what happens on them?

    12. At the end of the day, we're just a library.

      Increasingly just the library. This is the problem.

    13. Facebook is the hardest, because the company is archiving-unfriendly in general

      There are some really good reasons why social media should be hard to archive. See https://www.jstor.org/stable/j.ctt7t09g

    14. That could soon change, however. "Are we at risk of locking down? Yes, absolutely," he said. The Internet Archive is currently blocked in China, and occasionally as well in Russia, India and Turkey, and that's just at the whim of nation-state governments that have the tools to make that work. According to Kahle and Bailey, corporations are just as capable of fracturing the web in ways that make it harder to access and archive; even "user lock-in" to a specific browser and products could one day create internet bubbles, and then walls, based on the products people pay for.

      This has always been the case with the web, as soon as the Cookie and site logins were created.

    15. a professor's ID

      Minksy or Hillis? My bet would be Hillis.

    16. until the end of time

      The end of time you say? Now I will have to keep reading. But I guess that was the point.

    17. the Internet Archive will keep doing what it's been doing since 1996: preserving every fragment of text you or I are ever likely to read

      Hyperbole much? Who is likely to read what? The Web is much bigger than the Internet Archive's view of the Web. Little statements like this don't help us understand what is being preserved from the web.

  5. Dec 2020
    1. noarchive

      This implies that the cached content is there, but a link to the content is not provided publicly.

  6. Aug 2020
    1. Epilogue: The Uncertain Climb

      Thinking about Luther the Catholic Church and the printing press provides a lens on thinking about the complicated ways that the Internet is shaping and being shaped by our social and political lives.

  7. Nov 2019
    1. Do any creative projects — including personal poetry, expository writing, etc. — in a journal or on your personal computer, using your personal Google Drive, Microsoft 365 account, or a native text document tools like Notes or TextEdit.

      It's sad to me that things have come to this point where teenagers are asked to not do creative work on the web.

  8. Oct 2019
    1. The new gold rush in the context of artificial intelligence is to enclose different fields of human knowing, feeling, and action, in order to capture and privatize those fields. When in November 2015 DeepMind Technologies Ltd. got access to the health records of 1.6 million identifiable patients of Royal Free hospital, we witnessed a particular form of privatization: the extraction of knowledge value. 53 A dataset may still be publicly owned, but the meta-value of the data – the model created by it – is privately owned.

      AI is part of an old colonial project.

    2. Increasingly, the process of quantification is reaching into the human affective, cognitive, and physical worlds. Training sets exist for emotion detection, for family resemblance, for tracking an individual as they age, and for human actions like sitting down, waving, raising a glass, or crying. Every form of biodata – including forensic, biometric, sociometric, and psychometric – are being captured and logged into databases for AI training.

      The dark side of all the sensors we place around us.

    3. dysprosium
    4. Terbium
    5. With Amazon Mechanical Turk, it may seem to users that an application is using advanced artificial intelligence to accomplish tasks. But it is closer to a form of ‘artificial artificial intelligence’, driven by a remote, dispersed and poorly paid clickworker workforce that helps a client achieve their business objectives.

      Artificial Artificial Intelligence

    6. While ‘off the shelf’ machine learning tools, like TensorFlow, are becoming more accessible from the point of view of setting up your own system, the underlying logics of those systems, and the datasets for training them are accessible to and controlled by very few entities.

      The data is integral, it's not just about having the tools. That's why the tools are made open source.

    7. satellite picture
    8. In 2009, China produced 95% of the world's supply of these elements, and it has been estimated that the single mine known as Bayan Obo contains 70% of the world's reserves.

      This was 10 years ago. I wonder what it looks like today.

    9. There are 17 rare earth elements, which are embedded in laptops and smartphones, making them smaller and lighter. They play a role in color displays, loudspeakers, camera lenses, GPS systems, rechargeable batteries, hard drives and many other components. They are key elements in communication systems from fiber optic cables, signal amplification in mobile communication towers to satellites and GPS technology. But the precise configuration and use of these minerals is hard to ascertain. In the same way that medieval alchemists hid their research behind cyphers and cryptic symbolism, contemporary processes for using minerals in devices are protected behind NDAs and trade secrets.

      Difficult to understand how these rare earth metals are being deployed.

    1. Data science can indeed play a role in addressing deep inequities. Progressive critics of algorithmic decision making suggest focusing on transparency, accountability and human-centered design to push big data toward social justice.

      open source and transparency

    2. Contemporary proponents of poverty analytics believe that public services will improve if we use these data to create “actionable intelligence” about fraud and waste. Daniels, for example, promised that Indiana would save $500 million in administrative costs and another $500 million by identifying fraud and ineligibility over the 10 years of the contract.

      Cost savings is the promise.

    1. The Latin alphabet evolved from the visually similar Etruscan alphabet, which evolved from the Cumaean Greek version of the Greek alphabet, which was itself descended from the Phoenician alphabet, which in turn derived from Egyptian hieroglyphics.[1] The Etruscans, who ruled early Rome, adopted the Cumaean Greek alphabet, which was modified over time to become the Etruscan alphabet, which was in turn adopted and further modified by the Romans to produce the Latin alphabet.

      But did the Latin alphabet really come from Greek?

  9. May 2019
  10. Apr 2019
    1. Organization Guests

      I think this will allow us to have contractors on particular projects?

  11. Mar 2019
  12. archivaria.ca archivaria.ca
    1. The documentary heritage should be formed according to an archival conception, historically assessed, which reflects the consciousness of the particular period for which the archives is responsible and from which the source material to be appraised is taken

      This is the heart of appraisal, but how does one measure group consciousness?

  13. Feb 2019
    1. At any one moment of time there are X amount of tweets in the public firehose. You're allowed to be served up to 1% of whatever X is per a "streaming second." If you're streaming from the sample hose at https://stream.twitter.com/1/statuses/sample.json, you'll receive a steady stream of tweets, never exceeding 1% X tweets in the public firehose per "streaming second." If you're using the filter feature of the Streaming API, you'll be streamed Y tweets per "streaming second" that match your criteria, where Y tweets can never exceed 1% of X public tweets in the firehose during that same "streaming second." If there are more tweets that would match your criteria, you'll be streamed a rate limit message indicating how many tweets fell outside of 1%.

      I'm not sure I've seen this documented elsewhere in the current Twitter documentation. But I believe something like this is still in operation when retrieving data from the filter stream.

  14. Nov 2018
    1. 1,484,166 total views Share Facebook Twitter LinkedIn

      That's a lot of views, and now they are gone?

    1. donations from people like you

      taxes from citizens like you

  15. Sep 2018
    1. With the release of GPGMail 3.0 stable, we will start charging a small fee for GPGMail

      Notification of business model change.

  16. Jul 2018
    1. #### Parker Higgins ##### About Parker Higgins is an activist at the Electronic Frontier Foundation, working on issues of copyright, free speech, and electronic privacy. He also co-authors the weekly IP newsletter \[[Five Useful Articles](http://five.usefularticl.es/)\]. Follow him on Twitter at [@xor](https://twitter.com/xor).

      Hi Markdown, fancy seeing you here.

  17. May 2018
    1. Assange’s previously active Twitter account has had no activity since then

      Both @wikileaks and @julianassange accounts seem active right now (2018-05-16). But I guess the #ReconnectJulian campaign has taken them over?


    1. statistically

      We talked about dropping the word "statistically" since some signals may just be assertions "A was written by B" which in itself isn't statistical.

  18. Mar 2018
    1. Media professionals and everyday users found common ground in noting that at the very least, a news outlet should contact users to let them know that their tweets may be used in a story. As with Asian-American Twitter, both journalists and regular users in Black Twitter said a simple DM could open a line of communication between a reporter and a potential source. Jesse Holland, an Associated Press reporter, said he contacts users to verify tweets, which are essentially quotes. “Verification is probably the first and foremost thing,” Holland said. “Doing that means that you’re actually having a conversation, either by email or in person. It’s very rare that I would just take someone’s tweet and say, ‘This person said that.’”Initiating conversation with Twitter users equips reporters to provide accurate context by going beyond the metrics of what is being retweeted, and why. Simply searching for high retweets and “favorites” can link false narratives to Black Twitter via popular hashtags. For instance, the far right-wing account @prisonplanet had three out of four of the highest retweet counts in our data set, amassing just over 21,000 and 18,000 retweets for two tweets using the #notmypresident hashtag, which Black Twitter used to signal disdain for President-elect Donald Trump. The account gained an additional 13,000 tweets by linking to a video that the user claimed “would be devastating for #blacklivesmatter.” Verification of the identity and intention of users like this, preferably through conversation, is key to understanding the message that is being communicated through hashtags that gain traction on Twitter. Simply relying on Twitter trends to tell the story will not suffice.

      I think this applies for archivists too, because the content is being collected for long term preservation.

    1. Any favorite examples?

      This is a great question! I'll look forward to reading the article :-)

  19. Feb 2018
    1. By “document” here, I mean to capture a comprehensive record, or at least a good approximation, of the present reality that can be consulted today and brought forward into the future.

      I find this idea of documenting "reality" a bit troubling. Archives serve purposes, and if we aren't explicit about these purposes, and instead simply talk about how effectively they document reality we're not doing our job.

      Lynch claims to be offering up "pragmatic" approaches several times in this piece. But the key measure of pragmatism (in the philosophical sense) is the degree to which something is useful. Documenting reality is not a use. What purposes does the documentation need to serve right now.

      I'm as guilty as anyone for pointing to the future and imagining some user who will want to know that something happened. But it's just not a satisfying story to tell.

  20. Dec 2017
    1. Over two-thirds of users were unsure whether Twitter gives public Tweets to the Library of Congress for archiving (and another 11.5% were incorrect). This raises questions about what Twitter users think happens to Tweets in the long term. It also raises questions about whether they are truly giving informed consent for this archiving.Finally, only a slim majority of users accurately indicated that Tweets are set to be public by default. Given the common refrain that Twitter is a “public” platform, having 33% of respondents indicate they are uncertain whether or not Twitter is public by default suggests some users may not actively perceive it this way. This raises many questions about the kinds of literacy work that needs to be done to improve user understanding of what it means for a platform to be “public.” Together, these individual findings suggest that the problems of inaccurate knowledge of information flow highlighted by these three anecdotes may be more common across a wider swath of users.

      This is a really important finding for people who are actively archiving social media, and Twitter in particular. It shows why archivists shouldn't throw consent out the window as it shifts to archiving "public" content on the web.

  21. Nov 2017
    1. Label bots as automated accounts. This is technically achievable and would increase transparency in online political conversation.

      It's interesting to think about who would do the labeling. On the one hand Twitter could try to identify the bots themselves, and label them as such. In another bot creators could identify an account as a bot. Maybe there could be two labels?

  22. Oct 2017
    1. If the fool would persist in his folly he would become wise.

      I sure hope this one is true!

    1. 6.4.4 for 'dns' scheme

      It would be interesting to look at what DNS records a tool like Heritrix puts into a WARC file.

    1. However, relatively little attentionin the literature has been paid to articulating specificallyhow Web-based materials fit into this larger body of cul-tural heritage materials.

      I wonder if Rogue Archives: Digital Cultural Memory and Media Fandom by Kosnik could help answer this?

    2. Duncanand Blumenthal claim that a collaborative approach hasbeen critical to the success of NYARC’s Web archiving ef-forts, allowing curatorial and appraisal effort to be spreadacross member institutions, and helping to meet a varietyof Web archiving challenges, including technical diffi-culties and resource deficiencies. Rollason-Cass and Reedalso cite the importance of collaborating across institu-tions to create and grow the #blacklivesmatter Web Ar-chives. Duncan and Blumenthal suggest that similartrans-institutional collaboration could be encouragedthrough national organizations like the NDSA.

      Collaboration, or participatory archives could provide a useful framework for future work.

    3. earce-Moses and Kaczmarek

      This looks like an interesting article.

    4. For other kinds of collectingefforts, archivists often require donor agreements fromprevious owners of archival materials, expressly handingover control to the archival institution; however, it is oftennot feasible to gain the consent of copyright holders forWeb-based materials due to the sheer scale of collectingand due to the fact that it may not be possible to locate thecopyright holder in many cases.

      A real challenge for consent.

    5. Determining the scope, scale, intensity, and fre-quency of collecting, all constitute important appraisaldecisions that shape the resulting Web archives.

      There is a decidedly academic archive/library feel to this paper--but it's not really clearly scoped that way.

    6. Rollason-Cass and Reed describe this approach as aSpontaneous Events model, or a Living Archives model, asthese collecting programs respond and adapt to ongoingdevelopments, offering the example of the #black-livesmatter Web Archives at the IA.

      Does the concept of a LIving Archives come from somewhere else?

    7. While it is difficult for archivists to measure thesuccess of their appraisal activity,

      Indeed, what does success even look like for an archive?

    8. If everything from the past issaved, it becomes close to impossible to actually find sig-nificant materials.

      There is also the significance of wanting to forget material. Thinking about Mayer-Schönberger's Delete.


    1. In the below text, we extend these metrics to encompassdynamic graphs, as well as define some new metrics that areunique to dynamic graphs.

      Extending metrics work of Dunne & Shneiderman.

    1. One could, for example, imagine an honest business model – in which people paid an annual subscription for a service that did not rely on targeting people on the basis of the 98 data-points that the company holds on every user.

      This is why I've started paying Medium. At least they are trying. Twitter, are you listening?

    1. function get_resource_info(url) { ajax("HEAD", url, function(response) { if(response.status==200) { $wmloading.style.display='none'; var dt=response.getResponseHeader('Memento-Datetime'); var dt_span=document.createElement('span'); var dt_result = datetime_diff(dt); var style = dt_result.highlight ? "color:red;" : ""; dt_span.innerHTML=" " + dt_result.text; dt_span.title=dt; dt_span.setAttribute('style', style); var ct=response.getResponseHeader('Content-Type'); var url=response.responseURL.replace(window.location.origin, ""); var link=document.createElement('a'); // remove /web/timestamp/ from appearance link.innerHTML=url.split("/").splice(3).join("/"); link.href=url; link.title=ct; link.onmouseover=highlight_on; link.onmouseout=highlight_off; link.setAttribute('style', style); var el=document.createElement('div'); el.setAttribute('data-delta', dt_result.delta); el.appendChild(link); el.append(dt_span); $capresources.appendChild(el); if(dt_result.highlight === true && show_warning_icon === false) { display_warning_icon(); } // sort elements by delta in a descending order and update container var items = Array.prototype.slice.call($capresources.childNodes, 0); items.sort(function(a, b) { return b.getAttribute('data-delta') - a.getAttribute('data-delta'); }); $capresources.innerHTML = ""; for(var i=0, len=items.length; i<len; i++) { $capresources.appendChild(items[i]); } } }); }

      UA little function that uses a HEAD request to the wayback machine to determine the time gap between a web page and its constituent parts.

    1. The Twitter account, @Blacktivists, provided several clues that in hindsight indicate it was not what it purported to be. In several tweets, it employed awkward phrasing that a native English speaker would be unlikely to use. It also consistently posted using an apostrophe facing the wrong way, i.e. "it`s" instead of "it's."

      Well that`s interesting.

  23. Sep 2017
    1. a pre-study we conducted to se-lect the pair of least diverging topic

      This was an important part of the study. To make sure that the story didn't bias the results that were supposed to be about the graphics.

    2. Amazon Mechanical Turk(AMT).

      Do they have any idea who these people are?

    3. Although we never intended to test all of them—ourgoal was to assess whether anthropographics generally havean effect on empathy and donating behavior, not to test formost effective designs—and although we do not claim theyare exhaustive, we wanted to get a sense of the creative pos-sibilities.

      A more interesting experiment might have been to compare all these variations?

    4. Second, because we believed using only proportionaldata—instead of using proportions and absolute numbers—would help avoid possible confounds linked to theproportiondominance bias

      Couldn't there be other biases built into the stories?

    5. standard chart(baseline)

      What is a standard chart?

    6. t is sometimes unclear whether they sharethe same definition ofempathy

      This is important, and is directly related to how empathy is measured, since this is an empiriical study

    7. his complements Batemanet al.’s call to learn more about the effects of different types ofvisual embellishment in charts [6], and opens new perspectivesfor exploring the benefits of anthropographics.

      Isn't anthropographics a bit of a redundant term? All graphics are meant for humans aren't they? What other assumptions is this anthropological approach cooking into it?


  24. Aug 2017
    1. Ed Summers, a software developer at the Maryland Institute for Technology in the Humanities, graciously offered to grab some basic information about the more than 11,500 suspected new bot followers that were still following my account earlier this morning. An analysis of that data indicates that more than 75 percent of the accounts (8,836) were created before 2013 — with the largest group of accounts (3,366) created six years ago.

      It's nice to get a shout out from Mr Krebs.

    1. Google Scholar supports Highwire Press tags (e.g., citation_title), Eprints tags (e.g., eprints.title), BE Press tags (e.g., bepress_citation_title), and PRISM tags (e.g., prism.title). Use Dublin Core tags (e.g., DC.title) as a last resort - they work poorly for journal papers because Dublin Core doesn't have unambiguous fields for journal title, volume, issue, and page numbers.

      It looks like Google Scholar look for a variety of metadata.

  25. Jul 2017
    1. While innovation — the social process of introducing new things — is important, most technologies around us are old, and for the smooth functioning of daily life, maintenance is more important.

      This seems so obvious, but it's so overlooked. The addiction to constant growth and newness seems so closely tied to our ideas of how markets operate. Consumption and waste trump conservation and repair.

    1. One strategy that UTL employed in collaboration with project partners to address challenges of agency, differential access to resources, and the most direct application of benefit was very deliberate transactional use of project funding. Rather than assume transfer of documentation to UTL—either through donation or purchase—as required under the custodial paradigm, UTL instead helped to arrange and purchased negotiated access to documentation that remained in the custody or control of the partner organization. Project funds were put toward the arrangement, description, preservation, and digitization of documentation, just as they would have been if the archival materials were at UTL. But the investments were made not in Texas, but locally with the partner organizations. In this way,the partner organizations and in some cases communities were able to build infrastructure and skills in digitization, metadata, software development, and preservation appropriate to the context of their organizational goals and uses of the documentation. And in two cases at least, the human rights organization developed significant local expertise that served them well beyond their partnership with UTL. Additionally, rather than acquire the original records themselves—as called for under the custodial paradigm—UTL sometimes purchased digitized copies of documentation or gained non-exclusive access to documentation as they andpartners made it available online. Though somewhat unusual for a custodial archival repository, this system was very familiar and comfortable for UTL as an academic library that annually spent hundreds of thousands of dollars for access to databases.

      I love this logic for post-custodial investment in record keeping infrastructures which draws the comparison to the way we pay for access to information that we do not own, but which instead only lines the coffers of corporations.


  26. Jun 2017
    1. Designers often manipulate the circle visualization that purports to track app-download progress, front-loading it so that it moves slowly at first but then speeds up at the end. This allows the download to please us by seeming to beat our expectations, which were established by the contrived slowness.

      It would be interesting to learn the source of this.

    1. Many publiclibraries have active local history collections of print materials documenting their region. These materials, however, are now increasingly published exclusively online. But technical hurdles, the absence of training resources on web archiving for local history collection development, and the lack of an active network of peer practitioners have hindered the capacity of public libraries to expand into community-focused web archiving.

      It's not just a technical problem. How do librarians and archivists decide what to collect from the web? Will people come and ask to donate their content? Should they focus on public domain government material? How can they use adapt existing collection development policies to meet the web?

    1. Hans Ulrich Obrist and the artists Philippe Parreno and Olafur Eliasson all used the same word to describe his oeuvre: it’s a “toolbox”, they said, from which they can pluck useful ideas.

      Huh, a pragmatist perhaps?

    2. Morton means not only that irreversible global warming is under way, but also something more wide-reaching. “We Mesopotamians” – as he calls the past 400 or so generations of humans living in agricultural and industrial societies – thought that we were simply manipulating other entities (by farming and engineering, and so on) in a vacuum, as if we were lab technicians and they were in some kind of giant petri dish called “nature” or “the environment”. In the Anthropocene, Morton says, we must wake up to the fact that we never stood apart from or controlled the non-human things on the planet, but have always been thoroughly bound up with them.

      This idea that we're all Mesopotamians strangely reminds me of Phillip K Dick's idea that the empire never ended, but in this case the one from Iraq, not Rome.

    3. He says that we’re already ruled by a primitive artificial intelligence: industrial capitalism.

      This is the kind of idea that you can't unthink once you've thought it.

    1. \\

      In MARCBreaker format a backslash is used instead of a space.

  27. May 2017
    1. “Tastes, Ties and Time” dataset comprising four years’ worth of complete Facebook profile data harvested from the accounts of cohort of 1,700 college students.

      It's significant that the data is called a dataset which is a collection of data pulled from a website. In this aggregated form it affords different types of analysis. Sharing this dataset on the web takes the 4 years of work and makes it instantly available to anyone else in the world.

    1. Second - Researchers affiliated with an academic institution accredited by a member of the Council for Higher Education Accreditation remain able to share an unlimited number of Tweet IDs for non-commercial research purposes, subject to all of the other provisions and rules of the Developer Policy and Agreement.

      This is super news!

    1. So I’m going to appropriate the term to put a label on a few ideas.
    2. Firstly, there’s more that could be done to build better ways to deep link into pages, e.g. to allow sharing of individual page elements. But people have been trying to do that on and off for years without much visible success. It’s a hard problem, particularly if you want to allow someone to link to a piece of text. It could be time for a standards body to have another crack at it. Or I might have missed some exciting process, so please tell me if I have! But I think something like this would need some serious push behind. You need support from not just web frameworks and the major CMS platforms, but also (probably) browser vendors.

      What does success look like? You can for example link to this paragraph using Hypothesis' annotation.


      It would be nice if annotation was built into browsers somehow so annotation wasn't so dependent on a particular service. Maybe we'll get there someday. Or maybe we just need to use the tools that already exist more?

    1. Firstly, there’s more that could be done to build better ways to deep link into pages, e.g. to allow sharing of individual page elements. But people have been trying to do that on and off for years without much visible success. It’s a hard problem, particularly if you want to allow someone to link to a piece of text. It could be time for a standards body to have another crack at it. Or I might have missed some exciting process, so please tell me if I have! But I think something like this would need some serious push behind. You need support from not just web frameworks and the major CMS platforms, but also (probably) browser vendors.

      Hypothesis annotation let's you do this, try this URL https://hyp.is/A9qIOj7dEeef5tPxZEUiYw/blog.ldodds.com/

      It would be nice if annotation was built into browsers somehow so it wasn't so dependent on a particular service. Maybe we'll get there someday.

    1. Facebook activity, if concentrated strategically, could be influential. Was the activity mostly in swing states? Did it occur in the months of the Republican primaries and originate with accounts seeded from Russia? Or did fake-news and fake- account activity peak in the three days before the election?

      Answering these questions is of great public interest. The question about whether activity clustered in swing states is of particular interest because it would suggest a clear motive to influence the election and not simply generate clicks.

      I'm guessing it is possible to see if a Facebook account was created by someone with a Russian IP address, and if a post came from a Russian IP address. But I think it's important to remember that this doesn't necessarily mean the fake news campaigns are an act of the Russian government. The posts could very well be the product of a business enterprise that is making money by generating fake news for clients. The question then would be, who are the clients? Follow the money.

    1. For him, like most of us, e-mail is a “habitat” rather than an appli-cation (Ducheneaut and Bellotti 2001),

      This could be an interesting lead to think about in terms of appraising social media content.


    1. Whereas public archives should be appraised and preserved for both evidential value and informa­tional value, private manuscripts do not possess evidential value and are preserved only for their informational or research value, or their potential for use in research.

      Doesn't this presume that public records are not appraised? The reality is that not everything is kept, and some are selected -- just as is the case with personal archives.

    2. Most acquiring archives engage in some selection and arrangement that would threaten the “archive character” of the fonds.

      The same is true of non-personal records too isn't it?

    3. Jenkinson accordingly was suspicious of the prac­tice of acquisition, where an archives acquired a fonds created by another individual or organization, of which he observed: “Turning to the other kind of Archives, that of documents written originally by one person or body and preserved by another, we have not of course the same guarantee against forg­ery or tampering, because there are now two sides involved and either may have a motive for deceiving the other.”

      ... such devotion to the institution, as if individuals working in government didn't alter the record themselves ...

    4. His Creed, the Sanctity of Evidence; his Task, the conservation of every scrap of Evidence attaching to the Documents committed to his charge; his Aim to provide, without prejudice or thought, for all who wish to know the Means of Knowledge.

      reminds me so much of Open Data mantras


  28. Apr 2017
    1. They already police their networks for pornography, and quite well.

      I'm no expert, but is this really the case?

    1. their ia_archiver web crawler consults a publisher’s robots.txt to determine what parts of a website to archive and how often

      I've since heard from several people that the Internet Archive does not respect robots.txt when crawling at all, and that the robots.txt is only consulted when deciding what archived content to make available in the WayBack machine.

      I've never actually looked at my logs closely enough to confirm this, I guess because I've never actually told ia_archiver to go away either ...

    1. In identifying the need to shift from post-modern theory to archival performance,Schwartz brings diplomatic and photographic theory together to demonstrate thearchivalvalues of one specific form—of photographs

      Archival performance -- I like the sound of that, and how it could relate to performativity more generally.

    2. With the myths of the simplicity of the format-neutral world beginning to unravel,archivists can now grasp the complexities of debates about form and practice that newformats, in particular electronic records, are stimulating. Ultimately, the advent of thespecial format of electronic records has the potential finally to eradicate the pestilence ofthe traditional format-neutral stance from archival thinking and practice

      Strong words! So can we even talk about an "archival science" after this?

    3. the critical characteristic is that a record has to be linked to doing something – it isinherently transactional in its nature

      love this focus on "doing something" does that come from Jenkinson, or somewhere else?

    4. The movement from an archive to a collection is characterised by achange in the unity and coherence which is derived from the collector who ‘‘constructs anarrative of luck which replaces the narrative of production’’ (Stewart1984, p. 165).

      It's interesting to see the archive put in conversation with collections like this. It reminds me of some discussions about web archives and whether they were in fact had more to do with collections than archives.

    5. However, many historians, including the late Raphael Samuel who considered archivistsand librarians as the ‘‘Poor Bloody Infantry of the profession’’ (Samuel1994, pp. 18–19),have yet to understand the active role that archivists and librarians play in ‘pre-cooking’the raw materials of history (Elkner2003, p. 55).

      Troulliout too.

    6. urportedly format-neutral approach to archival research andpractices

      reminds me of ricky's position on needing to specialize in different formats -- the ideas need not necessarily translate across formats?


    1. A return to the older notion of “multiplying the copies” may make more sense: such copies will not, by definition, be unique, but will that make any difference?


    2. Unlike manuscripts, printed documents, photographs, and other traditional forms of records, electronic records have no material existence—at least none that can be perceived without the intervention of both hardware and software

      Makes me think of Kirschenbaum's Mechanisms.

    3. With tongues deep in their cheeks, archivists might try to assert that this represented a unique assemblage of information, but as an image constructed deliberately to lie, to misinform (“disinform,” perhaps), does it have value? The assumption that uniqueness is a positive quality in records—keep the information that is unique and disregard that which is not—is thus under serious attack.

      Reminds me of fake news.

    4. One medieval historian, for example, has estimated that a book of laws, compiled from other sources in ninth-century Italy, cost the equivalent of ninety-six two-pound loaves of bread, a staggering sum for the time.20

      Haha, how random. A book of laws sounds kinda voluminous though.

    5. Such diversity of opinion in specifying what the idea of uniqueness really means may indicate that, like many other archival ideas, this one is clearest if one has in mind a very narrow range of archival materials.

      This is really interesting, so the nature of the materials being considered shapes the type of uniqueness that is considered?

    6. Writers who approach uniqueness in this way have taken a step back from both the documents themselves and the information in them, emphasizing instead the processes that generate both.

      This makes me wonder about the approach we are taking in DocNow to appraisal. What are the ways that we can provide insight into the processes rather than the documents themselves? Perhaps the DocNow application itself is an embodiment of a macro-appraisal process?

    7. What have we meant by them in the past, and how have those meanings, which we encounter as fixed absolutes, evolved through time?

      Tracing this evolution over time does seem like a valuable thing to do. To unpack this assumptions that we kind of take for granted.


    1. This level of institutional mediation in providing access to cultural heritage information supports what Bourdieu has termed the “hierarchy of genres.” Within the fields that facilitate the production of culture, the symbolic production of art and literature is defined by their institutional treatment. Thisstatus creates a hierarchy of genres within each field that has been debated from Plato to the nineteenth-century Salons of Paris. The present-day translation of Bourdieu’s hierarchy as it applies to the field of art manifests in the digital environment, where the most important creators and artistic genres are reproduced online at an extremely high frequency (e.g., images of the Mona Lisa, paintingsby Picasso, etc.), while works lower in the hierarchy may require more specific search terms. Within the digital environment the hierarchy is expressed through metadata.

      This seems like an interesting connection: the hierarchy of genres and metadata being considered together. The primacy of metadata in the digital preservation field in part speaks to the role that text plays in archival systems. Perhaps this may be supplanted by other forms of image reading, e.g. by machine-learning, etc.

    2. In physical form, there exist “complex problems with the relationship of physical structure, intellectual integrity, and the representation of spatial hierarchy,” which are eliminated or left out in digital form.

      Don't these relationships exist in the digital as well -- they are just different, and reflect the digitization practices.

    3. Therefore, an implication of digitization of which archivists should become aware is the loss of physicality and the material information that supports archival value.

      This seems like an important but somewhat obvious observation. It kind of reminds me of Benjamin and the idea of aura.

    4. Through an exploration of a work’s materiality by considering the evidence of its manufacture, as well as the work’s origins, history and social existence, it becomes apparent that there is a distinct separation of form and content that leads to a further consideration of the object as a document.

      What is going on here?

    5. She questions how much longer textual models can be applied to visual materials with impunity, and suggests that it is necessary to reach outside of thearchival discipline in order to improve the standard approaches to the processes of appraisal, arrangement, and description of visual materials

      I wonder if some of these ideas could apply to multimedia as well.

    6. The problem with Taylor’s assessment techniques is that he is determining the value of a work of art based on content alone, and ignoring the object’s material qualities.

      What does this mean?


    1. . Opting out of a site like Google would mean opting out of much of online life.

      It's doable.

    2. You should be able to stop the malware on your refrigerator from posting racist rants on Twitter while still keeping your beer cold.

      OMG, this is sci-fi right?

    3. is the world’s de facto email server


    4. This year especially there’s an uncomfortable feeling in the tech industry that we did something wrong, that in following our credo of “move fast and break things”, some of what we knocked down were the load-bearing walls of our democracy.

      One sentence that sums up so much. Maciej is such a fine writer.

    1. One avenue of future research isthe dynamic relationship between the people who work in digitization factories andthe machines and materials of their labor.

      Reminds me of Sarah Roberts work on Commercial Content Moderation (CCM). https://illusionofvolition.com/behind-the-screen/

    2. University College London scholarGeoffrey Yeo (2009, p. 59) pinpoints the source of ‘‘secondary provenance’’ in thematerial–custodial history of archival collections. He argues that the interpretationof records is affected by the ‘‘previous selection and aggregation decisions’’ takenby both creators and custodians.

      It's interesting to think about the moments when creator and custodian roles overlap.

    3. Thomassen frees the analysis of the archives to encompass informationresources that are not or never have been a part of a formal archive, including digitalsurrogates.

      It's interesting that Conway finds something new in this piece by Thomassen, who was trying to just give an overview and not posit new things about the archive. It sounds like Thomassen could be an interesting to read.

    4. Generations of archivists, beginning with Sir Hilary Jenkinson (Ellis, p. 197),have rejected the archival nature of surrogates, considering them ‘‘artificialcollections’’ at least one step removed from the original source and therefore subjectto even stricter tests of authenticity and reliability (Smith1999).

      Ahh, so this is counter-position that motivates the argument.

    5. The distinction between digitizing for access and digitizing for preservation, sodeeply embedded in the professional perspectives of archivists and librarians, isartificial and misleading. In the digital world, access is the natural and obviousoutcome of digital transformation, even if access is fully realized only throughfunctioning electronic networks and the legal frameworks that manage permissions.

      Preservation is access, in the future. Apologies to @dbrunton ...

    6. This article will argue—and provide some supporting evidence—thatone of the most significant requirements for the long-term care of collections ofdigital surrogates is to respect these collections as archives in their own right,worthy of management, and maintenance as a record of their creation, organicexistence, and use

      Did anyone ever argue otherwise?


    1. when we reflect on the core of digital libraries weeasily observe that they may be libraries by name, but they are archives bynature.

      This is a great way of summing up why archives matter on the web. I ran across it in Paul Conway's https://link.springer.com/article/10.1007/s10502-014-9219-z

      Aside: It's kind of funny how the conversion from PDF to text transformed "we easily" to "weasily"

    1. All résumé writers in this corpus tended to usecertain types of subcategories of adjectives more than others when constructing their texts.

      Is it statistically different than the use of adjectives in other types of written or spoken language? Is an assumption being made that all these categories should be equal?

    2. Noun Subcategories

      how was this conceptual grouping of words done?

    3. The three most commonly used parts of speech relating to content words in the résumécorpus (1,631 tokens) were nouns, verbs, and adjectives,

      this is not unusual I take it?

    4. same set ofcommunicative purposes


    5. Bawarshi (2003) states: “There are the various social forces that constitute the scene of productionwithin which the writer’s cognition as well as his or her text are situated and shaped” (p. 5) andthat “genres function [for] their writers, readers and contexts” (p. 8). Thus, in addition tounderstanding the linguistic forms of a language, writers need also to recognize the socialfunctions of a text before constructing it.

      The function of text is important for being able to construct it. This is context. It's funny how 'text' is in the word 'context'.

    6. A text type can be considered a genre if there issufficient repetition for the readers and writers to recognize a text as a systematic example

      importance of repetition to defining a genre


    1. If relationships between mainstream archival repositories and the community archives ofthe near future are to be mutually beneficial, then they must be based on mutual respect,flexibility and sensitivity towards the concerns of community archive groups for autonomyand ownership.

      Some good lessons here, for archives not just of Occupy but communities of all kinds.

    2. [t]he relationship between sustainability andautonomy is one of the most important dilemmas...for community archives’

      This is a significant observation ... that the two are interrelated.

    3. Bold argues that the purpose ofthe group ‘should NOT be merely in collecting the objects we determine have worth in thehistory of the movement, but trying to offer a framework in which everyone candefine their history’.

      Good aspiration, but kind of vague really. I much prefer the way @Carter:2017 positions the archive as an instrument of power for these groups.

    4. Just as wide participation is important in theOccupy movement itself as a means of organizing without leaders or hierarchies, so tooparticipation in the community archives of the movement is a means of constructingarchives without hierarchical relationships between professional archivists and a non-professional user community.

      While there may not be a hierarchical relationship there is still a relationship. The nature of this relationship would be useful to investigate further.


    1. While Duranti’s argument centres around the idea of the formal institutional archive as the site of archival power, the idea of ‘archives as a place’ is arguably even more valid for community groups who look after their own records. For these groups, it is the autonomy and freedom of owning their own records and independent space which grants the archive ‘recognition and empowerment.

      This is such a crucial observation, and nicely rehabilitates/humanizes Duranti's ideas about provenance and power.

    2. nstead, there is an implicit understanding that resources are there to be used; the ephemerality that surrounds the archive serves as an invitation for readers to explore the collections

      Love this...

    3. This notion of ‘marking out a space’ is interesting in relation to the archive, as it reinforces the political nature of archival activity as rooted in the struggle for independence; with the emphasis firmly on carving out identity, culture and space.

      It's interesting to note the scale between Duranti's notion of archive as place, and the places that are being described here. The difference in power dynamic is significant, but in both cases the archive empowers.

    4. The archive serves a role as a ‘future-past’54 a way of using history to look forward to a more hopeful future. For Southwark Notes, the archive has to serve this more practical, politically engaged function: ‘if we’re not inspired into the future, then what’s the point? [... if the archive] doesn’t collectivise the people coming together, in a way it’s failed in its historical task

      This is a great quote!

    5. put it out there on the web and let people make of it what they will

      Access & Transparency

    6. ‘the rebellion of the archivist against his normal role is not [...] the politicising of a neutral craft, but the humanising of an inevitably political craft.

      So not a rebellion at all really -- just woke, as it were.

    7. as evidenced by the recent special issue of Archival Science focusing entirely on archiving activism and activist archiving, as well as the ethnographic research carried out by Susan Pell, in which she discusses how housing activist groups use archival power as an ‘enabling force’ in knowledge production

      All good things to read!


    1. I would argue, then, that the DIY institutions of this study are not simply storage sites of popular music’s material past. Rather, DIY institutions of popular music are, like other types of community archives, ‘a mediated social and cultural practice’, in which memory is mobilized to engage with the present and work towards an understanding of the future

      I like this definition. What does it have in common with non-DIY archives? Is there such a thing as a non-DIY archive?

    2. These grassroots archives, museums and halls of fame, sustained as they are by the enthu-siasm of volunteers, are indicative of a community-based desire to control how popular music’s past might come to be remembered. These are communities of consumption with a deep investment in the culture being collected and significant concerns over the way music’s past will appear for the future.

      What would you call this mode of thinking? Is it archival thinking? I almost wanted to say teleological but that's not right.

    3. The notion of the DIY institution is closely related to Andrew Flinn and colleagues’ conceptualization of ‘community archives’.13 Flinn’s framing of the community archive informed the ways in which institutions were categorized as ‘DIY’ in the database.

      It's probably important to follow up on this work of Flinn's in the context of the Documenting the Now grant.

    4. ‘DIY institutions’

      It's interesting/amusing to consider what DIY has in common with the "lone archivist" scenario.

    5. site observations and 125 semi-structured ethnographic interviews with founders, volunteers and other heritage workers,

      Holy fuck that's a lot of interviews.

    6. DIY institutions are akin to community archives; what Flinn, Stevens and Shepherd refer to as ‘any collection of material that documents one or many aspects of a community’s heritage, collected in, by and for that community and looked after by its members’

      This seems like a useful definition that isn't overburdened with too much archival terminology.


    1. he approach alsodecentralises curation to the participants who both contribute to and use the archive. It alsoallows communities and contexts to take form within the archive instead of assuming acommunity as a precoordinated entity.

      But the system selected for building the archive comes with a lot of precoordinated ideas and functionality doesn't it?

    2. articipatory archive assumes no consensus on order, no first order oforder (Weinberger2007, pp. 17–19), just the necessity of keeping information findable.

      This goal kind of reminds me of MPLP.

    3. In wiki systems, the most significantarchival functions-related problem was the unstructured nature of the records and theresulting difficulty of maintaining the integrity of an archive.

      It's interesting that lack of structure is desirable and a drawback...

    4. number of digital library-related software packages weredesigned to house a collection of books or other published digital material. Thesystems did not accommodate storing archival records or archaeological data.

      It would be interesting to dig into this a bit more. What functionality was lacking?

    5. ven though the informants agreed with the benefits of standardised descriptions, theyalso underlined the need to be able to insert data and descriptions as they are.Conventions vary between different scientific disciplines and some preliminary andnew observations may not be formal enough to fit in a standardised descriptor format.Similarly, it was agreed to be more important to capture as much relevant informationas possible than to strictly enforce a formal descriptive scheme.

      This is an important observation I think -- and one that has a lot of ramifications.

    6. Thus, the purpose of the digital archive is to reunite the collection andmake these scattered materials available in a single location on the web.

      Ricky's dissertation topic -- archival reunification.

    7. ualitative document analysis

      A research methodology being used to analyze archival materials. How common is this?

    8. raditionally, the principal group of users has beenresearchers. Researchers, primarily with social science background, are still in themajority, but a significant portion of visitors are at the moment people looking forinformation instead of data.

      Is this an easy distinction between data and information, in the archival context?


    1. Though Prince had no formal role with the Trump campaign or transition team, he presented himself as an unofficial envoy for Trump to high-ranking Emiratis involved in setting up his meeting with the Putin confidant, according to the officials, who did not identify the Russian.

      There's only one Prince. Blasphemy.

  29. Mar 2017
    1. but the latter is just as important to sustained success

      What does success even mean in this context? And what does failure look like? Are these useful ways of framing the work that libraries do when you are doing broken-world-thinking. As Jackson points out failure is the the site of repair, and of innovation. It is a beginning, and an opportunity to reinvent.

    2. In a sense, bibliographic vocabularies have always been "broken".

      It's important to come to terms with this :-) I kinda wish the whole article was about this sentence.

    3. What does broken world thinking mean for vocabularies? Recognition of the fragility of current systems and preparation for inevitable breakdowns; building maintenance functions directly into tools, workflows, and budgets, and including documentation, preservation, and terms of use from the moment projects are conceived.

      I think it also means surfacing and showcasing this maintenance work as Wikipedia does. Not just the vocabulary designers but the people who use the vocabulary to describe things. Who are the people behind our records?

    4. Note, however, that the principle of vocabulary reuse is not universally valid or held.

      It's also a false dichotomy IMHO. You can do both.

    5. What allows MARC to endure, even while ostensibly better options exist is that MARC is still the basis for the Library Management Systems widely in use. Library vendors have been slow to look at new formats for data and integrate them into their offerings, largely because they represent a significant investment in an environment shrinking rapidly by vendor mergers and acquisitions.

      It's interesting to think of the resistance to move from MARC as a function of the maintenance work that goes into doing description ... as opposed to doing standards work.

    6. The result has been that most activity based on shifting to more Semantic Web practices focuses almost entirely on exposing data on the Web, rather than figuring out how to use data provided by others. This reality has tended to limit creation and distribution of vocabularies not part of the traditional practices of library description.

      It's also a question of dependencies. There are risks associated with making your own data workflows dependent on those of another organization.

    7. Moreover, some of the most consequential innovations involve new workflows for seemingly mundane tasks. Consider the profound impact of Git, for example, on distributed version control or Slack on team messaging and file management.

      And what could be more mundane than standards work, which is where efforts around bibliographic description tends to focus so much of their attention. I wonder though if standards work is the opposite of mundane in the bibliographic world. It is a site of political maneuvering, and where the stakes are the highest.

    8. We suggest that a lack of appreciation for broken world constraints leads to unbalanced priorities, where investments flow largely to novelty projects and prototypes, and insufficiently to persons and protocols that enable maintenance and repair, and thus sustainable innovation.

      This is an important insight for managers of data infrastructures. The trouble with maintenance work is that it is often invisible until it breaks. How can data maintenance work be made more visible?

    1. As part of the process, the software recognised which parts of a page were pictures in order to discard them.Mr Leetaru's code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.

      Using the OCR data (as represented in the epub) to extract images from images of book pages.

    1. Yeah, I think that’s one of the struggles with this case. It has gotten an extraordinary amount of attention, which Jocelyn can share how that has really energized her father and energized her family. The amount of people that are fighting back. In some ways, that can have the kind of deterrent effect that we know that Donald Trump wants. We know that he wants, as we heard in some of the previous stories, to make people afraid of coming to the United States. To make this country unwelcoming of immigrants. And a case like this getting the kind of attention that it’s got can have that effect, unless we use this case to fight back. Unless we refuse to be desensitized to the kind of horror and trauma that we see in the video that Fatima shot. Unless we use that as an opportunity to mobilize and to fight back and to say, “Not in our backyards, not in our cities, not in our community, not in our country.” We’re going to not be complicit in this kind of brutal immigration enforcement actions from this administration or any administration.

      This is a super important point re: the Trump administration psychological manipulation of the media.

    1. he majority of video archiving solutions today focus on just the default version of a video or the highest resolution version, rather than attempting to archive all editions of a stream.

      Is this a problem, especially in a world where it's difficult to archive video because of storage constraints?

    2. While numerous utilities exist that are able to reverse the streaming protocols used by major video hosting sites, the sites themselves rarely offer officially sanctioned APIs for bulk downloading large volumes of their content as raw video source files.

      Utilities like (the amazing) youtube-dl which works with a lot more than just YouTube.

  30. Feb 2017
    1. Deciding what, exactly, is worth archiving is largely left up to the commander in chief, said Sharon Fawcett, assistant archivist for presidential libraries at the National Archives and Records Administration from 1969 to 2011.

      largely eh?

    1. Almost immediately, this method of traditional, hierarchical arrangement broke down as the university began its relentless shifting of administrative reporting lines.

      What does this breakdown actually look like in an archive?

    2. Robyns borrowed the following elements of the Canadian model to help appraise the relative importance of office functions

      shows that they are consulting the research literature -- one of Ricky's pet peeves when they do not...


    1. Archivistshavecometoacknowledgeandparticipateinsuchdocumentaryactivities,butaprofessionalconsensushasnotemergedabouttheirlegitimacyornecessityasaregularpartoftheresponsibilityofanyinstitutionalarchivist.

      This is an interesting aspect to the work of the archivist. Have things changed in the almost 50 years since this was written? It it possible that other professions have absorbed archival thinking: law, humanities, etc?

    2. Ratherthanaskingwhatexists,thequestionshouldbewhatisthevalueoftheavailableinformationtoprovideevidenceaboutthephenomenon.

      This seems like a key question to ask.