53 Matching Annotations
  1. Mar 2025
    1. Page not found This question was removed from Stack Overflow for reasons of moderation. Please refer to the help center for possible explanations why a question might be removed.

      A link leads to this page. I want to see what was here before.

      1. This is too generic an error message! Why was it removed?
      2. I assert that it would be better to keep it around than to delete and prevent people from enjoying the content that was found there. That is very heavy handed and unfair to those who still want the content, to simply delete the question, answers, and everyone comments and contributions to it.

      Here is a snapshot, but crucially, the "next page" and "show more comments" links are broken: https://web.archive.org/web/20101008061929/http://stackoverflow.com/questions/164432/what-real-life-bad-habits-has-programming-given-you/164556

  2. Feb 2025
    1. Browser add-on: Save Page WE Firefox / Chrome A firefox/chrome add-on which is lighter than the web-recorder mentioned below, and which worked well for a subset of use cases. Configurable, flexible, and can optionally scroll pages in order to retrieve lazy-loaded content. It inlines images, scripts, fonts, etc as data-URLs producing a single big standalone HTML file.
    2. It's not possible to do this with many websites these days. And for sites that seem like it's possible, it would still require some Javascript experience for reverse-engineering and "fixing" the scripts that are saved to your computer. There is no single method that works for all websites, you have to work through each unique problem for every site you try to save.
  3. www.webcitation.org www.webcitation.org
    1. Authors increasingly cite webpages and other digital objects on the Internet, which can "disappear" overnight. In one study published in the journal Science, 13% of Internet references in scholarly articles were inactive after only 27 months. Another problem is that cited webpages may change, so that readers see something different than what the citing author saw.
    1. A U.S. court has recently (Jan 19th, 2006) ruled that caching does not constitute a copyright violation, because of fair use and an implied license (Field vs Google, US District Court, District of Nevada, CV-S-04-0413-RCJ-LRL, see also news article on Government Technology). Implied license refers to the industry standards mentioned above: If the copyright holder does not use any no-archive tags and robot exclusion standards to prevent caching, WebCite® can (as Google does) assume that a license to archive has been granted. Fair use is even more obvious in the case of WebCite® than for Google, as Google uses a “shotgun” approach, whereas WebCite® archives selectively only material that is relevant for scholarly work. Fair use is therefore justifiable based on the fair-use principles of purpose (caching constitutes transformative and socially valuable use for the purposes of archiving, in the case of WebCite® also specifically for academic research), the nature of the cached material (previously made available for free on the Internet, in the case of WebCite® also mainly scholarly material), amount and substantiality (in the case of WebCite® only cited webpages, rarely entire websites), and effect of the use on the potential market for or value of the copyrighted work (in the case of Google it was ruled that there is no economic effect, the same is true for WebCite®).
  4. Dec 2022
  5. Nov 2022
    1. Preserving web content never really left my mind ever since taking screenshots of old sites and putting them in my personal museum. The Internet Archive’s Wayback Machine is a wonderful tool that currently stores 748 billion webpage snapshots over time, including dozens of my own webdesign attempts, dating back to 2001. But that data is not in our hands. Should it? It should. Ruben says: archive it if you care about it: The only way to be sure you can read, listen to, or watch stuff you care about is to archive it. Read a tutorial about yt-dlp for videos. Download webcomics. Archive podcast episodes.

      Should people have their own webarchive? A long list of pro's and cons comes to mind. For several purposes a 3rd party archive is key, for others having things locally is good enough. For other situations having a off-site location is of interest. Is this less a question of webarchiving and more a question of how wide the scope should be of one's own 3-2-1 back-up choices? I find myself more frequently thinking about the processes at e.g. the National Archive in The Hague, where a lot comes down to knowing what you will not keep.

  6. Mar 2022
  7. Feb 2022
  8. Jan 2022
    1. The databases include multiple copies of some titles. But they will never provide all the copies of, say, “The Wealth of Nations” and the early responses it provoked.

      The exact same could be said of the early web which hasn't been evenly archived or easily searchable, so responses to early blog articles may not be easily found amidst a mountain of noise.

  9. Dec 2021
  10. Nov 2021
  11. Oct 2021
  12. Aug 2021
  13. Jul 2021
    1. It's great to enhance the Internet Archive, but you can bet I'm keeping my local copy too.

      Like the parent comment by derefr, my actual, non-hypothetical practice is saving to the Wayback Machine. Right now I'm probably saving things at a rate of half a dozen a day. For those who are paranoid and/or need offline availability, there's Zotero https://www.zotero.org. Zotero uses Gildas's SingleFile for taking snapshots of web pages, not PDF. As it turns out, Zotero is pretty useful for stowing and tracking any PDFs that you need to file away, too, for documents that are originally produced in that format. But there's no need to (clumsily) shoehorn webpages into that paradigm.

      If you do the print-to-PDF workflow outlined earlier in the thread, you'll realize it doesn't scale well, requiring too much manual intervention and discipline (including taking care to make sure it's filed correctly; hopefully you remember the ad hoc system you thought up last time you saved something), that it's destructive, and that it ultimately gives you an opaque blob. SingleFile-powered Zotero mostly solves all of this, and it does it in a way that's accessible in one or two clicks, depending on your setup. If you ever actually need a PDF, you can of course go back to your saved copy and produce a PDF on-demand, but it doesn't follow that you should archive the original source material in that format.

      My only reservation is that there is no inverse to the SingleFile mangling function, AFAIK. For archival reasons, it would be nice to be able to perfectly reconstruct the original, pre-mangled resources, perhaps by storing some metadata in the file that details the exact transformations that are applied.

    1. Ebooks don’t have those limitations, both because of how readily new editions can be created and how simple it is to push “updates” to existing editions after the fact. Consider the experience of Philip Howard, who sat down to read a printed edition of War and Peace in 2010. Halfway through reading the brick-size tome, he purchased a 99-cent electronic edition for his Nook e-reader:As I was reading, I came across this sentence: “It was as if a light had been Nookd in a carved and painted lantern …” Thinking this was simply a glitch in the software, I ignored the intrusive word and continued reading. Some pages later I encountered the rogue word again. With my third encounter I decided to retrieve my hard cover book and find the original (well, the translated) text. For the sentence above I discovered this genuine translation: “It was as if a light had been kindled in a carved and painted lantern …”A search of this Nook version of the book confirmed it: Every instance of the word kindle had been replaced by nook, in perhaps an attempt to alter a previously made Kindle version of the book for Nook use. Here are some screenshots I took at the time:It is only a matter of time before the retroactive malleability of these forms of publishing becomes a new area of pressure and regulation for content censorship. If a book contains a passage that someone believes to be defamatory, the aggrieved person can sue over it—and receive monetary damages if they’re right. Rarely is the book’s existence itself called into question, if only because of the difficulty of putting the cat back into the bag after publishing.

      This story of find and replace has chilling future potential. What if a dictatorial government doesn't like your content. It can be all to easy to remove the digital versions and replace them whole hog for "approved" ones.

      Where does democracy live in such a world? Consider similar instances when the Trump administration forced the disappearance of government websites and data.

  14. Apr 2021
  15. Mar 2021
  16. Jan 2021
    1. Twitter threads gave illness a name and a face, grounding the dread in particular bodies and disparate — if often overlapping — experiences. They placed these experiences in history, creating an archive of disease, fear, rage, and hope that will persist even as these feelings — and some of these people — have passed.

      Archives are only worth their weight in water if interested parties can find what they're looking for. When artifacts aren't gathered and curated into public-facing unities or collections, then history elides them until further notice. These threads are still floating in the sprawl of the Twitterverse, placed into history and drowned out by an ocean of pure, frantic noise. What this piece makes evident to me is the need for restoration: that they need to be resurfaced, preserved, made visible again.

  17. Nov 2020
  18. Aug 2020
  19. Jun 2020
  20. Apr 2020
    1. However, as stated by Pourret [18], a majority of the journals in geochemistry also have a green colour according to the SHERPA/RoMEO grading system, indicating that preprint (and the peer-reviewed postprint version) articles submitted to these journals can be freely shared on a preprint server, without compromising authors’ abilities to publish in parallel in those journals. Moreover, Pourret et al. [17] highlighted that the majority of journals in geochemistry allow authors to share preprints of their articles (47/56; 84%).
      • Bahwa sebagian besar jurnal di bidang geokimia, membolehkan pengarsipan modus hijau (Green OA), atau pengarsipan dokumen riset, data, makalah versi preprint di repositori nirlaba (misal repositori kampus).

      • Di tahun 2020, fakta ini masih belum banyak diketahui oleh para dosen/peneliti. Mereka cenderung menerima untuk dikendalikan oleh jurnal dalam proses publikasi, tanpa keinginan berargumentasi untuk mempertahankan hak miliknya terhadap makalah (to retain copyrights).

  21. Dec 2019
  22. Nov 2019
  23. Aug 2018
  24. Jul 2018
  25. Sep 2017
  26. www.softwareheritage.org www.softwareheritage.org
  27. Jul 2017
  28. Apr 2017
    1. What technology does the archive use? The archiving system fetches links using an enhanced version of wget, with a little extra intelligence about fetching dependencies. Every crawled page gets stored in a single directory, and the links rewritten to point to the local copy.

      Simple explanation

  29. Sep 2014
    1. The cacophony of the crowd erases the past and affirms the present. It started with search and now its accelerated with the now web. I dont know where it leads but I almost want a remember button — like the like or favorite. Something that registers something as a memory — as an salient fact that I for one can draw out of the stream at a later time.