- Mar 2025
-
stackoverflow.com stackoverflow.com
-
Page not found This question was removed from Stack Overflow for reasons of moderation. Please refer to the help center for possible explanations why a question might be removed.
A link leads to this page. I want to see what was here before.
- This is too generic an error message! Why was it removed?
- I assert that it would be better to keep it around than to delete and prevent people from enjoying the content that was found there. That is very heavy handed and unfair to those who still want the content, to simply delete the question, answers, and everyone comments and contributions to it.
Here is a snapshot, but crucially, the "next page" and "show more comments" links are broken: https://web.archive.org/web/20101008061929/http://stackoverflow.com/questions/164432/what-real-life-bad-habits-has-programming-given-you/164556
-
- Feb 2025
-
iipc.github.io iipc.github.ioWelcome1
Tags
Annotators
URL
-
-
github.com github.com
-
superuser.com superuser.com
-
but that simply just launches a headless browser and downloads the requested URL. This approach is useless in my case since I want to save the Reddit page with the modifications that I've personally and manually made (i.e., with the desired comment threads manually expanded).
-
-
-
webrecorder.net webrecorder.net
-
community.notepad-plus-plus.org community.notepad-plus-plus.org
-
Over the years I found the MHTML format faster to acquire (there’s no messing around with the pdf output formatting options (header, pagination, etc)) and it’s much faster to reload on later viewings .
-
-
superuser.com superuser.com
-
Browser add-on: Save Page WE Firefox / Chrome A firefox/chrome add-on which is lighter than the web-recorder mentioned below, and which worked well for a subset of use cases. Configurable, flexible, and can optionally scroll pages in order to retrieve lazy-loaded content. It inlines images, scripts, fonts, etc as data-URLs producing a single big standalone HTML file.
-
It's not possible to do this with many websites these days. And for sites that seem like it's possible, it would still require some Javascript experience for reverse-engineering and "fixing" the scripts that are saved to your computer. There is no single method that works for all websites, you have to work through each unique problem for every site you try to save.
-
-
en.wikipedia.org en.wikipedia.org
Tags
Annotators
URL
-
-
netpreserve.org netpreserve.org
-
Web archiving is the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use.
Tags
Annotators
URL
-
-
www.webcitation.org www.webcitation.orgWebCite1
-
Authors increasingly cite webpages and other digital objects on the Internet, which can "disappear" overnight. In one study published in the journal Science, 13% of Internet references in scholarly articles were inactive after only 27 months. Another problem is that cited webpages may change, so that readers see something different than what the citing author saw.
-
-
www.webcitation.org www.webcitation.org
-
A U.S. court has recently (Jan 19th, 2006) ruled that caching does not constitute a copyright violation, because of fair use and an implied license (Field vs Google, US District Court, District of Nevada, CV-S-04-0413-RCJ-LRL, see also news article on Government Technology). Implied license refers to the industry standards mentioned above: If the copyright holder does not use any no-archive tags and robot exclusion standards to prevent caching, WebCite® can (as Google does) assume that a license to archive has been granted. Fair use is even more obvious in the case of WebCite® than for Google, as Google uses a “shotgun” approach, whereas WebCite® archives selectively only material that is relevant for scholarly work. Fair use is therefore justifiable based on the fair-use principles of purpose (caching constitutes transformative and socially valuable use for the purposes of archiving, in the case of WebCite® also specifically for academic research), the nature of the cached material (previously made available for free on the Internet, in the case of WebCite® also mainly scholarly material), amount and substantiality (in the case of WebCite® only cited webpages, rarely entire websites), and effect of the use on the potential market for or value of the copyrighted work (in the case of Google it was ruled that there is no economic effect, the same is true for WebCite®).
-
Caching and archiving webpages is widely done (e.g. by Google, Internet Archive etc.), and is not considered a copyright infringement, as long as the copyright owner has the ability to remove the archived material and to opt out.
-
Services such as the Internet Archive (Wayback Machine) or Google archive Internet documents in a shotgun-approach by a crawler, not focussing on academic references
-
- Aug 2021
-
www.usenix.org www.usenix.org
-
A subscription to a paper journal provides the library with an archival copy of the content. Subscribing to a Web journal rents access to the publisher's copy.
-
- Jul 2021
-
news.ycombinator.com news.ycombinator.com
-
It's great to enhance the Internet Archive, but you can bet I'm keeping my local copy too.
Like the parent comment by derefr, my actual, non-hypothetical practice is saving to the Wayback Machine. Right now I'm probably saving things at a rate of half a dozen a day. For those who are paranoid and/or need offline availability, there's Zotero https://www.zotero.org. Zotero uses Gildas's SingleFile for taking snapshots of web pages, not PDF. As it turns out, Zotero is pretty useful for stowing and tracking any PDFs that you need to file away, too, for documents that are originally produced in that format. But there's no need to (clumsily) shoehorn webpages into that paradigm.
If you do the print-to-PDF workflow outlined earlier in the thread, you'll realize it doesn't scale well, requiring too much manual intervention and discipline (including taking care to make sure it's filed correctly; hopefully you remember the ad hoc system you thought up last time you saved something), that it's destructive, and that it ultimately gives you an opaque blob. SingleFile-powered Zotero mostly solves all of this, and it does it in a way that's accessible in one or two clicks, depending on your setup. If you ever actually need a PDF, you can of course go back to your saved copy and produce a PDF on-demand, but it doesn't follow that you should archive the original source material in that format.
My only reservation is that there is no inverse to the SingleFile mangling function, AFAIK. For archival reasons, it would be nice to be able to perfectly reconstruct the original, pre-mangled resources, perhaps by storing some metadata in the file that details the exact transformations that are applied.
Tags
Annotators
URL
-
- Jan 2021
-
reallifemag.com reallifemag.com
-
Twitter threads gave illness a name and a face, grounding the dread in particular bodies and disparate — if often overlapping — experiences. They placed these experiences in history, creating an archive of disease, fear, rage, and hope that will persist even as these feelings — and some of these people — have passed.
Archives are only worth their weight in water if interested parties can find what they're looking for. When artifacts aren't gathered and curated into public-facing unities or collections, then history elides them until further notice. These threads are still floating in the sprawl of the Twitterverse, placed into history and drowned out by an ocean of pure, frantic noise. What this piece makes evident to me is the need for restoration: that they need to be resurfaced, preserved, made visible again.
-
- Dec 2019
-
wellcomeopenresearch.org wellcomeopenresearch.org
-
1
Given that this document cites a number of non-persistent web resources, I have archived a copy of https://wellcomeopenresearch.org/articles/4-170/v1 at http://web.archive.org/web/20191224000829/https://wellcomeopenresearch.org/articles/4-170/v1 using the "Save outlinks" mode.
Probably a good idea to do this routinely for all articles in the journal.
-
- Jul 2018
-
webcitation.org webcitation.orgWebCite1
-
Archiving service with an emphasis on scholarly publishing.
-
-
ageofshitlords.com ageofshitlords.com
-
Archiving pages that block it.
-
- Sep 2014
-
www.borthwick.com www.borthwick.com
-
The cacophony of the crowd erases the past and affirms the present. It started with search and now its accelerated with the now web. I dont know where it leads but I almost want a remember button — like the like or favorite. Something that registers something as a memory — as an salient fact that I for one can draw out of the stream at a later time.
-