“Archiving and Preserving the Web” from the Internet Archive perspective

In case you were wondering what some of the back-channel discussion on the #code4lib IRC channel was on Tuesday, Ed Summers and I were watching an EDUCAUSE webcast on the Internet Archive‘s Archive-It project. Archive-It is a subscription service that allows institutions to crawl and search their own web archive through a web application. On Tuesday, the EDUCAUSE Live! webcast included the project manager and Senior Crawl Engineer (what a title!) from the Internet Archive to talk about not only the server, but the open source web crawler and ARC access tools (copied from the project home page):

Aggregation of Risk in Pursuit of Disruptive Technologies

An open letter to Clayton Christensen as well as colleagues and practitioners of the theories of disruptive innovation:

State agencies in Ohio responsible for primary, secondary and higher education are coming together to share the risk of exploring disruptive technologies and to shepherd the adoption of successful technologies into the mainstream. We call this group “Collective Action”, and the model of disruptive innovations is a guiding element. On behalf of the Collective Action group, I am seeking wisdom and thoughts of potential pitfalls of this approach of aggregating risk capital in a loosely-coupled organization.

Germany is at it, too

This just in — at least to my INBOX — Germany is working on a unified repository as well. Called the eSciDoc project, it closely mirrors what the DRC is going to be:

The aim of the Max-Planck Society’s sInfo program is to significantly improve the effectiveness of its
scientists and institutes by systematically exploiting the new technical opportunities (Internet,
digitalization, communication, open access). It addresses many facets of scientific work: information
retrieval, processing and evaluating information, distributing and storing information, scientific work in
the laboratory and at the desk, scientific work performed by individuals and in groups.

Introducing Geographic Scope to Physical Collections

So I don’t know how this one slipped past me: you can link directly into Open WorldCat via an ISBN/ISSN REST-based URL.

Now any Web site can create “Find in a Library” links for specific titles. The syntax for link URLs is straightforward and keyed on common numeric identifiers.

For instance, a URL that gets directly to the ARL SPEC Kit on Patron Privacy that I wrote a number of years ago is:


Folks, it doesn’t get much easier than that. Compared to junk our OPACs are creating, that URL is absolutely gorgeous. (And it is a real URL — it doesn’t pretend to be a real URL by redirecting you to some other, perhaps messier, place. Nice work, OCLC!)