Earlier this week, Aaron Swartz of the Internet Archive announced the demonstration website of the Open Library project, a new kind of book catalog that brings together traditional publisher and library bibliographic data in an interface with the user-contributed paradigm of Wikipedia. Okay, I’ll pause for a moment while you parse that last sentence. Think you got it? Read — and watch — further.
Open Library has been mentioned a bit in the blogs this week, but not to the extent I thought was worthy of the magnitude of the project. So I recorded a screencast introduction (in Flash Video format below followed by a rough transcript) that looks at not only the browsing side of the system but also the record editing and record creation aspects of Open Library. As I say at the end of the recording, Open Library is one of those mind-bending, assumption-shattering projects that, at least for me, is challenging my thoughts about what library service could be and should be. Congratulations to the team at the Internet Archive, and I’m looking forward to future enhancements and directions for the project.
Rough transcript of the screen cast is below.
Hello, and welcome to this screencast overview of the Open Library project. Open Library is an effort by the Internet Archive to create a comprehensive catalog of every book. As the project’s “about” page says, “Not every book on sale, or every important book, or even every book in English, but simply every book.” The about page goes on to describe the characteristics of Open Library project — that it is a project enabled by Internet technology because no physical space could hold it and that it aims to pull together records from publishers and libraries. It is also a project in the same vein as Wikipedia, meaning that any user can create and edit the records in the system.
In this overview, I’ll lead you through searching and browsing the Open Library’s demonstration website from the perspective of any modern library catalog interface. Then I’ll show you where it deviates from traditional library catalogs by exposing the underlying wiki nature of the database; we’ll examine the changes that users have made and we’ll even make a change ourselves. And finally I’ll show the process of creating entirely new records in the system. So let’s get started.
We’re looking at the home page of the Open Library project demonstration site. In the middle is a search box with a suggested search — “tom sawyer adventure”. That is a good suggestion so we’ll click on Go. Open Library returns a classic, relevance ranked list of matching records with some book covers along the left side and a faceted list of refinements along the right. So right away you can see that there are some authority control problems here in the author names — Twain comma Mark, Mark comma Twain, and Twain comma Mark with birth and death dates — and here in the language field. But I have high hopes that the developer team will find some intriguing ways to address these problems.
Back over here in the results area we have the various editions of Samuel Clemen’s “The Adventures of Tom Sawyer” — let’s pick download from the Internet Archive link, a “Scan Sponsor” field here and a “View this book” graphic. This is one of the items scanned by the Open Content Alliance and made available by the Internet Archive through the Open Library project. A very nice interface for paging through the book. So one could imagine that the Open Library could become the primary vehicle by which Open Content Alliance materials are made available to the public.— there. We have the publisher, publication date and place, language, and a summary or review of sorts at the bottom. We also see signs of the availability of full text — over here in the options box there is a
So let’s go back here to the metadata page. Remember in the introduction that I said that the data was malleable in a wiki-like fashion. The Open Library developers created a system that allows for user-contributed updates (a la Wikipedia) to fielded data (like your classic bibliographic record). The two hints that the record is modifiable are this big edit button in the middle of the metadata and this more subtilenear the top of the page. Let’s start with the history link to see what has been done to this record.
This page should look familiar to those who have worked with wikis before. It shows a listing of edits that were made to this record from most recent to the very first edit, who made the change (identified by IP addresses in this case because the people making the changes were not logged to an account at the time), an editor-supplied comment about what was done, and when the change was made. We can go back in time and see the page at a particular version through the links under the “When” column, or we can use the compare function to see the difference between two version. In the case of, we see that the editor added “Canada” as the place of publication. On this page you start to see the fielded nature of this wiki structure, but the best place to see it is look at the record edit screen itself.
These are all full-text fields on this page with no controlled vocabulary. You’ll note the absence of any MARC field names here, but as you scroll through you’ll see the evidence of MARC and AACR2 in the field labels. Down at the bottom is an edit summary to describe the changes made to the record, then save, preview and delete version buttons — all classic wiki functions.
Now, I’d like to show the full record editing process, but since I don’t have this Mark Twain book in hand, I’m going to bring up another record that I created yesterday — ““. Before showing the editing process, let’s linger here a moment at the “options” box along the right side. Since this is a more modern book (as opposed to the Tom Sawyer book we saw first), there are additional options here for purchasing the book through these various vendors or borrowing the book through a very nice link into Open Worldcat and two web-based book trading sites.
But back to the metadata. There is one error and one omission in this record — perhaps this is a subtile demonstration of problems that creep in with user-generated content. First, the error, is that there is an extra digit in the ISBN-10 field, which is a big problem because the links in the options box use the ISBN as a linking field and at the time of this recording they don’t work. They will work in a moment, though. The second problem is that I forgot to put in the publication date. But hey, no problem, all I need to do is “Edit” this record.
So we are back to the edit screen, and I’m going to scroll down and fix the ISBN-10 field like so, then scroll down a little further and add the publication date. Then I’ll scroll all the way to the bottom and type in an edit summary — “Fixed the ISBN and added a publication date” — and hit save. We’re now back at the metadata display screen and the link to Open Worldcat now works. So, as an aside, one wonders what the folks in Dublin, Ohio, think about this. It is competition on the one hand since Worldcat is also aiming to be the most comprehensive catalog of books in the world. On the other hand, perhaps there is room for cooperation by somewhat getting vetted changes to Open Library records into the OCLC union catalog. Who knows?
Creating a New Record
Alright, back to current reality. Let’s add a record to Open Library, and in this case I’m going to use an ARL SPEC Kit that I wrote a number of years ago called “Library Patron Privacy”. First let’s run a search in Open Library to see if it is there, and no, it isn’t. The only way I’ve figured out how to enter a new item is to go to the URL where the page would be located and get the classic wiki “This page does not exist. Create it?” message.
One of the quirks I found in the system is that I have to create author wiki pages before book wiki pages — otherwise I’ll get a Python error message on the screen. I’ve reported this to the Open Library developers, but in the meantime just know authors need to be created before their books. Which is to say that authors have wiki pages in Open Library in addition to books. The structure of URLs to Open Library author pages is the letter “a” followed by a slash followed by the author’s last, first and middle names separated by underscore characters. So I’ll go to the URL of that form, then click on the “Create it” link.
Now here is one of the tricky parts of the existing interface. The page type starts as “type/page”, and as you can see it doesn’t have any of the fielded elements that we saw in previous examples. What you have do do is change the page type to “type/author” and then you get the fielded HTML form. So I’m going to go through here and fill in some of the parts. Then go down to the edit summary field and write a summary of this change, then click save. Now that, let’s create the record for the book.
You’ve seen the structure of the URLs to book pages before — a “b” followed by a slash followed by the book title with spaces replaced by underscore characters. I’ll put that in the URL field and get the default page type. This needs to be changed to “type/edition” in order to get the bibliographic record fields. There. Now I’ll go through here and enter the data. When we get down to the author field we enter it in the same format that we used to create it — an “a” followed by a slash followed by the name with spaces replaced by underscores.
So we’ll just finish up here and come down to the edit summary field, put something in here, and hit save.is now in the system, and you can see the public display here along with the links on the right because I entered an ISBN. I haven’t quite figured out how to get a cover image into the system yet — I expect there is a file upload interface somewhere, but I haven’t found it.
So that’s all there is, and I don’t say that in a way to denigrate the work that has been done by the development team so far. As the URL and site banner indicate, it is a demonstration system — and a compelling demonstration it is. All sorts of questions immediately come to mind, of course — will there be a controlled vocabulary or authority control built into the system, can data be exported out of records — and, for that matter, can end-users bulk import data into the system, are there Web2.0 niceties like tagging and RSS feeds in the works, and so forth.
Even with all of those questions, Open Library is one of those mind-bending, assumption-shattering projects that, at least for me, is challenging my thoughts about what library service could be and should be. Congratulations to the team at the Internet Archive, and I’m looking forward to future enhancements and directions for the project.
The text was modified to update a link from http://blogs.talis.com/panlibus/archives/2007/07/license_for_ope.php to http://blogs.capita-libraries.co.uk/panlibus/2007/07/17/license_for_ope/ on August 27th, 2012.
The text was modified to update a link from http://www.libraryjournal.com/blog/1090000309/post/1800011980.html to http://www.thedigitalshift.com/2007/07/roy-tennant-digital-libraries/the-peoples-catalog/ on November 8th, 2012.(This post was updated on 05-Jun-2014.)