“Using Access Data for Paper Recommendations”

Here is a pair of papers that I’d like a chance to digest at some point. The first is “Recommending Related Papers Based on Digital Library Access Records” by Stefan Pohl, Filip Radlinski, and Thorsten Joachims. According to the notes on the paper, it is to appear in proceedings of JCDL’07. The abstract:

An important goal for digital libraries is to enable researchers to more easily explore related work. While citation data is often used as an indicator of relatedness, in this paper we demonstrate that digital access records (e.g. http-server logs) can be used as indicators as well. In particular, we show that measures based on co-access provide better coverage than co-citation, that they are available much sooner, and that they are more accurate for recent papers.

Disseminators As the Core of an Object Repository

I’ve been working to get JBoss Seam tied into Fedora, and along the way thought it would be wise to stop and document a core concept of this integration: the centrality of Fedora Disseminators in the the design of the Ohio Digital Resource Commons. Although there is nothing specific to JBoss Seam (a Java Enterprise Edition application framework) in these concepts, making an object “render itself” does make the Seam-based interface application easier to code and understand. A disseminator-centric architecture also allows us to put our code investment where it matters the most — in the repository framework — and exploit that investment in many places. So what does it mean to have a disseminator-centric architecture and have objects “render themselves”?

Brewster Kahle on the Economics and Feasibility of Mass Book Digitization

Brewster Kahle, Director of the Internet Archive, was interviewed this week in a Chronicle of Higher Education podcast on the Economics and Feasibility of Mass Book Digitization. Among the many interesting points in the interview was that one of the biggest challenges is to such a mass digitization effort to believe that to digitize massive numbers of books and make them available is actually possible. The Open Content Alliance has put together a suite of technology that brings down the cost for a color scan with OCR to 10 cents per page or about $30 per book. He then goes on to perform this calculation: the library system in the U.S. is a 12B industry. One million books digitized a year is $30M, or “a little less than .3 percent of one year’s budget of the United States library system would build a 1 million book library that would be available to anyone for free.” He also covers copyright concerns including the more liberal copyright laws in countries such as China.

What Librarians Could Learn From Journalists

On Tuesday, the Poynter Institute (a school for journalists, future journalists, and teachers of journalists) released results of their EyeTrack07 study — an examination of reader behavior in the print and online mediums. An article on their website goes into more detail about the initial data but what caught my eye as of interest to the library community is the headline (“The Myth of Short Attention Spans”) and this conclusion “The reading-deep phenomenon [thoroughly reading a selected story] is even stronger online than in print.” Their website site has a video which explains the process and some of the initial results.

Survey on Digital Preservation Systems is Seeking Respondents

There are just a few days left to respond to the “International Digital Preservation Systems Survey” being run by Karim Boughida and Sally Hubbard from the Getty Research Institute. From the survey description:

This survey is intended to provide an overview of digital preservation system (DPS) implementation. DPS is defined here as an assembly of computer hardware, software and policies equivalent to a TDR (trusted digital repository) “whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now, and in the future”1.

“Draft Principles for Digitized Content” from the Digitization Policy Task Force of ALA’s Office for Information Technology Policy

A note on the LITA-L mailing list from M. Claire Stewart (a member of the American Library Association Office of Information Technology Policy Task Force on Digitization Policy) announces the availability of the Draft Principles for Digitized Content in the form of a series of blog postings on the ALA website. Stewart’s message notes:

Update to ‘Embedded Web Video in a Standards-Compliant, Accessible, and Successful Way’

With the release of Microsoft’s Windows Media Player version 11, the Microsoft Media Server (MMS) protocol is officially no longer supported. (Except, of course, for the confusing/amusing footnote on that page that says ‘mms://’ URIs are “highly recommended” as a protocol rollover URL — only Microsoft can at the same time make something deprecated and highly recommended.) As Ryan Eby noted earlier this year, those generating ASX files for Windows Media Player need to adjust their scripts.

The Intersection of the Web Architecture with Scholarly Communication

Two previous posts on dltj.org have described the OAI Object Reuse and Exchange (ORE) project and the theory behind what has become known as the ‘Web Architecture’. These two areas meet up now in this post which describes the issues surrounding the raw Web Architecture as applied to a web of scholarly communication and a basic outline of what the ORE project hopes to accomplish.

Problems With the Web Architecture

Working With the Web Architecture

As you may have noticed, the web has evolved a set of common principles that are a mix of ratified standards and ad hoc practices. The notion of a Web Architecture was codified in a W3C technical report called “Architecture of the World Wide Web” http://www.w3.org/TR/2004/REC-webarch-20041215/ or simply ‘Web Architecture.’ Those projects and protocols that align with the ‘Web Architecture’ are more likely to be picked up and used than those that do not. As a result, the OAI Object Reuse and Exchange (ORE) project seeks to provide an infrastructure for web-based information systems that exploit and enhance the Web Architecture, and therefore overlay cleanly on the existing web.