"Using Access Data for Paper Recommendations"
Here is a pair of papers that I'd like a chance to digest at some point. The first is "Recommending Related Papers Based on Digital Library Access Records" by Stefan Pohl, Filip Radlinski, and Thorsten Joachims. According to the notes on the paper, it is to appear in proceedings of JCDL'07. The abstract:
An important goal for digital libraries is to enable researchers to more easily explore related work. While citation data is often used as an indicator of relatedness, in this paper we demonstrate that digital access records (e.g. http-server logs) can be used as indicators as well. In particular, we show that measures based on co-access provide better coverage than co-citation, that they are available much sooner, and that they are more accurate for recent papers.
This is a two-page summary, with the meatier version being "Using Access Data for Paper Recommendations on ArXiv.org", a masters thesis written by Stefan Pohl. This one is about 70 pages. The abstract is:
This thesis investigates in the use of access log data as a source of information for identifying related scientific papers. This is done for arXiv.org, the authority for publication of e-prints in several fields of physics.
Compared to citation information, access logs have the advantage of being immediately available, without manual or automatic extraction of the citation graph. Because of that, a main focus is on the question, how far user behavior can serve as a replacement for explicit meta-data, which potentially might be expensive or completely unavailable. Therefore, we compare access, content, and citation-based measures of relatedness on different recommendation tasks. As a final result, an online recommendation system has been built that can help scientists to find further relevant literature, without having to search for them actively.
Stefan's work would seem to bring the old adage "Do as I Do (access), Not as I Say (cite)" to bear on information retrieval. More fluid and dynamic than using PageRank — or web citation — "co-access" would seem to allow the wisdom of the crowds to become more apparent.