This week is a mostly Google edition of DLTJ Thursday Threads. Below is a high-level overview of Google’s Book Search algorithm, how Google is helping web servers improve the speed at which content loads, and how Google’s internet traffic is growing as a percentage of all internet traffic. But first, there is an uprising on the RDA test records in the WorldCat database.
If you find these interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my FriendFeed stream (or subscribe to its feed in your feed reader). Comments and tips, as always, are welcome.
Memorandum Against RDA Test
We have found ourselves in an unenviable position of opposing the work that supposedly has been authorized by agencies representing our interests. I might compare it to a military coup d’état. I mean here the RDA “test” and its implications on the cataloging world at large. After extensive discussions on the PCC, OCLC cataloging e-mail lists with opinions from the British Library, Australia and North America, we can safely conclude that there is a broad consensus against principles of RDA and the way RDA “test” has been imposed on the cataloging world.
The original post on the OCLC-CAT list by Wojciech Siemaszkiewicz of the New York Public Library is behind a must-subscribe-and-authenticate form, but it has been copied out copied to an open website by Becky Yoose (thanks, Becky!). The subsequent discussion resulted in a Petition against the RDA Test by Jacqueline Byrd at Indiana University. The link to the position has been posted to the open AUTOCAT list, and there has been subsequent discussion there. (Hat tip to Kirsten Davis.)
Inside the Google Books Algorithm
Rich Results is the latest in a series of smaller front-end tweaks that have been matched by backend improvements. Now, the book search algorithm takes into account more than 100 “signals,” individual data categories that Google statistically integrates to rank your results. When you search for a book, Google Books doesn’t just look at word frequency or how closely your query matches the title of a book. They now take into account web search frequency, recent book sales, the number of libraries that hold the title, and how often an older book has been reprinted.
Alexis Madrigal article in TheAtlantic.com draws a comparison between the techniques and algorithms used for web search with those used for book materials. The need for relevant search results is the same, but books don’t have the same inter-page linking hints that drive the PageRank algorithm for web search. The use of anonymized circulation data in creating clustered bibliographic descriptions was mentioned at the ALA Midwinter ALCTS Forum on Mashups of Bibliographic Data, and apparently it is also used in the relevance ranking of Google Books search results. (Hat tip to Ron Murray.)
Google Releases mod_pagespeed
Google Sets Internet Traffic Record
Google now represents an average 6.4% of all Internet traffic around the world. This number grows even larger (to as much as 8-12%) if I include estimates of traffic offloaded by the increasingly common Google Global Cache (GGC) deployments and error in our data due to the extremely high degree of Google edge peering with consumer networks. Keep in mind that these numbers represent increased market share — Google is growing considerably faster than overall Internet volumes which are already increasing 40-45% each year.
Craig Labovitz of Arbor Networks notes that if Google were an internet service provider, it would now be “the second largest carrier on the planet.” Wow! That is a lot of data sloshing around on its own internal network!
The text was modified to update a link from http://dltj.org/wp-content/uploads/2010/11/wa.exe?A2=ind1011a&L=oclc-cat&D=0&F=P&T=0&X=5D7D8800A8770C99F0&P=2298 to http://listserv.oclc.org/scripts/wa.exe?A2=ind1011a&L=oclc-cat&D=0&F=P&T=0&X=5D7D8800A8770C99F0&P=2298 on August 22nd, 2012.(This post was updated on 22-Aug-2012.)