This week is a mostly Google edition of DLTJ Thursday Threads. Below is a high-level overview of Google’s Book Search algorithm, how Google is helping web servers improve the speed at which content loads, and how Google’s internet traffic is growing as a percentage of all internet traffic. But first, there is an uprising on the RDA test records in the WorldCat database.
I’m starting something new on DLTJ: Thursday Threads — summaries and pointers of stories, services, and other stuff that I found interesting in the previous seven days. This is culled from entries that I post to my FriendFeed lifestream through various channels (Google Reader shared items, citations shared in Zotero, Twitter posts, etc.), but since I know not everyone is using those services, it might be useful to post the best-of-the-selected here once a week. Why Thursday? Somewhere long ago I read that Thursday at 11am is the best time to put a post on a blog because Thursday lunch through Friday are the most active time for readers. I have no idea whether that is true or not, but lacking any evidence to the contrary, Thursday morning will do fine. (Obviously I’m a little late on this first one, but I’ll try to do better next time. Or not — maybe this will be a one-off weekly thing.)
Late, late in the day last Friday, the principle parties in the Google Book Search case submitted a revised settlement agreement agreement to the court. This post takes a look at the changes to the settlement from a library perspective. To keep this manageable, I’m not including discussion of library-oriented elements that haven’t changed; to read more about that I recommend the ALA/ACRL/ARL paper and/or previous posts on DLTJ. I’m also not including discussion on some aspects of the legal impact of the settlement (the appropriateness of setting policy via class action, the antitrust considerations of Google’s sole license to unclaimed works, etc.); for that I encourage browsing the writings of James Grimmelmann (any posting of his prefaced with “GBS” in the title). I will link off to some of the library-oriented discussion pieces of Grimmelmann and others in this post. If you really want the in-depth view of the settlement and the surrounding discussion, visit The Public Index, a website devoted to chronicling and commenting on aspects of the settlement.
EBSCOhost Connect was announced in the spring of 2006 as near as I can recall. (I can’t find the press release about it on the EBSCO website. As close as I can come to a date is from.) After three years, I’ve finally seen an EBSCOhost Connect in Google web search results. This screencast and accompanying transcript (below) show what I’ve found.
It has been a wild few weeks in search engines — or search-engine-like services. We’ve seen the introduction of no fewer than three high-profile tools … Wolfram|Alpha, Microsoft Bing, and … each with their own strengths and needing their own techniques — or, at least, their own distinct frame of reference — in order to maximize their usefulness. This post describes these three services, what their generally good for, and how to use them. We’ll also do a couple of sample searches to show how each is useful in its own way.
On Tuesday, the University of Michigan and Google executed an amendment to the original agreement that started Google’s efforts to create a collection of scanned books. The amendment was publicized in a press release by the University of Michigan and described in a page that summarized the changes. That summary page is a the first place to start if you want to know more about the changes reflected in the amendment, but in comparing the amendment to the original agreement, I found some other interesting tidbits. The amendment amounts to an endorsement of the Settlement Agreement by the University of Michigan and, as noted by the New York Times, it also gives Google an opportunity to “rebut some criticism” (or at least clarify and expand on some of the library-related terms) of the Settlement Agreement.
Via a post and an interview on the O’Reilly Radar blog, Google announced limited support for parsing RDFa statements and microformat properties in web page HTML coding and using those statements to enhance the relevance of search results as so-called “rich snippets”. In looking at the example review markup outlined in the O’Reilly post, though, I was struck by some unusual and unexpected markup. Specifically, that the namespace was this
http://rdf.data-vocabulary.org/ thing that I had never seen before, and the “rating” property didn’t have any corresponding range that would make that numeric value useful in a computational sense.
Today was to be the deadline for objecting to, opting out of, and/or filing briefs with the court on the Google Book Search Settlement. That was the plan, at least, when the preliminary approval statement from the court was issued last year. That deadline changed, and that is part of a recent flurry of activity surrounding the proposed Settlement. This post provides a summary of recent news and an index of documents that you might want to read for more information.
The American Library Association (through the Association’s Washington Office and the Association of College and Research Libraries Division) and the Association of Research Libraries filed a brief [PDF] with the court in support of the Google Book Search Settlement while asking the judge to “exercise vigorous oversight” over details the settlement. In the 22-page amicus1 brief, the library associations say they do not oppose the settlement, but they do request that the courts provide strict oversight of the activities of Google and the Book Rights Registry. From page 2 of the brief:
The Settlement, therefore, will likely have a significant and lasting impact on libraries and the public, including authors and publishers. But in the absence of competition for the services enabled by the Settlement, this impact may not be entirely positive. The Settlement could compromise fundamental library values such as equity of access to information, patron privacy, and intellectual freedom. In order to mitigate the possible negative effects the Settlement may have on libraries and the public at large, the Library Associations request that this Court vigorously exercise its jurisdiction over the interpretation and implementation of the Settlement.
The brief then describes “concerns with the Settlement, and how the Court’s oversight can ameliorate those concerns.”
We are starting to see objections to the Google Book Search Settlement this month in advance of the May 5th deadline set up by the court. The first comes from the consumer advocacy group Consumer Watchdog (found by way of the American Libraries news feed). They have submitted a letter to the U.S. Justice Department asking the antitrust division to delay the settlement until the “‘most favored nation’ clause favoring Google is removed and the deal’s ‘orphan works’ provision is extended to cover all who might digitize books, not only Google.” The letter in PDF is available on the Consumer Watchdog website. The objections revolve around the provision that require the Books Rights Registry to give Google the same terms as anyone else who enters into agreements with the Registry (noting that more favorable terms might be required by a new party in order to compete with Google) as well as the fact that the copyright infringement protection for digitizing orphan works only extends to Google.