Thursday Threads: Refining Data, Ebook Costs, Open Bibliographic Data, Copyright Infringement

It has been a long week, so for many of you this edition of DLTJ Thursday Threads will actually be read on Friday. The spirit was willing, the topics were certainly out there in the past seven days, but the necessary distractions were numerous. Please enjoy this edition whenever you read it. As always, there is lots more on my FriendFeed aggregation page.

Google Refine 2.0, a power tool for data wranglers

Thursday Threads: RDA Revolt, Google Book Search Algorithm, Google Helps Improve Web Servers, Google’s Internet Traffic Hugeness

Receive DLTJ Thursday Threads by E-mail!

Delivered by FeedBurner

This week is a mostly Google edition of DLTJ Thursday Threads. Below is a high-level overview of Google’s Book Search algorithm, how Google is helping web servers improve the speed at which content loads, and how Google’s internet traffic is growing as a percentage of all internet traffic. But first, there is an uprising on the RDA test records in the WorldCat database.

Thursday Threads: Print-on-Demand, Video Changing the World, Puzzling Out Public Domain, and more

I’m starting something new on DLTJ: Thursday Threads — summaries and pointers of stories, services, and other stuff that I found interesting in the previous seven days. This is culled from entries that I post to my FriendFeed lifestream through various channels (Google Reader shared items, citations shared in Zotero, Twitter posts, etc.), but since I know not everyone is using those services, it might be useful to post the best-of-the-selected here once a week. Why Thursday? Somewhere long ago I read that Thursday at 11am is the best time to put a post on a blog because Thursday lunch through Friday are the most active time for readers. I have no idea whether that is true or not, but lacking any evidence to the contrary, Thursday morning will do fine. (Obviously I’m a little late on this first one, but I’ll try to do better next time. Or not — maybe this will be a one-off weekly thing.)

Revised Google Book Search Settlement from a Library Perspective

Late, late in the day last Friday, the principle parties in the Google Book Search case submitted a revised settlement agreement agreement to the court. This post takes a look at the changes to the settlement from a library perspective. To keep this manageable, I’m not including discussion of library-oriented elements that haven’t changed; to read more about that I recommend the ALA/ACRL/ARL paper and/or previous posts on DLTJ. I’m also not including discussion on some aspects of the legal impact of the settlement (the appropriateness of setting policy via class action, the antitrust considerations of Google’s sole license to unclaimed works, etc.); for that I encourage browsing the writings of James Grimmelmann (any posting of his prefaced with “GBS” in the title). I will link off to some of the library-oriented discussion pieces of Grimmelmann and others in this post. If you really want the in-depth view of the settlement and the surrounding discussion, visit The Public Index, a website devoted to chronicling and commenting on aspects of the settlement.

EBSCOhost Connection Records Found In-The-Wild

EBSCOhost Connect was announced in the spring of 2006 as near as I can recall. (I can’t find the press release about it on the EBSCO website. As close as I can come to a date is from an announcement at the Oregon School Library Information System.) After three years, I’ve finally seen an EBSCOhost Connect in Google web search results. This screencast and accompanying transcript (below) show what I’ve found.

Three New Search Services: Wolfram|Alpha, Microsoft Bing, Google Squared

It has been a wild few weeks in search engines — or search-engine-like services. We’ve seen the introduction of no fewer than three high-profile tools … Wolfram|Alpha, Microsoft Bing, and Google Squared … each with their own strengths and needing their own techniques — or, at least, their own distinct frame of reference — in order to maximize their usefulness. This post describes these three services, what their generally good for, and how to use them. We’ll also do a couple of sample searches to show how each is useful in its own way.

Interesting Bits in the Univ of Michigan Amendment to Google Book Search Agreement

On Tuesday, the University of Michigan and Google executed an amendment to the original agreement that started Google’s efforts to create a collection of scanned books. The amendment was publicized in a press release by the University of Michigan and described in a page that summarized the changes. That summary page is a the first place to start if you want to know more about the changes reflected in the amendment, but in comparing the amendment to the original agreement, I found some other interesting tidbits. The amendment amounts to an endorsement of the Settlement Agreement by the University of Michigan and, as noted by the New York Times, it also gives Google an opportunity to “rebut some criticism” (or at least clarify and expand on some of the library-related terms) of the Settlement Agreement.

Google Search Engine Adds Support for RDFa, Or Do They?

Via a post and an interview on the O’Reilly Radar blog, Google announced limited support for parsing RDFa statements and microformat properties in web page HTML coding and using those statements to enhance the relevance of search results as so-called “rich snippets”. In looking at the example review markup outlined in the O’Reilly post, though, I was struck by some unusual and unexpected markup. Specifically, that the namespace was this thing that I had never seen before, and the “rating” property didn’t have any corresponding range that would make that numeric value useful in a computational sense.

Summary of Recent Google Book Search Settlement Activities

Today was to be the deadline for objecting to, opting out of, and/or filing briefs with the court on the Google Book Search Settlement. That was the plan, at least, when the preliminary approval statement from the court was issued last year. That deadline changed, and that is part of a recent flurry of activity surrounding the proposed Settlement. This post provides a summary of recent news and an index of documents that you might want to read for more information.

Library Associations File Amicus Brief for Google Book Search Settlement

The American Library Association (through the Association’s Washington Office and the Association of College and Research Libraries Division) and the Association of Research Libraries filed a brief [PDF] with the court in support of the Google Book Search Settlement while asking the judge to “exercise vigorous oversight” over details the settlement. In the 22-page amicus1 brief, the library associations say they do not oppose the settlement, but they do request that the courts provide strict oversight of the activities of Google and the Book Rights Registry. From page 2 of the brief:

The Settlement, therefore, will likely have a significant and lasting impact on libraries and the public, including authors and publishers. But in the absence of competition for the services enabled by the Settlement, this impact may not be entirely positive. The Settlement could compromise fundamental library values such as equity of access to information, patron privacy, and intellectual freedom. In order to mitigate the possible negative effects the Settlement may have on libraries and the public at large, the Library Associations request that this Court vigorously exercise its jurisdiction over the interpretation and implementation of the Settlement.

The brief then describes “concerns with the Settlement, and how the Court’s oversight can ameliorate those concerns.”


  1. Latin: “friend”, informal form of amicus curiae of “friend of the court” — Wiktionary []