Thursday Threads: Refining Data, Ebook Costs, Open Bibliographic Data, Copyright Infringement

3 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

It has been a long week, so for many of you this edition of DLTJ Thursday Threads will actually be read on Friday. The spirit was willing, the topics were certainly out there in the past seven days, but the necessary distractions were numerous. Please enjoy this edition whenever you read it. As always, there is lots more on my FriendFeed aggregation page.

Google Refine 2.0, a power tool for data wranglers

Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.

Google's Open Source blog has this announcement of a major new release of their "Refine" software package. It is software that runs on your Windows, Mac, or UNIX machine and you access it with your web browser. If your first inclination for cleaning up data sets is to drag out Excel or write a script using regular expressions, check out the three demonstration videos and see if Refine might get you to your end result faster.

Why Do eBooks Cost So Much? (A Publisher’s Perspective)

So far in our experience at Thomas Nelson, the elimination of manufacturing and distribution costs are being offset by retail price reductions and the three additional costs I have outlined. The good news is that we are making about the same margins, regardless of whether we sell the book in physical form or digital. As a result, I don’t expect eBook retail prices to come down any more. If they do, then publishers will have to figure out how to make it work. But for right now, I think the pricing is fair, based on the associated costs.

This post comes from the chairman and CEO of Thomas Nelson Publishers. In it he describes the shifting costs of physical versus digital production from a publisher's perspective. His practical upshot? "I don't expect eBook retail prices to come down any more."

Principles for Open Bibliographic Data

For some time now the OKFN Working Group on Open Bibliographic Data has been working on Principles on Open Bibliographic Data. While first attempts were mainly directed towards libraries and other public institutions we decided to broaden the principle’s scope by amalgamating it with Peter Murray-Rust’s draft publisher guidelines. The results can be seen below. We ask anyone to review these principles, discuss the text and suggest improvements.

Here are the highlights of the five principles mentioned in the post: when publishing data make an explicit and robust license statement; use a recognized waver or license that is appropriate for metadata; if you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition; we strongly recommend explicitly placing bibliographic data in the public domain via PDDL or CCo; and we urge creators of bibliographic metadata explicitly either dedicate this to the public domain or use an open license.

"Copyright Infringement and Me" -- The Sad Tale of Cooks Source

My 2005 Ice Dragon entry, called "A Tale of Two Tarts" was apparently printed without my knowledge or permission in a magazine and I am apparently the victim of copyright infringement.

That is the beginning of the beginning of a tale of significant copyright infringments by a small advertising-supported publication in western New England. The details have been summarized by others. What I find useful, though, are the posts that talk about lessons learned

as the internet speeds the spread of memes and how significant remedies for copyright infringment can be difficult to obtain. Since Facebook plays such a central role, the tale of Cooks Source might make a for a useful case study to the Facebook generation.


p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from to on November 19th, 2012.