Thursday Threads: Refining Data, Ebook Costs, Open Bibliographic Data, Copyright Infringement

It has been a long week, so for many of you this edition of DLTJ Thursday Threads will actually be read on Friday. The spirit was willing, the topics were certainly out there in the past seven days, but the necessary distractions were numerous. Please enjoy this edition whenever you read it. As always, there is lots more on my FriendFeed aggregation page.

Google Refine 2.0, a power tool for data wranglers

Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.

Google’s Open Source blog has this announcement of a major new release of their “Refine” software package. It is software that runs on your Windows, Mac, or UNIX machine and you access it with your web browser. If your first inclination for cleaning up data sets is to drag out Excel or write a script using regular expressions, check out the three demonstration videos and see if Refine might get you to your end result faster.

Why Do eBooks Cost So Much? (A Publisher’s Perspective)

So far in our experience at Thomas Nelson, the elimination of manufacturing and distribution costs are being offset by retail price reductions and the three additional costs I have outlined. The good news is that we are making about the same margins, regardless of whether we sell the book in physical form or digital. As a result, I don’t expect eBook retail prices to come down any more. If they do, then publishers will have to figure out how to make it work. But for right now, I think the pricing is fair, based on the associated costs.

This post comes from the chairman and CEO of Thomas Nelson Publishers. In it he describes the shifting costs of physical versus digital production from a publisher’s perspective. His practical upshot? “I don’t expect eBook retail prices to come down any more.”

Principles for Open Bibliographic Data

For some time now the OKFN Working Group on Open Bibliographic Data has been working on Principles on Open Bibliographic Data. While first attempts were mainly directed towards libraries and other public institutions we decided to broaden the principle’s scope by amalgamating it with Peter Murray-Rust’s draft publisher guidelines. The results can be seen below. We ask anyone to review these principles, discuss the text and suggest improvements.

Here are the highlights of the five principles mentioned in the post: when publishing data make an explicit and robust license statement; use a recognized waver or license that is appropriate for metadata; if you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition; we strongly recommend explicitly placing bibliographic data in the public domain via PDDL or CCo; and we urge creators of bibliographic metadata explicitly either dedicate this to the public domain or use an open license.

“Copyright Infringement and Me” — The Sad Tale of Cooks Source

My 2005 Ice Dragon entry, called “A Tale of Two Tarts” was apparently printed without my knowledge or permission in a magazine and I am apparently the victim of copyright infringement.

That is the beginning of the beginning of a tale of significant copyright infringments by a small advertising-supported publication in western New England. The details have been summarized by others. What I find useful, though, are the posts that talk about lessons speeds the spread of memes and how significant remedies for copyright infringment can be difficult to obtain. Since Facebook plays such a central role, the tale of Cooks Source might make a for a useful case study to the Facebook generation.


p style=”padding:0;margin:0;font-style:italic;”>The text was modified to update a link from to on November 19th, 2012.

Passion Quilt Meme: Take Time to Wonder

Image of a girl closely examining a caterpillar crawling on a white gate.  Image has the caption 'Take time to Wonder'

I found this meme via Karen Schneider’s entry. Although I wasn’t explicitly tagged, I thought it was interesting enough to add an entry to the meme’s Flikr pool.

With all due respect to Karen — and I agree that a love of reading is important — but it is a sense of wonder that encourages a love of reading and all sorts of other critical character traits. This is a picture of my daughter when she was about three years old. She is on the back deck of our Connecticut house watching a caterpillar crawl up our gate. She loves to read (and now three years later is reading scores of books on horses and dolphins from the elementary school library), and as her father I hope the same sense of curiosity will sustain her love for reading, arts, sciences, and life.

Since I wasn’t tagged, I’m not inflicting the meme on anyone else.