Thursday Threads: Structured Data on the Web, Ebook Indexes, Amazon Disintermediates Publishers

Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

DLTJ Thursday Threads for two weeks in a row! I’m getting back in the groove. This week has pointers to geeky things (learning about structured data on the web) and not quite so geeky things (thoughts about indexes in ebooks and Amazon’s tactics for end-to-end control of book publishing). Well, admittedly, for only certain definitions of “not quite so geeky” … still I hope you enjoy the pointers and be sure to let me know what you think.

On a sad note, just as I was finishing composing this week’s Thursday Threads I saw a tweet that Michael Hart — instigator, founder, and tireless advocate for Project Gutenberg — died on Tuesday. The Project Gutenberg site has a wiki page with an obituary written by Dr. Gregory B. Newby. I never met Michael, but I ran into his writings and work soon after I started my own online journey in 1987. Before I got to know of his Project Gutenburg work I found I could easily identify his writings by his astonishing ability to full-justify his paragraphs without the need for added spaces. Brewster Kahle’s remembrance of Michael has an example. That one eccentric trait, for me at least, is evidence of the passion he brought to the desire to make public domain works available to everyone well before we knew what an e-book was. Rest in peace, Michael Hart; the world is a better place from your efforts.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my FriendFeed stream (or subscribe to its feed in your feed reader). Comments and tips, as always, are welcome.

Learn about and play with microformats, microdata, RDFa and JSON-LD

More and more of the world’s data is moving onto the Web. We want to share, re-mix and use this data to build more awesome Web applications. Using structured data technologies to mark up people, places, events, recipes, ratings, music, movies and products on the Web makes everybody’s life easier. This site will help you learn about big data, the semantic web, and the practical application of technologies such as Microformats, RDFa, Microdata and JSON-LD.

This is a nice site that just popped up on the web. The similarities and differences between microformats (an older and, in my opinion, more “hackish” way to make data machine readable on web pages), RDFa (what the hard core standardistas wished we would use to encode data), microdata (part of the new HTML5 specification), and JSON-LD (linked data as JavaScript Object Notation — a new one to me) can be hard to ferret out. This site brings together tutorials on these techniques with links to “playground” areas and utilities to use test and extract data from web pages. These techniques are the emerging ways that data is getting encoded into web pages for others to find and use, and to the extent that we want our bibliographic data to integrate into other domains we should be using these techniques.

Why an ebook still needs an index [and how they could be designed]

A draft screen design of an index entry for the term millet, from Peter Meyers' article

In sum, an index is a kind of a collection of pre-made searches: rather than diving headlong and unawares into a search oval’s do-it-yourself void, an index presents would-be searchers with an already assembled, alphabetized list of the 500 or so most common query items. (Microsoft’s effort to brand Bing as a “decision engine” offers an apt analogy; index is to search as a decision engine is to a search engine.) Speaking of search: of course, the standard ebook search oval has its role in the world of digital books. But for the kinds of guided lookup missions listed above, it’s a poor substitute for an index.

- Why an ebook still needs an index, by Peter Meyers, O’Reilly Radar

Professional indexers, take note! We still need your skills in the e-book world. Meyers argues in favor of robust indexes in e-books and functionality in ebook readers, and gives us some back-of-the-napkin sketches for how it could be implemented.

Amazon continues on its mission to disintermediate publishers

What if you could ask the author of a book a question while you were reading the book? That’s the kind of world Amazon wants to offer with its new @author feature, which the online bookstore launched on Wednesday with a group of writers including Susan Orlean and self-help guru Tim Ferriss. Readers can ask questions directly from their Kindles while they are reading a book, and the question gets sent to the author’s Twitter account as well as to their home page at Amazon. In addition to creating what the company hopes will be a kind of reader community around Kindle titles — something it has been pushing in other ways as well — this new feature looks like another step in Amazon’s quest to cut publishers out of the equation and build relationships directly with authors.

My reaction was “if publishers are getting disintermediated in relationships between readers and authors, I wonder where that put libraries? I don’t see evidence that libraries exist in Amazon’s world.” My next thought was, “I wonder if there is a role for standards here such that libraries/librarians could participate in the conversations along side the author.” After all, we have seen what can happen when a librarian is brought in virtually to a classroom discussion. Could we see the same effect if public librarians were assigned to virtually participate in book groups? Then the cynical side kicked in and remembered that Amazon has again snubbed the standards process by recently promoting its own “print replica” ebook format over the established PDF standard.1 If their goal is to disintermediate everyone in the book world, do we really think they would play nicely with standards?

Footnotes

  1. Of course, we’ve come to learn that the Kindle Print Replica format is really a PDF wrapped in a proprietary file format. []
(This post was updated on 10-Sep-2011.)