Google Book Search Privacy, Orphan Works, and Monopoly

A few weeks ago, a reporter at the Chronicle of Higher Education interviewed Adam Smith, Google's director of product management, about the Google Book Search settlement and posted the interview in audio form. The page isn't dated, but guessing from metadata in the URL it was somewhere around the publication of paper issue dated June 26, 2009. I'm calling out this particular interview because Mr. Smith said things that I hadn't heard in other forms yet -- Google's intentions about privacy in Google Book Search, an explicit statement about the Book Rights Registry releasing information about the status of orphan works, and a statement on what Google expects the size of the orphan works problem to be once the Registry has been in operation for a while.

Below is a rough transcript of portions of the interview. I've added emphasis in the transcript to the parts that I hadn't heard Google representatives say before.

Chronicle host: There has been a lot of concern among librarians and in the library community about access and privacy. Can you alay some of those fears?

Adam Smith: There has been a lot of discussion about how this settlement affect things such as access and privacy, and what we are really looking at is creating a product that will be broadly accessible to the university community as well as the internet community generally. [...] I think with respect to privacy, Google hasn't designed the product yet so it is hard to have a privacy policy for it, but we fully intend to have a policy that is consistent with a lot of the standard procedures in the library community today. Things such as allowing authentication to happen via IP. But we take privacy seriously and it will be consistent with Google's privacy policy as well as have some specific provisions when we actually get down to designing the product.

Chronicle host: There have been a lot of interest and concern in so called "orphan works" -- where do those fit into the settlement and how do respond to some of the anxiety about that.

Adam Smith: So there is no technical definition of "orphan works" but for the purposes here we'll say a book for which no rightsholder exists. Google's mission in this is to really provide broad access to all of these books and when you look at the corpus as a whole, the percentage of books that are available -- say -- is about 20% are in the public domain or more, about 5% are kind of in print. What that leaves is this center of books that are not in print but may be or may be not in copyright. And what we believe is through the settlement agreement and the establishment of the Books Rights Registry, which is an author- and publisher-controlled entity that will try to track down the rights holders of the particular book, we believe that over time what will happen is that rightsholders will come forward to claim the money that was generated via the economic models and this will allow for better identification of the specific rightsholders to the works. And the Books Rights Registry has committed to making any information -- or making the information about whether or not a book has been claimed -- making that public so that someone who's interested in making use of one of these potentially orphan works can understand as to whether or not a rightsholder has come forward for that particular book.

[...]

Chronicle host: Another concern is maybe the one that Google encounters the most -- is the question of monopoly. And why we should be happy that the idea that a private company has essential control over 10 million plus works?

Adam Smith: So I think at its root what's really important here is to look at the agreements. And Google has non-exclusive agreements at the root of all of its agreements. So, its agreements with its library partners are non-exclusive, its agreements with its publishers and authors are non-exclusive. So anyone is free to enter into agreements with those institutions or those publishers. With respect to the settlement agreement, for all works for which a rightsholder comes forward, the Books Rights Registry will have the ability to license or enter into economic models with other parties for those works. So really this is not an exclusive license to Google, but rather it's establishing the ability for them to get access to these. Obviously for the public domain works, there is no rights or contract associated with that. So what this really leaves is what we believe is a very thin slice of the remaining books, which are the orphan worked books.

I'm glad to see some sensitivity to the notion of privacy in Mr. Smith's response to that question. The notion of privacy goes beyond using IP address authentication to enable institutional subscription users to access the scanned books, of course -- specifically to the collection and disposition of log files related to individuals' use of the Google books database. I wonder if Google will really consider severing the link between reader and work, as is common practice in libraries today. In the case of online books, that would mean not collecting -- or at least immediately anonymizing -- the IP address of the machine used to read portions of the book. Time will tell, and this is certainly an area where I hope there is more dialog between Google and academic libraries (should the settlement agreement be approved).

It is interesting that a Google representative is making statements about what the Books Rights Registry will do with orphan works information. I would think it would be up to the registry's board of directors to decide whether or not they publicly release information about the orphan status of a work. I don't recall reading in the settlement agreement that it would be mandatory.

Mr. Smith's answer to the monopoly question ignores the "most favored nation" clause in the settlement agreement that says the Registry cannot offer licensing terms to another party that are more favorable than the ones offered to Google. While that might not be a monopoly in the strictest sense, it certainly makes it harder for any other entity to compete effectively with Google. That same answer also shows Google's optimism in the estimate that there will be "a very thin slice" of works that will turn out to be orphans -- in copyright but without an identified rightsholder. I can only assume that they have internal research to back that up. My gut tells me that there is considerably more than a thin slice, but that part of Mr. Smith's answer plays well with the notion that Google won't really have a monopoly because there will be so few books that Google will have the exclusive protections in the class action lawsuit settlement to digitize.

Adam Smith also has answers to questions about why Google didn't fight it out in court, what Google is doing to help the settlement be approved, and what Google's reaction might be if the settlement isn't approved.

The text was modified to update a link from http://chronicle.com/media/audio/v55/i40/smith/ to http://chronicle.com/article/Audio-Whats-Next-for-Google/48349/ on January 20th, 2011.