Model Language on Library Data Ownership

In September, Carl Grant wrote a blog post on the ownership of library data ("We have a problem... another vendor appearing to need education about exactly WHO owns library data") that has been rolling around my own thoughts for, well, months. The spark of Carl's post was a Twitter conversation where a major library system vendor appeared to be taking steps to limit what library/customers can do with their own data.

SerialsSolutions sez: we own the data you entered into 360. You can't share it with a competing Discovery vendor.#VendorLove— Wally Grotophorst (@grotophorst) August 15, 2012

Awful! Seriously...RT @grotophorst SerialsSolutions sez: we own the data you entered into 360. No sharing data!#VendorLove

— ASERL (@ASERLJEB) August 16, 2012

@aserljeb Awful but true. And I spoke with the VP of Discovery at SS.To quote: "well, there is a competitive advantage in that data."

— Wally Grotophorst (@grotophorst) August 16, 2012

This harkens back to the OCLC records use policy furor of 2008-2010. (In case you don't remember, the heart of the matter was a proposed transition to a policy that seemed to significantly limit the reusability of descriptive information in WorldCat, particularly in light of new desired use cases like library linked data. I was one that was in favor of a more open policy.) There is a critical difference to note, though. In the OCLC records use policy case the feedback prompted the cooperative's board of trustees to create a public forum for debate and rough consensus building. The result was a revised policy that was true to the needs of the cooperative while enabling new uses of data to be tried and implemented. In fact, we see tangible results of this effort in the recent announcement of library linked data embedded into WorldCat.org pages. ((As an aside, I was critical about the lack of openness to the process, but I'll readily acknowledge that the output of the process is a a great compromise. The clarification about linking with an OCLC URL being sufficient to denote attribution in linked data contexts is particularly helpful. In retrospect, the process worked well.))

What we have here is something quite different. The case that Carl points out involves a private company making a decision about a library's data. And I don't think this is limited to the named vendor; there can easily be other cases where vendors want to treat library data as a competitive advantage and make getting it difficult to keep that advantage. The channels to make the company's management respond are limited and narrow. In fact, a public calling out may not be sufficient to get action. Carl's post has suggestions on what libraries can do to protect themselves. I'd like to add one other possibility: the creation of model language that libraries can use at subscription or renewal time that spells out their expectations and desires for their data.

When I entered the library profession about 20 years ago, it was somewhat common for libraries to put software escrow clauses in their contracts; in the event the company no longer provided support for software the library purchased to run on its own systems (whether due to bankruptcy or other reasons), the library had the right to access the source code and internal system documentation stored with a third party. It was a form of protection for the library such that it could continue operations in the event of a failure of the software company. Such clauses fell out of fashion, coinciding somewhat to the move from purchase to subscription service models. This is one form of precedent for libraries managing technology risk.

A more recent example is the NISO Shared Electronic Resource Understanding (SERU) best practice. This is an agreement between content distributors and content licensors that covers the common cases of uses of digital information under copyright law without the overhead of getting lawyers involved in complex and detailed license negotiations. Representatives from distributors and licensors came together to forge this common understanding for the benefit of all.

Yet another contemporary example is the data liberation portal set up by Google staff. Through this portal you can see instructions on how to access data in various Google services in the event you want to transfer that data to another service.

What is called for in this case is some combination of these three exemplars. I think what we need is:

clauses in legal agreements with providers that govern the ability for libraries to access their own data about their own operations;
combined with a process through which representatives in the profession come to agreement on the best model language for contract clauses;
plus the transparency of commitments an process exemplified by data liberation.

This could be a project that goes through the formal process of best practices work in a forum like NISO. Or it could start as an ad hoc effort using the Code4Lib wiki plus a mailing list to manage discussion. Or some other mechanism the community creates.

I'm interested in this from two perspectives. First, as a library professional I think it is an important way for a library to mitigate risks to it operations. Second, as someone employed by a not-for-profit providing Software-as-a-Service options to libraries I want to ensure any model language is workable for the service provider.

Are you interested? Comment here, link to this post with thoughts from Twitter, Facebook, Google+, Tumbler, Pinterest, a mailing list or your own blog, or get in touch with me personally.