Introducing the OAI Object Reuse and Exchange Initiative

Posted on 7 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

In the past few months a new group has formed to tackle the problem of representing and exchanging complex digital objects in a web-based environment. I am proud to serve on the technical committee for this group and over the next few postings I'm aiming to introduce the library community to the work of the Open Archives Initiative Object Exchange and Reuse group and seek the feedback of the wisdom of this crowd.

Vision and Scope

OAI Object Reuse and Exchange (ORE) is a new effort conducted under the umbrella of the Open Archives Initiative. The summary vision statement is to develop, identify, and profile extensible standards and protocols to allow repositories, agents, and services to interoperate in the context of use and reuse of compound digital objects beyond the boundaries of the holding repositories. A key aspect of this statement is that it refers to working with objects, not about metadata only. In that way, the ORE work is set apart from the previous OAI work, the Protocol for Metadata Harvesting (PMH).

The aim of the ORE effort is to promote (through creation or endorsement) effective and consistent mechanisms:

  • to facilitate discovery of compound digital objects;
  • to reference (or 'link to') these objects (as well as parts thereof);
  • to obtain a variety of disseminations of these objects;
  • to aggregate and disaggregate objects; and
  • to enable processing of objects by automated agents.

Although these mechanisms may apply to more general web activities, the use cases we are working from are firmly bounded to the needs of the academic community. Generally speaking, those use cases seek to establish the basis for a digital scholarly communication system composed of two types of systems: a) applications that manage content (such as institutional repositories); and b) applications that leverage managed content (such as search engines, personal productivity tools, and data and text analysis services). Of course, other application domains are possible, but like the initial OAI-PMH, the intent is to start with a domain with which we are familiar with an eye towards more general applications as appropriate.

Compound Digital Objects

Key to understanding the ORE vision and scope is a definition of the phrase "compound digital object." In the case, a compound digital object is content with multiple components that vary on:

  • Content, or semantic, types (including: text, datasets, simulations, software, dynamic knowledge representations, machine readable chemical structures, bibliographic and other types of metadata);
  • Media types (including IANA registered MIME types and other type/format registries such as GDFR);
  • Network locations (including content from institutional repositories, scientific data repositories, social networking sites and the general web); and
  • Relationships (where the digital object is part of a complex graph of objects related by lineage, versions, and derivations).

In the abstract, this definition is understandably hard to grasp. There are some conceptual examples the technical committee uses to keep focus on the task at hand. One example is a paper in the arXiv repository with different disseminations. Although the primary artifact in this example might be simply the paper itself, in even this case there is a compound digital object surrounding that artifact with components that represent the paper in PDF and Postscript formats, Dublin Core descriptive metadata, and an HTML "splash page" for the paper. Another example is that of an issue of an overlay journal built from distributed ePrints from different repositories. The nature of "the paper" itself is changing: in e-science the text of the paper is combined with data sets and simulations; in e-humanities, the text of the paper is combined with primary content (such as scanned items) and the scholar's derived content.

Description of Work

So an accurate perspective on the OAI-ORE work is that it seeks to enrich the content sharing landscape. ORE is about enabling digital objects to float between systems that manage content and systems that leverage managed content. In the first category are applications such as institutional repositories, research-group and managed personal (ePortfolio) repositories, discipline-oriented repositories, publisher repositories, dataset repositories, cultural heritage repositories, learning object repositories, and digitized book and manuscript collections. In the second category are applications such as search engines, authoring tools, citation management, collaborative environments, social network applications, data/text mining applications, relationship graph analysis tools, preservation services, workflow tools, and report generation tools.

A key point to remember is that OAI-ORE is not necessarily about transferring the digital assets from one system to another. It is the goal of the technical work to enable new, complex objects to be built without necessarily transferring all of the component parts from disparate content repositories to a single system. (Reflect for a moment on the overlay journal concept -- the papers that make up the issue of the overlay journal could certainly remain dispersed in the repositories where they are originally located; the overlay journal pulls together the contents in a virtual construct that represents an issue of that overlay journal.) In some use cases, transfer of the digital object content is required; preservation mirroring is one such example. In many cases, however, full transfer is not permitted (by terms of use), impractical (as in a dataset that is terabytes in size) or simply superfluous.

At its first meeting, the technical committee identified some motivating use cases to guide our work. At this point they are little more than general statements of the activities that can be done better with an ORE framework in place. Over the course of the next few weeks the technical committee will elaborate on these general statements and turn them into stories.

  • Find, collect, analyze, relate, and publish data-oriented scholarly objects
  • Preserve compound digital objects
  • Remote submission of compound digital objects
  • Citation management
  • Object equivalence recognition (de-duping) to aid resource discovery
  • Graph-based quality assessment of data-centric scholarship

OAI-ORE Project Organization

The character of the project organization is similar to that of the Protocol for Metadata Harvesting effort. Carl Lagoze (Cornell University) and Herbert Van de Sompel (Los Alamos National Laboratory) are the principle investigators on the project. They coordinate the efforts of an international group of volunteers that form an Advisory Committee, a Technical Committee (of which I am a member) and a Liaison Committee. The membership lists for these committees are available on the OAI-ORE website. It is worth noting that the participants are not exclusively from the library domain. In particular, there is an emphasis not just on text/image/video objects but also scientific data objects. The Andrew W. Mellon Foundation is funding the work for a 24-month period that began in October 2006 with additional support from the National Science Foundation.

The impetus behind the OAI-ORE effort was a meeting in April 2006 of representatives from institutional repository projects, scholarly content repositories, registry projects, and various other projects that touch on interoperability. See http://msc.mellon.org/Meetings/Interop/ or more information.

The ORE work does not imply that the OAI-PMH specification is being dropped or replaced. OAI-PMH will continue to exist as one approach to interoperability. OAI-ORE will complement OAI-PMH when richer functionality is desired as part of a multi-level interoperability stack. In fact, one might consider OAI-ORE to be resource centric in contrast to OAI-PMH's metadata-centric approach.

The technical committee has met once (the report of the meeting is available from the ORE website) and will be conducting a conference call this week leading up to a second face-to-face meeting in May. Right now we are fleshing out the use cases as a tool for testing models that we create or adapt from other uses. We want to make sure what we're specifying will really work in our application domain. Once we have a good sense of what a model of the ORE scope of work entails, we'll review existing related technologies with the intent of adapting what is currently available to meet the needs of the ORE model and only creating new specifications and protocols when it is really necessary. Some early candidates of related technologies are OAI-PMH, RSS/ATOM, OpenURL, and METS/DIDL.

The technical committee has agreed amongst itself to use a tag of 'oaiore' in the various social web tools (Technorati, Connotea, and del.icio.us for example) as a way to co-locate material on this topic. Others are encouraged to do the same. Fellow technical committee member Pete Johnson (with follow up) has already started a conversation, and you can listen to an interview with Herbert van de Sompel via EDUCAUSE.

In a subsequent postings, I'll go into some detail about the inner workings of "The Web Architecture" and how it is both a help and a hindrance to the interaction of compound digital objects in our domain, and how it is a force too powerful to be ignored in either case.