On the Need for a General Purpose Digital Object Repository

Posted on 7 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

Digital objects -- we've all got 'em. Billions and billions of them. And we put them in individual content silos, stratified along such unhelpful lines as media type, owning entity, and other equally meaningless categories. At least meaningless to the end user. So, let's ask ourselves: what is the job the user is trying to get done? And how can we structure our digital object repositories to help them out?

What is a Digital Object?

Kahn and Wilensky ((Kahn, Robert and Wilensky, Robert. 1995. "A Framework for Distributed Digital Object Services." http://www.cnri.reston.va.us/home/cstr/arch/k-w.html)) define a digital object as "a data structure whose principal components are digital material, or data, plus a unique identifier for this material, called a handle (and, perhaps, other material)." Lagoze, et al, further define 'other material' to include "terms and conditions … an encapsulation of access rules on use of the object" ((Lagoze, Carl. 1995. A Secure Repository Design for Digital Libraries. D-Lib Magazine. http://www.dlib.org/dlib/december95/12lagoze.html)) and metadata ("data about data" ((Daniel, Ron, Jr.; Lagoze, Carl; and Payette, Sandra D. A Metadata Architecture for Digital Libraries. http://www.cs.cornell.edu/lagoze/papers/ADL98/dar-adl.html)) ).

Furthermore, Daniel, Lagoze and Payette distill the principles of a digital object repository within a model they call "Distributed Active Relationships":

  • There is no essential distinction between data and metadata. We can only make such a distinction in terms of a particular "about" relationship. As a result, what is metadata in the context of one "about" relationship may be data in another.
  • There is no single "about" relationship. There are many different and important relationships between data resources.
  • Resources can be related without regard for their location. The connectivity in networked information architectures makes it possible to have data in one repository describe data in another repository.
  • The computational power of the networked information environment makes it possible to consider active or dynamic relationships between data sets. This adds considerable power to the "data about data" definition. First, data about another data set may not physically exist, but may be automatically derived. Second, the "about" relationship may be an executable object -- in a sense interpretable metadata.

Within such a general framework, it is possible to refer to digital objects, their metadata, and the services/operations/transformations applied to them as distinct entities.

Where Digital Objects Reside Today

Many systems have been and are being deployed on college campuses that contain, in part, the concept or a subsystem for storing digital objects as reflected by this general definition.

Digital Library systems (examples of specific instances are Greenstone and ContentDM) store and present the bitstreams along with their associated, typically very rich metadata. The library, archive, or museum staff play an important, and sometimes sole, role in ingesting content into the system. Content is presented as in an online exhibition or a search/retrieve interface. Digital library systems usually also have the goal of preserving bitstreams, metadata, and context for scholars in the distant future.

Institutional Repositories (IRs - DSpace, Digital Commons and Symposia) offer a web-based interface to those that perform the content ingestion process (in most use cases their roles are faculty, instructor, researcher, or the administrative assistant of one of these). There is a hierarchical representation of communities within communities, into which digital objects are placed. Presentation of content reflects the same organization of communities within communities. Like digital library systems, the design and implementation of subsystems within an IR are guided by best practices for preserving information in a digital form.

Electronic Journals (ejournals - eprints, DpubS and BEPress) require specialized handling and presentation of content. Authors submit digital objects to editors; editors in turn use a machine-mediated workflow to farm out, track, and collate comments from reviewers. Revised digital objects are packaged together in "issues" and "volumes" for presentation on the website. There is usually a commitment to ongoing preservation of ejournal digital objects.

Collaborative Learning Environments (CLEs - Sakai, Moodle, WebCT, Blackboard, Angel and Desire2Learn) ingest and present content in the course of teaching and learning. Digital objects come into the system, are combined and manipulated into packages of learning objects by instructors, and presented to learners as part of taking a course. Learners, in turn, create and ingest their own digital objects into the CLE as evidence of learning and are manipulated by instructors by annotation and/or grading.

Electronic Portfolios (ePortfolio - OSPI plus just about all of the CLE vendors) hold digital objects on behalf of learners and scholars that demonstrate that individual's competency for a particular field or skill. The digital objects can be a result from a learning experience in a CLE, a paper published in an ejournal, a research report submitted to an IR community, or come from a source not identified in this whitepaper. The portfolio's owner creates a presentation context for a selection of objects and grants access to selected groups.

The Case for a Unified Repository

Each of these systems - digital libraries, institutional repositories, electronic journals, collaborative learning environments, and electronic portfolios - is, at least in part, a repository of digital objects. Content (a digital object in the form of data and metadata) is added to the repository through a variety of ingestion workflows, manipulated by a variety of business tools, and expressed to repository users through a variety of presentation mechanisms. Internally to each system, metadata is usually distantly linked to its underlying object through numerous relational database tables or other schemes.

The content repository of each system can be viewed as a silo; content resident in the system is not easily accessible to other systems. In some cases, the content repository is closely linked to the ingestion and presentation functions of the system. In other systems, the linkage is not tightly bound making it possible to substitute alternate repository systems. In almost all cases, sharing content between silos usually involves words like "export/import" or "copy" or "migrate" - signaling a practice of content stored in multiple, unsynchronized locations in order to meet the needs of the workflow and presentation of the system surrounding each content silo.

But what if, beneath the facade of workflows, interfaces and business rules for each of these tasks, there was a single unified repository of all digital objects.

Shared Objects

At a fundamental level, objects placed in the repository via one system are visible to all of the other systems that use the same unified repository. An article ingested through the e-journal interface is available as evidence of a competency in an e-portfolio. From a video imported through the digital library interface a segment is described and annotated (and inserted back into the repository as a new object) for use as part of a learning object through the collaborative learning environment. A student submits a paper for evaluation to an instructor through a collaborative learning environment course and immediately that paper, and the resulting grading and commentary, is immediately visible as an object in the student's portfolio.

The recent unification of Sakai as a CLE and OSP as an ePortfolio platform is evidence of the evolution of thinking for single-repository/multiple-service systems.

Shared Workflow

Another immediate benefit of the unified repository concept is the generalization of tasks that promote efficiency through automation. For example, the process by which a paper submitted to a journal is reviewed by peers is similar to processes in dissertation defenses and promotion/tenure boards. An object - or a collection of objects in a portfolio combined into a single object following the Daniel et al model of "Distributed Active Relationships" - follows a process whereby reviewers submit comments and annotations on "an object" to a central point for aggregation, processing and reporting. In the case of a journal article, it is peer reviewers submitting comments to an editor. In the case of dissertation defenses and promotion/tenure boards, it is automation of the review of a candidates work by internal committee members and external referees. In all cases, the underlying digital objects remain immutable and are augmented by relationships to other objects representing annotations and other actions (subsequent publications, acceptance by defense committee, award of promotion or tenure).

Preconditions to Ensure Success

Such a unified repository model presupposes a number of factors. First, that it is possible to uniquely identify individuals and their roles in relationship to objects through the numerous interfaces atop the object repository. The foundation for this precondition exists with the Shibboleth federated identity management system.

Second, that it is possible to decouple the existing repository function from the workflow, business rule, and presentation layers of the underlying system (e.g. CLE, IR, ePortfolio). In most cases, it may be necessary for the unified repository to emulate the repository store and retrieve functions of the incumbent system.

Conclusion

Digital Objects are at the heart of the systems emerging on higher education campuses. These systems each employ repositories to hold objects used in accomplishing the goals of the system, but these repository silos are not efficient use of storage space and promulgate the problems of copying objects from system to system. A unified repository, on the other hand, promotes a single universe of digital objects enabled by a multifaceted network of relationships and metadata that allow each system to use any digital object to fulfill its needs. The end result is a seamless, efficient presentation of content available to learners and researchers through the system that best meets the users' needs.

Modified to remove a link to http://www.umi.com/proquest/digitalcommons/.

<

p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://www.cs.cornell.edu/lagoze/papers/ADL98/dar-adl.html to http://web.archive.org/web/20120617040554/http://www.cs.cornell.edu/lagoze/papers/ADL98/dar-adl.html on October 21st, 2013.