Would the Real “Dublin Core” Please Stand Up?

Posted on     6 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

I've been following the discussion by Stu Weibel on his blog about the relationship between Resource Description Framework (RDF) and Dublin Core Abstract Model (DCAM), and I think I'm as confused as ever. It comes as a two part posting with comments by Andy Powell Pete Johnston (apologies, Pete), Mikael Nilsson, Jonathan Rochkind, and Ed Summers. Jonathan's and Ed's comments describe the same knowledge black hole that I've been facing as well; in Ed's words: "The vocabulary I get--the DCAM is a tougher nut for me to crack."

I'm struggling to get beyond Dublin Core as simply the definition of metadata terms. That does seem to be the heart of Dublin Core, doesn't it? The Mission and Scope of the Dublin Core Metadata Initiative, as described on the Dublin Core Metadata Initiative's (DMCI) "about" page, is:

The development and maintenance of a core set of metadata terms (the DCMI Metadata Terms) continues to be one of the main activities of DCMI. In addition, DCMI is developing guidelines and procedures to help implementers define and describe their usage of Dublin Core metadata in the form of Application Profiles. This work is done in a work structure that provide discussion and cooperation platforms for specific communities (e.g. education, government information, corporate knowledge management) or specific interests (e.g. technical architecture, accessibility).

Terms? ... yeah, it's in there. Application Profiles? -- how one would actually use DC? ... yeah, it's there too. An abstract model? Either it is so fundamental to Dublin Core that it doesn't get mentioned as a work activity, or its definition is somehow secondary to the work of the DCMI. To be honest, I'm not sure which it is (or even if this is a fair dichotomy).

An outsider's view of the history of Dublin Core

There doesn't seem to be a "brief history of Dublin Core" document out there; if there is, I can't find it. [Update 20080218T1514 : My wife, rightly so, asked if I had done an actual literature search for one; alas, no I had not. So I Google'd "dublin core" history (relax, I used Google Scholar) and came up with a book chapter by Stu Weibel and others called Dublin core: process and principles that takes the history through 2002. There is also a rather unflattering article by Jeffery Beall called Dublin Core: an obituary but it doesn't really contain much history. I might need to try a more rigorous literature search later.] I think I get the history of Dublin Core -- in very broad strokes, it goes something like this.

First we had the Dublin Core Metadata Element Set (DCMES, sometimes DCES) -- otherwise known simply as "Dublin Core" as it was the first product of the Dublin Core body -- which defined the 15 elements that we all think we know and love: contributor, coverage, creator, date, description, format, identifier, language, publisher, relation, rights, source, subject, title, and type. This "Dublin Core" was codified and ratified in all sorts of places: IETF RFC 5013 (version 1.1, which obsoletes version 1.0 which is RFC 2413, ANSI/NISO Standard Z39.85, and ISO Standard 15836-21003. In the common vernacular, when one refers to "Dublin Core" one is talking about these 15 elements. (It is sort of like how the Open Archives Initiative Protocol for Metadata Harvesting -- OAI-PMH -- is called OAI even though OAI is certain now bigger than just PMH with the work on the definition of Object Reuse and Exchange.)

Next we had the Qualified Dublin Core, which (in part) added attributes to some of the 15 "core" elements (the one that always leaps to mind is "spatial" and "temporal" for the core term "coverage"). This was a tweak to the DCMES -- done in such a way, it would appear, so as not to invalidate all of the nicely codified and ratified versions. I imagine all of that codifying and ratifying took a lot of effort; I wouldn't intentionally want to mess it up either.

But then the story gets sort of fuzzy. Dublin Core is successful, and so an effort starts to define why it is successful. To me, this seems like the W3C work on the Architecture of the World Wide Web. It isn't an attempt at revisionist history as much as it is trying to put the genie back in the bottle by coming up with formal definitions for all of the stuff that is successful. (In the W3C, this seems to result in efforts by the Technical Architecture Group to reconcile the way things are with the way one would like things to be.) In the case of the DCMI, it is release last year of the Dublin Core Abstract Model followed by corresponding realignment of the "DMCI Metadata Terms" to match the DCAM.

By the way, in case you hadn't noticed the core 15 elements plus the qualifications on some of the elements was expanded in 2002 to include a lot more. In fact, in the latest definition of DCMI Metadata Terms, the original 15 are called "legacy terms":

Implementers may freely choose to use these fifteen properties either in their legacy dc: variant (e.g., http://purl.org/dc/elements/1.1/creator) or in the dcterms: variant (e.g., http://purl.org/dc/terms/creator) depending on application requirements.... Over time..., implementers are encouraged to use the semantically more precise dcterms: properties, as they more fully follow emerging notions of best practice for machine-processable metadata.

And where does RDF fit in?

This is where it gets really fuzzy for me, and I, too, am trying to reconcile what differences exist between RDF and the DCAM based on these postings and comments from Stu's blog. The DCAM, on the surface, makes complete sense as a model for defining the description of a digital object. The use of URIs from the DCMI Metadata Terms as predicates of triples in RDF makes perfect sense, too. The overlap of the DCMI Description Set Model -- in particular its apparent redefinition of value surrogates and value strings from RDF's URI references and plain/typed literals -- is confusing.

Stu's second post says:

The abstract model provides a syntax-independent (hence the abstract bit) set of conventions for expressing metadata on the web. RDF is the natural idiom for the expression of the DCAM, but it is NOT essential. You can build any arbitrary syntactical representation of the metadata according to DCAM, and a lossless transformation to any other arbitrary syntactical representation should be possible between two machines that grok both syntaxes.

One of the concepts that I think I'm missing here is the value, either by description or by example, of other syntactical representations of the DCAM that get us further than RDF. It is bad enough that the original native representation of "Dublin Core" was XML when one considers "RDF is the natural idiom for the expression of the DCAM." I think I'm in tune with an RDF view of the world, but I suspect that for many others RDF is a foreign, albeit graspable, notion. Now to layer on top of this that RDF is nature but not essential really muddies the waters.

So what is "Dublin Core"? Is it the abstract model? Is the set of terms that can be used as predicates in RDF expressions? Is it the legacy 15-element XML-based standard for describing digital objects? Count me in among those want more in trying to figure this out....

The text was modified to update a link from http://www.niso.org/international/SC4/n515.pdf to http://www.iso.org/iso/catalogue_detail.htm?csnumber=37629 on January 19th, 2011.