A Report on Namespaces Used by OAI-PMH Repositories

Posted on 2 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

I had a need for a survey of the metadata namespaces used by OAI-PMH repositories, so I wrote up a quick shell script and XSLT style sheet to parse through the list of Registered Data Providers at the OpenArchives.org website. The results of this effort are pretty interesting. Some of them:

  • Dublin Core is, as you would expect, the highest-used descriptive metadata standard. Every service — or at least those that reported using any namespace at all — reported Dublin Core as a record harvesting option. For some, it was the only option (which I find rather sad). One problem, though, comes in with the variety of namespace URIs declared that all appear to be semantically the same thing: http://www.openarchives.org/OAI/2.0/oai_dc/, http://www.openarchives.org/OAI/2.0/oai_dc (note the missing trailing slash), http://purl.org/dc/elements/2.0/ (used exclusively by the ProQuest Digital Commons product, it would seem), and http://purl.org/dc/elements/1.1/ (the difference between 2.0 and 1.1 is not clear to me). In order to be processable, there must be an exact string match of the namespace URI -- so even that missing trailing slash is significant!
  • The next most popular namespace URI is http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1807.txt, which semantically would seem to identify the IETF RFC 1807 on a Format for Bibliographic Records. You can see what one of these things looks like -- although RFC1807 predates XML (it was approved by the IETF in mid-1995), it looks like someone turned the metadata format into XML along the way. Very interesting...
  • The next most popular is http://www.ndltd.org/standards/metadata/etdms/1.0/ — corresponding to the Interoperability Metadata Standard for Electronic Theses and Dissertations — followed closely by http://www.openarchives.org/OAI/1.1/oai_marc — which fell out of favor years ago with the publication of MARC21 by the Library of Congress (which goes by the namespace http://www.loc.gov/MARC21/slim). Unfortunately, it doesn't seem to have been picked up by the majority of OAI-PMH data providers that used the older oai_marc schema.
  • As you get towards the bottom of the first list, there are all sorts of interesting variants on qualified Dublin Core and other one-off schemas.

Your thoughts and observations? I've filed away the UNIX script and XSLT style sheet. If there is interest in seeing something like this in the future, let me know and I can dig them out.

The text was modified to update a link from http://www.ndltd.org/standards/metadata/current.html to http://www.ndltd.org/standards/metadata/etd-ms-v1.00-rev2.html/ on January 19th, 2011.