Skip to content
Solely for the Purpose of Catching $PAMRZ

A Report on Namespaces Used by OAI-PMH Repositories


I had a need for a survey of the metadata namespaces used by OAI-PMH repositories, so I wrote up a quick shell script and XSLT style sheet to parse through the list of Registered Data Providers at the OpenArchives.org website. The results of this effort are pretty interesting. Some of them:

  • Dublin Core is, as you would expect, the highest-used descriptive metadata standard. Every service — or at least those that reported using any namespace at all — reported Dublin Core as a record harvesting option. For some, it was the only option (which I find rather sad). One problem, though, comes in with the variety of namespace URIs declared that all appear to be semantically the same thing: http://www.openarchives.org/OAI/2.0/oai_dc/, http://www.openarchives.org/OAI/2.0/oai_dc (note the missing trailing slash), http://purl.org/dc/elements/2.0/ (used exclusively by the ProQuest Digital Commons product, it would seem), and http://purl.org/dc/elements/1.1/ (the difference between 2.0 and 1.1 is not clear to me). In order to be processable, there must be an exact string match of the namespace URI — so even that missing trailing slash is significant!
  • The next most popular namespace URI is http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1807.txt, which semantically would seem to identify the IETF RFC 1807 on a Format for Bibliographic Records. You can see what one of these things looks like — although RFC1807 predates XML (it was approved by the IETF in mid-1995), it looks like someone turned the metadata format into XML along the way. Very interesting…
  • The next most popular is http://www.ndltd.org/standards/metadata/etdms/1.0/ — corresponding to the Interoperability Metadata Standard for Electronic Theses and Dissertations — followed closely by http://www.openarchives.org/OAI/1.1/oai_marc — which fell out of favor years ago with the publication of MARC21 by the Library of Congress (which goes by the namespace http://www.loc.gov/MARC21/slim). Unfortunately, it doesn’t seem to have been picked up by the majority of OAI-PMH data providers that used the older oai_marc schema.
  • As you get towards the bottom of the first list, there are all sorts of interesting variants on qualified Dublin Core and other one-off schemas.

Your thoughts and observations? I’ve filed away the UNIX script and XSLT style sheet. If there is interest in seeing something like this in the future, let me know and I can dig them out.

2 Comments

  1. Sarah Shreeves | March 20, 2007 at 5:45 pm | Permalink

    Have you seen the work that Tom Habing has done at the University of Illinois on a Registry of OAI data providers? He’s added all sorts of interesting reports on OAI data providers and has probably the biggest list of OAI data providers since he pulls in data providers that are not registered at openarchives.org.

    See http://gita.grainger.uiuc.edu/registry/.

    sarah

  2. the jester | March 20, 2007 at 9:16 pm | Permalink

    Thank you, Sarah! Tom’s Distinct Metadata Schemas is much more comprehensive, and more useful, than my quick scripting. I’m grateful for the pointer to his work.

1 Trackback

  1. DigitalKoans | March 23, 2007 at 3:50 am | Permalink

    Flashback (Week of 3/19/07)

Post a Comment

Your email is never published nor shared. Required fields are marked *
Human Detection Scheme
(What's this?)
Comment Preview

Subscribe without commenting

From the Disruptive Library Technology Jester (http://dltj.org/), printed on Friday the 14th of November 2008 at 6:51:57 PM EST (-0500). The URL to this page is http://dltj.org/article/oai-pmh-namespaces/

[Creative Commons Logo] This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.