Skip to content
Solely for the Purpose of Catching $PAMRZ

Best Practice Proposal for a DESCRIPTION Datastream

OhioLINK is deep in the process of migrating content from our old Bulldog/Documentum-based system to, well, something else, and we’ve been talking about the treatment of the metadata in the course of the migration. I think it is safe to say that the Bulldog asset management system (and Documentum, which bought and integrated Bulldog into its product line about five years ago) is not really known for its rich handling of metadata. Or at least how the library community thinks of metadata: Dublin Core, MIX, MODS, MARC, VRA Core, PREMIS, FGCD, etc. — all at the same time in the same application engine with structured crosswalks between them. 1 I think it is also safe to say that pure, unqualified Dublin Core, the only datastream that is required for every FEDORA object, does not completely encompass the descriptive fidelity needed for our objects. These observations, combined with reading a mid-term project report from the RepoMMan effort in the U.K., got me thinking about metadata and how we should store it in FEDORA objects. The outcome of that line of thinking is this proposal: “to establish a practice of creating an in-line XML datastream with the label ‘DESCRIPTION’ that contains the primary descriptive metadata for each object.”

Rationale

Although FEDORA mandates an unqualified Dublin Core datastream for every object, unqualified Dublin Core is not expressive enough to describe our objects. Therefore I recommend establishing this practice so subsequent agents/consumers of the objects (internal disseminators and external applications) will know the location of the most expressive metadata for the object.

Risks/Unknowns

  • FEDORA does not provide a mechanism to keep elements of the DESCRIPTION datastream in sync with the DC datastream. Do we store common data elements (e.g. “creator”) in both places? If so, our front-end applications would need to change the value of “creator” in two places and there is always the risk that they will get out of sync. How much real value is there in maintaining the FEDORA-mandated DC datastream?
  • There is no convention (that I know of) for a “primary descriptive metadata” datastream label in a FEDORA object, so “DESCRIPTION” is an arbitrary choice at this point. Future practices may go against this decision (although the choice does set us up to start using datastream labels like “PRESERVATION” for PREMIS metadata and so forth).

Background

In their “Experiences with Fedora” report, the RepoMMan team noted:

…working with Fedora’s compulsory Dublin Core (DC) datastream started one thinking about the metadata that a repository object would eventually need and how this might be mapped onto the Dublin Core fields. It was some considerable time later than an e-mail on the Fedora-users list made it clear that the inherent DC datastream was intended solely for Fedora’s internal use and not as the basis of external searches. 2

Even with our most simplest collection, we already know that unqualified Dublin Core will not be sufficient (most specifically, we had discussions about the lack of precision of “Date” and “Coverage” as compared to the field labels we already have in the Bulldog data dictionary). It is important that our metadata be parsable by machine processes, so I would advocate the proposed practice rather than trying to “shoe-horn” our descriptions into unqualified Dublin Core with text labels added the values and the like. And if we keep the machine parsable, we will have a wider variety of options for indexing the data and displaying it at the presentation layer.

The “in-line XML” part of this proposal means that the DESCRIPTION datastream would be “managed” by the FEDORA server (e.g. not external or referenced), so it would become part of the object in the content store.

Example

If we take for a moment what is displayed in the presentation layer for a sample object from the Forestry collection as the sum total of all of the descriptive metadata for an object of this collection, a corresponding DESCRIPTION datastream would look something like:
[xml]
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://drc.ohiolink.edu/schema/
http://drc.ohiolink.edu/schema/schema.xsd"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/">

Catalpa speciosa, bignonoides and Kampfera seeds.
Ohio Agricultural Experiment Station. Dept. of
Forestry.

Catalpa speciosa, bignonoides and Kampfera seeds.
Item #2

Ohio Agricultural Research and Development
Center

1908-12

2003-04-17T00:00:00

photographic prints
hdl:21151
2
Ohio
Copyright: Ohio State University

http://library.osu.edu/sites/dlib/terms.html


[/xml]

Comments?

Reactions to the proposal? A rational step forward, or is there a better way?

Footnotes

  1. Reality check for those in the “library community” … do you think of metadata in this way? []
  2. Richard Green, “Experiences with Fedora during the project’s first year” Report D-D8, July 2006; page 8; retrieved 28-Aug-2006 from http://www.hull.ac.uk/esig/repomman/downloads/D-D8-fedora-exp-v10.pdf. []

4 Comments

  1. Ryan | September 12, 2006 at 11:58 am | Permalink

    This is definitely a reasonable approach. Indiana, Tufts, and Virginia are all taking similar approaches. The only difference is in the details. At Indiana, we’re keeping as little as possible in Fedora’s default DC datastream, and lumping all other metadata into a METS document in a METDADATA datastream (philosophy, sample object). Tufts keeps a fairly complete record in the default DC, but the fully-complete records are in DCA-ADMIN and DCA-META (sample object). Virginia has their own metadata format, which can be found on their metadata site.

  2. the jester | September 12, 2006 at 12:32 pm | Permalink

    At Indiana, we’re keeping as little as possible in Fedora’s default DC datastream, and lumping all other metadata into a METS document in a METDADATA datastream (philosophy, sample object).

    I would note for those that haven’t followed the “philosophy” link, that “as little as possible” in Indiana’s case is:

    It currently includes only these items:

    Title
    PURL for the object (if this is an item-level object) in an Identifier field
    Fedora PID for the object in an Identifier field

    The “real” DC record (if present) will be in the [descriptive metadata] of the METS document, alongside any other descriptive metadata.
    http://wiki.dlib.indiana.edu/confluence/pages/viewpage.action?pageId=441#FedoraMetadataStoragePhilosophy-DublinCore

    That certainly is bare-bones, but it makes a great deal of sense.

    Thanks for the comment and the links to the examples, Ryan!

  3. Sergio Berna | October 17, 2006 at 10:24 am | Permalink

    “to establish a practice of creating an in-line XML2 datastream with the label ‘DESCRIPTION’ that contains the primary descriptive metadata for each object.”

    Interesting proposal. Do you know of any recopilation project of these denominations of fedora inline datastreams?

    In my case I have need for several descriptive datastreams. That need a richer vocabulary.

    The first datastream would be the descriptive datastream, using the tag you propose.

    Then would follow preservation datastreams, Rights management datastream, format especification datastream, relations management datastream and many more that maybe thinking a little ahead could be equally named such as your proposal implies.

  4. the jester | October 19, 2006 at 10:55 am | Permalink

    Interesting proposal. Do you know of any recopilation project of these denominations of fedora inline datastreams?

    There are two areas where I think the concepts proposed here might get picked up. The first is in the Content Model Forum on the FEDORA website wiki, which also has an entry for the effort to create a formalized content model. The second is only tangentially related but could also play into the role of datastreams in objects, and that is the Asset Actions work out of the Digital Library Federation.

    This is definitely an area of work where lots of smart people need to come together to figure it out.

Post a Comment

Your email is never published nor shared. Required fields are marked *
Human Detection Scheme
(What's this?)
Comment Preview

Subscribe without commenting

From the Disruptive Library Technology Jester (http://dltj.org/), printed on Thursday the 24th of July 2008 at 11:29:26 PM EDT (-0400). The URL to this page is http://dltj.org/article/description-datastream/

[Creative Commons Logo] This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.