Best Practice Proposal for a DESCRIPTION Datastream
OhioLINK is deep in the process of migrating content from our old Bulldog/Documentum-based system to, well, something else, and we've been talking about the treatment of the metadata in the course of the migration. I think it is safe to say that the Bulldog asset management system (and Documentum, which bought and integrated Bulldog into its product line about five years ago) is not really known for its rich handling of metadata. Or at least how the library community thinks of metadata: Dublin Core, MIX, MODS, MARC, VRA Core, PREMIS, FGCD, etc. — all at the same time in the same application engine with structured crosswalks between them.
Rationale
Although FEDORA mandates an unqualified Dublin Core datastream for every object, unqualified Dublin Core is not expressive enough to describe our objects. Therefore I recommend establishing this practice so subsequent agents/consumers of the objects (internal disseminators and external applications) will know the location of the most expressive metadata for the object.
Risks/Unknowns
- FEDORA does not provide a mechanism to keep elements of the DESCRIPTION datastream in sync with the DC datastream. Do we store common data elements (e.g. "creator") in both places? If so, our front-end applications would need to change the value of "creator" in two places and there is always the risk that they will get out of sync. How much real value is there in maintaining the FEDORA-mandated DC datastream?
- There is no convention (that I know of) for a "primary descriptive metadata" datastream label in a FEDORA object, so "DESCRIPTION" is an arbitrary choice at this point. Future practices may go against this decision (although the choice does set us up to start using datastream labels like "PRESERVATION" for PREMIS metadata and so forth).
Background
In their "Experiences with Fedora" report, the RepoMMan team noted:
...working with Fedora's compulsory Dublin Core (DC) datastream started one thinking about the metadata that a repository object would eventually need and how this might be mapped onto the Dublin Core fields. It was some considerable time later than an e-mail on the Fedora-users list made it clear that the inherent DC datastream was intended solely for Fedora's internal use and not as the basis of external searches.
Richard Green, "Experiences with Fedora during the project's first year" Report D-D8, July 2006; page 8; retrieved 28-Aug-2006 from http://www.hull.ac.uk/esig/repomman/downloads/D-D8-fedora-exp-v10.pdf.
Even with our most simplest collection, we already know that unqualified Dublin Core will not be sufficient (most specifically, we had discussions about the lack of precision of "Date" and "Coverage" as compared to the field labels we already have in the Bulldog data dictionary). It is important that our metadata be parsable by machine processes, so I would advocate the proposed practice rather than trying to "shoe-horn" our descriptions into unqualified Dublin Core with text labels added the values and the like. And if we keep the machine parsable, we will have a wider variety of options for indexing the data and displaying it at the presentation layer.
The "in-line XML" part of this proposal means that the DESCRIPTION datastream would be "managed" by the FEDORA server (e.g. not external or referenced), so it would become part of the object in the content store.
Example
If we take for a moment what is displayed in the presentation layer for a sample object from the Forestry collection as the sum total of all of the descriptive metadata for an object of this collection, a corresponding DESCRIPTION datastream would look something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Comments?
Reactions to the proposal? A rational step forward, or is there a better way?
The text was modified to remove a link to http://worlddmc.ohiolink.edu/Science/Details?oid=4005859 on December 31st, 2010.