Jester's Cap

Disruptive Library Technology Jester

We're Disrupted, We're Librarians, and We're Not Going to Take It Anymore

Main menu

Skip to primary content
Skip to secondary content
  • About the Blog
  • About the Author
  • About the Tagline
  • Comment Policy
  • Contact

Post navigation

← Previous Next →

Best Practice Proposal for a DESCRIPTION Datastream

Posted on September 6, 2006 by Peter Murray
This entry was posted in DRC, Fedora and tagged DRC, Dublin Core, Fedora, libraries, metadata by Peter Murray. Bookmark the permalink.

OhioLINK is deep in the process of migrating content from our old Bulldog/Documentum-based system to, well, something else, and we’ve been talking about the treatment of the metadata in the course of the migration. I think it is safe to say that the Bulldog asset management system (and Documentum, which bought and integrated Bulldog into its product line about five years ago) is not really known for its rich handling of metadata. Or at least how the library community thinks of metadata: Dublin Core, MIX, MODS, MARC, VRA Core, PREMIS, FGCD, etc. — all at the same time in the same application engine with structured crosswalks between them. 1 I think it is also safe to say that pure, unqualified Dublin Core, the only datastream that is required for every FEDORA object, does not completely encompass the descriptive fidelity needed for our objects. These observations, combined with reading a mid-term project report from the RepoMMan effort in the U.K., got me thinking about metadata and how we should store it in FEDORA objects. The outcome of that line of thinking is this proposal: “to establish a practice of creating an in-line XML datastream with the label ‘DESCRIPTION’ that contains the primary descriptive metadata for each object.”

Rationale


Although FEDORA mandates an unqualified Dublin Core datastream for every object, unqualified Dublin Core is not expressive enough to describe our objects. Therefore I recommend establishing this practice so subsequent agents/consumers of the objects (internal disseminators and external applications) will know the location of the most expressive metadata for the object.

Risks/Unknowns

  • FEDORA does not provide a mechanism to keep elements of the DESCRIPTION datastream in sync with the DC datastream. Do we store common data elements (e.g. “creator”) in both places? If so, our front-end applications would need to change the value of “creator” in two places and there is always the risk that they will get out of sync. How much real value is there in maintaining the FEDORA-mandated DC datastream?
  • There is no convention (that I know of) for a “primary descriptive metadata” datastream label in a FEDORA object, so “DESCRIPTION” is an arbitrary choice at this point. Future practices may go against this decision (although the choice does set us up to start using datastream labels like “PRESERVATION” for PREMIS metadata and so forth).

Background

In their “Experiences with Fedora” report, the RepoMMan team noted:

…working with Fedora’s compulsory Dublin Core (DC) datastream started one thinking about the metadata that a repository object would eventually need and how this might be mapped onto the Dublin Core fields. It was some considerable time later than an e-mail on the Fedora-users list made it clear that the inherent DC datastream was intended solely for Fedora’s internal use and not as the basis of external searches. 2

Even with our most simplest collection, we already know that unqualified Dublin Core will not be sufficient (most specifically, we had discussions about the lack of precision of “Date” and “Coverage” as compared to the field labels we already have in the Bulldog data dictionary). It is important that our metadata be parsable by machine processes, so I would advocate the proposed practice rather than trying to “shoe-horn” our descriptions into unqualified Dublin Core with text labels added the values and the like. And if we keep the machine parsable, we will have a wider variety of options for indexing the data and displaying it at the presentation layer.

The “in-line XML” part of this proposal means that the DESCRIPTION datastream would be “managed” by the FEDORA server (e.g. not external or referenced), so it would become part of the object in the content store.

Example


If we take for a moment what is displayed in the presentation layer for a sample object from the Forestry collection as the sum total of all of the descriptive metadata for an object of this collection, a corresponding DESCRIPTION datastream would look something like:
[xml]
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://drc.ohiolink.edu/schema/
http://drc.ohiolink.edu/schema/schema.xsd"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/">

Catalpa speciosa, bignonoides and Kampfera seeds.
Ohio Agricultural Experiment Station. Dept. of
Forestry.

Catalpa speciosa, bignonoides and Kampfera seeds.
Item #2

Ohio Agricultural Research and Development
Center

1908-12

2003-04-17T00:00:00

photographic prints
hdl:21151
2
Ohio
Copyright: Ohio State University

http://library.osu.edu/sites/dlib/terms.html



[/xml]

Comments?


Reactions to the proposal? A rational step forward, or is there a better way?

The text was modified to remove a link to http://worlddmc.ohiolink.edu/Science/Details?oid=4005859 on December 31st, 2010.

Footnotes

  1. Reality check for those in the “library community” … do you think of metadata in this way? [↩]
  2. Richard Green, “Experiences with Fedora during the project’s first year” Report D-D8, July 2006; page 8; retrieved 28-Aug-2006 from http://www.hull.ac.uk/esig/repomman/downloads/D-D8-fedora-exp-v10.pdf. [↩]
Link to this post!

Share this:

(This post was updated on 30-Dec-2010.)

Links in "Best Practice Proposal for a DESCRIPTION Datastream"

Series Table of Contents for Why FEDORA?

  1. On the Need for a General Purpose Digital Object Repository
  2. Why Fedora? Because You Don’t Need Fedora
  3. Thinking about Our Fedora Disseminators
  4. Processing Raw Fedora Objects
  5. Fedora Disseminators to Enable Accessible Repository Content
  6. Representing Collections In FEDORA
  7. Best Practice Proposal for a DESCRIPTION Datastream
  8. Why FEDORA? Answers to the FEDORA Users Interview Survey

Tags for "Best Practice Proposal for a DESCRIPTION Datastream"

Find Related Content: within DLTJ Technorati del.icio.us Wikipedia
DRC Find posts tagged 'DRC' in DLTJ Find posts tagged 'DRC' in Technorati Find posts tagged 'DRC' in del.icio.us Find posts tagged 'DRC' in Wikipedia (English)
Dublin Core Find posts tagged 'Dublin Core' in DLTJ Find posts tagged 'Dublin Core' in Technorati Find posts tagged 'Dublin Core' in del.icio.us Find posts tagged 'Dublin Core' in Wikipedia (English)
Fedora Find posts tagged 'Fedora' in DLTJ Find posts tagged 'Fedora' in Technorati Find posts tagged 'Fedora' in del.icio.us Find posts tagged 'Fedora' in Wikipedia (English)
libraries Find posts tagged 'libraries' in DLTJ Find posts tagged 'libraries' in Technorati Find posts tagged 'libraries' in del.icio.us Find posts tagged 'libraries' in Wikipedia (English)
metadata Find posts tagged 'metadata' in DLTJ Find posts tagged 'metadata' in Technorati Find posts tagged 'metadata' in del.icio.us Find posts tagged 'metadata' in Wikipedia (English)

Related Posts on Disruptive Library Technology Jester

  • Fedora Disseminators to Enable Accessible Repository Content
  • Representing Collections In FEDORA
  • Looking Forward to Version 2.2 of FEDORA
  • Fedora, Objects, Datastreams, Filesystems, and a Correction
  • Thinking about Our Fedora Disseminators

Track and Share With Others

• Technorati iconTechnorati Cosmos

• TrackBack URI


Logging In...

Profile cancel

Sign in with Twitter Sign in with Facebook
or

Not published

  • 4 Replies
  • 3 Comments
  • 0 Tweets
  • 0 Facebook
  • 1 Pingback
Last reply was February 22, 2010
  1. Ryan
    View September 12, 2006

    This is definitely a reasonable approach. Indiana, Tufts, and Virginia are all taking similar approaches. The only difference is in the details. At Indiana, we’re keeping as little as possible in Fedora’s default DC datastream, and lumping all other metadata into a METS document in a METDADATA datastream (philosophy, sample object). Tufts keeps a fairly complete record in the default DC, but the fully-complete records are in DCA-ADMIN and DCA-META (sample object). Virginia has their own metadata format, which can be found on their metadata site.

    The text was modified to remove a link to http://fedora.dlib.indiana.edu:9090/fedora/get/iudl:4420 on January 19th, 2011.

    Reply
  2. the jester
    View September 12, 2006

    [quote comment="3837"]At Indiana, we’re keeping as little as possible in Fedora’s default DC datastream, and lumping all other metadata into a METS document in a METDADATA datastream (philosophy, sample object).[/quote]

    I would note for those that haven’t followed the “philosophy” link, that “as little as possible” in Indiana’s case is:

    It currently includes only these items:

    Title
    PURL for the object (if this is an item-level object) in an Identifier field
    Fedora PID for the object in an Identifier field

    The “real” DC record (if present) will be in the [descriptive metadata] of the METS document, alongside any other descriptive metadata.
    http://wiki.dlib.indiana.edu/confluence/pages/viewpage.action?pageId=441#FedoraMetadataStoragePhilosophy-DublinCore

    That certainly is bare-bones, but it makes a great deal of sense.

    Thanks for the comment and the links to the examples, Ryan!

    The text was modified to remove a link to http://fedora.dlib.indiana.edu:9090/fedora/get/iudl:4420 on January 19th, 2011.

    Reply
  3. Sergio Berna
    View October 17, 2006

    [quote post="109"]“to establish a practice of creating an in-line XML2 datastream with the label ‘DESCRIPTION’ that contains the primary descriptive metadata for each object.”[/quote]

    Interesting proposal. Do you know of any recopilation project of these denominations of fedora inline datastreams?

    In my case I have need for several descriptive datastreams. That need a richer vocabulary.

    The first datastream would be the descriptive datastream, using the tag you propose.

    Then would follow preservation datastreams, Rights management datastream, format especification datastream, relations management datastream and many more that maybe thinking a little ahead could be equally named such as your proposal implies.

    Reply
  4. Re: [Fedora-commons-users] Where to store things you need to search?
    View February 22, 2010

    Kramer auto Pingback[...] Steve. I also ran across this quotation at http://dltj.org/article/description-datastream/ "working with Fedora's compulsory Dublin Core (DC) datastream started one thinking about the [...]

    Reply

Home

Search

Recent Posts

  • Code4Lib Journal Issue #20 Published; My Editorial: “It is Volunteers All the Way Down…”
  • Notes on the Code4Lib Virtual Lightning Talks
  • Interlibrary Loan Standards Undergoing Revision at the ISO Level
  • Vote for an ALA2013 Ignite Session on Open Source Communities
  • A Great iPad Keyboard/Case Combination: New Trent Airbender
  • ResourceSync Specification Draft Published for Comment

Archives

  • 2013: J F M A M J J A S O N D
  • 2012: J F M A M J J A S O N D
  • 2011: J F M A M J J A S O N D
  • 2010: J F M A M J J A S O N D
  • 2009: J F M A M J J A S O N D
  • 2008: J F M A M J J A S O N D
  • 2007: J F M A M J J A S O N D
  • 2006: J F M A M J J A S O N D
  • 2005: J F M A M J J A S O N D

Feeds and Such

  • Link to Podcast (RSS feed) for this blog
    Add Podcast to iTunes subscription
    Receive DLTJ by e-mail:


    Delivered by FeedBurner
  • View Peter Murray's profile on LinkedIn

Copyright

This work by Peter Murray is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States.

Creative Commons License
© 2013 | Theme based on Twenty Eleven by Wordpress.org | DLTJ strives for Standards Compliant XHTML & CSS | RSS Posts & Comments
From the Disruptive Library Technology Jester (http://dltj.org/), printed on Thursday the 23rd of May 2013 at 6:20:37 AM UTC (+0000). The URL to this page is

[Creative Commons Logo] This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
This work by Peter Murray is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States.