Jester's Cap

Disruptive Library Technology Jester

We're Disrupted, We're Librarians, and We're Not Going to Take It Anymore

Main menu

Skip to primary content
Skip to secondary content
  • About the Blog
  • About the Author
  • About the Tagline
  • Comment Policy
  • Contact

Post navigation

← Previous Next →

“Cautiously Optimistic”

Posted on June 13, 2006 by Peter Murray
This entry was posted in Disruption in Libraries and tagged digital libraries, Joint Conference on Digital Libraries 2006, metadata, National Science Digital Library, standards, xml by Peter Murray. Bookmark the permalink.

During the cookies and lemonade break during JCDL this afternoon I surprised one of the well-respected elders of the field with this question: are we really making progress? are we winning a fight against entropy 1? I wasn’t out for a quote for publication at the time so I won’t reveal the individual’s name, but I will report that there was a chuckle then the reply “cautiously optimistic.”

This person went on to say that access to raw information has improved much over the last five years — that the internet and its tools have increased the capacity to publish and retrieve information. ‘Sure,’ s/he went on to say, ‘we have a number of hard problems to solve — linking related object to each other and so forth — but we are making progress.’ I, too, offered a chuckle and agreed, and we went back to our cookies and lemonade.

Entropy and chaos are powerful forces, however, and it was just after this brief encounter that we heard from Carl Lagoze with a talk called Metadata aggregation and “automated digital libraries”: A Retrospective on the NSDL experience. Although the paper is a modestly dry report on the issues resolved and overcome in “running a relatively large-scale digital library (over a million objects) by collecting, processing, storing, and using metadata” 2, the oral presentation was anything but dry. In fact, it offered a sobering reminder of how hard this is and the challenges before us. He did it with four questions:

  1. What is a digital library anyway?
  2. What is the role of metadata in a digital library?
  3. What is “low barrier” technology? [This one was tied to the observation that OAI-PMH, while modestly simple compared to other protocols, still requires a lot of effort to get right. See reality lesson #4 below.]
  4. Where should expensive and limited human energy be allocated?

… and seven reality lessons:

  • Reality lesson #1: Metadata is not being created
    In truth, there is not a lot of funding set aside in projects to create metadata.
  • Reality lesson #2: Participating as a metadata provider is complicated by a “knowledge gap”
    Doing so requires three skill sets that are frequently distinct: Domain expertise (e.g. “mathematics”); Metadata expertise (e.g. “Dublin Core”); and Technical expertise (e.g. encode it in XML and use a formal protocol).
  • Reality lesson #3: Harvested metadata is not necessarily useful metadata
    “Correct” metadata is not necessarily “rich” metadata. The general problem of metadata quality remains unsolved — even the best automated/automatic transformations are not good enough.
  • Reality lesson #4: OAI-PMH is not necessarily low-barrier and automatic
    Doing OAI-PMH right incorporates lots of details and assumed knowledge (UTF-8, XML schema validation, URL encoding, date stamping, resumption tokens, etc.). An even after sometimes months of hand-holding data provider, the initial success does not persist in the majority of cases; the failure rate of subsequent harvests is high. And the “incremental harvest” functionality is a nice concept but it doesn’t work: support for “deleted” records is inconsistent in data providers; less than 50% of providers claim to persist deletions and many persistent claims are faulty. Too often server failures and harvest failures require a full harvest ‘resync’.
  • Reality lesson #5: Human cost of large-scale harvesting is high
    In the case of NSDL, their metrics show that they exchange 170 messages per year per provider and that it takes on average 98 message exchanged for first harvest to succeed (which, as previously noted, subsequently fails).
  • Reality lesson #6: Matching individual metadata records of equivalent resources is hard
    I didn’t have anything in my notes about this, but as I recall his comments were about the lack of ways to uniformly handle these surrogate objects in the OAI-PMH protocol.
  • Reality lesson #7: Lots of (even good) metadata does not make a complete digital library (and maybe not even a digital library that is highly useful for education)
    There is a real need to understand the value-add of a digital library: capturing the wisdom of the community served as well as focusing less on structured information and more on relationships among resources and user-derived relationships and annotations.

So what do I think? You know — I’m not sure. These are tough problems, and the world would be a better place if they were solved. We can demand answers, but sometimes there just isn’t enough of a shoulder to stand on from the giant below. Still, one can’t help but wonder if all of the energy put into the collective “digital library” problem so far has just dissipated into chaos.

Footnotes

  1. Defined as: “Measure of disorganization or degradation in the universe that reduces available energy, or tendency of available energy to dwindle. Chaos, opposite of order.” Do you remember your Second Law of Thermodynamics? [↩]
  2. Lagoze, C., Krafft, D. B., Cornwell, T., Dushay, N., Eckstrom, D., Saylor, J. 200y. Metadata aggregation and “automated digital libraries”: A retrospective on the NSDL experience. In Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (Chapel Hill, NC, USA, June 11 – 15, 2005). JCDL ’06. ACM Press, New York, NY, 231. [arXiv:cs.DL/0601125] [↩]
Link to this post!

Share this:

(This post was updated on 19-Jun-2006.)

Links in "“Cautiously Optimistic”"

Tags for "“Cautiously Optimistic”"

Find Related Content: within DLTJ Technorati del.icio.us Wikipedia
digital libraries Find posts tagged 'digital libraries' in DLTJ Find posts tagged 'digital libraries' in Technorati Find posts tagged 'digital libraries' in del.icio.us Find posts tagged 'digital libraries' in Wikipedia (English)
Joint Conference on Digital Libraries 2006 Find posts tagged 'Joint Conference on Digital Libraries 2006' in DLTJ Find posts tagged 'Joint Conference on Digital Libraries 2006' in Technorati Find posts tagged 'Joint Conference on Digital Libraries 2006' in del.icio.us Find posts tagged 'Joint Conference on Digital Libraries 2006' in Wikipedia (English)
metadata Find posts tagged 'metadata' in DLTJ Find posts tagged 'metadata' in Technorati Find posts tagged 'metadata' in del.icio.us Find posts tagged 'metadata' in Wikipedia (English)
National Science Digital Library Find posts tagged 'National Science Digital Library' in DLTJ Find posts tagged 'National Science Digital Library' in Technorati Find posts tagged 'National Science Digital Library' in del.icio.us Find posts tagged 'National Science Digital Library' in Wikipedia (English)
standards Find posts tagged 'standards' in DLTJ Find posts tagged 'standards' in Technorati Find posts tagged 'standards' in del.icio.us Find posts tagged 'standards' in Wikipedia (English)
xml Find posts tagged 'xml' in DLTJ Find posts tagged 'xml' in Technorati Find posts tagged 'xml' in del.icio.us Find posts tagged 'xml' in Wikipedia (English)

Related Posts on Disruptive Library Technology Jester

No related posts.

Track and Share With Others

• Technorati iconTechnorati Cosmos

• TrackBack URI


Logging In...

Profile cancel

Sign in with Twitter Sign in with Facebook
or

Not published

Home

Search

Recent Posts

  • Code4Lib Journal Issue #20 Published; My Editorial: “It is Volunteers All the Way Down…”
  • Notes on the Code4Lib Virtual Lightning Talks
  • Interlibrary Loan Standards Undergoing Revision at the ISO Level
  • Vote for an ALA2013 Ignite Session on Open Source Communities
  • A Great iPad Keyboard/Case Combination: New Trent Airbender
  • ResourceSync Specification Draft Published for Comment

Archives

  • 2013: J F M A M J J A S O N D
  • 2012: J F M A M J J A S O N D
  • 2011: J F M A M J J A S O N D
  • 2010: J F M A M J J A S O N D
  • 2009: J F M A M J J A S O N D
  • 2008: J F M A M J J A S O N D
  • 2007: J F M A M J J A S O N D
  • 2006: J F M A M J J A S O N D
  • 2005: J F M A M J J A S O N D

Feeds and Such

  • Link to Podcast (RSS feed) for this blog
    Add Podcast to iTunes subscription
    Receive DLTJ by e-mail:


    Delivered by FeedBurner
  • View Peter Murray's profile on LinkedIn

Copyright

This work by Peter Murray is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States.

Creative Commons License
© 2013 | Theme based on Twenty Eleven by Wordpress.org | DLTJ strives for Standards Compliant XHTML & CSS | RSS Posts & Comments
From the Disruptive Library Technology Jester (http://dltj.org/), printed on Saturday the 25th of May 2013 at 11:10:05 PM UTC (+0000). The URL to this page is

[Creative Commons Logo] This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
This work by Peter Murray is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States.