OhioLINK is deep in the process of migrating content from our old Bulldog/Documentum-based system to, well, something else, and we’ve been talking about the treatment of the metadata in the course of the migration. I think it is safe to say that the Bulldog asset management system (and Documentum, which bought and integrated Bulldog into its product line about five years ago) is not really known for its rich handling of metadata. Or at least how the library community thinks of metadata: Dublin Core, MIX, MODS, MARC, VRA Core, PREMIS, FGCD, etc. — all at the same time in the same application engine with structured crosswalks between them. 1 I think it is also safe to say that pure, unqualified Dublin Core, the only datastream that is required for every FEDORA object, does not completely encompass the descriptive fidelity needed for our objects. These observations, combined with reading a mid-term project report from the RepoMMan effort in the U.K., got me thinking about metadata and how we should store it in FEDORA objects. The outcome of that line of thinking is this proposal: “to establish a practice of creating an in-line XML datastream with the label ‘DESCRIPTION’ that contains the primary descriptive metadata for each object.”
Some questions and observations that have come in through mechanisms other than blog comments on the analysis of the XTF/FEDORA integration. I’ve reproduced those here for the sake of completeness, but also be sure to go back to the first two entries in this series to read the comments there as well.
Indiana University’s Observations
As it turns out, Indiana University is considering much the same path. They have an existing FEDORA-based repository and a number of XTF projects that have been in development for a while. They, too, are looking to put these two technologies together and have a page on their project website with Digital Repository Architecture > Search”>IU’s observations of an XTF plus FEDORA (plus more!) combination.
This is a continuation of the investigation about integrating the California Digital Library’s XTF software into the FEDORA digital object repository that started earlier. This analysis looks at the textIndexer module in particular, starting with an overview of how textIndexer works now with filesystem-based objects and ending with an outline of how this could with reading objects from a FEDORA repository instead.
XTF’s Native File System handler
Natively, XTF wants to read content out of the file system. The core of the processing is done in these two class files:
We’re experimenting pretty heavily now with the California Digital Library‘s XTF framework as a front-end to a FEDORA object repository. Initial efforts look promising — thanks go out to Brian Tingle and Kirk Hastings of CDL; Jeff Cousens, Steve DiDomenico, and Bill Parod from Northwestern; and Ross Wayland from UVa for helping us along in the right direction.
XTF into Eclipse How-To
As we get more serious about XTF, I wrote up a so that it can be deployed as a dynamic web application. Let me know if you find it useful. Definitely let me know if you find it in error. We haven’t put a version of XTF into OhioLINK’s source code repository, but that might follow shortly.
One of the DRC developers had a question recently that sparked a discussion about what to do with collections of objects. In order to answer the question of how to represent the notion of a collection within the repository, we’re going to have to get pretty heavy into RDF: the Resource Description Framework. RDF is a language created by the Worldwide Web Consortium “for representing information about resources in the World Wide Web.” If you already know about RDF — or just want to see what a proposed solution is — you can skip down to the “RDF for Collections in FEDORA” heading.
I am excited almost beyond description to be sharing a panel with Sandy Payette (Cornell
University, USA), Andrew Treloar (Monash University, Australia), Matthias Razum (Fiz
Karlsruhe, Germany), and Carl Lagoze (Cornell University, USA) at the upcoming Joint Conference on Digital Libraries. The tutorial is on Sunday afternoon (Sunday, June 11, 2006, 1:30-5:00pm local time) with the title “The Fedora Service Framework – Advanced Applications and Panel Discussion”. Sandy’s recent announcement include this abstract:
Although we were a little concerned right about this time last week, you came through with a wonderful suite of applications with OhioLINK as the mentoring organization for the Google Summer of Code. In the end, we are blown away not only by the increase in quantity over last year, but also the quality as well. We received seven for the video snapshot idea, five for the grid-based bulk video conversion tool, one each for the JPIP-based disseminator and applet client, plus a half-dozen proposals for things we didn’t have on .
Calling all accessibility technology experts! What follows is a line of thinking about using characteristics of the FEDORA digital object repository to enable access to content through non-graphical interfaces. Thanks to Linda Newman from the University of Cincinnati and others on the Friday morning DRC Developers conference call for triggering this line of thinking.
In a recent post defining universal disseminators for every object in our repository (if the last dozen words didn’t make sense, please read the linked article and come back), I hinted at having an auditory derivative of each object, at least at the preview level. During today’s conference call, Linda asked if such a disseminator could be used to offer different access points for non-GUI users. Well, why not? Let’s look back at the “presentation” part of the disseminator label:
Michael J. Giarlo wrote a very nice summary of my FEDORA trilogy (only three parts so far — I think there are more good things to say about FEDORA; and besides, I like Douglas Adams’ concept of what a trilogy should be), and added a piece that I hadn’t considered:
- Having one’s objects stored as XML on the filesystem also opens up opportunities to see how tools which act thereupon might be glued into the repository infrastructure. One such example might be for an XML-aware search engine (such as amberfish, Lucene, or Zebra). Since you’ve got low-level access to these files, it would be fairly simple to tack on a search & indexing system that is independent of your choice of repository.
Another reason to consider the FEDORA digital object repository system, if having the ability to put all of your content in one place and reducing the complexity of digital preservation aren’t enough, is the capability to create and define behaviors that the content can perform. In the FEDORA world, these behaviors are called disseminators.