OhioLINK was excited and privileged to participate in the second annual Google Summer of Code — a program to inspire young developers and provide students in Computer Science and related fields the opportunity to do work related to their academic pursuits during the summer, and to support existing open source projects and organizations. This is the first of three posts summarizing the efforts of three students; this one details the work of Juan Pablo Garcia Ortiz, a Ph.D. candidate at the University of Almeria in Spain, to build a JPEG2000 JPIP Streaming Server and Client Browser Viewer Applet. This is an edited version of his final report.
Theis of the high-level sense of passion and commitment inherent in the Fedora community. I’ve posted some answers back to the FEDORA wiki on behalf of OhioLINK, and am also including the responses here as it fits into the “Why FEDORA?” series of blog postings. (If you are reading this through a RSS news reader, I think you’ll have to actually come to the DLTJ website and scroll down to the bottom of this post to see the table of contents of the series.) On with the responses!
OhioLINK is deep in the process of migrating content from our old Bulldog/Documentum-based system to, well, something else, and we’ve been talking about the treatment of the metadata in the course of the migration. I think it is safe to say that the Bulldog asset management system (and Documentum, which bought and integrated Bulldog into its product line about five years ago) is not really known for its rich handling of metadata. Or at least how the library community thinks of metadata: Dublin Core, MIX, MODS, MARC, VRA Core, PREMIS, FGCD, etc. — all at the same time in the same application engine with structured crosswalks between them. 1 I think it is also safe to say that pure, unqualified Dublin Core, the only datastream that is required for every FEDORA object, does not completely encompass the descriptive fidelity needed for our objects. These observations, combined with reading a mid-term project report from the RepoMMan effort in the U.K., got me thinking about metadata and how we should store it in FEDORA objects. The outcome of that line of thinking is this proposal: “to establish a practice of creating an in-line XML datastream with the label ‘DESCRIPTION’ that contains the primary descriptive metadata for each object.”
Some questions and observations that have come in through mechanisms other than blog comments on the analysis of the XTF/FEDORA integration. I’ve reproduced those here for the sake of completeness, but also be sure to go back to the first two entries in this series to read the comments there as well.
Indiana University’s Observations
As it turns out, Indiana University is considering much the same path. They have an existing FEDORA-based repository and a number of XTF projects that have been in development for a while. They, too, are looking to put these two technologies together and have a page on their project website with Digital Repository Architecture > Search”>IU’s observations of an XTF plus FEDORA (plus more!) combination.
This is a continuation of the investigation about integrating the California Digital Library’s XTF software into the FEDORA digital object repository that started earlier. This analysis looks at the textIndexer module in particular, starting with an overview of how textIndexer works now with filesystem-based objects and ending with an outline of how this could with reading objects from a FEDORA repository instead.
XTF’s Native File System handler
Natively, XTF wants to read content out of the file system. The core of the processing is done in these two class files:
We’re experimenting pretty heavily now with the California Digital Library‘s XTF framework as a front-end to a FEDORA object repository. Initial efforts look promising — thanks go out to Brian Tingle and Kirk Hastings of CDL; Jeff Cousens, Steve DiDomenico, and Bill Parod from Northwestern; and Ross Wayland from UVa for helping us along in the right direction.
XTF into Eclipse How-To
As we get more serious about XTF, I wrote up a so that it can be deployed as a dynamic web application. Let me know if you find it useful. Definitely let me know if you find it in error. We haven’t put a version of XTF into OhioLINK’s source code repository, but that might follow shortly.
The August 2006 edition of “The DPubS Report” produced by Cornell University Libraries for the DPubS community announced work underway at the Penn State to bridge the worlds of FEDORA. Here is the line from the newsletter:and
--------------------------------------------------------------------------SOFTWARE DEVELOPMENT UPDATE--------------------------------------------------------------------------[...]NEAR-TERM SCHEDULED WORK[...]* Penn State is working on Fedora interoperability. The plan is tohave that capability in the September release, with a working versionfor testing in late August.
The newsletter goes on to say that the work will be made available under an open source license, so I for one can’t wait to see what it looks like and how we might apply it to our own needs.
One of the DRC developers had a question recently that sparked a discussion about what to do with collections of objects. In order to answer the question of how to represent the notion of a collection within the repository, we’re going to have to get pretty heavy into RDF: the Resource Description Framework. RDF is a language created by the Worldwide Web Consortium “for representing information about resources in the World Wide Web.” If you already know about RDF — or just want to see what a proposed solution is — you can skip down to the “RDF for Collections in FEDORA” heading.
Building on the shoulders of others — isn’t that how that quote goes? There has been a stack of printouts on my desk for a while now for various access management and service provisioning technologies. Rather than keep the paper, I’m putting the list here so I know how to get back to them if/when I need to. (Of course, along the way if you’d like to comment on them or suggest others to look at, please feel free to do so in the comments.) Note, too, that by listing them here I’m not proposing, or even sure if, all of these pieces come together to a coherent structure.
Open Repositories 2007 is coming up next year, and it looks to be an interesting meeting. The first day is open user group meetings for DSpace, Fedora, and Eprints, followed by general conference sessions that cover issues that cut across all of the open repository systems. This year, the user groups will partition their programs into Plenary, Technical Issues, and Management Issues and the partitions will be staggered so that IT managers can attend all plenary sessions, technical staff can attend all technical sessions, etc.