Looking Forward to Version 2.2 of FEDORA

Posted on 5 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

Sandy Payette, Co-Director of the Fedora Project and Researcher in the Cornell Information Science department, announced a tentative date for the release 2.2 of the FEDORA digital object repository.

The Fedora development team would like to announce that Fedora 2.2 will be released on Friday, January 19, 2007.

This new release will contain many significant new features and enhancements, including [numbers added to the original for the sake of subsequent commentary]:

  1. Fedora repository is now a web application (.war) that can be installed in any container
  2. Fedora authentication has been refactored to use servlet filters (no longer Tomcat realms)
  3. A new Fedora installer makes it easy to get started with Fedora (with both "quick" and "custom" install options)
  4. GSearch service (backed by Lucene or Zebra) - flexible, configurable, indexes any datastream
  5. Journaling service to create a backup/mirror repository
  6. New checksum features for datastreams
  7. Support for Postgres database configuration
  8. Standard system logging with Log4J
  9. Over 40 bug fixes
  10. Many other enhancements

Be on the lookout for the release announcement the new year! Also, there will be opportunities to talk with the Fedora development team at Open Repositories 2007 (http://openrepositories.org/).

This is great news and a major step forward for the project. Here are some reasons why I think this is true.

1. Fedora repository is now a web application (.war)

To this point, the FEDORA repository application distribution has been pre-bundled inside a Tomcat Java servlet container. The binding has been pretty tight with certain dependencies written into the Tomcat configuration itself. That made it very difficult to install FEDORA into an organization's existing servlet container (be it another installation of Tomcat or Jetty/JBoss/Glassfish, etc.). Even more problematic, there were reports of problems trying to get JSP-based applications to work inside the FEDORA-supplied container (we ran into this ourselves) meaning that organizations wanting to run both FEDORA and another servlet-based application needed to run two servlet containers; pretty inefficient. (OhioLINK was in this position in its early implementations of the Ohio DRC project.)

With release 2.2, the core developers have effectively turned the software distribution inside out. The primary output of the new build process is a standard Web ARchive (or WAR) file that can be put inside any servlet container. The new installation program (see #3 below) comes with a Tomcat distribution, should a new installation need it, but it is no longer required. There have been reports that the new WAR-based distribution works inside the Jetty servlet container; we're hoping it will work in the JBoss Application Server as well (since that is what we're using to build our next generation interface).

2. Fedora authentication has been refactored to use servlet filters

I'm not quite sure what this means, but I have hopes that it will make integration with Shibboleth easier. Can anyone else see the path between FEDORA and Shibboleth and comment on it?

3. A new Fedora installer makes it easy to get started with Fedora

From the start, FEDORA required a Java servlet container in order to run. To make the installation job easier for those that are not familiar with Java servlet containers, the FEDORA installation process did everything for you. Now that the relationship between the FEDORA application and the servlet container have been flipped around (see #1 above), the core developers devised an easy-to-use installation application that mimics the simplicity of the previous installation style while allowing others to make use of FEDORA as an integrated application within an existing servlet container.

4. GSearch service

The original FEDORA search service, the appropriately-named "basic search," indexes only the Dublin Core (DC) datastream of each object. As has been mentioned on the Fedora-Users mailing list several times, the DC datastream is really meant as an administrative metadata datastream and not necessarily the full description of the object; that full description can be stored in other datastreams of a FEDORA object. Not only did basic search not index these other descriptive metadata streams, but it also wouldn't index the full text of PDF, text, and other indexable datastreams.

GSearch — where "G" stands for "General" but could equally well stand for "Gert" Schmeltz Pedersen, its lead developer from the Technical University of Denmark — does all of the above as a new component in the FEDORA Service Framework. We extend our gratitude to Gert and his colleagues for contributing their work to the general FEDORA distribution as well as to DEFF, Denmark's Electronic Research Library, which funded the GSearch project.

5. Journaling service

Like a journaling file system or a journaling database, this capability allows one to capture all of the transactions applied to the repository and replay them against a secondary repository instance or to restore a repository from backup.

6. Datastream checksums

As part of its ingestion and maintenance functions, the FEDORA software can now calculate, store, and verify checksums of datastreams. This helps ensure the integrity of the repository content, or at least detect when something goes wrong.

7. Support for PostgreSQL

In the battle between which relational database engine is best, FEDORA now supports most of the big ones out-of-the-box: Oracle, MySQL, and new PostgreSQL. Here at OhioLINK, we've started with MySQL but are considering a migration to PostgreSQL as our in-house, preferred RDBMS, so the timing of this announcement is great.

8. Standard system logging with Log4J

Put this one in the category of "playing nicely with others." We've already reaped the benefit of the refactored logging code in the client JAR file in a pre-release version of the code.

9 and 10. Bug fixes and many other enhancements

The core code is evolving along a nice trajectory. This is good to see for the health of the overall project!

Version 2.2 represents another monumental step towards the vision of a Flexible, Extensible Digital Object Repository Architecture. Congratulations to the core developers for what sounds like is going to be a great release.

The text was modified to update a link from http://comm.nsdl.org/pipermail/fedora-users/2006-December/002330.html to http://article.gmane.org/gmane.comp.cms.fedora-commons.user/2330/ on January 19th, 2011.