Building an Institutional Repository Interface Using EJB3 and JBoss Seam

Posted on 13 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

This tour is designed to show the overall architecture of a FEDORA digital object repository application within the JBoss Seam framework while at the same time pointing out individual design decisions and extension points that are specific to the Ohio Digital Resource Commons application. Geared towards software developers, a familiarity with Java Servlet programming is assumed, although not required. Knowledge of JBoss Seam, Hibernate/Java Persistence API, EJB3 and Java EE would be helpful but not required; brief explanations of core concepts of these technologies are included in this tour.

The tour is based on revision 709 of /drc/trunk and was last updated on 18-Jan-2007.

This tour will also be incorporated into a presentation at Open Repositories 2007 on Tuesday afternoon.

Directory Layout

The source directory tree has four major components: 'lib', 'resources', 'src', and 'view'.

lib - libraries required by the application. The lib directory contains all of the JAR libraries required by the application. Its contents is a mix of the Seam-generated skeleton (pretty much everything at the top level of the 'lib' directory) and JAR libraries that are specific to the DRC application (in subdirectories of 'lib' named for the library in use). For instance, the 'commons-codec-1.3' and the 'hibernate-all' and the 'jboss-seam' JAR files were all brought into the project via 'seam-gen' while 'lib/commons-net-1.4.1/commons-net-1.4.1.jar' library was added specifically for this project. A convention has been established whereby new libraries added to the project appear as entries in the file which is used by series of directives in the build.xml file to setup the classpaths for compiling and for building the EJB JAR. This is done to make the testing and transition of new libraries into the application more explicit and easily testable. Note that the newly included library directory also includes a copy of any license file associated with that library; this is not only a requirement to use some libraries but is also a good practice to show the lineage of some of the lesser known libraries. (For an example of what is required, see the changes to build.xml and to in order to bring the Apache Commons Net library into the application.)

resources - configuration files and miscellaneous stuff. The resources directory holds the various configuration files required by the application plus other files used for testing and demonstration. Much of this was generated by the Seam-generated skeleton as well. Some key files here are the import.sql file (SQL statements that are used to preload the RDBMS used by Hibernate as the mocked up repository system) and the test-datastreams directory which has sample files for each of the media types.

src - Java source code. The src directory contains all of the Java source code for the application. Everything exists in a package called 'edu.ohiolink.drc' with subpackages for classes handling actions from the view component of the MVC, entity beans (sometimes known as Data Access Objects -- or DAOs -- I think), exception classes (more on this below), classes for working with FEDORA (not currently used), media type handler classes (more on this below), unit test classes (not currently used), and utility classes.

view - XHTML templates, CSS files, and other web interface needs. The view directory holds all of the files for the "view" aspect of the Model-View-Controller paradigm. More information about the view components is below.

Entity Classes

The entity beans package has three primary entity beans defined:,, and (The entity bean is not used at this time.) is the primary bean that represents an object in the repository. and are component beans that only exist in the lifecycle of an bean; holds a representation of a FEDORA object datastream and holds a representation of a Dublin Core datastream for that object.

The Datastream and Description objects are annotated with @Embedded in the source; this is Hibernate's way of saying that these objects do not stand on their own. also has numerous methods marked with a @javax.persistence.Transient annotation meaning that this is information not stored in the backing Hibernate database; these methods are for the various content handlers, which will be outlined below.

Mock Repository

As currently configured, the entity beans pull their information from a static RDBMS using Hibernate rather than from an underlying FEDORA digital object repository. (You'll need to go back to revision 691 to see how far we got with the FEDORA integration into JBoss Seam before we switched our development focus to the presentation 'view' aspects of the application.) As currently configured, Hibernate uses an embedded Hypersonic SQL database for its datastore. As part of the application deploy process, the Java EE container will instantiate a Hypersonic database and preload it with the contents of the import.sql file. (The import.sql file contains just three sample records at the moment: one each for a text file, a PDF file, and a graphic file.)

All of the data for a repository object is contained in a single table record. Hibernate manages the process for us of reading that record out of the database and creating the three corresponding Java objects: Item, Datastream and Description. (Hibernate could also handle the process of updating the underlying table record if we were to change a value in one of the Java objects.) The mapping of table column to Java object field is handled by the @Column(name="xx") annotations in the entity beans.

For Datastream, what is stored in the database is not the datastream content itself but rather a filename that points to the location of the datastream file. The file path in this field can either be absolute (meaning a complete path starting from the root directory of the filesystem) or a relative path. In the case of the latter, the path is relative to the deployed application's WAR directory (something like ".../jboss-4.0.5.GA/server/default/deploy/drc.ear/drc.war/" for instance). Note that the getter/setter methods for the contentLocation are private -- the rest of the application does not need to know the location of the datastreams; this will also be true when the DRC application is connected to a FEDORA digital object repository. The method marked public instead is getContent, and the implementation of getContent hides the complexity of the fact that the datastream is coming from a disk file rather than a FEDORA repository call. For the three records/repository-objects currently defined in 'import.sql' there are three corresponding demo datastreams in the test-datastreams directory.

In all likelihood, this representation of the FEDORA repository will be too simple for us to move forward much further. In particular, the current notion of one datastream per repository object is too simplistic. The Datastream embedded object will likely need to be broken out into a separate table and as a corresponding distinct Java applet. (We may reach the same point soon for the Description object as well.)

By using the Entity Beans as a buffer between the business logic and the view components of the rest of the application, I hope we can minimize/localize the changes required in the future in order to replace the mock repository with a real underlying FEDORA repository.

View Templates

The preferred view technology for JBoss Seam is Facelets, an implementation of Java Server Faces that does not require the use of Java Server Pages (JSP). Although the '.xhtml' pages in the view directory bear a passing resemblance to JSP, behind the scenes they are radically different. Of note for us is the clean templating system used to generate pages. The home.xhtml file has a reference to the template.xhtml file in the 'layout' directory. If you read through the template.xhtml file, you can see where the Facelets engine will pull in other .xhtml files in addition to the content within the tag of home.xhtml.

Content Handlers

The paradigm of handling different media types within the DRC application is guided in large part by the notion of disseminators for FEDORA objects and the Digital Library Federation Aquifer Asset Actions experiments. The underlying concept is to push the media-specific content handling into the digital object repository and to have the presentation interface consume those content handlers as it is preparing the end-user presentation.

For instance, the DRC will need to handle content models for PDFs, images, video, and so forth. Furthermore, how a video datastream from the Digital Video Collection is offered to the user may be different than how a video datastream from a thesis is offered to the user. Rather than embedding the complexity of making those interface decisions into the front-end DRC application, this model of content handlers pushes that complexity closer to the objects themselves by encoding those behaviors a disseminators of the object. What the presentation layer gets from the object is a chunk of XHTML that it inserts into the dynamically generated HTML page at the right place.

There is work beginning on a framework for FEDORA disseminators at /BaseDisseminator/trunk in the source code repository; that work has been put on hold at the moment in favor of focusing on the presentation interface. In order to prepare for the time when the presentation behaviors are encoded as FEDORA object disseminators, the current presentation layer makes use of Content Handlers for each of the media types. The Handler interface defines the methods required by each handler and the TextHandler class, the ImageHandler class, and the PdfHandler class implement the methods for the three media types already defined.

Of these, TextHandler class is the most complete, so I'll use it as an example.

  • The getRawDatastream method takes the datastream and sends it back to the browser with the HTTP headers that cause a File-Save dialog box to open.
  • The getFullDisplay method returns a chunk of XHTML that presents the full metadata in a manner that can be included in a full metadata display screen.
  • The getRecordDisplay method (currently unwritten) returns a chuck of XHTML used to represent the object in a list of records that resulted from a user's search or browse request.
  • The getThumbnail method (currently unwritten) returns a static graphic thumbnail rendition of the datastream (e.g. a cover page, a key video frame, etc.).

By making these content handlers distinct classes, it is anticipated that the rendering code for each of these methods can be more easily moved to FEDORA object disseminators with minimal impact to the surrounding DRC interface application.

Exception Handling

The DRC application follows the practice suggested by Barry Ruzek in Effective Java Exceptions (found via this link on The Server Side). The article can be summarized as:

One type of exception is a contingency, which means that a process was executed that cannot succeed because of a known problem (the example he uses is that of a checking account, where the account has insufficient funds, or a check has a stop payment issued.) These problems should be handled by way of a distinct mechanism, and the code should expect to manage them.

The other type of exception is a fault, such as the IOException. A fault is typically not something that is or should be expected, and therefore handling faults should probably not be part of a normal process.

With these two classes of exception in mind, it's easy to see what should be checked and should be unchecked: the contingencies should be checked (and descend from Exception) and the faults should be unchecked (and descend from Error).

All unchecked exceptions generated by the application are subclasses of DrcBaseAppException. (DrcBaseApplication itself is a subclass of RuntimeException.) For an example, see NoHandlerException. By setting up all of the applications exceptions to derive from this point, we have one place where logging of troubleshooting information can take place (although this part of the application has not been set up yet). Except when there is good reason to do otherwise, this pattern should be maintained.

At this point, no checked (or contingency) exceptions specific to the DRC have been defined. When they are needed, though, they will follow the same basic structure with a base exception derived from Exception.

The text was modified to update a link from to on January 19th, 2011.

The text was modified to update a link from to on January 20th, 2011.


p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to on November 6th, 2012.