Getting Around Drupal’s Prohibition of @ Characters in User Ids

A while back we created an LDAP directory to consolidate account information for various back-room services, and when we created it we decided to use the individual’s e-mail address as the account identifier (uid in LDAP-speak). It seemed like the logical thing to do — it is something that the user knows and it is a cheap and easy way to assume that the account identifiers will be unique. This is not uncommon for many internet services, of course.

Now we’re bring up a Drupal content management system and of course want to tie the authentication into the existing LDAP directory. The initial configuration appeared to work, but there were odd, unexplained failures — most notably, Drupal would not consider it a ‘real’ account because it didn’t have an e-mail field. Even weirder was the fact that we configured Drupal to know exactly which LDAP attribute to use as the e-mail address (mail, in LDAP-speak). It wasn’t until one of our system engineers wondered out loud if the at-sign (‘@’) in the user id wasn’t causing problems that we started making progress towards a solution.

As it turns out, he was right. Without spending so much time in the guts of the Drupal code to know exactly if this is true, it seems like Drupal wants to reserve the ‘@something‘ construct for inter-Drupal authentication. In other words, if you have an account on one Drupal server (let’s call it DrupalA) and want to access a second (let’s call it DrupalB) — and if the two servers agree to share user accounts — the account from DrupalA would be recorded in the database of DrupalB as “UserId@DrupalA“.

The ‘at’ symbol for us, though, is just a normal part of an e-mail address. We really didn’t want to reconstruct our LDAP account scheme, so the best choice seemed to be to find a way to trick Drupal into accepting these account identifiers. This, unfortunately, was no easy task. I couldn’t find the root cause of the problem, but did diagnose enough of the symptoms to force a patch into the system. The patch, in the form of a new module (code included below) forces the account to have two necessary attributes that seem to go missing whenever a ‘@’ character appears in the user id. If you have similar problems, I can’t claim that this will work for you, nor can I guarantee this approach will be supportable in the future. All’s I know is that it seems to work for us in our situation right now.

function olinkldap_help($section) {
  $output = '';
  switch ($section) {
    case 'admin/modules#olinkldap':
      $output = 'olinkldap;
    case 'admin/modules#description':
    case 'admin/help#olinkldap':
      $output = t('Sets up OhioLINK-specific LDAP parameters.');
  return $output;
function olinkldap_settings() { }
function olinkldap_user($op, &$edit, &$user, $category = NULL) {
  switch($op) {
    case 'load':
function olinkldap_user_load(&$user) {
  // Calculate the DN for the user -- you'll need to adjust this to match your LDAP base DN
  $ldap_dn=sprintf("uid=%s,ou=People,dc=somewhere,dc=outthere", $user->name);
  // Create a new array with the two LDAP-specific values that seem to be missing.
  $forced_data=array('ldap_authentified' => 1, 'ldap_dn' => $ldap_dn);
    // It seems like this should work, but it doesn't (it throws a segmentation fault)
    //  user_save($user_edit,array($forced_data);
    // so we're going to interact directly with the database
  if ($user->uid) {
    // Get the 'data' field for the user and put it in the $data array
    $data = unserialize(db_result(db_query('SELECT data FROM {users} WHERE uid = %d', $user->uid)));
    // Put all of the attributes from $forced_data into $data
    foreach ($forced_data as $key => $value) {
      $data[$key] = $value;
    // Reserialize the $data array and update it in the database
    $v[] = serialize($data);
    db_query("UPDATE {users} SET data='%s' WHERE uid=%d",array_merge($v,array($user->uid)));

Save this as ‘olinkldap.module’, update the DN to reflect your LDAP server’s base DN (see comment in code), copy it into your Drupal modules directory, and activate it. Your ‘@’-impaired userids should start working again. If you are using the inter-Drupal account sharing (we’re not) this might break something for you. That’s not interesting for us, so I’m not testing it against that condition. If you use this and find that it works or doesn’t work, or you have a better way of solving the problem, please leave a comment or traceback…

Building an Institutional Repository Interface Using EJB3 and JBoss Seam

This tour is designed to show the overall architecture of a FEDORA digital object repository application within the JBoss Seam framework while at the same time pointing out individual design decisions and extension points that are specific to the Ohio Digital Resource Commons application. Geared towards software developers, a familiarity with Java Servlet programming is assumed, although not required. Knowledge of JBoss Seam, Hibernate/Java Persistence API, EJB3 and Java EE would be helpful but not required; brief explanations of core concepts of these technologies are included in this tour.

The tour is based on revision 709 of /drc/trunk and was last updated on 18-Jan-2007.

This tour will also be incorporated into a presentation at Open Repositories 2007 on Tuesday afternoon.

Directory Layout

The source directory tree has four major components: ‘lib’, ‘resources’, ‘src’, and ‘view’.

lib – libraries required by the application. The lib directory contains all of the JAR libraries required by the application. Its contents is a mix of the Seam-generated skeleton (pretty much everything at the top level of the ‘lib’ directory) and JAR libraries that are specific to the DRC application (in subdirectories of ‘lib’ named for the library in use). For instance, the ‘commons-codec-1.3’ and the ‘hibernate-all’ and the ‘jboss-seam’ JAR files were all brought into the project via ‘seam-gen’ while ‘lib/commons-net-1.4.1/commons-net-1.4.1.jar’ library was added specifically for this project. A convention has been established whereby new libraries added to the project appear as entries in the file which is used by series of directives in the build.xml file to setup the classpaths for compiling and for building the EJB JAR. This is done to make the testing and transition of new libraries into the application more explicit and easily testable. Note that the newly included library directory also includes a copy of any license file associated with that library; this is not only a requirement to use some libraries but is also a good practice to show the lineage of some of the lesser known libraries. (For an example of what is required, see the changes to build.xml and to in order to bring the Apache Commons Net library into the application.)

resources – configuration files and miscellaneous stuff. The resources directory holds the various configuration files required by the application plus other files used for testing and demonstration. Much of this was generated by the Seam-generated skeleton as well. Some key files here are the import.sql file (SQL statements that are used to preload the RDBMS used by Hibernate as the mocked up repository system) and the test-datastreams directory which has sample files for each of the media types.

src – Java source code. The src directory contains all of the Java source code for the application. Everything exists in a package called ‘edu.ohiolink.drc’ with subpackages for classes handling actions from the view component of the MVC, entity beans (sometimes known as Data Access Objects — or DAOs — I think), exception classes (more on this below), classes for working with FEDORA (not currently used), media type handler classes (more on this below), unit test classes (not currently used), and utility classes.

view – XHTML templates, CSS files, and other web interface needs. The view directory holds all of the files for the “view” aspect of the Model-View-Controller paradigm. More information about the view components is below.

Entity Classes

The entity beans package has three primary entity beans defined:,, and (The entity bean is not used at this time.) is the primary bean that represents an object in the repository. and are component beans that only exist in the lifecycle of an bean; holds a representation of a FEDORA object datastream and holds a representation of a Dublin Core datastream for that object.

The Datastream and Description objects are annotated with @Embedded in the source; this is Hibernate’s way of saying that these objects do not stand on their own. also has numerous methods marked with a @javax.persistence.Transient annotation meaning that this is information not stored in the backing Hibernate database; these methods are for the various content handlers, which will be outlined below.

Mock Repository

As currently configured, the entity beans pull their information from a static RDBMS using Hibernate rather than from an underlying FEDORA digital object repository. (You’ll need to go back to revision 691 to see how far we got with the FEDORA integration into JBoss Seam before we switched our development focus to the presentation ‘view’ aspects of the application.) As currently configured, Hibernate uses an embedded Hypersonic SQL database for its datastore. As part of the application deploy process, the Java EE container will instantiate a Hypersonic database and preload it with the contents of the import.sql file. (The import.sql file contains just three sample records at the moment: one each for a text file, a PDF file, and a graphic file.)

All of the data for a repository object is contained in a single table record. Hibernate manages the process for us of reading that record out of the database and creating the three corresponding Java objects: Item, Datastream and Description. (Hibernate could also handle the process of updating the underlying table record if we were to change a value in one of the Java objects.) The mapping of table column to Java object field is handled by the @Column(name="xx") annotations in the entity beans.

For Datastream, what is stored in the database is not the datastream content itself but rather a filename that points to the location of the datastream file. The file path in this field can either be absolute (meaning a complete path starting from the root directory of the filesystem) or a relative path. In the case of the latter, the path is relative to the deployed application’s WAR directory (something like “…/jboss-4.0.5.GA/server/default/deploy/drc.ear/drc.war/” for instance). Note that the getter/setter methods for the contentLocation are private — the rest of the application does not need to know the location of the datastreams; this will also be true when the DRC application is connected to a FEDORA digital object repository. The method marked public instead is getContent, and the implementation of getContent hides the complexity of the fact that the datastream is coming from a disk file rather than a FEDORA repository call. For the three records/repository-objects currently defined in ‘import.sql’ there are three corresponding demo datastreams in the test-datastreams directory.

In all likelihood, this representation of the FEDORA repository will be too simple for us to move forward much further. In particular, the current notion of one datastream per repository object is too simplistic. The Datastream embedded object will likely need to be broken out into a separate table and as a corresponding distinct Java applet. (We may reach the same point soon for the Description object as well.)

By using the Entity Beans as a buffer between the business logic and the view components of the rest of the application, I hope we can minimize/localize the changes required in the future in order to replace the mock repository with a real underlying FEDORA repository.

View Templates

The preferred view technology for JBoss Seam is Facelets, an implementation of Java Server Faces that does not require the use of Java Server Pages (JSP). Although the ‘.xhtml’ pages in the view directory bear a passing resemblance to JSP, behind the scenes they are radically different. Of note for us is the clean templating system used to generate pages. The home.xhtml file has a reference to the template.xhtml file in the ‘layout’ directory. If you read through the template.xhtml file, you can see where the Facelets engine will pull in other .xhtml files in addition to the content within the <ui:define name="body"> tag of home.xhtml.

Content Handlers

The paradigm of handling different media types within the DRC application is guided in large part by the notion of disseminators for FEDORA objects and the Digital Library Federation Aquifer Asset Actions experiments. The underlying concept is to push the media-specific content handling into the digital object repository and to have the presentation interface consume those content handlers as it is preparing the end-user presentation.

For instance, the DRC will need to handle content models for PDFs, images, video, and so forth. Furthermore, how a video datastream from the Digital Video Collection is offered to the user may be different than how a video datastream from a thesis is offered to the user. Rather than embedding the complexity of making those interface decisions into the front-end DRC application, this model of content handlers pushes that complexity closer to the objects themselves by encoding those behaviors a disseminators of the object. What the presentation layer gets from the object is a chunk of XHTML that it inserts into the dynamically generated HTML page at the right place.

There is work beginning on a framework for FEDORA disseminators at /BaseDisseminator/trunk in the source code repository; that work has been put on hold at the moment in favor of focusing on the presentation interface. In order to prepare for the time when the presentation behaviors are encoded as FEDORA object disseminators, the current presentation layer makes use of Content Handlers for each of the media types. The Handler interface defines the methods required by each handler and the TextHandler class, the ImageHandler class, and the PdfHandler class implement the methods for the three media types already defined.

Of these, TextHandler class is the most complete, so I’ll use it as an example.

  • The getRawDatastream method takes the datastream and sends it back to the browser with the HTTP headers that cause a File-Save dialog box to open.
  • The getFullDisplay method returns a chunk of XHTML that presents the full metadata in a manner that can be included in a full metadata display screen.
  • The getRecordDisplay method (currently unwritten) returns a chuck of XHTML used to represent the object in a list of records that resulted from a user’s search or browse request.
  • The getThumbnail method (currently unwritten) returns a static graphic thumbnail rendition of the datastream (e.g. a cover page, a key video frame, etc.).

By making these content handlers distinct classes, it is anticipated that the rendering code for each of these methods can be more easily moved to FEDORA object disseminators with minimal impact to the surrounding DRC interface application.

Exception Handling

The DRC application follows the practice suggested by Barry Ruzek in Effective Java Exceptions (found via this link on The Server Side). The article can be summarized as:

One type of exception is a contingency, which means that a process was executed that cannot succeed because of a known problem (the example he uses is that of a checking account, where the account has insufficient funds, or a check has a stop payment issued.) These problems should be handled by way of a distinct mechanism, and the code should expect to manage them.

The other type of exception is a fault, such as the IOException. A fault is typically not something that is or should be expected, and therefore handling faults should probably not be part of a normal process.

With these two classes of exception in mind, it’s easy to see what should be checked and should be unchecked: the contingencies should be checked (and descend from Exception) and the faults should be unchecked (and descend from Error).

All unchecked exceptions generated by the application are subclasses of DrcBaseAppException. (DrcBaseApplication itself is a subclass of RuntimeException.) For an example, see NoHandlerException. By setting up all of the applications exceptions to derive from this point, we have one place where logging of troubleshooting information can take place (although this part of the application has not been set up yet). Except when there is good reason to do otherwise, this pattern should be maintained.

At this point, no checked (or contingency) exceptions specific to the DRC have been defined. When they are needed, though, they will follow the same basic structure with a base exception derived from Exception.

The text was modified to update a link from to on January 19th, 2011.

The text was modified to update a link from to on January 20th, 2011.


p style=”padding:0;margin:0;font-style:italic;” class=”removed_link”>The text was modified to remove a link to on November 6th, 2012.

Java Application for Batch Processing FEDORA Objects

We had a need today to transform an XML file with a custom DTD into Dublin Core; the custom XML file is a datastream in our FEDORA repository and we want to put the Dublin Core XML file back into the FEDORA object as the DC datastream. This took a slew of technologies and techniques: reading a datastream out of the FEDORA repository using API-A, parsing XML documents using the Java DOM library, creating a new document with the correct namespaces using Java DOM, and modifying the DC datastream in the repository using API-M.

I’m posting the code here in case someone else might find it useful. Of course, if you know a better way please let me know. We’ll probably need to do things like this again…

 * Copyright (C) 2006 OhioLINK
 * This file is part of the OhioLINK Digital Resource Commons (DRC) Project.
 * The OhioLINK DRC is free software; you can redistribute it and/or
 * modify it under the terms of the Affero General Public License as
 * published by Affero, Inc. -- either version 1 of the License, or
 * (at your option) any later version.
 * The OhioLINK DRC Project is distributed in the hope that it will be
 * useful, but WITHOUT ANY WARRANTY -- without even the implied warranty
 * Affero General Public License for more details.
 * You should have received a copy of the Affero General Public
 * License in the LICENSE.txt file that comes with the DRC project;
 * if not, write to DRC Development Team, OhioLINK, 2455 North Star Rd,
 * Suite 300, Columbus, OH 43221, USA.
package batch;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import fedora.client.FedoraClient;
import fedora.server.access.FedoraAPIA;
import fedora.server.types.gen.DatastreamDef;
import fedora.server.types.gen.MIMETypedStream;
public class Batch {
 public static void main(String[] args) {
  for (int i = 80; i &lt; 81; i++) {
   // "hdl" is our FEDORA PID prefix
   String pid = "hdl:" + i;
   try {
    FedoraClient client = new FedoraClient(
      "fedoraAdmin", "password");
    FedoraAPIA apia = client.getAPIA();
    FedoraAPIM apim = client.getAPIM();
    // Get the list of datastreams for this object.  For each one, we're
    // going to look for an identifier that ends in "etd"
    DatastreamDef[] datastreams = apia.listDatastreams(pid, null);
    for (int j = 0; j &lt; datastreams.length; j++) {
     DatastreamDef def = datastreams[j];
     String itemId = def.getID();
     if (itemId.endsWith("etd")) {
      // If we've found it, get it out of the FEDORA server and
      // create a XML DOM document for it
      MIMETypedStream ds = apia.getDatastreamDissemination(pid,itemId,null);
      byte[] file = ds.getStream();
      InputStream inputStream = new ByteArrayInputStream(file);
      // String fileStr = new String(file, "ascii");
      // System.out.println(fileStr);
      DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
      DocumentBuilder builder = factory.newDocumentBuilder();
      Document sourceDoc = builder.parse(inputStream);
      // Now build an empty XML DOM document for the Dublin Core
      Document destDoc = builder.newDocument();
      Element rootElement=destDoc.createElementNS("","oai_dc:dc");
      // Now copy the values from the ETD XML document into
      // the DC XML document
      Element e; String value;
      value=sourceDoc.getElementsByTagName("title").item(0).getTextContent().replaceAll("[\t ]*\n[\t ]*", " ").replaceAll("[\t ][\t ]+", " ").trim();
      // author's name comes in many parts; this'll put them together
      e = destDoc.createElement("dc:creator");
      String nameFields[] = { "authfname", "authmname", "authlname", "authsuffix"};
      String author = new String();
      for (String field : nameFields) {
       value = sourceDoc.getElementsByTagName(field).item(0).getTextContent().replaceAll("[\t ]*\n[\t ]*", " ").replaceAll("[\t ][\t ]+", " ").trim();
       if (value != null && !value.equals("")) {
        author = author.concat(value).concat(" ");
      value=sourceDoc.getElementsByTagName("language").item(0).getTextContent().replaceAll("[\t ]*\n[\t ]*", " ").replaceAll("[\t ][\t ]+", " ").trim();
      value=sourceDoc.getElementsByTagName("abstract").item(0).getTextContent().replaceAll("[\t ]*\n[\t ]*", " ").replaceAll("[\t ][\t ]+", " ").trim();
      value=sourceDoc.getElementsByTagName("docyear").item(0).getTextContent().replaceAll("[\t ]*\n[\t ]*", " ").replaceAll("[\t ][\t ]+", " ").trim();
      value = sourceDoc.getElementsByTagName("subjects").item(0).getTextContent().replaceAll("[\t ]*\n[\t ]*", " ").replaceAll("[\t ][\t ]+", " ").trim();
      // Use a Transformer for output
      TransformerFactory tFactory = TransformerFactory.newInstance();
      Transformer transformer = tFactory.newTransformer();
      transformer.setOutputProperty(javax.xml.transform.OutputKeys.OMIT_XML_DECLARATION, "yes");
      DOMSource source = new DOMSource(destDoc);
      StringWriter strWriter = new StringWriter();
      StreamResult result = new StreamResult(strWriter);
      transformer.transform(source, result);
      String xmlAsString=strWriter.getBuffer().toString();
      // System.out.println(xmlAsString);
      byte[] normalarr=xmlAsString.getBytes("UTF-8");
      // Lastly, write the modified DC datastream back to the FEDORA server
      apim.modifyDatastreamByValue(pid, "DC", null, "Dublin Core", false, "text/xml", null, normalarr, "A", "Batch program to add DC datastream from ETD XML file", false);
   } catch (MalformedURLException e) {
    System.out.println(pid+" "+e.getLocalizedMessage());
   } catch (Exception e) {
    System.out.println(pid+" "+e.getLocalizedMessage());

Traditional Development/Integration/Staging/Production Practice for Software Development

Recently, I was asked to outline a plan for a structured process for software development that maximizes productivity and reduces bugs that reach the user. This was originally an internal OhioLINK document, but the process described is pretty traditional and others might find a use for this as well. You are welcome to use this; please honor the Creative Commons licensing terms and contact me in advance if you need something different.

NOTE!  This article is translated to Serbo-Croatian language by Anja Skrba from This article is translated to Slovak language by Juraj Rehorovsky from Everycloud. This article is translated to Belarusian by Vicky Rotarova. This article is translated into Romanian by Milos Onichanu.

Creating Applications in Four Tiers

Let’s start first with a description of the four tiers for software development.

Optional. This is the working environment for individual developers or small teams. Working in isolation with the rest of the tiers, the developer(s) can try radical changes to the code without adversely affecting the rest of the development team.
A common environment where all developers commit code changes. The goal of this environment is to combine and validate the work of the entire project team so it can be tested before being promoted to the Staging Environment. It is possible for Development and Integration to be the same environment (as in the case where the developer does not use a local copy of the source code).
The staging tier is a environment that is as identical to the production environment as possible. The purpose of the Staging environment is to simulate as much of the Production environment as possible. The Staging environment can also double as a Demonstration/Training environment.
The production tier might include a single machine or a huge cluster comprising many machines.

These tiers speak of “environments” rather than “machines” or “servers.” It is certainly possible for multiple Development environments and the Integration environment to be on the same physical machine, or the Integration and Staging environments to be on the same machine. If at all possible, the Production environment should be by itself and not shared with any of the other environments.

Resources at Each Tier


  • Identical software configuration as the production machine and a complete, independent copy of the production database so it is a true basis for QA testing. If you are using a Storage Area Network (SAN) or Network Attached Storage (NAS), you may be able to use the “snapshot” capabilities of the storage device to simulate a copy of the production database without requiring an entire duplicate copy of the data (and corresponding hardware).
  • Comparable hardware configuration to the production system so an accurate forecast of capacity by performance testing against it and then multiplying its performance by the number of machines that will be deployed in production.


  • Limited subset of data that is useful for testing “boundary conditions” in the application. It may be wise to refresh this subset of data frequently to remove the artifacts of software development and testing on the Integration environment.


  • The same limited subset of data as the Integration environment.

Moving Between Tiers

This graphic shows the nature of the work performed in each environment, the responsibilities of actors in each environment, and relative rate of software builds and deployments.

Process for Moving Through Development/Integration/Staging/Production Stages

In narrative form, the software developer writes code in his or her development environment (1) and checks it into the Subversion source code repository (2). As other developers report bugs (3) more changes are made (5) and checked in (6). Remember that the Development and Integration environments can be the same actual environment, so these two boxes can be collapsed; it is important to note, though, that in such a case changes are still being checked into Subversion.

When the developers are happy with the behavior of the Integration environment (6), the Release Master creates a copy or “tag” of the code in Subversion and updates the Staging environment to this tag (7). At this point the quality assurance (QA) testers start their review (8). QA testers can be both internal staff and external reviewers; the Staging area also doubles as a training environment when the Production release is ready. QA reports go back to the developer (9) who fixes them (10) and checks the changes into Subversion (11). After all of the bugs are fixed, the release manager promotes a new version to staging (12).

This process continues until the QA team declares the staging version is “okay to release” (13). The release manager packages up the release version from Subversion (14) and deploys it on the production servers (15). As time goes on, bug reports and feature requests are made (16) for which the developer writes code (17) and checks in the changes to the source code repository (18). (17) and (18) are functionally equivalent to ”(1)” and ”(2)” above. Repeat until the end user is completely satisfied.

Important Notes

Developers only make changes to the Development and Integration environments. If a bug fix is to be made, the developer makes it in Subversion at the Integration stage. In order to maintain the integrity of the source code repository at no point does a developer make changes directly to the Staging or Production environments.

For each deployment to Production, there are multiple versions in Staging and for each deployment into Staging, there are multiple versions in Development/Integration. By design, end users are isolated from the rapid and occasionally buggy process of developing software. It is assumed that most bugs will be caught early and repeated versions at the early stages fill find bugs faster.

Only “release manager” can deploy versions to the next stage. There can be different release managers for deployment from Integration-to-Staging and Staging-to-Production; the release manager can even change from version to version. Of course, on some projects, the developer, the release manager, and QA tester can actually be the same person. The important point, though, is that there is always only one person responsible for deploying the new version.

Although the vertical boxes in the graphic imply the Integration environment is turned off at step ”(7)”, in reality the version of the software on the Integration environment never really goes away. Instead the version of software that is promoted to the Staging environment is “tagged” off of “trunk” in the source code repository and it is the tagged version that is copied to Staging. Work by the developers then continues on the “trunk”. The same holds true for the promotion of the Staging version to production.

Coding Standards

To facilitate transfer of applications from the development server through the staging and into production, the code should be free of any server-dependent variables.

  • If necessary, determine the hostname programatically rather than specifying it explicitly in the code or in the configuration. One can use either javax.servlet.ServletRequest.getServerName or javax.servlet.ServletRequest.getLocalName to determine this in a servlet.
  • To ensure code is portable across file systems use relative pathnames in code and if necessary set the base directory in the application’s configuration. This will enable applications and scripts to be moved from one directory on the development machine to another directory (possibly with a different path) on the production machine.

Picking a Java Web Application Framework

We’re beginning a new phase of our digital library development at OhioLINK and an oversimplification of one of the consequences of this new phase is that we will be developing more software from scratch rather than adapting stuff that we find out there on the net. (Another consequence of this new phase is our interest in applying the Service-Oriented Architecture paradigm to library applications.) In previous phases, we were somewhat at the mercy of whatever development framework was used in the application we were adopting. As we start this new development where we control more of our own destiny, we wanted to take a step back and look at the available frameworks to support our development efforts. The options we identified at the start were plain Java servlets, Apache Struts, Spring Framework, and EJB3 with JBoss SEAM.

Our analysis is admittedly by proxy rather than through direct experience. If we had more time, we would start our new phase in each one of these (with the associated learning curve for some of them) and pick the one that worked out the best. We have neither the luxury of time nor of excess developer talent, so we looked at what others were saying, created some sample applications, and discussed our gut reactions to each option. In the end, we decided to start with EJB3/SEAM.

Summary of Options

Over the years, the Java community has developed frameworks to support the creation of applications in the Java programming language. These frameworks consist of a combination of code libraries, programming rules, and best practices that have evolved based on the research and experience of the developer community. Just as almost no one designs, codes, debugs, and documents their own low-level file I/O libraries any more, these frameworks provide a level of abstraction over the raw programming language that enables developers to create better code faster.

Within the web application arena, arguably the earliest successful framework was Apache Struts. It was one of the first frameworks to promote a Model-View-Controller (MVC) architecture: the Model represents the business or database code, the View represents the page design code, and the Controller represents the navigational code. By creating these three layers, Struts promoted a practice of design and development away from commingling database code, page design code, and control flow code in the same JavaServer Pages files. In practice it was found that unless these three concerns are separated, larger applications become difficult to maintain.

The latest revolution in design architectures is Dependency Injection (DI). Sometimes also referred to as Inversion of Control (or IoC), it is a programming design pattern and architectural model in which the responsibility for object creation and object linking is removed from the code of the objects themselves. As the term “Dependency Injection” may imply, it is the framework that instantiates the component objects and binds them to each other through the use of setters and/or constructors rather than the components linking themselves together. In this pattern, the objects themselves become loosely coupled, allowing for a more dynamic and testable integration of the component objects. The creation and binding of objects is defined in an XML configuration file (Spring Framework) or via Java Annotations (EJB3).

Two primary frameworks have emerged for the Dependency Injection paradigm: Spring Framework and EJB3. The Spring Framework is the grandparent of subsequent IoC efforts (including EJB3). A de facto standard, it is widely used in the Java programming community with a large number of tools and documentation. EJB3 is a formal specification adopted by participants in the Java Community Process for standardizing Java-related developments. Coupled with the JBoss SEAM framework, the EJB3/SEAM combination is roughly on par with raw technical capabilities of the Spring Framework. SEAM removes some of the complexity of raw EJB3 by providing and preconfiguring popular choices for views (JSF) and models (Hibernate) as well as integrated AJAX-based Remoting and jPBM. Both promote the same MVC architecture as Struts.

The distinguishing characteristics are that Spring acts as an open integration tool at the expense of a complicated XML-based configuration process whereas EJB3/SEAM is simpler in its configuration and more restrictive in what other tools can be easily brought into the framework. In other words, Spring has a wide array of choices for the various framework components and for the most part the developer must manage the integration; EJB3/SEAM lessens the complexity of the framework by limiting the choices available for the various components to only the most popular options. Note that there is overlap between Spring and EJB3; the Pitchfork project allows for JSR-250 (Common Annotations) dependency injection and EJB 3.0 style interception as an add-on to the Spring framework. One should also note that a full EJB3 implementation is a kind of superset to SEAM, and if the needs of our applications go beyond that of SEAM it is possible to reduce our dependency on the SEAM framework by beginning to integrate other choices (and managing that integration ourselves).

OhioLINK staff would have a learning curve associated with both Spring and EJB3/SEAM (and Apache Struts as well, although it has been discounted as an option as a mostly dead-end architecture at this point). On the question of whether the learning curve is greater than the effort of doing things the “plain Java” way, one can point to the large number of developers who have made the leap to these frameworks and are arguably more productive for doing so. Since we are beginning the development of a new code base, it seems to be the ideal time to start up that learning curve with the creation and deployment of a new service. The learning curve might be easier for Spring than for EJB3/SEAM due to the wide variety of materials already produced for Spring, although the corporate backers of EJB3/SEAM seem to be filling this gap at a steady pace. The learning curve for EJB3/SEAM may be shallower, though, because SEAM simplifies the configuration of the framework itself.

At our development team meeting yesterday, we decided that OhioLINK will adopt a Dependency Injection (DI) paradigm over use of Apache Struts and plain Java servlets because of the anticipated large productivity gains after the learning curve. Of the two DI/IoP frameworks widely available now, we decided to adopt EJB3 coupled with JBoss SEAM. The standards-driven nature of EJB3 reduces the risk of adopting a technology that may not be supported in the medium-term. It is expected that JBoss SEAM will flatten the learning curve to make EJB3 more readily understood in the short term while providing a migration path to a broader EJB3 implementation (beyond the capabilities of SEAM) if needed in the future. The Spring Framework, to its credit, also makes it possible to envelop EJB3 objects as part of a Spring-based application, which could smooth the transition to that framework should it become necessary. We also understand that there is some risk of “vendor lock-in” by relying on JBoss SEAM — in our limited experience, applications seemed to deploy better under JBoss Application Server as opposed to a stock Apache Tomcat 5.5 installation. On the other hand (if the “standards” nature of EJB3 is to be trusted), it should be possible to move our code to other application servers, leaving SEAM behind, with minimal changes. We may also be able to further flatten the learning curve by buying a support contract with JBoss in the short- to medium-term.

Details about the Candidates

Enterprise Java Beans 3 (EJB3)

The EJB 3.0 framework is a standard framework defined by the Java Community Process (JCP) and supported by all major J2EE vendors. [The architecture of the Spring framework is based upon the Dependency Injection (DI) design pattern.] Open source and commercial implementations of EJB 3.0 specifications are already available from JBoss and Oracle. EJB 3.0 makes heavy use of Java annotations.

  • Based on standards work (the Java Community Process)
  • “Configuration by default” using Java Annotations backed by more complicated XML configuration files, if needed
  • A more compact, rigidly-defined framework stack; easier configuration, but fewer choices
  • Brand new (recently ratified); fewer tools, books, tutorials available


JBoss Seam is a powerful new application framework to build next generation Web 2.0 applications by unifying and integrating popular service oriented architecture (SOA) technologies like AJAX, Java Server Faces (JSF), Enterprise Java Beans (EJB3), Java Portlets and Business Process Management (BPM) and workflow.

Seam has been designed from the ground up to eliminate complexity at the architecture and the API level. It enables developers to assemble complex web applications with simple annotated Plain Old Java Objects (POJOs), componentized UI widgets and very little XML. The simplicity of Seam 1.0 will enable easy integration with the JBoss Enterprise Service Bus (ESB) and Java Business Integration (JBI) in the future.

  • Based on EJB3
  • Integrates together some of the most widely adopted components such as JavaServer Faces and Hibernate
  • Can reportedly be used outside of JBoss Application Server (direct experience show some problems with this)

Spring Framework

The Spring framework is a popular but non-standard open source framework. It is primarily developed by and controlled by Interface21 Inc. The architecture of the Spring framework is based upon the Dependency Injection (DI) design pattern. Spring can work standalone or with existing application servers and makes heavy use of XML configuration files.

  • Explicit configuration via XML file
  • More flexible than EJB3/SEAM in swapping in and out various modules; yet the decisions made early on limit the ability to swap in and out later.
  • Several years old; widely available tools, books, tutorials
  • De facto standard of an open source community with benevolent control by a corporate entity.

Apache Struts

Apache Struts is a free open-source framework for creating Java web applications.

  • (Summary opinion) Generally seen as a deprecated framework; few new projects start with Struts.
  • Weak integration of new techniques such as AJAX.

Plain Java Servlet (No Framework)

  • Must build the entire application support layer (object data storage, presentation interfaces, helper functions, etc. from scratch).
  • No ready-made integration of new techniques such as AJAX.

Background Information


p style=”padding:0;margin:0;font-style:italic;”>The text was modified to update a link from to on January 19th, 2011.