Minutes of the FEDORA Workflow Working Group meeting of 18-Jun-2006

Please note — this is a copy of the FEDORA Workflow Working Group minutes from the FEDORA Wiki. It is being posted here in order to get it into the blogosphere at the right places. Please make comments on the FEDORA Wiki “talk” page rather than on this posting.

FEDORA Workflow Working Group Meeting

18-Jun-2006, University of Virginia

Attending: Grace Agnew, Rutgers U.; Chris Awre, U. of Hull; Dan Davis, Harris Corp.; Richard Green, U. of Hull; Peter Murray, OhioLINK; Matthias Razum, FIZ Karlsruhe; Bill Parod, Northwestern U; Adam Soroka, U. of Virginia; Thorny Staples, U. of Virginia; Ross Wayland, U. of Virginia

Review of Minutes from 6 Dec 2005

One of the assumptions in the 6 Dec 2005 minutes is: “an object repository built on FEDORA should be seen as a write rarely, read mostly application”. Some participants noted that the repository is viewed as part of a working tool for scholars, which makes “writes” not so ‘rare’. This kind of assumption is useful for characterize FEDORA-based application as something other than a RDBMS. A clarifying comparison is the frequency of transactions when “saving a file” (once every 10 minutes) versus “banking transactions” (thousands of transactions/second). This characterization helps define the efficiency needs of reads over writes.

A revised assumption was offered: “FEDORA is typically an application where reads typically exceed writes.

It was also noted that the working group charter and the assumptions make reference to “BPEL” as the only workflow orchestration engine, excluding other tools such as JBPM. This is not intentional, and does not necessarily reflect a decision to use only BPEL-based engines. The participants noted, however, that no one envisions writing a new workflow engine from scratch.

Participant Desires and Status Review of Workflow-related Activities

FIRE/Workflow 0.1 client update

The December meeting envisioned work on the NSDL NCCS/FIRE client as the starting point for embedding workflow capabilities into FEDORA. The FEDORA core development team looked at both ActiveBPEL and JBPM for integration into the FEDORA-SF and determined that within their available time that the core developers needed to hardwire the workflow into the client with the desire to apply a workflow orchestration engine in a later release.

FEDORA Repository update

Transforming FEDORA to be built as a WAR file is almost completed. Adding generic “messaging” in FEDORA is one task on the list of desired for core developer team to prioritize and assign in September.

Max Plank Society: eSciDoc


Max Planck Society is made up of 82 institutes, each with many working groups. The ESciDoc_Project is building one repository for all of these institutes and working groups in disciplines ranging from humanities to sciences. In interviews with users, the eSciDoc development team has determined that each user group will likely want a near infinite diversity of combination of workflow steps. eSciDoc developers have built a framework layer around FEDORA and will be putting a workflow engine for human-in-the-loop and process-to-process steps.


The eSciDoc development team has decided to use JBPM. In an experiment with ActiveBPEL, without using advanced tools (at the time of the experiment, the ActiveBPEL editor tool was not freely available), setting up a workflow involving interactions between two services took several days and editing 13 BPEL-related files as opposed to just editing one file with JBPM. Their thinking is that the eScieDoc developers will interview users to model the end users’ workflow needs, then create the workflow script in JBPM for that group.

Harris Corporation


Submitted a bid for NARA’s ERA system (records management archive) but were not successful. Harris Corp. is still working with FEDORA and is looking for other opportunities to use it in service integration proposals. Harris Corp. is using the IBM engine (has both the human-in-the-loop and the process-to-process capabilities); it is arguably the most advanced out there, but it is still not mature as this field continues to grow. Existing engines “make it look easy” by limiting the kinds of workflow activities that can be done, and open source implementations are many steps back because the advanced tools are only now coming out.

U. of Hull: RepoMMan


The RepoMMan project seeks to give researchers some ‘extra services’ almost without them being aware of it: managing work in progress; access for themselves and small collaborative group; safety (backup) of work — and be as easy to use a network drive on their desktop plus interface for repository-related activities. The RepoMMan project is funded to bring a BPEL workflow to a FEDORA system to fulfill a part of this vision.

The development team has surveyed and interviewed researchers about how they did research to understand what repository processes and features might help them. In the autumn, the development team plans to conduct the same survey and interview techniques to get the teaching/learning needs and administrator needs. They are managing the evolving expectations for a general document management system from university’s administration.


Use of ActiveBPEL is mandated by the JISC funding. Initially, the inability to validate FEDORA’s WSDL was a showstopper, but the WSDL was recently rewritten by the FEDORA core developer team and is being tested now. The RepoMMan developers have gained a great deal of experience with the BPEL engine outside of FEDORA. With a rewritten WSDL, the development is expected to move faster.


Overlap with FEDORA Preservation Services Working Group

It was noted and discussed that the Preservation Services Working Group has begun defining the ‘events’ associated with objects in the repository and was looking for an ‘event management engine’ of some sort. It is clear that there is some overlap in these working group activities and some coordination would be beneficial to both groups. The Preservation Services Working Group refers to this activity as “Event Management” which seems to be equivalent to our “Orchestration Management”.

Areas where the Workflow Working Group can make a meaningful contribution

It was observed that collectively we’re reaching for the final plateau of the desired service, but there are many layers from where we are now to there that needs to be built (e.g. message-oriented middleware). In the course of the afternoon discussion, two work areas were identified.

First, defining and coding the building blocks of repository activities as discrete, WSDL-addressable web services. This work needs to be done in conjunction with the FEDORA Preservation Services Working Group — building on their existing work and aid in specifying these events as discrete web services. These ‘event’ building blocks are likely to be generic to many workflow orchestration engines or other mechanisms to build workflow steps into an application.

Second, building a reference implementation of a workflow stack into the FEDORA-SF using open source components. It is envisioned that this stack will consist of a “workflow engine” layer (BPEL-based, JBPM-based, other, or a combination of these); a “workflow management system” that provides the human interface to instantiate and track workflow scripts in flight (interfaces could be JSR-168 portlets, dedicated web servlet, a desktop Swing-based app, etc.); and a “workflow editor system” for creating new workflow scripts.

Next steps

  1. Create a narrative of these two work areas
  2. Participants review Preservation WG workflows
    1. Peter will get with Ron Jantz to begin the dialog
    2. WG members will review the Preservation WG “events” as the basis of creating the orchestration “building block” activities

To BPEL or not to BPEL, quite a good question

OhioLINK is actively looking at BPEL as an option for workflow orchestration for the DRC project. I was asked recently about that choice, especially in light of a preliminary report from another team looking to use Fedora in a manner similar to the DRC. The preliminary report has not been published (I’ll update this posting when it is), and the organization involved is intentionally not mentioned here. Their questions, though, do allow for an opportunity to explain some of my own thinking on the topic.

[We] had a look both at workflow engines with proprietary definition languages and at ones with BPEL support. [The] recommendation was not to use a BPEL engine because in their opinion, the current BPEL standard lacks at least one important feature: being able to assign a process to a user. They say that BPEL is targeted at workflow orchestration and not so much at modelling a workflow where human interactions are required. To my understanding, ingestion and especially quality assurance in most cases will include non-technical steps which require the assignment of the ongoing process to a person.

Your team has identified one known problem — the lack of a standards-based way to script human interactions in a BPEL orchestration engine. There are ways around it (as your team noted and as described in “Yes, BPEL has a human side“) and also work being done to resolve it (“WS-BPEL Extension for People“).

Speaking only for myself and OhioLINK for the moment, my desire to look seriously at BPEL, acknowledging that it is an immature and rapidly developing standard, is two-fold. First, work on this standard is being done under the auspices of OASIS which has a good track record for bringing consensus to the industry and generating reasonable, open specifications for the industry to rally around. From a digital preservation perspective, I believe that identifying and adhering to key standards aids in the overall ability to preserve the interactive nature of systems and their underlying digital objects. This characteristic is what makes the Fedora repository, and the inherent ability to recreate the system from its underlying METS-like objects, very attractive.

Second, OhioLINK increasingly sees Fedora playing the role of a centralized content repository in a Service-Oriented Architecture environment. The repository and workflow orchestration engine will need to interoperate with an inventory control system (a component of today’s Integrated Library System model), purchasing and accounts payable systems (for acquiring and claiming physical and digital content), and expertise systems (automated and human-mediated “reference” services). And these activities will be conducted over a variety of interfaes: HTML, SOAP, Interactive Voice Response, etc. Our thinking along these lines is preliminary and purely theoretical, but the surface analysis we’ve done so far seems to indicate that we have a lot of other’s work to leverage if we adopt these emerging industry philosophies, standards, and blueprints.

Additionally, they came up with the issue of complexity. Whereas they needed just one configuration file for jBPM to model their simplistic test case, it took them eleven configuration files (including WSDL) to set up their “hello world” example with ActiveBPEL. Adding a second web service to the workflow required changing several of these configuration files. Their impression was that going the BPEL way would let us end up in a maintenance nightmare (we expect to have many highly configurable workflows with lots of changes to them).

Unfortunately, this is where your team’s coding experience trumps my theoretical understanding. I’ve done some reading on the background and theory of BPEL, less on the actual implementation of a BPEL system, and no hands-on work (yet). Eleven files strikes me as extreme (and a point of concern beacuse I expect to offer our users highly configurable workflow scenarios as well).