Minutes of the FEDORA Workflow Working Group meeting of 18-Jun-2006

by  Peter E. Murray  ·   Posted on 
 ·  6 minutes reading time
Please note -- this is a copy of the FEDORA Workflow Working Group minutes from the FEDORA Wiki. It is being posted here in order to get it into the blogosphere at the right places. Please make comments on the FEDORA Wiki "talk" page rather than on this posting.

FEDORA Workflow Working Group Meeting

18-Jun-2006, University of Virginia

Attending: Grace Agnew, Rutgers U.; Chris Awre, U. of Hull; Dan Davis, Harris Corp.; Richard Green, U. of Hull; Peter Murray, OhioLINK; Matthias Razum, FIZ Karlsruhe; Bill Parod, Northwestern U; Adam Soroka, U. of Virginia; Thorny Staples, U. of Virginia; Ross Wayland, U. of Virginia

Review of Minutes from 6 Dec 2005

One of the assumptions in the 6 Dec 2005 minutes is: "an object repository built on FEDORA should be seen as a write rarely, read mostly application". Some participants noted that the repository is viewed as part of a working tool for scholars, which makes "writes" not so 'rare'. This kind of assumption is useful for characterize FEDORA-based application as something other than a RDBMS. A clarifying comparison is the frequency of transactions when "saving a file" (once every 10 minutes) versus "banking transactions" (thousands of transactions/second). This characterization helps define the efficiency needs of reads over writes.

A revised assumption was offered: "FEDORA is typically an application where reads typically exceed writes."

It was also noted that the working group charter and the assumptions make reference to "BPEL" as the only workflow orchestration engine, excluding other tools such as JBPM. This is not intentional, and does not necessarily reflect a decision to use only BPEL-based engines. The participants noted, however, that no one envisions writing a new workflow engine from scratch.

Participant Desires and Status Review of Workflow-related Activities

FIRE/Workflow 0.1 client update

The December meeting envisioned work on the NSDL NCCS/FIRE client as the starting point for embedding workflow capabilities into FEDORA. The FEDORA core development team looked at both ActiveBPEL and JBPM for integration into the FEDORA-SF and determined that within their available time that the core developers needed to hardwire the workflow into the client with the desire to apply a workflow orchestration engine in a later release.

FEDORA Repository update

Transforming FEDORA to be built as a WAR file is almost completed. Adding generic "messaging" in FEDORA is one task on the list of desired for core developer team to prioritize and assign in September.

Max Plank Society: eSciDoc

Overview

Max Planck Society is made up of 82 institutes, each with many working groups. The ESciDoc_Project is building one repository for all of these institutes and working groups in disciplines ranging from humanities to sciences. In interviews with users, the eSciDoc development team has determined that each user group will likely want a near infinite diversity of combination of workflow steps. eSciDoc developers have built a framework layer around FEDORA and will be putting a workflow engine for human-in-the-loop and process-to-process steps.

Update

The eSciDoc development team has decided to use JBPM. In an experiment with ActiveBPEL, without using advanced tools (at the time of the experiment, the ActiveBPEL editor tool was not freely available), setting up a workflow involving interactions between two services took several days and editing 13 BPEL-related files as opposed to just editing one file with JBPM. Their thinking is that the eScieDoc developers will interview users to model the end users' workflow needs, then create the workflow script in JBPM for that group.

Harris Corporation

Update

Submitted a bid for NARA's ERA system (records management archive) but were not successful. Harris Corp. is still working with FEDORA and is looking for other opportunities to use it in service integration proposals. Harris Corp. is using the IBM engine (has both the human-in-the-loop and the process-to-process capabilities); it is arguably the most advanced out there, but it is still not mature as this field continues to grow. Existing engines "make it look easy" by limiting the kinds of workflow activities that can be done, and open source implementations are many steps back because the advanced tools are only now coming out.

U. of Hull: RepoMMan

Desires

The RepoMMan project seeks to give researchers some 'extra services' almost without them being aware of it: managing work in progress; access for themselves and small collaborative group; safety (backup) of work -- and be as easy to use a network drive on their desktop plus interface for repository-related activities. The RepoMMan project is funded to bring a BPEL workflow to a FEDORA system to fulfill a part of this vision.

The development team has surveyed and interviewed researchers about how they did research to understand what repository processes and features might help them. In the autumn, the development team plans to conduct the same survey and interview techniques to get the teaching/learning needs and administrator needs. They are managing the evolving expectations for a general document management system from university's administration.

Technology

Use of ActiveBPEL is mandated by the JISC funding. Initially, the inability to validate FEDORA's WSDL was a showstopper, but the WSDL was recently rewritten by the FEDORA core developer team and is being tested now. The RepoMMan developers have gained a great deal of experience with the BPEL engine outside of FEDORA. With a rewritten WSDL, the development is expected to move faster.

Discussion

Overlap with FEDORA Preservation Services Working Group

It was noted and discussed that the Preservation Services Working Group has begun defining the 'events' associated with objects in the repository and was looking for an 'event management engine' of some sort. It is clear that there is some overlap in these working group activities and some coordination would be beneficial to both groups. The Preservation Services Working Group refers to this activity as "Event Management" which seems to be equivalent to our "Orchestration Management".

Areas where the Workflow Working Group can make a meaningful contribution

It was observed that collectively we're reaching for the final plateau of the desired service, but there are many layers from where we are now to there that needs to be built (e.g. message-oriented middleware). In the course of the afternoon discussion, two work areas were identified.

First, defining and coding the building blocks of repository activities as discrete, WSDL-addressable web services. This work needs to be done in conjunction with the FEDORA Preservation Services Working Group -- building on their existing work and aid in specifying these events as discrete web services. These 'event' building blocks are likely to be generic to many workflow orchestration engines or other mechanisms to build workflow steps into an application.

Second, building a reference implementation of a workflow stack into the FEDORA-SF using open source components. It is envisioned that this stack will consist of a "workflow engine" layer (BPEL-based, JBPM-based, other, or a combination of these); a "workflow management system" that provides the human interface to instantiate and track workflow scripts in flight (interfaces could be JSR-168 portlets, dedicated web servlet, a desktop Swing-based app, etc.); and a "workflow editor system" for creating new workflow scripts.

Next steps

  1. Create a narrative of these two work areas
  2. Participants review Preservation WG workflows
    1. Peter will get with Ron Jantz to begin the dialog
    2. WG members will review the Preservation WG "events" as the basis of creating the orchestration "building block" activities