LYRASIS’ “Reposervice” Setup Pushed to GitHub

Earlier this month published ‘reposervice’ to GitHub. Reposervice is a “self-contained” Islandora installation source tree that is intended to smooth the LYRASIS deployment of repository services between development servers, a staging server and production servers. It is a bit of a work-in-progress at the moment, but others might find it useful as well.

(By the way, if you had looked at Reposervice prior to June 18th, you may have noticed a missing critical element — the Drupal submodule. Not because you couldn’t add Drupal yourself but because the Reposervice clone has relative soft symlinks to the Islandora modules positioned in the top level Reposervice directory.)

The goals of this setup are listed in the README file:

  • Put (most) everything in a self-contained directory using relative paths for most components with a configuration script that generates configuration files for absolute paths.
  • Make it easy to track the upstream Islandora work so that you can bring selected commits into your own environment, if desired [using git submodules].
  • Put the configuration of Fedora Commons, FedoraGSearch, SOLR, and other associated components under version control.
  • Use Drupal Features to store the Drupal configuration and put it under version control.
  • Support multi-site setups for separate Islandora/Drupal instances using a common Fedora Commons, SOLR, and djatoka installation.

The first four bullets are there along with hints of the fifth. (There is some as-yet unfinished, uncommitted code that automates much of the work of creating multi-site setups under a single Drupal installation.)

When I sent a note about this to the islandora community mailing list, I got a helpful reply back from Nick Ruest pointing to some work that Graham Stewart of the University of Toronto had done using Chef.

Date: Thu, 13 Jun 2013 12:39:50 -0400
From: Nick Ruest
Subject: Re: [islandora] A ‘DevOps’ Configuration for Islandora

I nearly forgot! Graham Steward at UofT has a few recipes up in his
Github account[1] and there is a recording of his presentation from the
2012 Access[2].



The recording of the presentation is a great introduction to Chef from a library perspective; Graham builds an Islandora 6.x installation from scratch in 624 seconds. The Ruby-based islandora12.rb recipe indeed has a great deal of resemblance to the bash scripts I was creating to deploy the components into the central directory tree. I’m going to have to add Chef to my list of things to learn, and Graham’s call of cooperation in building library-oriented recipes is a compelling one.

There are a few LYRASIS-specific things in here, but I’ve tried to make the basic building blocks replicable for others. This git repo, as it is further developed and documented, will be the foundation of a presentation I’m giving at Open Repositories next month. Comments, questions, observations, and even pull requests (should you find this setup useful in your own work) welcome!

Open Repositories 2011 Report: Day 2 with DSpace plus Fedora and Lots of Lightning Talks

Today was the second day of the Open Repositories conference, and the big highlight of the day for me was the panel discussion on using Fedora as a storage and service layer for DSpace. This seems like such a natural fit, but with two pieces of complex software the devil is in the details. Below that summary is some brief paragraphs about some of the 24×7 lightning talks.
Continue reading

Open Repositories 2011 Report: DSpace on Spring and DuraSpace

This week I am attending the Open Repositories conference in Austin, Texas, and yesterday was the second preconference day (and the first day I was in Austin). Coming in as I did I only had time to attend two preconference sessions: one on the integration — or maybe “invasion” of the Spring Framework — into DSpace and one on the introduction of the DuraCloud service and code.
Continue reading

Fedora plus Sakai, Any Interest?

There was a time when I was moving in both the worlds of the Sakai Collaborative Learning Environment and the Fedora Commons digital content repository. It seemed like a good idea to bring these two worlds together — Fedora as a content repository for Sakai learning objects. Back in 2006, I logged a ticket in Sakai’s tracker to see if anyone was interested. This morning I got notification that they are thinking of closing the ticket.

I’ve since moved away from both communities into other areas of interest, but felt one final duty of stewardship over this idea before it drifts out to sea. Perhaps integration between Sakai and Fedora Commons is already happening and this ticket is anachronistic. Perhaps this wasn’t such a good idea to begin with and it should die on the vine. Perhaps someone else will think this is a good idea and become the champion for it. At this point, though, my professional interests are elsewhere and I can’t carry this one forward.

Here is the text of the message:

(View Online)
Issue Type: Feature Request
Feature Request
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Peter Murray


 View all

 View comments

 View history
Fedora as Sakai’s Content Repository 

23-Feb-2010 06:32
Created: 27-Apr-2006 08:20

The following issue has been updated.
Updater: David Horwitz

Date: 23-Feb-2010 06:32
MAINT TEAM REVIEW: This feature request is currently unassigned and will be reviewed. In line with stated Jira practice [] Feature requests that are not going to be implemented will be closed with a status of "wont fix". If you intend implementing this issue please ensure that its up to date and assigned correctly

Project: Sakai
Components: Content service (Pre-K1/2.6), Resources
Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1

Provide an alternative to the existing ContentHostingService interface that stores content in database tables, directories, and files with one that stores and retrieves content in a Fedora Digital Object Repository.

Digital Preservation Activities: NSF’s “DataNet” and the NSF/Mellon Blue Ribbon Task Force

The past few weeks have seen announcements of large digital preservation programs. I find it interesting that the National Science Foundation is involved in both of them.

Sustainable Digital Data Preservation and Access Network Partners

The NSF’s Office of Cyberinfrastructure has announced a request for proposals with the name Sustainable Digital Data Preservation and Access Network Partners (DataNet). The lead paragraph of its synopsis is:

Science and engineering research and education are increasingly digital and increasingly data-intensive. Digital data are not only the output of research but provide input to new hypotheses, enabling new scientific insights and driving innovation. Therein lies one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams. This solicitation addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.

The introduction in the solicitation goes on to say:

Chapter 3 (Data, Data Analysis, and Visualization) of NSF’s Cyberinfrastructure Vision for 21st Century Discovery presents a vision in which “science and engineering digital data are routinely deposited in well-documented form, are regularly and easily consulted and analyzed by specialists and non-specialists alike, are openly accessible while suitably protected, and are reliably preserved.” The goal of this solicitation is to catalyze the development of a system of science and engineering data collections that is open, extensible and evolvable.

The full program solicitation is available (here’s a hint if the left side of the PDF version is cut off when printing — in the Acrobat print dialog, reduce the document size to 94% of the paper size). There will be up to five awards of $20 million each for five years with the possibility of continuing funding.

The part that I find interesting, from a library technologist’s perspective, is this: “Successfully providing stability for long-term preservation and agility both to embrace constant technological change and to engage evolving research challenges requires a novel combination of expertise in library and archival sciences, computer, computational, and information sciences, cyberinfrastructure, and the other domain sciences and engineering. A goal of this solicitation is to support the creation of new types of organizations that fully integrate all of these capabilities.” Undertaking such an endeavor must be a truly cross-discipline attempt — marrying up the best of library and archive practices with other forms of science and engineering to accomplish the task.

It would seem that the Fedora Commons platform is a great starting point for the technological infrastructure. It is as if the solicitation could have been written with Fedora in mind: “content heterogeneity requires that each awardee create a resource that serves a broad disciplinary and subject matter range, manages a diverse array of data types and formats, and provides the capability to support collections at the research, resource, and reference levels.” Another component of the program goals — developing models for economic and technological sustainability — is similar to OhioLINK’s attempts to aggregate the creation and support of content repositories at state-wide economies of scale.

Peter Brantley, Executive Director of the Digital Library Federation, has established a group on Nature’s Network service for those who want to collaborate or get further information (open to participation from anyone, but registration is required). There is a kernel of a group in Ohio that are considering the possibility of a joint application; if interested, please let me know. Peter also has a post on the topic on O’Reilly’s Radar.

Blue Ribbon Task Force on Sustainable Digital Preservation and Access

The National Science Foundation (NSF) and the Andrew W. Mellon Foundation are funding a blue-ribbon task force to address the issue of economic sustainability for digital preservation and persistent access. Co-chaired by Fran Berman of the San Diego Supercomputer Center and Brian Lavoie of OCLC, the task force will meet with over the next two years to look at the issue. It is intended as an international effort; support is also coming from JISC in the U.K.

In its final report, the Task Force is charged with developing a comprehensive analysis of current issues, and actionable recommendations for the future to catalyze the development of sustainable resource strategies for the reliable preservation of digital information. During its tenure, the Task Force also will produce a series of articles about the challenges and opportunities of digital information preservation, for both the scholarly community and the public.1

The only news so far appears to be the press releases linked above. Now I recognize it is a two year effort and they only got started late last month, but I half expect some public face to the work of the task force to be available somewhere, even in the early stages. If DLTJ readers see anything, please mention it in this posting’s comments.


  1. From the OCLC press release. []

Disseminators As the Core of an Object Repository

I’ve been working to get JBoss Seam tied into Fedora, and along the way thought it would be wise to stop and document a core concept of this integration: the centrality of Fedora Disseminators in the the design of the Ohio Digital Resource Commons. Although there is nothing specific to JBoss Seam (a Java Enterprise Edition application framework) in these concepts, making an object “render itself” does make the Seam-based interface application easier to code and understand. A disseminator-centric architecture also allows us to put our code investment where it matters the most — in the repository framework — and exploit that investment in many places. So what does it mean to have a disseminator-centric architecture and have objects “render themselves”?

How It Works

Sequence Diagram This is a sequence diagram showing all of the pieces:

  • Browser: The user’s browser
  • DRCseam: A JBoss Seam application that generates the user interface and performs much of the business logic. DRCseam, however, does not render the objects or their metadata into browser-consumable artifacts. Read on!
  • Fedora: A basic Fedora digital object repository.
  • Disseminator: A simple servlet that performs various transformations on object datastreams to render content usable by the browser.

With these components in play, here is the description of a sequence to render a page showing the metadata for a repository item:

  1. request item page: The browser follows a link to an item detail page.
  2. API-A ObjectProfile: The interface application asks the repository for the ‘Object Profile’ of the item…
  3. return object profile: …which the repository returns. The interface application now knows basic details about the object: that it exists, the creation and updated timestamps, and so forth.
  4. API-A DatastreamDissemination for fullDisplay: The interface application needs the object’s metadata display, so it asks the object to “render itself” by making a call to the Fedora repository for the object’s “FullDisplay” disseminator.
  5. call getFullDisplay: The Fedora repository in turn calls the object’s disseminator with the Persistent Identifier (PID) of the object as a parameter.
  6. API-A Datastream for metadata: Using the object PID, the disseminator calls back to the Fedora repository for the descriptive metadata datastream (the DC datastream, in this case)…
  7. XML metadata: …which the Fedora repository returns.
  8. transform metadata: The disseminator performs some transformation or derivation on the descriptive datastream to create an XHTML representation…
  9. XHTML fragment: …which it returns to the Fedora software…
  10. XHTML fragment: …which is returned to the interface application…
  11. XHTML page: …which inserts it at the appropriate place in the XHTML page it has built and returns the XHTML page to the browser.

Step #4 is where we diverge from previous architectures. Instead of making the interface application transform the metadata into a human-readable format, the interface application calls the object’s disseminator to do the job.

The Heart of It All: The Disseminator

The key to this architecture is asking the object to “render itself”. This puts the task of creating the appropriate representation at the object level. The object can be an image, a video, a spreadsheet, or a PDF file. More importantly, the object can be a PDF of a journal article or a PDF of a thesis; in both cases the metadata describing that PDF file will be different (journal/volume/issue in one case and department/degree/advisor in the other).

Rather than putting special case code in the interface application to render the description of the journal article one way and the thesis another way, that special case code is bound to the object in the form of a “disseminator”. The disseminator methods for the journal article and the thesis share the same name — getFullDisplay — but will return entirely different XHTML fragments — one for a journal article and one for a thesis. For both objects, though, the interface application will make a call to the object in the Fedora repository asking for the output of each getFullDisplay dissemination. In the case of a Dublin Core description, the dissemination output could look like this:

<table class="drc_dublinCore_table">
<tr class="drc_dublinCore_row drc_dublinCore_title">
<td class="drc_dublinCore_label drc_dublinCore_title">Title:</td>
<td class="drc_dublinCore_value drc_dublinCore_title">Jester Example</td>
<tr class="drc_dublinCore_row drc_dublinCore_identifier">
<td class="drc_dublinCore_label drc_dublinCore_identifier">Identifier:</td>
<td class="drc_dublinCore_value drc_dublinCore_identifier">demo:exampleObject</td>

You’ll note that there is a liberal application of CSS styles on all of the XHTML elements, allowing for the look of the dissemination to be further transformed in the browser via CSS stylesheets. A getFullDisplay dissemination for a journal article could look like this:

<table class="drc_ejc_table">
<tr class="drc_ejc_row drc_ejc_title">
<td class="drc_ejc_label drc_ejc_title">Article Title:</td>
<td class="drc_ejc_value drc_ejc_title">Taking Advantage of Fedora Disseminations</td>
<tr class="drc_ejc_row drc_ejc_volume">
<td class="drc_ejc_label drc_ejc_volume">Volume:</td>
<td class="drc_ejc_value drc_ejc_volume">3</td>
<tr class="drc_ejc_row drc_ejc_issue">
<td class="drc_ejc_label drc_ejc_issue">Issue:</td>
<td class="drc_ejc_value drc_ejc_issue">2</td>

Looking at the Pieces

There is a demonstration system set up for a short period of time that shows all of the pieces. First, the disseminator:


Next, how this disseminator looks as accessed through the Fedora repository:


And finally, how this result looks through the Seam-based interface application. (A note about this application — only this URL works at the moment even though there are other links on the page. This is also the ‘trunk’ version of our interface code, so it is likely to change and/or break and/or work better at any time.)


Fedora Setup

In addition to the Seam-based interface application and the disseminator code, there is setup required at the Fedora repository — specifically, the creation of a Behavior Definition (bDef) that describes the disseminators that the objects share in common and the creation of a Behavior Mechanism (bMech) that describes the implementation of that definition for a particular object type. Below is a series of screen shots that show the steps to create the bDef and bMech.

Disseminator Behavior Definition (bDef)

Using the Fedora Admin client, under the “Builders” menu, select “Behavior Definition Builder”. The first pane, “General” parameters, use a specific PID of ‘demo:bDefExample‘ and put something in for the Behavior Object Name, Behavior Object Description, and one of the Dublin Core Metadata fields. (It doesn’t matter what you put in for these values.)
Fedora Admin Behavior Definition Builder “General” pane

Under the “Abstract Methods” pane, create new definitions for each of the disseminator methods.
Fedora Admin Behavior Definition Builder “Abstract Methods” pane

Under the “Documentation” pane, put something in the first entry. Again, it doesn’t matter what is put in for these values, but they are required.
Fedora Admin Behavior Definition Builder “Documentation” pane

Select “Ingest” at the bottom of the window, and the demo:bDefExample bDef will be created. Alternatively, you could import the demo:bDefExample saved in the DRC source code repository (choose “original format” at the bottom of that page).

Disseminator Mechanism Definition (bMech)

The bMech is a little more complicated. Under the “Builders” menu, select “Behavior Mechanism Builder”. The first pane, “General” parameters, use a specific PID of ‘demo:bMechExample‘ and put something in for the Behavior Object Name, Behavior Object Description, and one of the Dublin Core Metadata fields. (It doesn’t matter what you put in for these values.) In the “Behavior Definition Contract” pick the bDef just created (demo:bDefExample).
Fedora Admin Behavior Mechanism Builder “General” pane

In the “Service Profile” pane, put in values in the “General” area (it doesn’t matter what). In the Service Binding area, make sure the Message Protocol is HTTP GET, put in text/html, text/xml for Input MIME Types and put in text/html, text/xml, text/plain for Output MIME Types.
Fedora Admin Behavior Mechanism Builder “Service Profile” pane

Under the Service Methods pane, put in http://localhost:8080/BaseDisseminator for the Base URL. (The disseminator is also loaded in the same servlet as the Fedora repository and the Seam interface application, and it is loaded at the “/BaseDisseminator” context path in the servlet.) Create Service Method Definitions that correspond to the Abstract Methods in the bDef.
Fedora Admin Behavior Mechanism Builder “Service Methods” pane

Select “Properties” for each one of the Service Method Definitions in turn. “echo” is a unique disseminator method that simply echos back the context parameters of the disseminator request. This is useful for seeing exactly what the Fedora server is going to give to the disseminator.
Fedora Admin Behavior Mechanism Builder “Service Methods” Definitions for “echo” Method

With the exception of “echo” all of the other Service Method Definitions are the same. The Method Binding consists of the disseminator method followed by a slash and the PID placeholder followed by a question mark and ‘dc’ equals the DC placeholder. Since the Method Binding field has two placeholders, there are two entries in the Method Parameter Definitions area. The first is for PID — a “Default” parameter that is required and passed by value to the disseminator. The default value is the special value $PID, which the repository software will replace with the PID of the object as the disseminator is called. The second is for DC, a “Datastream” parameter that is required and passed to the disseminator by URL reference. The disseminator doesn’t actually use this reference to a datastream, but it is a requirement that all bMechs pass a datastream of one sort or another to the disseminator.
Fedora Admin Behavior Mechanism Builder “Service Methods” Definitions for “getFullDisplay” Method

If you have followed all of the steps so far, under the “Datastream Input” pane there will be one entry for DC in the table. The only thing that needs to be done here is adding “text/xml” in the MIMEType column.
Fedora Admin Behavior Mechanism Builder “Datastream Input” pane

Under the “Documentation” pane, put something in the first entry. Again, it doesn’t matter what is put in for these values, but they are required.
Fedora Admin Behavior Mechanism Builder “Documentation” pane

Select “Ingest” at the bottom of the window, and the demo:bMechExample bMech will be created. Alternatively, you could import the demo:bMechExample saved in the DRC source code repository (choose “original format” at the bottom of that page).

Sample Object

The last step is to add this disseminator bDef/bMech combination to an object. Edit any object in the repository and go to the “Disseminators” pane. If there are other disseminators already defined for this object, select “New” along the left side. Put in a label — any label will do. Next to “Behavior defined by…” select demo:bDefExample. Then next to “Mechanism” select demo:bMechExample. The admin client will prompt for a DC binding; select “Add” and choose the DC datastream in the pop-up window.
Fedora Admin Sample Object’s “Disseminators” pane in progress

Select “Save Changes” at the bottom. The completed disseminator looks like this:
Fedora Admin Sample Object’s “Disseminators” pane completed

There is a sample object in the DRC source code repository that has the disseminator already defined.


Comments about this architecture are certainly welcome. I’m sure I’ll be writing about it more in the future, but here are some thoughts at this point:

Future Directions

In this case, I’m using an XSLT stylesheet to transform the Dublin Core XML into an XHTML table. That stylesheet is stored in the BaseDisseminator WAR file. The stylesheet could just as easily be a datastream of a special “formatting” object in the repository. One of the key distinctions of OhioLINK’s Fedora implementation is that institutions using the repository will be able to “brand” their content in any way they choose. Having the flexibility of storing metadata transformations just like any other object in the repository would seem to be of great advantage in that scenario.

On a related front, this style of implementation would be greatly enhanced by the work of the Fedora Content Model Dissemination Architecture (CMDA). Because disseminators must be bound to specific objects rather than classes of objects, management of the variety of bMechs in a scenario such as this will likely become difficult very soon. I’m heartened by the fact that the CMDA work is going on and will cut our management complexity dramatically when it becomes available.


These concepts are based in part on the work of the Digital Library Federation’s Aquifer Asset Actions technical working group and discussions among members of the OAI Object Reuse and Exchange technical committee as well as conversations with many Fedora developers and implementors. Thanks, everyone.

[Update 20070426T1147 : Fixed the sample object URL. Thanks, Jodi.]

The text was modified to update a link from to on January 19th, 2011.

Presentation Summary: “MPTStore: Implementing a fast, scalable, and stable RDBMS-backed triplestore for Fedora and the NSDL”

Chris Wilper gave this presentation on behalf of the work that he and Aaron Birkland did to improve the performance of the Fedora Resource Index.

Version 2.0 of the Fedora digital object repository software added a feature called the Resource Index (RI). Based on Resource Description Framework (RDF) triples, the RI provided quick access to relationships between objects as well as to the descriptive elements of the object itself. After about two years of use using the Kowari software, the RI has pointed to a number of challenges for “triplestores”: scalability (few triplestores are designed for greater than 100 million triples); performance; and stability (frequent “rebuilds”).

The real motivation behind experimenting with a new triplestore, however, was the NSDL use case. The National Science Digital Library (NSDL) is a moderately large repository (4.7 million objects, 250 million triples) with a lot of write activity (driven by periodic OAI harvests; primarily mixed ingests and datastream modifications). The NSDL data model also includes existential/referential integrity constraints that must be enforced. Querying the RI to determine correct repository state proved to be difficult: Kowari is aggressively buffering triple, sometimes on the order of seconds, before writing them to disk. Flushing the buffer after every write is also computationally expensive (hence the drive to use buffers in the first place).

The NSDL team also encountered corruption under concurrent use and with abnormal shutdowns, forcing the rebuild of the triplestore. And the solution was not scaling well; performance was becoming notably worse. In looking for solutions other triplestores were considered but rejected. Using a RDBMS seemed attractive — efficient transactions, very stable, generally speedy — but a “one big table” paradigm to store all of the relations did not seem to give them a desired scalability.

NSDL developers observed that total number of distinct predicates is much lower than the number of predicates or objects; NSDL has about 50 distinct predicates. Based on this observation, their solution, called “Mapped Predicate Tables,” creates a table for every predicate in the triplestore. This has several advantages: a low computational cost for triple adds and deletes, queries for known predicates are fast, complex queries benefit from the relatively mature RDBMS planner having finer-granularity statistics and query plans, and flexible data partitioning to help address scalability. This solution comes with several disadvantages, however: one needs to manage predicate to table mapping, complex queries crossing many predicates require more effort to formulate, and with a naive approach simple unbound queries scale linearly with the number of predicates.

So the NSDL team created the MPTStore triplestore and contributed it back to the Fedora core developers for use by the community. MPTStore is a Java library that handles all of the predicate mapping and accounting behind the scenes. The basic API remains the same as for other triplestores, performing triple writes and queries, and the library hides all of the implementation details of translating queries from a particular language (SPO, SPARQL) into SQL statements. The library is also designed to expose transaction/connection semantics should the developer wish to have direct access to the predicate tables.

A solution like MPTStore is well suited for NSDL use case. The NSDL team was very familiar with the operations of RDBMS administration: performance tuning, backups, etc. The stored triplestore data is transparent and “hackable” — adhoc SQL queries and analysis are relatively simple. In fact, the RDBMS triplestore helped track down Fedora middleware bugs that resulted in an inconsistent state. Fixing these bugs also improved the performance of the Kowari-based RI.

[Updated 20070129T1447 to include links to Chris’ presentation on SlideShare.]

Open Source for Open Repositories — New Models for Software Development and Sustainability

This is a summary of a presentation by James L. Hilton, Vice President and CIO of University of Virginia, at the opening keynote session of Open Repositories 2007. I tried to capture the esessence of his presentation, and omissions, contradictions, and inaccuracies in this summary are likely mine and not that of the presenter.

Setting the stage

This is a moment in which institutions may be willing to invest in open source development in a systematic way (as opposed to what could currently be characterized as an ad hoc fashion) driven by these factors:

  • Fear. Prior to Oracle’s hostile take-over of PeopleSoft, the conventional wisdom of universities was that they needed to buy their core enterprise applications rather than build them. In doing so, they sought the comfort of buying the security of a leading platform. Oracle’s actions diminished that comfort level. Blackboard acquisition of WebCT and lawsuit against a competitor does not help either.
  • Disillusionment and ERP fatigue. What was largely thought to be an outsourced project was found to be an endless upgrade cycle. Organizations need to build entire support units to handle the upgrades for large ERP systems rather than supporting the needs of the users.
  • Incredulity — we’re supposed to do what? The application of technology typically has a disruptive impact (cannot predict the end), the stakes are incredibly high (higher education and/or research could be lost in a decade), it tends to be expensive, and the most common survival strategy is to seed many expensive experiments in the hopes that one will be in the right place at the time the transition needs to happen. The massive investment anticipated for technology to support academic computing (libraries, high-performance clusters, etc) will pale in comparison to the investment in administrative computing.
  • Rising tide of collaboration. This is a realization that the only way to succeed is through collaboration. To paraphrase Hilton, “In the new order it will be picking the right collaborative partners where the new competitive advantage will come from.”


Hilton offered these definitions and contrasts as a way to frame the rest of his discussion. First was Open or “free” software. Free as in beer, or free as in “adopt a puppy.” The software comes with the ability to do with as you want with the code, not just the ability to use the code. They he defined the term License as a contract — what ever you agree to you are bound to; you cannot use copyright law to protect you. The rules and conditions that are applied to the software do matter.

Lastly, he talked about Copyleft or “viral” licensing. There are different interpretations of “open” in open source. “Copyleft” has come to mean that code should be freely available to be used and modified, and it should never by locked up. GPL is an example. This is often called “viral” because if you include software with this license in any other work that is released, the additional software must be released under the same license. This is seen by some as valuable because it prevents open source from being encircled by proprietary code. Copyleft is contrasted with an “open/open” license — you can do whatever you want to do with a code under this license. An “open/open” license places no restrictions on what users do with code in derivative software packages.

Case Study — Michigan’s Sakai Sojourn

Hilton briefly described why UMich went down the Sakai path in 2001-2002:

  • Legacy system with no positive trajectory forward. It could never be released into open source; all of the development would have to be carried on UMich’s shoulders forever.
  • Saw market consolidation in CMS. This was mostly evident in the commercial sector with Blackboard and WebCT being the dominant choices. They had concerns about the cost of licenses in this environment down the road.
  • Saw the potential of tapping the institution’s core competencies and starting a virtuous cycle of development, teaching and research. Or, put another way, they didn’t want core competencies in teaching and research held hostage to a commercial development cycle.
  • Strategic desire to blur the distinction between the laboratory/classroom and between knowledge creation/digestion. They realized that the functions of a research support tool and a course support tool were pretty much the same under different skins, and they sought to blur that distinction even more.
  • NRC report and the need for collaboration. UMich was willing to fund the project two years internally but knew after that need to find collaborative partners by the fifth year in order to be declared a success.
  • A moment of time opportunity that synchronized the development process of several partners with funding provided by the Mellon Foundation.

There were also specific goals for the Sakai project. The new system had to replicate the functionality of existing course and research collaboration environments. They also wanted experience in finding partners willing to collaborate. Hilton said, “Sakai was/is at least as interest from a collaboration perspective as it is from the technology perspective.” Bringing together disparate organizations with different beliefs on how things should be done is a challenge. Additionally, they wanted to get better as an institution at discerning open source winners; it shouldn’t be like a lottery. Lastly, they wanted to implement software parts that were not built at UMich. Each partner institutions committed to implementing the same thing even if wasn’t built at that institution. This is tough to do, but they knew they needed to do it for their own good in the long run.

What happened? Not only did the original partners show up, but the community came, too. Even more interesting was that the community was formed with dues-paying members — even in a world where the software is free. It became a vibrant community, too, with a conference every six months. Sakai was released under an open-open license model, and corporate partners showed up as well (selling support services, or hosting services, or hardware for the software). The software did grow up and left its home; a separate foundation now holds the intellectual property of the code (originally partners assigned copyright to UMich). They also positioned Sakai to be a creditable threat to the commercial entities in order to force them to the standards table.

Takeaway lessons that generalize to open source development

First, the benefits of open source development.

  • destiny control (but only when you really need to drive). having the control is not always a good thing. Is it worth the effort? Is the project core to the institution’s mission? (Does it directly support scholarship and teaching?)
  • builds community and camaraderie (in the case of Sakai, both locally at UMich and internationally)
  • unbundles software ownership and its support. inspires more competition in the implementation and support space.
  • community source provides institutions an opportunity to leverage links between open source, open access and culture of the academy/wider world (a.k.a. put up or shut up)

Then, the challenges of open source development.

  • Guaranteeing clean code (IP) is hard (read as “impossible”). A certain amount of faith about the code they get and there needs to be consideration for mitigating risks.
  • Figuring out who is authorized to license institutionally-owned code is challenging and then you have to convince them to give it away. No one in the institution typically has been appointed or given the authority to release code. One of the things that the sakai licensing discussions highlighted was institutional differences in requirements and aesthetics.
  • Patent quagmire always looming. How do you know your software is not infringing? How do you make sure you don’t inadvertently give away all institution patents? Be careful when looking at licenses from an institutional perspective versus an individual perspective.
  • There is also the inevitable lawsuit risk. Or, as your counsel might say to you, “Let me get this straight, we can get sued but there’s no one we can sue.”

Then, some discoveries that they made along the way.

  • An open source project not a silver bullet. The commitment to build rather than buy must align with institutional priorities and competencies; it is not right for every project/application.
  • Licensing does matter; it is a contract: whatever you stick in its rules is what sticks. There are probably have too many open source license options and some sort of standardization is needed. Also keep in mind that if you release something under an open/open license, you can’t include any copyleft components.
  • Communities don’t just happen, they require: specific shared purpose (when visions vary, or when they change, collaborations struggle); and governance (e.g., separate board with dedicated developers sitting between institutions). Cooperation (“I won’t hurt you if you don’t hurt me”) is not collaboration.
  • Open (community) source requires real project discipline. “It is as spontaneous as a shuttle launch.” Along the way one needs to learn to balance pragmatics and ideals. One also needs to learn to trust your partners. “It really requires learning to let go.” Letting go, and having the community make the decisions, may be the quickest path to efficiency.

Reflection on open/community source for repositories

Repositories are at the center of everything at the institution. It connects with the library, with the presses/scholarly publishing operation, with classroom teaching, with the laboratory, and with the world. It is a core piece of of infrastructure for the university of the 21st century. As institutions, we need to make sustaining investments in our repositories.

Hilton sees three different approaches to “community” in the existing projects:

  • dspace: community of user/developers. The come together to talk about what they want to do, write code, and support each other. Clearly there are enthusiastic users as developers.
  • eprints: appears as like a vendor talking with customers wanting the community help shape the direction.
  • fedora: in transition from a combination of the previous two models moving towards a Sakia-like model. it will require institutions to make commitments to it.

In the end, Hilton asked some thought-provoking questions. Is now the time for institutional investment in open/community source? Will a coherent community (or communities) emerge in ways that are sustainable? — is there a shared vision?

The text was modified to update a link from to on January 19th, 2011.

A Vision for FEDORA’s Future, an Implementation Plan to Get There, and a Project Update

This morning, Sandy Payette of Cornell University and FEDORA project co-director, gave an update on the FEDORA project including a statement of a vision for FEDORA’s future, information about the emerging FEDORA Commons non-profit, and a status report/roadmap for the software itself. Below is a summary based on my notes of Sandy’s comments and slide content.

Vision for FEDORA’s Future

From her perspective, Sandy sees many kinds of projects using FEDORA, and she sees them fall into these general categories: Scholarly Workbenches — capturing, managing and publishing the process of scholarship; Linking Data and Publications — complex objects built up of relationships with different types of internal and external objects; Reviews and Annotations of Objects — blogs and wikis on top of information spaces; collaborations surrounding a repository object; and Museum Exhibits with K-12 Lesson Plans.

Based on these observations, she can envision the evolution of FEDORA as an open source software for building robust information spaces with these major components:

  • repository services: manage, access, versioning, and storage of digital objects
  • preservation services: repository integrity checking, monitoring, alerting, migration, and replication
  • process management services: workflow centered around digital objects and messaging with peer applications
  • collaboration services: annotation, discussion, and rating of digital objects

The collaboration services suite has not been part of the core FEDORA project to date. Other people have found clever ways to put services such as blogs and wikis on top of a FEDORA content repository, but there are functions that can be put into the FEDORA system that can enable and enhance collaborative services.

FEDORA, of course, does not exist in isolation from other activities on the internet, and there are implications of what is commonly called “Web 2.0” on the FEDORA system. The key theme of Web 2.0 is an “architecture of participation:” the capability to remix and transform data sources — building on top of objects that already exist — to harness collective intelligence of the community. Some specific examples are collaborative classification (Del.Icio.Us), content sharing and feedback (YouTube), power of collective intelligence (Wikipedia, amazon reviews), and alternative trust models (such as ebay — one based on reputation). This emergent behavior is influencing upcoming generations of scholars and scientists; they will have a completely different expectations regarding the technology they use for learning and research.

Taken as a whole, the vision for FEDORA is to enable “object-centric” collaborations. FEDORA is evolving into an open source platform that integrates a robust core (repositories and enterprise SOA) with dynamic content access (collaborative applications and web access/re-use). It is a technology for complex digital objects. As contrasted with a technology such as Wikipedia’s MediaWiki — ideal for working with wiki-based resources — FEDORA is great for many different applications, including as a content store for wikis. In other words, one is not tied to one particular application or use case.

Fedora Commons Non-Profit

FEDORA as a project is evolving into FEDORA as an organization. That organization, called Fedora Commons, will be a non-profit to “to enable software developers to collaborate and create open source software (repository, services, tools) that enables information-oriented communities to collaborate in creating new forms of network-based knowledge objects (educational, scholarly, cultural) and that, ultimately, enables institutions to manage and preserve these information objects over time in a robust repository and service architecture.” FEDORA Commons will be a custodian of the software platform and the means to steer its direction.

Structurally, it is envisioned as a 501c3 (as in the section of the IRS tax code) non-profit charitable organization. There is a proposal to the Moore Foundation being prepared to receive a grant for the initial start-up funds for the Fedora Commons focusing on sustainability and community building. The Commons may also seek matching funds from other foundations (Getty, Mellon) in later years until the organization is fully self-sustaining. The current thinking is that the Commons will achieve “steady state” with its own business model in 2010. The startup funds will extend the funding for the core development team as well as fostering a community of contributors to the project and committers to the code base. The plans include several funded positions: board of directors, executive director, technology architect (supervising sysadmin and build master as well as developers), a financial/accounting specialist, and a communications specialist.

Sustainability in this context means increasing the installed base of FEDORA as well as moving towards a community leadership model. One model is the Eclipse Foundation with four technical councils (collaboration, repository, enterprise, preservation) with corresponding community outreach councils. The community will also need do develop an income-generating model, be it corporate membership (dues structure like Eclipse) and/or university and government members.

Fedora Project Status Report

Fedora 2.2 was released on January 19th, and Sandy went through the major changes and features. First is FEDORA as a web application; it has been refactored and repackaged so it can now run in different (even existing) servlet/web containers. Along with this is a new installer application that steps one through the process of bringing up the software. There is a “Quick” option to get running immediately and a “Custom” option to set Fedora up optimally for a particular environment.

Within FEDORA itself, datastreams can now have checksums, and this is supported with new repository configuration options. This enabled trusted client/server collaboration and offers on-demand integrity checking of the repository. The manner in which it handles authentication has changed as well; version 2.2 uses servlet filters instead of Tomcat realms. This decouples FEDORA authentication from Tomcat. Three filters come with the core software: username/password file, LDAP, and Pubcookie.

FEDORA 2.2 also includes several modules from community committers: GSearch (configurable search for datastreams in FEDORA); Journaling (replication/recovery module for repositories); and MPTStore (new high-performing triplestore).

Sandy also covered the roadmap. The Mellon Phase 2 grant runs through 4Q2007 and the work remaining content models, content model dissemination architecture, basic messaging service, and preservation services. Next is “FEDORA Enterprise” (in the form of a grant proposal in front of Mellon now ending in 2Q2009) to include workflow engine and supporting tools, message-oriented middleware for an enterprise service bus (ESB), and distributed transactions. Finally, the FEDORA Commons 501c3 work (starting 3Q2007) in two parts: the technical (evolution of the integrated platform) and community building (foster development and outreach, evolving a business model, and tapping ongoing sources of funding).

[Updated 20070129T1655 to correct the section of the U.S. Tax Code in the last paragraph. I don’t think we want anything to do with 26 USC 301c3.]

Open Repositories Presentation: Building an IR Interface Using EJB3 and JBoss Seam

Below is the outline of the Ohio DRC presentation from today’s FEDORA session at Open Repositories conference. Comments welcome!