Open Source for Open Repositories — New Models for Software Development and Sustainability

Posted on 8 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

This is a summary of a presentation by James L. Hilton, Vice President and CIO of University of Virginia, at the opening keynote session of Open Repositories 2007. I tried to capture the esessence of his presentation, and omissions, contradictions, and inaccuracies in this summary are likely mine and not that of the presenter.

Setting the stage

This is a moment in which institutions may be willing to invest in open source development in a systematic way (as opposed to what could currently be characterized as an ad hoc fashion) driven by these factors:

  • Fear. Prior to Oracle's hostile take-over of PeopleSoft, the conventional wisdom of universities was that they needed to buy their core enterprise applications rather than build them. In doing so, they sought the comfort of buying the security of a leading platform. Oracle's actions diminished that comfort level. Blackboard acquisition of WebCT and lawsuit against a competitor does not help either.
  • Disillusionment and ERP fatigue. What was largely thought to be an outsourced project was found to be an endless upgrade cycle. Organizations need to build entire support units to handle the upgrades for large ERP systems rather than supporting the needs of the users.
  • Incredulity -- we're supposed to do what? The application of technology typically has a disruptive impact (cannot predict the end), the stakes are incredibly high (higher education and/or research could be lost in a decade), it tends to be expensive, and the most common survival strategy is to seed many expensive experiments in the hopes that one will be in the right place at the time the transition needs to happen. The massive investment anticipated for technology to support academic computing (libraries, high-performance clusters, etc) will pale in comparison to the investment in administrative computing.
  • Rising tide of collaboration. This is a realization that the only way to succeed is through collaboration. To paraphrase Hilton, "In the new order it will be picking the right collaborative partners where the new competitive advantage will come from."

Distinctions

Hilton offered these definitions and contrasts as a way to frame the rest of his discussion. First was Open or "free" software. Free as in beer, or free as in "adopt a puppy." The software comes with the ability to do with as you want with the code, not just the ability to use the code. They he defined the term License as a contract -- what ever you agree to you are bound to; you cannot use copyright law to protect you. The rules and conditions that are applied to the software do matter.

Lastly, he talked about Copyleft or "viral" licensing. There are different interpretations of "open" in open source. "Copyleft" has come to mean that code should be freely available to be used and modified, and it should never by locked up. GPL is an example. This is often called "viral" because if you include software with this license in any other work that is released, the additional software must be released under the same license. This is seen by some as valuable because it prevents open source from being encircled by proprietary code. Copyleft is contrasted with an "open/open" license -- you can do whatever you want to do with a code under this license. An "open/open" license places no restrictions on what users do with code in derivative software packages.

Case Study -- Michigan's Sakai Sojourn

Hilton briefly described why UMich went down the Sakai path in 2001-2002:

  • Legacy system with no positive trajectory forward. It could never be released into open source; all of the development would have to be carried on UMich's shoulders forever.
  • Saw market consolidation in CMS. This was mostly evident in the commercial sector with Blackboard and WebCT being the dominant choices. They had concerns about the cost of licenses in this environment down the road.
  • Saw the potential of tapping the institution's core competencies and starting a virtuous cycle of development, teaching and research. Or, put another way, they didn't want core competencies in teaching and research held hostage to a commercial development cycle.
  • Strategic desire to blur the distinction between the laboratory/classroom and between knowledge creation/digestion. They realized that the functions of a research support tool and a course support tool were pretty much the same under different skins, and they sought to blur that distinction even more.
  • NRC report and the need for collaboration. UMich was willing to fund the project two years internally but knew after that need to find collaborative partners by the fifth year in order to be declared a success.
  • A moment of time opportunity that synchronized the development process of several partners with funding provided by the Mellon Foundation.

There were also specific goals for the Sakai project. The new system had to replicate the functionality of existing course and research collaboration environments. They also wanted experience in finding partners willing to collaborate. Hilton said, "Sakai was/is at least as interest from a collaboration perspective as it is from the technology perspective." Bringing together disparate organizations with different beliefs on how things should be done is a challenge. Additionally, they wanted to get better as an institution at discerning open source winners; it shouldn't be like a lottery. Lastly, they wanted to implement software parts that were not built at UMich. Each partner institutions committed to implementing the same thing even if wasn't built at that institution. This is tough to do, but they knew they needed to do it for their own good in the long run.

What happened? Not only did the original partners show up, but the community came, too. Even more interesting was that the community was formed with dues-paying members -- even in a world where the software is free. It became a vibrant community, too, with a conference every six months. Sakai was released under an open-open license model, and corporate partners showed up as well (selling support services, or hosting services, or hardware for the software). The software did grow up and left its home; a separate foundation now holds the intellectual property of the code (originally partners assigned copyright to UMich). They also positioned Sakai to be a creditable threat to the commercial entities in order to force them to the standards table.

Takeaway lessons that generalize to open source development

First, the benefits of open source development.

  • destiny control (but only when you really need to drive). having the control is not always a good thing. Is it worth the effort? Is the project core to the institution's mission? (Does it directly support scholarship and teaching?)
  • builds community and camaraderie (in the case of Sakai, both locally at UMich and internationally)
  • unbundles software ownership and its support. inspires more competition in the implementation and support space.
  • community source provides institutions an opportunity to leverage links between open source, open access and culture of the academy/wider world (a.k.a. put up or shut up)

Then, the challenges of open source development.

  • Guaranteeing clean code (IP) is hard (read as "impossible"). A certain amount of faith about the code they get and there needs to be consideration for mitigating risks.
  • Figuring out who is authorized to license institutionally-owned code is challenging and then you have to convince them to give it away. No one in the institution typically has been appointed or given the authority to release code. One of the things that the sakai licensing discussions highlighted was institutional differences in requirements and aesthetics.
  • Patent quagmire always looming. How do you know your software is not infringing? How do you make sure you don't inadvertently give away all institution patents? Be careful when looking at licenses from an institutional perspective versus an individual perspective.
  • There is also the inevitable lawsuit risk. Or, as your counsel might say to you, "Let me get this straight, we can get sued but there's no one we can sue."

Then, some discoveries that they made along the way.

  • An open source project not a silver bullet. The commitment to build rather than buy must align with institutional priorities and competencies; it is not right for every project/application.
  • Licensing does matter; it is a contract: whatever you stick in its rules is what sticks. There are probably have too many open source license options and some sort of standardization is needed. Also keep in mind that if you release something under an open/open license, you can't include any copyleft components.
  • Communities don't just happen, they require: specific shared purpose (when visions vary, or when they change, collaborations struggle); and governance (e.g., separate board with dedicated developers sitting between institutions). Cooperation ("I won't hurt you if you don't hurt me") is not collaboration.
  • Open (community) source requires real project discipline. "It is as spontaneous as a shuttle launch." Along the way one needs to learn to balance pragmatics and ideals. One also needs to learn to trust your partners. "It really requires learning to let go." Letting go, and having the community make the decisions, may be the quickest path to efficiency.

Reflection on open/community source for repositories

Repositories are at the center of everything at the institution. It connects with the library, with the presses/scholarly publishing operation, with classroom teaching, with the laboratory, and with the world. It is a core piece of of infrastructure for the university of the 21st century. As institutions, we need to make sustaining investments in our repositories.

Hilton sees three different approaches to "community" in the existing projects:

  • dspace: community of user/developers. The come together to talk about what they want to do, write code, and support each other. Clearly there are enthusiastic users as developers.
  • eprints: appears as like a vendor talking with customers wanting the community help shape the direction.
  • fedora: in transition from a combination of the previous two models moving towards a Sakia-like model. it will require institutions to make commitments to it.

In the end, Hilton asked some thought-provoking questions. Is now the time for institutional investment in open/community source? Will a coherent community (or communities) emerge in ways that are sustainable? -- is there a shared vision?

The text was modified to update a link from http://www.virginia.edu/vpcio/bio.html to http://www.virginia.edu/vpcio/biography.html on January 19th, 2011.