Seeking feedback on database design for an open source software registry

 · Peter Murray
Last updated: July 15, 2011

As part of the Mellon Foundation grant funding the start-up of LYRASIS Technology Services, LTS is establishing a registry to provide in-depth comparative, evaluative, and version information about open source products. This registry will be free for viewing and editing (all libraries, not just LYRASIS members, and any provider offering services for open source software in libraries). Drupal will be the underlying content system, and it will be hosted by LYRASIS.

I'm seeking input on a data model that is intended to answer these questions:

  • What open source options exist to meet a particular need of my library?
  • What are the strengths and weaknesses of an open source package?
  • My library has developers with skills in specific technologies. What open source packages mesh well with the skills my library has in-house?
  • Where can my library go to get training, documentation, hosting, and/or contract software development for a specific open source package?
  • Are any peers using this open source software?
  • Where is there more information about this open source software package?

The entity-relationship data model and narrative surrounding it are on the Code4Lib wiki. Comments on the data model can be made as changes to the wiki document, replies posted here, or e-mail sent directly to me. In addition to comments on the data model, I'm particularly interested in answers to these questions (also listed at the bottom of the wiki page):

  1. The model does not provide for a relationship between a person and a software package. Would such a relationship be useful? E.g., individuals self-identifying as affiliated with an open source software package.
  2. The initial planning process did not account for the inclusion of packages that were not themselves end products. Should code libraries and support programs be included as packages in the registry? The model could conceivably be adjusted in two ways to account for this. The simplest would only require the addition of new PackageType enumerations (e.g. “code library”); this would not allow for searching of packages that use code libraries (e.g., answering the question “What repositories use the djatoka JPEG2000 viewer system?”) Another simple change would be to add “code library” to the TechType enumeration; the code library would not have the benefit of links to other relationships and entities. A more complicated change would do both but there would be no relationship between the code library as a Package and as a Technology. Are there better ways to add code libraries to the model?
  3. Some who have reviewed the concept for the registry suggested other attributes. Should these be added? (And what is missing?)
    • Package – Translations
    • Package – Intended audience (e.g. developers, patrons/desktop, patrons/web, library-staff/desktop, library-staff/web)
    • Version – Code maturity (e.g., alpha, beta, release candidate, formal release)
  4. To answer the question “Are any peers using this open source software?” is it necessary to have an enumeration of library types? Public library, school library, university library, community college library, special library, museum (others?)
  5. Is the location of Institutions and Providers desired? One reason it might be desirable is to do a geography-based search (e.g. training providers within a 60-mile radius).

Feel free to add to the list of questions. I'm looking forward to your thoughts.