Looking For a Comprehensive Discovery Layer

Posted on 7 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

Earlier today, Wright State University issued an invitation-to-negotiate ((Invitation to Negotiate: A competitive process whereby suppliers and contractors have an opportunity to initially submit pricing proposals for consideration. Once reviewed, the University has the opportunity to determine which proposers it wishes to conduct negotiations with for the purpose of arriving at the terms deemed to be in the best interest of the University and OhioLINK.)) for a discovery layer on behalf of OhioLINK. This post contains the details specific to the project, and I'm posting it here with the desire to cast a wide net of parties that may be interested in responding. In doing so, please note that this posting is not the official call for proposals, and nothing said here binding on the proposal process. If you are interested in making a proposal for a solution or part of the solution for what we are seeking, please contact Mary Pasquinelli, Sr. Purchasing Agent at Wright State University, and reference ITN 601908 (Room 301 University Hall, 3640 Colonel Glenn Hwy., Dayton, Ohio 45435, phone 937-775-2411, FAX 937-775-3711, email: mary.pasquinelli@wright.edu).

Introduction and overview

OhioLINK is evolving its suite of services in the face of changes in user expectations. To that end, we seek to employ a state-of-the-art unified search interface for all OhioLINK content and that of its members.  The key deliverables of this new interface are to:

  • Provide a unified, general search environment for the entire consortium for all types of content in the local integrated library systems, institutional repositories (including, but not limited to, those housed in the Digital Resource Commons), the OhioLINK Electronic Journal and Electronic Book Centers, the Electronic Theses and Dissertations service, consortially-supplied databases, and locally subscribed databases for each individual OhioLINK institution.
  • Supplement, not replace, existing search interfaces for specific content types.
  • Provide basic and advanced search features, flexible search session and user profile features, and rich export options.
  • Provide a search experience closer to that of common Internet search engines, closer to the end-user’s daily online reality.
  • Incorporate modern Web 2.0 interface features like faceted searching, tag clouds, end-user tagging and ratings, and search history breadcrumbs.
  • Deliver the appropriate copy (electronic or physical) requested by users while hiding the varied and intricate nature of library services such as Open URL, OhioLINK PCIRC, and interlibrary loan.
  • Make all of the above available to non-library service points/systems such as campus portals and learning management systems.

OhioLINK believes the best course of action to provide this service is to build a pre-computed index of as much metadata as possible while using a federated search tool to interrogate remaining databases for which metadata is not available for pre-indexing.  Results from the pre-computed index and the federated search are available to the new end-user library interface proposed herein as well as to other non-library services points.  Users are led to the appropriate copy through a delivery resolver.  These four components make up the envisioned system:

Unified, Pre-computed Index

OhioLINK houses and serves through specialized interfaces a wide variety of content in a wide variety of formats.  Content includes MARC records describing physical and digital items, records from general and specialized index/abstract databases, citations and full text from electronic journals and electronic books, and metadata in derivatives of Dublin Core that describe images, videos, audio files, and documents.  Our concept of the unified, pre-computed index is to bring all of the metadata records together under a common index structure for the purpose of searching and browsing.  This will involve harvesting, transforming, and computing relevancy rankings for disparate metadata sets, and returning results to search queries in such a way that other applications can make use of the data.  Most data is universally available to OhioLINK members; some metadata sets are limited to particular member institutions.

Federated Search

Although OhioLINK has a great deal of metadata under its control, some metadata sets are not harvestable in their entirety from external metadata providers.  In these cases, OhioLINK seeks to deploy a federated search engine to retrieve records from the metadata provider.  Results to search queries must be returned in such a way that other applications can make use of the data.  Most external metadata providers are universally available to OhioLINK members; some metadata providers are limited to particular member institutions.

End-user Interface

The end-user interface combines search results from both the unified, pre-computed index and the federated search engine with social media tools (tags, recommendations, etc.) in a coherent user interface.  The end-user interface must take into account the orders of magnitude difference in response time from the unified index component and the federated search engine component.  The ideal interface returns the results of the unified index component to the user as soon as possible.  The federated search component searches the configured resources and periodically updates the user interface with the number of new hits found.  Minimally, federated search results are displayed separately from the pre-computed index results. At the user’s request, the end-user interface retrieves the most current results from the federated search component, combines it with the previous unified index results with best-effort relevancy ranking and insertion of new results into the existing facets, then returns the result to the user’s browser. The federated search component continues searching and updating the counter of new hits in the user interface until it has exhausted all external metadata providers.

Delivery Resolver

The final component is a delivery resolver that will help the user retrieve the most appropriate content for their needs.  It takes the form of an OpenURL resolver that is programmed with the various delivery mechanisms available to members of the OhioLINK community. The delivery resolver includes the initiation of a PCIRC request through the union catalog.

Modularity and interoperability

The new discovery layer system should be modular in design. Each component of the system should be interoperable with other modules and other systems via common standards and protocols. Respondents should address modularity and interoperability of each component that they propose.

Any respondent can respond to any or all portions of the project. Respondents who address some subset of system components, but not the complete system, must explain interactions and standards compliance.  They must explain how each module interacts with other vendors’ products, describing standards, APIs, and any other appropriate aspects of interoperability.

The OhioLINK authentication environment

User authentication: general. OhioLINK has deployed a Shibboleth infrastructure for user authentication, authorization, and identification. OhioLINK, as a service provider (Shib-SP), is a member of the InCommon federation. A handful of member institutions, as identity providers (Shib-IdP), are members of the InCommon federation. Since not all OhioLINK member institutions have deployed Shibboleth IdPs, OhioLINK runs a gateway between the legacy authentication system (described below) and an OhioLINK-hosted Shib-IdP. This OhioLINK-hosted Shib-IdP is not a member of the InCommon federation. Instead, it is configured in the OhioLINK-hosted Shib-SP services as a bilateral trust. Proposed systems are expected to operate in this environment.

User authentication: PCIRC. OhioLINK's PCIRC service is based on the Innovative Interfaces INN-Reach software. Users identify themselves to this system based on a combination of a barcode/unique-identifier index, a fuzzy match of the user's name, and an optional PIN.

User database. OhioLINK does not operate a union database of users from its member institutions. Instead, OhioLINK relies on member institutions to authenticate users based on local campus mechanisms (whether through a Shib-IdP or by supplying attributes to the OhioLINK-hosted Shib-IdP). Several existing OhioLINK services have personalization databases that are created in an ad-hoc manner by users at member institutions. The ideal candidate solution under this ITN would leverage Shibboleth attributes from the IdP (e.g. Shibboleth Targeted-ID or eduPersonPrincipleName) as a key into a database of user personalization, as opposed to forcing the user to create an account specific to the candidate solution.

The OhioLINK systems environment

Candidate solutions that use Linux as an underlying operating system strongly preferred (Redhat Enterprise Server or Ubuntu distribution). Solutions based on Microsoft Windows Server will not be considered.

Candidate solutions that use PostgreSQL or MySQL as a relational database management system are preferred. Oracle is acceptable.

Candidate solutions that use Apache HTTPD and/or Tomcat are preferred. OhioLINK strongly prefers systems that allow its technical staff to control the configuration of the HTTPD and Tomcat services.

OhioLINK prefers installations of software on servers controlled by OhioLINK and/or its member institutions. Software-as-a-Service (vendor-hosted implementations) will be considered, however.

The dataset of MARC data for member-curated collections comes from 63 discrete installations of Millennium systems from Innovative Interfaces. Most MARC records from these systems are aggregated into a Millennium-based union catalog run by OhioLINK using the INN-Reach software. Since some records are unique to institutions, an ideal candidate solution will harvest bibliographic and holdings records from each of these installations rather than relying on the aggregation of the OhioLINK union catalog. Candidate solutions are expected to integrate into the PCIRC service at the union catalog.