Drupal as the Foundation of Ohio Textbook Portal

At the end of last month, the Ohio Board of Regents announced the University System of Ohio Textbook Portal. The service has been talked about in the media, in trade publications, and in numerous blog postings. Enough time has passed now that word has gotten out, and I won’t be taking any of the chancellor’s thunder about the project. I did the back-end development work for the portal and wrote this document as an introduction to the project for our development team and anyone else interested about the project.

The textbook portal is based on the Drupal (version 6) content management system. In particular, the portal makes heavy use of the search module to execute and format search results. If you are familiar with Drupal, it is going to be different enough, however, that you’re going to want to read this to see why some decisions were made. If you are not familiar with Drupal, this document will give you a head start into understanding the Drupal way of the world.

A couple of points before we start. First, before starting this project I had only a passing familiarity with PHP as a programming language and no experience with code development for Drupal. 1 Read the code with that frame of mind; if you have more experience in either of these areas and know of a better way to do something, please let me know and I will gratefully incorporate your suggestions into the code. Second, you can find the code in OhioLINK’s public Subversion repository and reference to it in OhioLINK’s public Trac project server, should you want to take a look at it yourself.

This code in the Subversion repository corresponds to everything under the /sites directory of a Drupal installation. In the basic Drupal installation, there are two subdirectories in this directory: all and default. In a multi-site Drupal installation, the “all” directory is supposed to correspond to modules/themes that are made available to all sites within an installation while the “default” directory is intended for modules/themes for the “default site“. I’m using the distinction somewhat differently. Everything in the “all” directory is third-party modules and everything in the “default” directory is stuff I’ve created. It is an arbitrary, unnecessary distinction, but I think it will help with maintenance.

At a very high level, you can look at the Installation Documentation for the ETextbook Portal. This document was written from the perspective of a bare metal restore of the service. (Well, not quite — it assumes Ubuntu is installed on the server.) It has the various applications and modules that need to be installed to get the site up and running. This should make a good checklist should you wish to reproduce the portal. Knowing how to install Drupal comes in handy, but the installation process itself it pretty easy.

If you follow the documentation up to the point of restoring the database, you’ll have a good foundation. But doing so will mean that there are several configuration options you’ll need to set that would otherwise be in the database backup. You’ll need to activate these modules:

eTextbook Metasearch
Integrates the results from the various textbook search modules. This corresponds to the Drupal node type “all”.
CourseSmart Search
Searches the CourseSmart eTextbook Database. This corresponds to the Drupal node type “csmart”.
OhioLINK E-Books Search
Searches the OhioLINK E-book Center. This corresponds to the Drupal node type “ebooks”.
OhioLINK Library Catalog Search
Searches the OhioLINK Central Catalog. This corresponds to the Drupal node type “libcat”.
Safari Search
Searches Safari Books Online. This corresponds to the Drupal node type “safari”.

Each of them has minor, but important, configuration parameters that you’ll need to set up in the Drupal installation’s /admin/settings directory. In particular, the CourseSmart Search module will have parameters for the discount coupon code plus the username/password for the private API (the private API is discussed in the module-specific section below).

Structure of the Search Modules


Each of the search modules — CourseSmart, OhioLINK EBC, OhioLINK Library Catalog, and Safari — follow the same basic structure. (The “all” metasearch module is a little different and is covered below.) The outline, hooks followed by supporting functions, is:

function module_menu() {
  ...
}
 
 
function module_perm() {
  ...
}
 
 
function module_search($op = 'search', $keys = NULL) {
  ...
}
 
 
function module_form_alter(&$form, $form_state, $form_id) {
  ...
}
 
 
function module_search_process($keys) {
  ...
}
 
 
function module_format_result($item) {
  ...
}
 
 
function module_search_box_form_submit($form, &$form_state) {
  ...
}
 
 
function module_search_query($keys = '', $query = array(), $search = 'web', $version = 'v1') {
  ...
}

Some explanation for each of these:

  • module_menu() is a Drupal hook that defines the menu options for the setting screen. The code to generate the menus themselves will be in a file in the module called “module.admin.inc”.
  • module_perm() is a Drupal hook for defining the user permissions appropriate for this module. It isn’t really used in the portal. (The settings screens look for the “administer site configuration” user permission value.)
  • module_search() is a Drupal hook that defines a custom search routine for nodes of this type. The code pattern in other Drupal modules seems to be to use this as a level of indirection to a non-hook function, such as module_search_process().
  • module_form_alter() is a Drupal hook for changing the behavior of a form before it is rendered in the HTML back to the user. In conjunction with module_search_box_form_submit(), the code in this hook will turn FORM POST requests into pretty URLs.
  • module_search_process() is the function called by the module_search() function. This function prepares the query, including the pagination-of-results calculation, and calls another function — module_search_query() — to do the actual searching. We’re adding this level of indirection because the “metasearch” module will also call module_search_query() to get results, but the code in that module does do all of the things module_search_process() does.
  • module_format_result() is called with information about the search hit, and formats it in a way that can be fed back into the Drupal search.module output engine. The issue here is that we’ve got fielded data (author, copyright year, publisher, and ISBN) that we want to display as fielded, but Drupal doesn’t give us a way to do that. Rather, Drupal’s standard search module is looking for an array with keys for ‘title’ of the hit, ‘link’ of the hit, and a ‘snippet’ to display to give the user context for the result. (See the “Return Value” heading of the hook_search API documentation.) So this module will create a snippet of HTML that builds a nice display of the fielded data.
  • module_search_box_form_submit(), in conjunction with module_form_alter(), forms the callback to turn FORM POST requests into pretty URLs.
  • module_search_query() performs whatever functions are required to get hits from the remote service. This, of course, is the real heart of what we’re doing. Rather than searching text of nodes internal to Drupal, this function will return an array of results that comes from a query of a remote service. The array returned has two elements: ‘total’ — an integer representing the total number of hits for the query, and ‘items’ — an array of individual hits from this search.

Module-specific details


Although each of the search modules follows this general code pattern, they each have their idiosyncrasies.

CourseSmart is probably the simplest module of the bunch and a good place to start when looking at the code. Note that we are using the private API (appending md=1 to the end of the URL) in order to get the ISBNs as listed on the CourseSmart website. Calls to the private API is restricted to particular IP addresses, so in order to use it you’ll need to contact CourseSmart. CourseSmart is also a little funky in that they will return items in their inventory that they won’t sell. This is designated with an esubscription price of $0, and are filtered out in the module_search_process() function.

OhioLINK EBook Center uses the SRU interface to the underlying XTF installation in order to get search results out. The search results come back in an XML document returned with multiple namespaces, which complicates somewhat the DOM parsing of that document. Basically, it means one has to register the namespaces with the XPath processor and take them into account when using XPath to pull out elements for formatting the result record.

OhioLINK Library Catalog uses the Shrew PHP class created by David Walker at California State University. Shrew hacks through the MARC display of records for an Innovative Interfaces WebPAC and returns a MARCXML document. Without this, I’d really be stuck as to how to efficiently get the library catalog search results into the portal. I’m grateful to him for releasing the code at exactly the right time and to Rob Casson at Miami who pointed me in David’s direction when I was considering having to write the Shrew-equivalent myself.

Safari Books Online is using the same underlying engine as CourseSmart to deliver materials, so the search module is very similar.

Structure of the Metasearch Module


The eTextbook Metasearch module (a.k.a. “all”) is structured very similar to the other search modules, but deviates in several important ways.

  • When the Drupal all_search hook is called with the ‘search’ operation parameter, a results array with explicitly 1 “result” and the search keys as the item returned. What we’re really doing is faking out the Drupal Search module into thinking that there are actually results so we can get to the all_search_page() hook. If we didn’t set the number of results to a value greater than zero, Drupal would display the “no hits found” message for us (which we don’t want it to do).
  • The undocumented hook_search_page(), when defined for a module, is called by Drupal rather than using the built-in internal search results page. (The other modules use the built-in results page rendering.) We override the hook using all_search_page(), and that function calls each of the module_search_query() functions for the four remote sources in sequence. The results are then put into output block and the block is returned to the calling core code.
  • “all” also contains several utility functions used by the other modules. all_parse_keys() will look at the user’s search string for ISBN values and return the user’s search string as an array of an ISBN and everything else. all_proxyify_url() will determine whether a user is outside of a campus network and prepend the OhioLINK proxy server string to the URL.

Plans for Enhancements


Some ideas and plans for making this better.

  • We want to include bookstores in the search results. In particular, where possible, we’d like to search the bookstore’s inventory control system and display results right in the metasearch results.
  • For the metasearch results, each of the target remote services are called in sequence. Ideally, the four services would be called in parallel. Even better, perhaps, would be to render the base page, then inject search results from the remote services via AJAX as they become available.

The text was modified to update a link from http://drupal.org/getting-started/6 to http://drupal.org/getting-started on January 28th, 2011.

The text was modified to update a link from http://www.ljndawson.com/permalink/2008/09/03/USO_and_CourseSmart.html to http://web.archive.org/web/20081211220047/http://www.ljndawson.com/permalink/2008/09/03/USO_and_CourseSmart.html on November 13th, 2012.

Footnotes

  1. These made seem like odd choices to make for a project that had a short conception-to-production timeline, but a) there were already some helpful pieces written in PHP that sped development of some aspects of the portal, and b) I thought drinking the cool-aid of Drupal would be a good way to see what it was all about. []
(This post was updated on 13-Nov-2012.)