Analysis of PubGet -- An Expedited Fulltext Service for Life Science Journal Articles

by  Peter E. Murray  ·   Posted on 
 ·  6 minutes reading time

In June, a new service that speeds access to life sciences literature reached a milestone. Called PubGet, it is a service that reduces the number of clicks to the full text of an article, and the milestone was activating the 50th institution using its service. Using its own proprietary "pathing engine", it links directly to the full text on the publisher's website. PubGet does this by understanding the link structure for each journal of each publisher and constructing the link to the full-text based on information from the citation. The PubGet service focuses on the life sciences journals indexed in PubMed -- hence the play on names: PubMed to PubGet.

How It Works

[caption id="attachment_1205" align="alignright" width="300" caption="Typical View of a PubGet Article Display"]OLINKS screen for Christensen article

Link Resolver screen for Christensen article

EBSCOhost screen for Christensen article

EBSCOhost screen for Christensen article

Typical View of a PubGet Article Display[/caption] In a typical interaction, a user would start at a web page with a journal article citation that has a link to the user's OpenURL resolver. Contained in that link is the citation metadata that identifies the specific article. Clicking on that link takes you to the OpenURL resolver web page for that specific article. That web page contains links to any online versions of the article, and might also include links to library catalog records for physical copies, and options to search for similar articles. An example of one of these pages is this one from my place of work for an article by Clayton Christensen in the Harvard Business Review. (When you are coming from an OhioLINK member institution, it looks like the screen image to the right.) Clicking on that link that says "Full text of this article at EBSCO" takes you to yet another page -- this time from EBSCOhost -- that has the citation data again and the options for viewing or taking other actions on the article. Once there it is one more click to the HTML or PDF full text of the article. From the perspective of the creators of PubGet, that is two clicks and two screens too many. PubGet's pathing engine knows about the structure of links on the publishers website, and so it creates a link directly from the citation in the search results list to the article PDF.

The pathing engine is one of three components that make up the service. The other two are a search engine and a personalization feature. The search engine indexes the citation and abstract fields; it is not nearly as sophisticated as the thesauri-driven search engine native to PubMed, but it does the job for cases when you have a known citation. The personalization feature allows you to tie your account on PubGet to an institution, and with that knowledge the PubGet service can know exactly what digital rights your institution has for each journal and can create links to the full-text article that go through your institution's proxy server. The account system also enables you to have new articles matching your search criteria sent to you and to mark articles in the search results for later bulk downloading (via a Firefox plugin). ((Since the articles are not held within the PubGet service itself, the bulk article downloading function requires a Firefox plugin so that the article requests come from your browser to the publisher's site.))

Thinking About PubGet in a Wider Information Ecosystem

One quandary I have with PubGet is that it bypasses OpenURL as the open standard for linking to full-text content. In order to take advantage of PubGet's unique characteristic -- the pathing engine to get straight to the article text -- you need to start at the PubGet site itself in order to get the direct URLs to the articles. This is a pretty significant downside to the service. You can't get the pathing engine along with the powerful PubMed search engine.

It would be nice if PubGet could be set up as an OpenURL target, and when it receives a request translates it to the direct link to the full-text using its pathing engine. That way I could set up PubGet as the OpenURL resolver in my PubMed account, and the article links in PubMed would automatically go to the full text. I don't know how this would work as a business model for PubGet, though, because as an OpenURL resolver is this manner, it makes the PubGet website invisible -- through a series of browser redirects I'd go from PubMed to PubGet to the publisher site. (If the point is to make money selling advertising for related industries, a configuration that completely by-passes any visible signs of PubGet would cut into that revenue source.)

As I was sharing background on PubGet with Thomas Dowling, a colleague at OhioLINK, he pointed out something I didn't know about OpenURL: it is within the standard to specify a "service type" in the OpenURL Context Object. Section 5.1 of the NISO standard for OpenURL says a service type is "The resource that defines the type of service (pertaining to the Referent) that is requested." And there are indeed service types registered as part of the SAP2 Community Profile: abstract, citation, fulltext, holdings, ill, and any. So the recipient of an OpenURL request, using the "fulltext" Service Type, should be able to replicate the proprietary PubGet pathing engine using a standard OpenURL structure. In a brief bit of experimentation, though, I was not able to find an OpenURL resolver that a) knew how to handle a Service Type parameter, and/or b) knew how to honor that parameter by getting directly to the full text. As OpenURL undergoes its 5-year review this year, it might be worthwhile to emphasize this part of the standard with examples and descriptions of best practices so it is more widely adopted.

Other Articles on PubGet