Interesting Shibboleth Use Case: Enforcing Geographic Restrictions

Posted on 4 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

Last month's HathiTrust newsletter had an interesting technical tidbit at the top about access to out-of-print and brittle or missing items:

One of the lawful uses of in-copyright works HathiTrust has been pursuing is to provide access on an institutional basis to works that fall under United States Copyright Law Section 108 conditions: works in HathiTrust that are not available on the market at a fair price, and for which print copies owned by HathiTrust member institutions are damaged, deteriorating, lost or stolen. As a part of becoming a member, institutions are required to submit information about their print holdings for fee calculation purposes. We have also been requesting information about the holdings status and condition of works, to facilitate uses of works where permissible by law (specifications for HathiTrust holdings data are available at http://www.hathitrust.org/print_holdings).

As of December 2012, we are using the holdings status and condition information submitted by United States member institutions, in combination with information about the market availability of works stored in the HathiTrust rights database, to determine whether or not access to applicable in-copyright works in HathiTrust is allowed. The specific terms of access are as follows:

  • Access is only available to users affiliated with HathiTrust member institutions in the United States, and only from U.S. soil.
  • In order to gain access, users from member institutions must be authenticated into HathiTrust via Shibboleth using their institutional login.
  • Print copies of the works in HathiTrust must be owned currently or have been owned previously by the institution’s library system.
  • The number of users who can access a given digital copy at a time is determined by the number of print copies held (or previously held) in the library system. If a library system only has one print copy, only one user at a time will be able to access the digital copy.

A general scenario for how out of print determinations are made and communicated to HathiTrust is available in the HathiTrust rights database documentation: http://www.hathitrust.org/rights_database#op. Additional information on the service is available at http://www.hathitrust.org/out-of-print-brittle.

It is the first three conditions (in the first two bullets) that I find interesting: that access is only available to affiliated users, that access is available only from "U.S. soil", and that users must authenticate using a HathiTrust member institution's Shibboleth identity provider. The only way I can think for HathiTrust to enforce the first two conditions is to use Shibboleth. Only through Shibboleth would HathiTrust have assurances that the user is a member of the community and is at a particular place. ((Let's set aside for a moment the relatively trivial ways that IP address geolocation can be fooled: VPN services, web proxies, etc. If you want to know more, just Google "how to bypass geographical restrictions".)) Libraries more commonly use rewriting proxy servers, like EZproxy, to facilitate access to restricted or licensed material. Rewriting proxy servers effectively hide the location of the user because to HathiTrust the user's location would appear to be where the proxy server is.

I dug a little deeper to see if I could find a definition of "affiliated" -- does it mean "only students, faculty and staff" or other looser forms of affiliation like "alumni" or "parent" or "guest"? One of the great strengths of Shibboleth (generally) and the identity management federations like InCommon (specifically) is that they have fairly rigorous definitions of "member" and "affiliated" -- piggybacking on the eduPerson eduPersonAffiliation attribute definition. I didn't find a firm linkage to those defined eduPerson terms, but I did find an interesting declaration in HathiTrust Digital Library Access and Use Policies: "Users must be authenticated members of a HathiTrust institution or individuals using a computer on a HathiTrust institution's library premises." That would both seem to simultaneously make the Shibboleth requirement redundant in cases where access came from an on-campus IP address and the question about the definition of affiliation moot -- by that statement, anyone using a library terminal would have access even if they weren't otherwise a member of the campus community. Hmmm, I wonder how they are resolving that contradiction?

Digging a little deeper, I found the HathiTrust Shibboleth technical details page where they talk about the kinds of attributes required to use the service. They do require 'eduPersonScopedAffiliation' ((eduPersonScopedAffiliation is nearly the same as eduPersonAffiliation; it just tacks "@" on the end.)), so they can see the types of membership someone has with an institution. It is also refreshing that the only other element they require is eduPersonTargetedID -- the "persistent, non-reassigned, privacy-preserving identifier" known only to the institution and the service. (The eduPerson definition goes on to say: "This attribute is designed to preserve the principal's privacy and inhibit the ability of multiple unrelated services from correlating principal activity by comparing values. It is therefore REQUIRED to be opaque, having no particular relationship to the principal's other identifiers, such as a username or eduPersonPrincipalName. It SHOULD be considerably difficult for an observer to guess the value that would be returned to a given service provider.") It is great to see HathiTrust using the privacy-enhancing aspects of Shibboleth like they were meant to be used. Because they are using targetedID, a prosecuting party would need to subpoena records from both HathiTrust (to get the eduPersonTargetedID of the person they were interested in) and the member institution (to see who that eduPersonTargetedID was assigned to) to pin research activities to a specific individual.