Encryption of Patron Data in Modern Integrated Library Systems

“How much effort do you want to spend securing your computer systems? Well, how much do you not want to be in front of a reporter’s microphone if a security breach happens?” I don’t remember the exact words, but that quote strongly resembles something I said to a boss at a previous job. Securing systems is unglamorous detail work. One slip-up plus one persistent (or lucky) attacker means years of dedicated efforts are all for naught as personal information is inadvertently released. See, for example, what happened recently with Sony Online Entertainment’s recent troubles.

It was in that frame of mind that I responded to a series of questions from a librarian taking a computer science class. (As someone else who straddles the computer-science/library-science divide, I wanted to encourage this line of thinking!) Now library systems typically don’t have credit card information, so they may not be attractive to individuals that seek to expose or exploit personal information. But our systems do have physical addresses, e-mail addresses, and sometimes birthdays or other personal data. And we have a professional ethic to keep patron use information private.

The person that sent me these questions asked that I not mention a name or affiliation, but that it was okay that I repost the questions along with my replies. I’m hoping this encourages some discussion because my understanding of the use of encryption in ILS products is very narrow and only somewhat deep (and is getting shallower by the day as my direct experience is going on ten years old).

Background on the project is that during our encryption unit, I realized that I didn’t know anything about what libraries to do back up our strongly stated policies about protecting patron privacy, so I wanted to find out more about it.

Questions:

  1. What encryption tools/standards, if any, are used to safeguard patron accounts (name, items checked out, databases accessed, etc.) at the library?
  2. Where in the systems do these tools typically fit — at the ILS level, or somewhere else? (e.g., university ID systems)
  3. How are circulation and other records expunged? I.e., are they permanently deleted in such a way that hard drive forensics couldn’t bring them back?

In my experience, this patron information is not encrypted in integrated library systems. The difficulty is that if those bits of information are encrypted, they must be decrypted by the program in order to be useful (generating an overdue notice means the patron’s information must be known to the program, displaying the patron’s name on his/her account information screen, etc.). And for programs to decrypt they must have the secret key. And if the programs know the secret key it is trivial for an attacker to get the key as well. And since good encryption, by its nature, is computationally “expensive” there would be a lot of system load with all of the encryption and decryption of bits of information. (Computationally expensive is good because it makes it harder for an attacker to guess the correct key.)

Password Hashing Flowchart

Note that passwords are a special case. Passwords are not really encrypted in a database; rather the output of a “one way hash” algorithm is stored. When the user tries to log in, the same one way hash algorithm is applied to the text string entered as a password and if the output matches what is stored in the database the user is let in.

As the diagram shows, with the login attempts the hashed password is not decrypted; the output of the hash algorithm is compared to what is known to be the hashed password.

[Aside: I'm trying an experiment in this post. The diagram is a Scalable Vector Graphic (SVG) file. It seems to be showing up fine in the browsers I'm testing, but I have no idea how it will appear in the RSS feed or if you are using an RSS reader or receiving this post via FeedBurner e-mail. If you don't see the graphic, try viewing the post via the DLTJ website.]

The most effective encryption would be at the database management system layer. For instance, Oracle has “Transparent Data Encryption” feature. “Data is automatically encrypted when it is written to disk and automatically decrypted when accessed by the application.” Automatic encryption is not built into MySQL, but you can use a MySQL-specific function to encrypt a field. PostgreSQL has a contributed module that performs the function.

Another option — other than database-level encryption — is to have the operating system encrypt the underlying filesystem (for example, the Red Hat Encrypted Filesystem). That way all of the database storage files — stored in that filesystem directory — would be encrypted.

Note, though, that in any of these cases, the key is known to the computer somehow, and so it is possible for an attacker to recover the key and decrypt the data. There are, of course, varying levels of obscurity one can apply to the key, but I think we’re getting pretty far off on a tangent.

How often circulation and other records would be expunged would depend on implementations in each software system, but as a general guideline I don’t think a strong deletion mechanism is used to obliterate data on the disk. I’d be happy to be proven otherwise. And as you consider hard drive forensics, also think about pulling the same information off backup tapes; that would probably be easier to get to.

In a follow-up, I was asked:

WRT your response on Q2, do you have an idea of what level “most” or “some” libraries might have the encryption, or were you speaking purely from a view of what ideal/good situations might look like?

On 3, I have heard from a few others that there seems to be just deletion with no zeroing out features or the like and that it does take a period of time (1-2 months) for backup tapes to be overwritten. So it strikes me that the weakest link may be in the area we talk most about protecting.

With regards to the database-level or the filesystem-level encryption, I was speaking from a point of view of what idea/good situations might look like. One of the outcomes of posting these questions to a wider group of readers is, I hope, more real-world experience reports from people who might be running systems that actually do this.

Yes, I think those are weak links, with the backup tapes being the biggest problem. One can’t predict when blocks on a live filesystem disk will be overwritten, but overwriting tapes is pretty predictable — and easy because one doesn’t need access to the live system.

(This post was updated on 13-Jun-2014.)