The Security Implications of Teaching Librarians to Program
Should librarians be learning to how to develop software? This theme has come up in the past few years ((Going back to Karin Dalziel’s 2008 Why every Library Science student should learn programming, continuing through Dianne Hillmann’s keynote at Code4Lib 2011 to this year’s LITA/ALCTS Library Code Year Interest Group and OCLC’s recent announcement for a Platform University [PPTX].)) and I think it is a good thing. I once had a boss that told his group “I want you guys to automate yourself out of your job because there are far more interesting things you could be working on.” I think that is an empowering philosophy for staff of any type.
There is one thing that has me worried, though, in the enthusiasm to teach ad hoc software development skills to everyone that is interested in learning: security. I was reminded about this by way of a recent New York Times Bits Blog posting: Hackers Breach 53 Universities and Dump Thousands of Personal Records Online. A paragraph from that post:
To breach servers, the hackers used a technique known as an SQL injection, in which they exploit a software vulnerability and enter commands that cause a database to dump its contents. In the case of some universities, the hackers breached multiple servers. In several cases, hackers breached student and alumni blogs– which contained things like usernames and passwords–not the university servers themselves. At Princeton, for instance, hackers breached a WordPress blog for Princeton alums based in the United Kingdom which contained several usernames and encoded passwords.
SQL injection is a form of attack where malicious users get the server to execute spurious database commands by tacking them onto the end of web form fields (among other methods). The classic example is to add ; DROP TABLE
to the end of a text input field. If that was actually executed, it would delete the table of information from the database. That would be bad. It is somewhat easy to protect against -- don't take the user's input at face value, always "clean" it -- but it is an extra step that the developer needs to remember to do. And it is one thing to know to do it (is cleaning of user inputs being taught in the coding-for-everyone workshops?) but quite another to have the discipline to do it for all user inputs. (Or to have the sophistication to create and use code functions to do it for you.) Forget one user input and the game is up. The bad guys have programs that they can run to scrape your website for forms and systematically try to break through your defenses.
You see, when you are creating applications to be used by others, you take on the responsibility of not only writing the code to do the function you set out to do but also accounting for all the things that could go wrong. The Open Web Application Security Project has hundreds of possible code vulnerabilities and attack points that need to be thought through and written into the application. If I were a system administrator, before code written by some just learning to program was put in a place where the world could reach it, I would want a skilled developer to check that code for security problems. If I don't, I run the risk of being the sysadmin on the hook to explain why there was a serious security or privacy breach. ((A side note: I don't intend to say that professional, trained programmers make no mistakes. A piece of code that I wrote early in my career was the source of security breach at OhioLINK, and I found (and reported) a command injection flaw at a major integrated library system vendor where an attacker could take over a server by putting in a malicious e-mail address.))
So if librarians are going to learn to program and we don't want to put our public-facing servers at risk, what kinds of software development tasks could librarians use to cut their teeth? Here are some ideas:
- Data manipulation. A great example of this is an article proposal being reviewed by the Code4Lib Journal editorial committee now: short scripts that clean up ebook vendor records by testing URLs, making local changes and enhancements. correct problems by moving fields, etc. Or it could be a program that slices and dices reference encounter statistics to answer questions about coverage requirements. These programs would be written and run on your desktop machine or in your server account, so there is no exposure to malicious users.
- Browser enhancements. Can you think of a shortcut to a workflow but can't get the software creator to implement it? There are a number of ways to enhance browsers to change the way websites behave. Some of the easiest ways are to write userscripts (using Greasemonkey for Firefox and Tampermonkey for Chrome) and/or userstyles (using Stylish for Firefox and Chrome) to insert JavaScript and custom Cascading Style Sheets (CSS) at the browser level to effect how a website operates or looks. These are generally safe because the execution is limited to your browser, but you should distribute your userscripts and userstyles without having someone knowledgable about browser security look at them.
- Intranet services. I'll mention this possibility, but with reluctance. When writing and deploying web applications or scripts for your intranet, you are really just limiting where the malicious attacks can come from -- hopefully just to users who can log into your intranet. That is a substantial reduction of risk, but you need assurance that only local users can really get to the application/script.
Note! Jon has some further suggestions for development in the comments.
Getting librarians and other library staff fluent in programming skills is important to maximizing the effectiveness of staff and empowering staff to solve their own issues. Just as important, though, is to do so while ensuring the integrity of the systems, and that should be at the core of any instruction program.