Learnings from the British Library Cybersecurity Report
The British Library suffered a major cyber attack in October 2023 that encrypted and destroyed servers, exfiltrated 600GB of data, and has had an ongoing disruption of library services after four months. Yesterday, the Library published an 18-page report on the lessons they are learning. (There are also some community annotations on the report on Hypothes.is.)
Their investigation found the attackers likely gained access through compromised credentials on a remote access server and had been monitoring the network for days prior to the destructive activity. The attack was a typical ransomware job: get in, search for personal data and other sensitive records to copy out, and encrypt the remainder while destroying your tracks. The Library did not pay the ransom and has started the long process of recovering its systems.
The report describes in some detail how the Library recognized that its conglomeration of disparate systems over the years left them vulnerable to service outages and even cybersecurity attacks. They had started a modernization effort to address these problems, but the attack dramatically exposed these vulnerabilities and accelerated their plans to replace infrastructure and strengthen processes and procedures.
The report concludes with lessons learned for the library and other institutions to enhance cyber defenses, response capabilities, and digital modernization efforts. The library profession should be grateful to the British Library for their openness in the report, and we should take their lessons to heart.
Note! Simon Bowie has some great insights on the LSE Impact blog, including about how the hack can be seen as a call for libraries to invest more in controlling their own destinies.
The Attack
The report admits that some information needed to determine the attackers' exact path is likely lost. Their best-effort estimate is that a set of compromised credentials was used on a Microsoft Terminal Services server (now called Remote Desktop Services). Multi-factor authentication (MFA, sometimes called 2FA) was used in some areas of the network, but connections to this server were not covered. The attackers tripped at least one security alarm, but the sysadmin released the hold on the account after running malware scans.
Starting in the overnight hours from Friday to Saturday, the attackers copied 600GB of data off the network. This seems to be mostly personnel files and personal files that Library staff stored on the servers. The network provider could see this traffic looking back at network flows, but it is unclear whether this tripped any alarms itself. Although their Integrated Library System (an Aleph 500 system according to Marshall Breeding's Library Technology Guides site) was affected, the report does not make clear whether patron demographic or circulation activity was taken.
Recovery—Rebuild and Renew
Reading between the lines a little bit, it sounds like the Library had a relatively flat network with few boundaries between systems: "our historically complex network topology ... allowed the attackers wider access to our network than would have been possible in a more modern network design, allowing them to compromise more systems and services." Elevated privileges on one system lead to elevated privileges on many systems, which allowed the attacker to move freely across the network. Systems are not structured like that today—now tending to follow the model of "least privileges"—and it seems like the Library is moving away from the flat structure towards a segmented structure.
As the report notes, recovery isn't just a matter of restoring backups to new hardware. The system can't go back to the vulnerable state it was in. It also seems like some software systems themselves are not recoverable due to age. The British Library's program is one of "Rebuild and Renew" — rebuilding with fresh infrastructure and replacing older systems with modern equivalents. In the never-let-a-good-crisis-go-to-waste category, "the substantial disruption of the attack creates an opportunity to implement a significant number of changes to policy, processes, and technology that will address structural issues in ways that would previously have been too disruptive to countenance."
The report notes "a risk that the desire to return to ‘business as usual’ as fast as possible will compromise the changes", and this point is well taken. Somewhere I read that the definition of “personal character” is the ability to see an action through after the emotion of the commitment to action has passed. The British Library was a successful institution, and it will want to return to that position of being seen as a thriving institution as quickly as possible. This will need to be a continuous process. What is cutting edge today will become legacy tomorrow. As our layers of technology get stacked higher, the bottom layers get squeezed and compressed into thin slivers that we tend to assume will always exist. We must maintain visibility in those layers and invest in their maintenance and robustness.
Backups
They also found "viable sources of backups ... that were unaffected by the cyber-attack and from which the Library’s digital and digitised collections, collection metadata and other corporate data could be recovered." That is fortunate—even if the older systems have to be replaced, they have the data to refill them.
They describe their new model as "a robust and resilient backup service, providing immutable and air-gapped copies, offsite copies, and hot copies of data with multiple restoration points on a 4/3/2/1 model." I’m familiar with the 3/2/1 strategy for backups (three copies of your data on two distinct media with one stored off-site), but I hadn’t heard of the 4/3/2/1 strategy. Judging from this article from Backblaze, the additional layer accounts for a fully air-gapped or unavailable-online copy. An example is the AWS S3 “Object Lock” service, a cloud version of Write-Once-Read-Many (WORM) storage. Although the backed-up object is online and can be read ("Read-Many"), there are technical controls that prevent its modification until a set period of time elapses ("Write-Once"). Presumably, the time period is long enough to find and extricate anyone who has compromised the systems before the object lock expires.
Improved Processes
The lessons include the need for better network monitoring, external security expertise retention, multi-factor authentication, and intrusion response processes. The need for comprehensive multi-factor authentication is clear. (Dear reader: if you don't have a comprehensive plan to manage credentials—including enforcement of MFA—then this is an essential takeaway from this report.)
Another outcome of the recovery is better processes for refreshing hardware and software systems as they age. Digital technology is not static. (And certainly not as static as putting a printed book on a climate-controlled shelf.) It is difficult (at least for me) to envision the kind of comprehensive change management that will be required to build a culture of adaptability and resilience to reduce the risk of this happening again.
Some open questions...
I admire the British Library's willingness to publish this report that describes in a frank manner their vulnerabilities, the impacts of the attack, and what they are doing to address the problems. I hope they continue to share their findings and plans with the library community. Here are some things I hope to learn:
- To what extent was the patron data (demographic and circulation activity) in the integrated library system sought and copied out?
- How will they prioritize, plan, and create replacement software systems that cannot be recovered or are deemed too insecure to put back on the network?
- Describe in greater detail their changes to data backup plans and recovery tests. What can be taught to other cultural heritage institutions with similar data?
- This is about as close to "green-field" development as you can get in an organization with many existing commitments and requirements. What change management exercises and policies helped the staff (and public) through these changes?
Cyber security is a group effort. It would be easy to pin this chaos on the tech who removed a block on the account that may have been the beachhead for this attack. As this report shows, the organization allowed this environment to flourish, culminating in that one bit-flip that brought the organization down.
I’ve never been in that position, but I am mindful that I could someday be in a similar position looking back at what my actions or inactions allowed to happen. I’ll probably be at risk of being in that position until the day I retire and destroy my production work credentials. I hope the British Library staff and all involved in the recovery are treating themselves well. Those of us on the outside are watching and cheering them on.