Options in Storage for Digital Preservation

Posted on 3 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

A last-minute change to my plans for ALA Midwinter came on Tuesday when I was sought out to fill in for a speaker than canceled at the ALCTS Digital Preservation Interest Group meeting. Options for outsourcing storage and services for preserving digital content has been a recent interest, so I volunteered to combine two earlier DLTJ blog posts with some new information and present it to the group for feedback. The reaction was great, and here is the promised slide deck, links to further information, and some thoughts from the audience response.

Slide Deck and References

[caption id="slideshare-options-in-storage" align="alignright" width="425" caption="Slides for 'Options in Storage for Digital Preservation'"]

[/caption]
In the presentation there is a Table About Costs that uses a scenario from an earlier DLTJ blog post. The text of the scenario is:

To examine the similarities and differences in costs, let’s use the OhioLINK Satellite Image collection as a prototypical example. It consists of about 2 terabytes (2TB) of high-quality images in TIFF format, with about 7.5GB of data going into the repository each month. In the interest of exploring everything that S3 can do, there is an assumption that approximately 4GB of data will be transferred out of the archive each month; OCLC’s Digital Archive does not have a end-user dissemination component.

The point of showing this scenario is to show the widest range of costs -- from a storage-only solution like Amazon S3 to a soup-to-nuts service like OCLC Digital Archive. A word about the redacted costs. Some of the numbers for OCLC's Digital Archive response (from 2008) came from a confidential quote, so the numbers were removed from the public table. For the numbers that are publicly listed, the values come from Barbara Quint's article.

The articles and blog posts I referenced in the course of the presentation were:

Iglesias, Edward and Wittawat Meesangnil (2010). Using Amazon S3 in Digital Preservation in a mid sized academic library: A case study of CCSU ERIS digital archive system. The Code4Lib Journal, issue 12, retrieved 5-Jan-2011 from http://journal.code4lib.org/articles/4468

Murray, Peter (2008). Long-term Preservation Storage: OCLC Digital Archive versus Amazon S3. Disruptive Library Technology Jester. Retrieved 5-Jan-2011 from http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/

Murray, Peter (2009). Can We Outsource the Preservation of Digital Bits?. Disruptive Library Technology Jester. Retrieved 5-Jan-2011 from http://dltj.org/article/outsource-digital-bits/

Quint, Barbara (2008). OCLC Introduces High-Priced Digital Archive Service. Information Today. Retrieved 5-Jan-2011 from http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=49018

Some Thoughts

There was a great deal of discussion after the presentation about how good of a guarantee is good enough. Amazon S3, offers two levels of availability: "Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year." The question was whether that slight risk of loss is "good enough" for our purposes. Coming to grips with the digital storage, can we (as the librarian profession) get someone from Amazon to talk about what they do to assure that data is available? Can the terms that they use be translated into terms that we use and understand? Can we get a level of familiarity and comfort with their storage about what they do to trust them as a long-term data warehouse? Can we pull out the appropriate questions of the Trusted Repositories Audit & Certification: Criteria and Checklist to see how Amazon S3 measures up?