JPEG2000 for Digital Preservation

4 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

Last month was an interesting month for discussion and news of JPEG2000 as an archival format. First, there was a series of posts on the IMAGELIB about the rational for using JPEG2000 for master files. It started with a posting by Tom Blake of Boston Public Library asking these questions:

What can I do with a JPEG200 that I can't do with a TIFF, a good version
of Zoomify, and a well-designded DAMS?

Don't you need to rely on a proprietary version/flavor of JPEG2000 and a
viewer to utilize its full potential?

Bill Snead from Duke offered pointers in a follow up message to Aware's "Why JPEG2000?" whitepaper, the Olsen-Melville case study from the University of Connecticut, and Princeton's statement of use of JPEG2000 as surrogates from a TIFF master.

I leapt into the conversation by offering an opinion that with JPEG2000 is a compelling replacement for a TIFF-based practice because:

  1. JPEG2000 offers a single format for both access and preservation of digital imagery. Eliminating the complexity of managing derivatives in the creation, processing, and delivery of images is a good thing, I think. Said another way, the archival master can be the same file as the production master with the access derivatives being generated on-the-fly based on the inherent-to-JPEG2000 scaling capabilities. Also, if one's preservation master is your access master, then one will know very quickly if something is wrong with the preservation master -- it no longer renders in your access system.
  2. It is truly lossless compression -- one can get back to a bit-for-bit identical TIFF if desired.
  3. The standard has built-in support for bundling (embedding) metadata with the image bitstream. For the long term, that means we just need to be concerned about managing one file and not a series of image-plus-metadata files based on a (opinion-> "fragile") strategy of holding file names constant and varying the file extension. It could equally be argued, of course, that wrapping the image codestream and the associated metadata in a METS or MPEG21 file achieves the same end. I'm not sure which is better -- the metadata in the image file or the image file inside a metadata wrapper.
  4. JPEG2000 is at the core of a number of technologies that are bigger than we are with consituencies that have deeper pockets than ours to ensure JPEG2000 support is perpetuated, well, indefinitely. For instance, JPEG2000 is at the heart of the DICOM medical imaging standard that is gaining wide adoption. Also, motion pictures are being delivered to theaters using Motion JPEG2000. I can't confirm this because it seems wildly outrageous, but a reliable source told me that very soon motion picture film stock will not be produced anymore and that all films will be shot and edited digitally. I don't know if Motion JPEG2000 is used in the filming or editing process, but if it is then I'm even less worried about the future of reading JPEG2000 files.
  5. The JPEG2000 is an open standard with defined and emerging protocols for guaranteeing compliance with the standard. (This is one area, I think, that our community could help support the JPEG2000 standard, by the way. We have a vested interest in making sure images that we capture with one software are processable by other bits of software.) If the text of the standard is printed to acid free paper or etched onto a nickel plate, we've got a very good shot at creating a program to read these files in whatever future computing system comes about.

One note about #3 above. I would not argue that an access system based around JPEG2000 files would use the embedded metadata as the core of the DAMS. Rather, I would proposed that the authoritative version of the metadata would be in the JPEG2000 file itself. The surrounding DAMS database would be used for efficiencies in accessing and manipulating the metadata. If the DAMS was blown away (or if one wanted to migrate to a new DAMS), the subsequent asset management system could be built based on the metadata stored in the JPEG2000 artifacts. Any kind of modification to the metadata wouldbe written back to the JPEG2000 file and the requisite checksums would need to be recalculated. (If were storing our metadata in the same trusted fashion as the image bitstream, then we would be calculating checksums on the metadata anyway.)

One of the concerns about JPEG2000 is some language from the JPEG2000 website about how "undeclared and obscure submarine patents may still present a hazard..." to open use of the standard. This seems like lawyer CYA to me as nothing has come up that I'm aware of in the seven years after the standard was ratified. Here, too, there are deeper pockets than that of libraries and cultural heritage institutions that would fight something like this. And, if in the end it is found that a patent would cause an embargo 'unlicensed' versions of JPEG2000 codecs for some period of years, we can always run a batch conversion back to TIFF until the embargo period is up and/or something else better comes along.

To sum up, I believe that JPEG2000 is a compelling and real replacement for TIFF. I'm not the only one, though. In section 2.3 ("The Establishment of a Scalable Workflow for Digitizing a Wide Variety of Materials") of the Harvard University Library Open Collections Program Final Report, Harvard announced the "Adoption by Office of Information Systems of the jpeg2000 standard for archival master images stored in the Digital Repository Service." That is farther than the previous early adopter's statement by the Library of Congress in the Digital Format Registry entry for JPEG2000 where they said "[a]s adoption and implementation increases in other sectors, the use of the format for the Library's master images may become more and more appealing."