Preserving Digital Video
My place of work is looking to acquire educational videos in a digital form with an eye towards long-term preservation. At this point we receive a physical form (preferably DVD, but sometimes VHS) and digitize it to a very lossy access format (RealMedia, in this case). With this change, we would get a preservation-worthy digital copy from the producer/distributor and forego the physical version.
There is quite a lot written on preserving video, but I wanted to distill the requirements down into statements that vendors could reasonably provide today. I think these are pretty sound requirements, but I'm looking for feedback. In particular, I'm not quite sure how to handle the transfer of closed caption text from the publisher/distributor; suggestions are welcome.
[Jester's note: I just realized that an earlier version of this posting went out to the net about two hours before this "final" version. Sorry about publishing the work-in-progress early; I must have hit the wrong button in the new version of WordPress...]
File Formats
Some of the clearest guidance on file formats comes from this short excerpt from the Moving Image section of the U.K. Arts and Humanities Data Service Preservation Handbook:
Guidance on the preservation of digital video should, by necessity, change over time. [...] The MPEG-2 and MPEG-4 formats are better suited to high-quality digital video. MPEG-2 is better known for its use as a format for DVD-Video, which encourages confidence when considering the likelihood that the format will be readable in the long-term. The format has an average transfer rate of 2-5 megabits per second, but there may be disk space restraints and the software tools necessary to convert and store this format are costly. MPEG-4 has a lower transfer rate of 1-2 megabits per second and is intended for streaming video. Other codecs, such as QuickTime, Windows Media, Real Video and Open DIVX, are useful for specific purposes, but not suitable for preservation. ((Knight, G., & McHugh, J. (2005). Preservation Handbook: Moving Image. p. 3.))
The Library of Congress Sustainability of Digital Formats site has an entry for MPEG-2 (also known as H.262) and an entry for MPEG-4 (more completely, MPEG-4 file format version #2) that give the nitty-gritty details for the file formats.
The preservation master copies we want to store has a frame size of 720 pixels by 480 pixels. (That size is for NTSC format videos, common in USA, Canada and Japan. Master copies of PAL-format videos, common in Australia, New Zealand, the United Kingdom and most of Europe, is 720 x 576.) This is the standard resolution used in MPEG-2-compressed commercially distributed DVD movies. ((Audio/Video Capture and Management (2002).)) These frame sizes are appropriate for analog video signals. ("As defined by ITU-R Recommendation BT.601, more commonly know by the abbreviations Rec. 601 or BT.601 or its former name, CCIR 601. [It is] a standard published by the CCIR (now ITU-R) for encoding interlaced analogue video signals in digital form." (("Rec. 601" (2008).)) ) The audio is 48KHz stereo at 224 kb/s or better.
Captioning Text
There appears to be two primary schemes for binding closed captioned text with video files. One from the W3C is Synchronized Multimedia Integration Language (or SMIL) is an XML format and is used by many media players. The other is Microsoft's Synchronized Accessible Media Interchange (or SAMI), a pseudo-HTML format that is only read by Windows Media player.
To make matters more complicated, a whole set of different schemes are used for DVDs. (On VHS recordings, closed caption text was encoded in one of the non-visible lines that make up the video signal. Since the DVD format only included visible lines, other schemes were required.) The most popular seems to be the Scenarist Closed Caption (SCC) format. This is a binary file that exists on the DVD along side the video files.
Resources Consulted
Arms, C. R., & Fleischhauer, C. Sustainability of Digital Formats: Planning for Library of Congress Collections. National Digital Information Infrastructure and Preservation Program. Retrieved April 8, 2008, from http://www.digitalpreservation.gov/formats/.
Audio/Video Capture and Management. (2002).In NINCH Guide to Good Practice (1st). Retrieved April 8, 2008, from http://www.nyu.edu/its/humanities/ninchguide/VII/.
Guideline H: Provide access to multimedia presentations for users with sensory disabilities. Accessible Digital Media: Design Guidelines for Electronic Publications, Multimedia and the Web. Retrieved 14-Apr-2008 from http://ncam.wgbh.org/publications/adm/guideline_h.html.
Knight, G., & McHugh, J. (2005). Preservation Handbook: Moving Image. AHDS Preservation Handbook. 8 p. Arts and Humanities Data Service. Retrieved April 8, 2008, from http://ahds.ac.uk/preservation/video-preservation-handbook.pdf.
Rec. 601. (2008, April 8).Wikipedia, the free encyclopedia. Retrieved April 8, 2008, from http://en.wikipedia.org/wiki/Rec._601 (version at time of citation).
The text was modified to update a link from http://ahds.ac.uk/ to http://www.ahds.ac.uk/ on January 28th, 2011.
The text was modified to update a link from http://ahds.ac.uk/preservation/ahds-preservation-documents.htm to http://www.ahds.ac.uk/preservation/ahds-preservation-documents.htm on January 28th, 2011.
The text was modified to update a link from http://ahds.ac.uk/preservation/video-preservation-handbook.pdf to http://www.ahds.ac.uk/preservation/video-preservation-handbook.pdf on January 28th, 2011.
The text was modified to update a link from http://ahds.ac.uk/preservation/video-preservation-handbook.pdf to http://www.ahds.ac.uk/preservation/video-preservation-handbook.pdf on January 28th, 2011.
The text was modified to update a link from http://ncam.wgbh.org/publications/adm/guideline_h.html to http://ncam.wgbh.org/invent_build/web_multimedia/accessible-digital-media-guide/guideline-h-multimedia on January 28th, 2011.