<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"><channel><title>Disruptive Library Technology Jester &#187; preservation</title> <atom:link href="http://dltj.org/tag/preservation/feed/" rel="self" type="application/rss+xml" /><link>http://dltj.org</link> <description>We&#039;re Disrupted, We&#039;re Librarians, and We&#039;re Not Going to Take It Anymore</description> <lastBuildDate>Mon, 06 Feb 2012 20:04:22 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <cloud domain='dltj.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' /> <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/us/</creativeCommons:license> <item><title>Thursday Threads: Digital Legacies, Zettabytes of Information, Digital Books, Alternate Network Architectures</title><link>http://dltj.org/article/thursday-threads-2011w19/</link> <comments>http://dltj.org/article/thursday-threads-2011w19/#comments</comments> <pubDate>Thu, 12 May 2011 10:19:46 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Thursday Threads]]></category> <category><![CDATA[ebooks]]></category> <category><![CDATA[information processing]]></category> <category><![CDATA[internet]]></category> <category><![CDATA[Peer-to-Peer Networks]]></category> <category><![CDATA[preservation]]></category><guid isPermaLink="false">http://dltj.org/?p=2872</guid> <description><![CDATA[Receive DLTJ Thursday Threads:by&#160;E-mailby&#160;RSSDelivered by FeedBurner Mind-expanding topics this week. The threads start with a potentially morbid, but definitely intriguing, topic: what is to become of our personal digital legacies? If that isn&#8217;t enough to blow your mind, the next &#8230; <a href="http://dltj.org/article/thursday-threads-2011w19/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2872"></abbr><div id="feedburner-thursday-threads-email-2011w19" class="wp-caption alignright noprint noFrontPage" style="width: 230px;;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><form style="border: 1px solid rgb(204, 204, 204); padding: 3px; margin: 0pt; text-align: center;" action="http://feedburner.google.com/fb/a/mailverify" method="post" target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true"><p>Receive <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> Thursday Threads:</p><p>by&nbsp;<a href="http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads&amp;loc=en_US" title="D.L.T.J. Thursday Threads Email Subscription">E-mail</a><br /><input style="width: 140px;" name="email" value="Your e-mail address" onfocus="if (this.defaultValue==this.value) this.value = ''" type="text"/><input value="thursday-threads" name="uri" type="hidden"/><input name="loc" value="en_US" type="hidden"/><input value="Subscribe" type="submit"/></p><p>by&nbsp;<a href="http://feeds.dltj.org/thursday-threads/" title="D.L.T.J. Thursday Threads RSS Feed">RSS</a></p><p style="font-size: 80%;">Delivered by <a href="http://feedburner.google.com" target="_blank" title="Google Feedburner Service">FeedBurner</a></p></form></div><p> Mind-expanding topics this week.  The threads start with a potentially morbid, but definitely intriguing, topic: <a href="#p2872-personal-digital-legacies">what is to become of our personal digital legacies</a>?  If that isn&#8217;t enough to blow your mind, the next topic is an <a href="#p2872-information-processed">accounting of the amount of information processed in 2008</a>.  Still hanging in there?  Then think about <a href="#p2872-digital-book">what could become of the book</a> if we take advantage of its digital nature.  You might not have much room to think big thoughts after those threads, but if you do the last one explores <a href="#p2872-new-networking-models">what could become of how our machines talk to each other</a>.<br /><span id="more-2872"></span><br />My apologies for missing last week&#8217;s <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym> Thursday Threads</i>.  I compose these entries on Wednesday evening and last week I snuggled with one of the children as he went to bed and I didn&#8217;t get back up again.  That&#8217;s 31 straight weeks of <i>Thursday Threads</i> without interruption; not bad for an experiment many months ago.  I haven&#8217;t mentioned this recently, so let me say it now:  thank you for all of the positive feedback and for your interest in this series.  It has been fun and useful to me to look back on the highlights of the week and put them in some context, and judging by the rising subscription count some readers find it useful too.</p><p>Feel free to send this to others you think might be interested in the topics.  If you find these threads interesting and useful, you might want to add the <a href="http://feeds.dltj.org/thursday-threads/" title="RSS Feed for DLTJ Thursday Threads">Thursday Threads RSS Feed</a> to your feed reader or subscribe to e-mail delivery using the form to the right.  If you would like a more raw and immediate version of these types of stories, watch <a href="http://friendfeed.com/dltj" title="Peter Murray - FriendFeed">my FriendFeed stream</a> (or subscribe to <a href="http://friendfeed.com/dltj?format=atom" title="Atom feed for Peter Murray's FriendFeed account">its feed</a> in your feed reader).  Comments and tips, as always, are <a href="http://dltj.org/contact">welcome</a>.</p><p><h2 id="p2872-personal-digital-legacies">Personal Digital Legacies</h2></p><blockquote><p>Here it is. I&#8217;m dead, and this is my last post to my blog. In advance, I asked that once my body finally shut down from the punishments of my cancer, then my family and friends publish this prepared message I wrote—the first part of the process of turning this from an active website to an archive.<div style="text-align: right; width: 100%;"><cite>- <a href="http://www.penmachine.com/2011/05/the-last-post" title="The last post - Penmachine - Derek K. Miller">The last post</a>, Penmachine, Derek K. Miller</cite></div></blockquote><p>Ed Summers pointed to this blog post in a <a href="https://twitter.com/#!/edsu/status/67759330342088704">tweet</a> in which Ed also said: &#8220;apart from being incredibly moving [this post] makes me wonder (again) what archiving services exist for depositing online work.&#8221;  Also earlier this week was an article forwarded to me from Ron Murray in New Scientist with the title <a href="http://www.newscientist.com/article/dn20445-digital-legacy-respecting-the-digital-dead.html" title="Digital legacy: Respecting the digital dead | New Scientist">Digital legacy: Respecting the digital dead</a>.  The article covers the efforts of the British Library in working with personal digital archives, offering an overview of the techniques that border on the field of digital forensics to preserve the digital legacies of donated personal archives.  And those two item follow a book I recently read called <i><a href="http://www.yourdigitalafterlife.com/" title="Your Digital Afterlife: A book about digital death and legacy.">Your Digital Afterlife</a></i> by Evan Carroll and John Romano (which itself I found by way of an <a href="http://www.npr.org/2011/01/10/132617124/after-death-protecting-your-digital-afterlife" title="After Death, Protecting Your 'Digital Afterlife' | NPR">NPR news story</a>).</p><p>It has me wondering and considering (if not yet acting on) my digital legacy.  Some of it is highly personal, like letting my spouse know how to access all of the digital bill paying sites that I use for the family&#8217;s finances.  Other parts are fairly public, like what to do with this blog and my social media accounts.  And it all brings me back around to Ed&#8217;s thought:  is there a role for libraries in this space?  Is some of what is represented in personal digital legacies the kind of highly-local content that libraries should be preserving?  Or, put another way, how would our profession respond to the preservation desires of a patron who has used the Google tools shown in this <a href="http://googlesystem.blogspot.com/2011/05/google-chromes-emotional-ad.html" title="Google Chrome's Emotional Ad">90 second video from Google</a> (via the Google Operating System blog, Unofficial news and tips about Google).</p><p><h2 id="p2872-information-processed">Accounting of Information Processed</h2></p><blockquote><p><div id="p2872-capacity" class="wp-caption alignright" style="width: 510px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://ucsdnews.ucsd.edu/newsrel/general/04-05BusinessInformation.asp" title="Business Information Consumption: 9,570,000,000,000,000,000,000 Bytes per Year"><img alt="" src="http://cdn.dltj.org/wp-content/uploads/2011/05/Appendix.Counting.jpg" title="Counting Very Large Numbers" width="500" height="242" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Comparisons of Digital Capacity Measurements</p></div><p>Three years ago, the world&#8217;s 27 million business servers processed 9.57 zettabytes, or 9,570,000,000,000,000,000,000 bytes of information.</p><p>Researchers at the School of International Relations and Pacific Studies and the San Diego Supercomputer Center at the University of California, San Diego, estimate that the total is equivalent to a 5.6-billion-mile-high stack of books stretching from Earth to Neptune and back to Earth, repeated about 20 times.</p><div style="text-align: right; width: 100%;"><cite>- <a href="http://www.networkworld.com/news/2011/050911-worlds-servers-process-957zb-of.html" title="World's servers process 9.57ZB of data a year | Network World">World&#8217;s servers process 9.57ZB of data a year</a>, by Lucas Mearian, NetworkWorld</cite></div></blockquote><p>Those numbers come from a <a href="http://ucsdnews.ucsd.edu/newsrel/general/04-05BusinessInformation.asp" title="Business Information Consumption: 9,570,000,000,000,000,000,000 Bytes per Year | University of California, San Diego">news release</a> of a <a href="http://hmi.ucsd.edu/pdf/HMI_2010_EnterpriseReport_Jan_2011.pdf" title="How Much Information?: 2010 Report on Enterprise Server Information">report</a> from the University of California at San Diego that measures the amount of information swirling around the world&#8217;s computers.  And it includes this handy table of comparisons of digital capacity measurements that attempts to put it into perspective.  (Although trying to imaging the number of home computer hard drives in the state of Minnesota still boggles the mind.)  I also find it to be a nice reality check.  After all, my field&#8217;s contribution to that number can&#8217;t be too big, can it?  So our problems really aren&#8217;t that big at all&#8230;</p><p><h2 id="p2872-digital-book">The Digital Book</h2></p><blockquote><p><div id="p2872-digital-book" class="wp-caption alignright" style="width: 456px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><object width="446" height="326"><param name="movie" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf"></param><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always"/><param name="wmode" value="transparent"></param><param name="bgColor" value="#ffffff"></param><param name="flashvars" value="vu=http://video.ted.com/talk/stream/2011/Blank/MikeMatas_2011-320k.mp4&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/MikeMatas-2011.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=1134&#038;lang=&#038;introDuration=15330&#038;adDuration=4000&#038;postAdDuration=830&#038;adKeys=talk=mike_matas;year=2011;theme=new_on_ted_com;theme=a_taste_of_ted2011;theme=the_creative_spark;theme=words_about_words;theme=what_s_next_in_tech;event=What%27s+Next+in+Tech;tag=Design;tag=Entertainment;tag=Technology;tag=demo;tag=software;&#038;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /><embed src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" pluginspace="http://www.macromedia.com/go/getflashplayer" type="application/x-shockwave-flash" wmode="transparent" bgColor="#ffffff" width="446" height="326" allowFullScreen="true" allowScriptAccess="always" flashvars="vu=http://video.ted.com/talk/stream/2011/Blank/MikeMatas_2011-320k.mp4&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/MikeMatas-2011.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=1134&#038;lang=&#038;introDuration=15330&#038;adDuration=4000&#038;postAdDuration=830&#038;adKeys=talk=mike_matas;year=2011;theme=new_on_ted_com;theme=a_taste_of_ted2011;theme=the_creative_spark;theme=words_about_words;theme=what_s_next_in_tech;event=What%27s+Next+in+Tech;tag=Design;tag=Entertainment;tag=Technology;tag=demo;tag=software;"></embed></object><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Presentation by Mike Matas at TED (4 minutes)</p></div><p>The idea behind Push Pop Press [is] a digital creation tool designed to blow up the concept of the book.  Frictionless self-publishing is a fertile new space, but this particular startup got a little help from former vice president Al Gore, whose exacting demands on an app version of his book <em><a href="http://www.amazon.com/Our-Choice-Solve-Climate-Crisis/dp/1594867348/ref=sr_1_1?ie=UTF8&amp;qid=1303940712&amp;sr=8-1" title="Our Choice: A Plan to Solve the Climate Crisis [Paperback] | Amzaon.com">Our Choice: A Plan to Solve the Climate Crisis</a></em> gave this would-be company its first real boost.</p><p>Developed by former Apple employees Mike Matas and Kimon Tsinteris, Push Pop Press will be a publishing platform for authors, publishers and artists to turn their books into interactive iPad or iPhone apps — no programming skills required.</p><div style="text-align: right; width: 100%;"><cite>- <a href="http://www.wired.com/gadgetlab/2011/04/app-stars-push-pop-press" title="Gore, Ex-Apple Engineers Team Up to Blow Up the Book | Wired.com">Gore, Ex-Apple Engineers Team Up to Blow Up the Book</a>, by Brian X. Chen, Wired.com Gadget Lab</cite></div></blockquote><p>I won&#8217;t call what these gentlemen show an &#8220;ebook&#8221; &#8212; something that brings to my mind static words on a digital page.  No, this is a <em>digital book</em> &#8212; something wholly new to the process of communicating ideas from author to reader.  To see why this is different, watch the <a href="http://www.ted.com/talks/mike_matas.html" title="Mike Matas: A next-generation digital book | Video on TED.com">four minute TED Talk video</a>.</p><p><h2 id="p2872-new-networking-models">New Ways of Internetworking</h2></p><blockquote><p>Imagine a web where our browsers connected directly to each other to do voice, video, media sharing and run applications, using P2P and real-time APIs, rather than going through centralized servers that controlled traffic and permissions.   That&#8217;s a potent idea and if implemented properly could future-proof a part of the web from authoritarian crack-downs, disruptions by disasters and more.  It could also establish a permanent lawless zone of connected devices with no central place to stop anyone from doing anything in particular.</p><p>It just so happens that something like that may now be under development in the most official of venues.  The World Wide Web Consortium (W3C) announced today the formation of a new <a href="http://www.w3.org/2011/04/webrtc-charter.html" title="Web Real-Time Communications Working Group Charter">Web Real-Time Communications Working Group</a> to define client-side APIs to enable Real-Time Communications in Web browsers, without the need for server-side implementation.  The Group is chaired by engineers from Google and Ericsson.</p><div style="text-align: right; width: 100%;"><cite>- <a href="http://www.readwriteweb.com/archives/his_could_be_big_decentralized_web_standard_under.php" title="This Could be Big: Decentralized Web Standard Under Development by W3C | ReadWriteWeb">This Could be Big: Decentralized Web Standard Under Development by W3C</a>, by Marshall Kirkpatrick, ReadWriteWeb</cite></div></blockquote><blockquote><p>A team of researchers at Rutgers University have launched the latest of a group of wireless network initiatives aiming to create a more open alternative to the Internet. MondoNet aims to enable a mesh network that lets a hybrid collection of new and existing Wi-Fi, WiMax and other wireless devices connect to each other without going through a central carrier.</p><p><strong><a href="http://www.mondonet.org/MondoNetNCApaper_draft.pdf" title="Weaving a New ‘Net: A Mesh-Based Solution for Democratizing Networked Communications">A draft proposal</a></strong> for MondoNet describes its premise as well as how it will gather the best of existing technologies for mobile ad-hoc wireless mesh networks (MANETs). The project&#8217;s goal to create a system that provides both greater freedom and privacy for individual users than today&#8217;s Web.</p><div style="text-align: right; width: 100%;"><cite>- <a href="http://www.eetimes.com/electronics-news/4215577/Rutgers-team-proposes-Net-alternative" title="Rutgers team proposes Net alternative | EE Times">Rutgers team proposes Net alternative</a>, Rick Merritt, EE Times</cite></div></blockquote><p>I&#8217;m including these as forward-looking points of interest.  I don&#8217;t know if either will amount to anything substantial, but I do think it is interesting that researchers are looking at the next evolutionary steps in networking.  Either of these proposals would be dramatic changes in flows of information between network users.  I like how both seem to be building in a fundamental pillar of privacy into the design.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/thursday-threads-2011w19/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Iron Mountain to Close its Virtual File Store Service</title><link>http://dltj.org/article/outsource-digital-bits-redux/</link> <comments>http://dltj.org/article/outsource-digital-bits-redux/#comments</comments> <pubDate>Tue, 12 Apr 2011 01:00:42 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Raw Technology]]></category> <category><![CDATA[preservation]]></category> <category><![CDATA[storage]]></category><guid isPermaLink="false">http://dltj.org/?p=2794</guid> <description><![CDATA[About two years ago I wrote a blog post wondering if we could outsource the preservation of digital bits. What prompted that blog post was an announcement from Iron Mountain of a Cloud-Based File Archiving service. Since then there have &#8230; <a href="http://dltj.org/article/outsource-digital-bits-redux/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2794"></abbr><p>About two years ago I wrote a blog post wondering if <a href="http://dltj.org/article/outsource-digital-bits/" title="Can We Outsource the Preservation of Digital Bits? | Disruptive Library Technology Jester">we could outsource the preservation of digital bits</a>.  What prompted that blog post was an announcement from Iron Mountain of a <a href="http://www.ironmountain.com/news/2009/impr02232009.asp" title="Iron Mountain Digital Introduces the Industry’s First Enterprise Solution for Cloud-Based File Archiving" class="broken_link" rel="nofollow">Cloud-Based File Archiving</a> service.  Since then there have been a number of other services that have sprung up that are more attuned to the needs of cultural heritage communities (<a href="http://duraspace.org/duracloud.php" title="DuraCloud | Duraspace">DuraCloud</a> and <a href="http://chronopolis.sdsc.edu/" title="Chronopolis -- Digital Preservation Program">Chronopolis</a> come to mind), but I have wondered if the commercial sector had a way to do this cheaply and efficiently.  The answer to that question is &#8220;maybe not&#8221; as <a href="http://www.gartner.com/DisplayDocument?id=1626215" title="Iron Mountain Becomes Third Provider to Exit Public Cloud Storage Market | Gartner Group Research">Iron Mountain has told Gartner Group</a> (<a href="http://cdn.dltj.org/wp-content/uploads/2011/04/iron_mountain_becomes_third__211632.pdf" title="Iron Mountain Becomes Third Provider to Exit Public Cloud Storage Market [PDF]">PDF archive</a>) that it is closing its <span class="removed_link" title="http://www.ironmountain.com/digital-archiving/inactive-data-file-archiving.html">Virtual File Store</span> services and its <a href="http://www.ironmountain.com/storage/storage-as-a-service.html" title="Storage as a Service, STaaS Storage Outsourcing, Data Storage Solution | Iron Mountain" class="broken_link" rel="nofollow">Archive Service Platform</a>.</p><p>The Gartner analysis goes on to say: &#8220;Virtual File Store customers that stay with Iron Mountain will be transferred to a higher-value offering, File System Archiving (FSA) in 2012. The new offering will be a hybrid that leverages policy-based archiving on site and in the cloud with indexing and classification capabilities.&#8221; <a href="http://www.theregister.co.uk/2011/04/11/iron_mountain_exits_public_storage_cloud/" title="Bruised Iron Mountain gives up on storage cloud | The Register">The Register has more details and speculation </a> about what happened.  As always, the full story might be more interesting that what the news reports are saying.  In any case &#8212; just to close this loop &#8212; if you were thinking of trying this particular option, think no further.<p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://www.ironmountain.com/digital-archiving/inactive-data-file-archiving.html on June 9th, 2011.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/outsource-digital-bits-redux/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>Thursday Threads: Kindle Singles and Kindle Accessibility, Sped-up Discourse, ISBN Troubles</title><link>http://dltj.org/article/thursday-threads-2011w4/</link> <comments>http://dltj.org/article/thursday-threads-2011w4/#comments</comments> <pubDate>Thu, 27 Jan 2011 11:50:30 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Thursday Threads]]></category> <category><![CDATA[accessibility]]></category> <category><![CDATA[Amazon]]></category> <category><![CDATA[ebooks]]></category> <category><![CDATA[identifier]]></category> <category><![CDATA[ISBN]]></category> <category><![CDATA[Kindle]]></category> <category><![CDATA[Kindle Singles]]></category> <category><![CDATA[preservation]]></category> <category><![CDATA[scholarly communication]]></category><guid isPermaLink="false">http://dltj.org/?p=2408</guid> <description><![CDATA[Receive DLTJ Thursday Threads:by&#160;E-mailby&#160;RSSDelivered by FeedBurner This week Amazon takes center stage of DLTJ Thursday Threads with a report of their new Kindle Singles program for medium-form digital content and a screen-reader-aware version of the Kindle reader application for PCs. &#8230; <a href="http://dltj.org/article/thursday-threads-2011w4/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2408"></abbr><div id="feedburner-thursday-threads-email-2011w04" class="wp-caption alignright noprint noFrontPage" style="width: 230px;;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><form style="border: 1px solid rgb(204, 204, 204); padding: 3px; margin: 0pt; text-align: center;" action="http://feedburner.google.com/fb/a/mailverify" method="post" target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true"><p>Receive <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> Thursday Threads:</p><p>by&nbsp;<a href="http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads&amp;loc=en_US" title="D.L.T.J. Thursday Threads Email Subscription">E-mail</a><br /><input style="width: 140px;" name="email" value="Your e-mail address" onfocus="if (this.defaultValue==this.value) this.value = ''" type="text"/><input value="thursday-threads" name="uri" type="hidden"/><input name="loc" value="en_US" type="hidden"/><input value="Subscribe" type="submit"/></p><p>by&nbsp;<a href="http://feeds.dltj.org/thursday-threads/" title="D.L.T.J. Thursday Threads RSS Feed">RSS</a></p><p style="font-size: 80%;">Delivered by <a href="http://feedburner.google.com" target="_blank" title="Google Feedburner Service">FeedBurner</a></p></form></div><p> This week Amazon takes center stage of <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym> Thursday Threads</i> with a report of their new <a href="#kindle-singles">Kindle Singles program</a> for medium-form digital content and a <a href="#kindle-accessibility">screen-reader-aware version</a> of the Kindle reader application for PCs.  After that is a look at how <a href="#trial-by-twitter">scholarly discourse is changing</a> &#8212; radically! &#8212; with the availability and use of near-real-time feedback loops.  And we close out with a peek at <a href="#ebook-isbn">shaky ground</a> in the world of ISBN identifiers.</p><p>As a sidenote to last week&#8217;s comment about this blog migrating to Amazon&#8217;s service&#8230;there are still a few hiccups.  For instance, last week&#8217;s edition of <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym> Thursday Threads</i> wasn&#8217;t published via the RSS feed until late in the day and it wasn&#8217;t until Friday that the e-mail subscribers received it.  I think those issues are ironed out now, but if you notice any other problems <a href="http://dltj.org/contact">please let me know</a>.</p><p><h2 id="kindle-singles">Kindle Singles — Compelling Ideas Expressed at Their Natural Length — Now Available in the Kindle Store</h2></p><blockquote><p>Before the advent of digital reading, writers often had to choose between making their work short enough for a magazine article or long enough to deliver the &#8220;heft&#8221; required for book marketing and distribution. Three months ago, Amazon made a call to serious writers, thinkers, scientists, business leaders, historians, politicians and publishers to join Kindle in making a new kind of content available to readers—Kindle Singles. Typically between 5,000 and 30,000 words, each Kindle Single is intended to allow a single killer idea &#8212; well researched, well argued and well illustrated &#8212; to be expressed at its natural length. Today, Amazon is introducing the first set of Kindle Singles to the Kindle Store. &#8230;</p><p>The new Kindle Singles section of the Kindle Store is now available at <a href="http://www.amazon.com/kindlesingles" title="Kindle Singles | Amazon.com">www.amazon.com/kindlesingles</a>. Available to both Kindle device and app users, and priced between $0.99 and $4.99, the first set of Kindle Singles include original reporting, essays, memoirs and fiction. Amazon plans to frequently launch many more Kindle Singles over time.</p></blockquote><p>Is there room for commercial content between &#8220;short enough for a magazine article&#8221; and a full-fledged book?  Amazon seems to think so with this <a href="http://www.businesswire.com/news/home/20110126006018/en/Kindle-Singles" title="Kindle Singles -- Compelling Ideas Expressed at Their Natural Length -- Now Available in the Kindle Store | Business Wire">announcement of the Kindle Singles</a> program.  Among the first are <a href="http://blog.ted.com/2011/01/26/introducing-tedbooks/" title="Introducing TED Books | TED Blog">three works from TEDTalk speakers</a> priced at $2.99 each.  The content is only available in digital form and only in the proprietary Kindle format.  This may be a problem for a library trying to acquire this content for its collection (although this is just a subset of the more general issue of acquiring content saddled in proprietary formats with restrictive digital rights management).  What makes this problem more acute, though, is that Amazon is seeking high quality content for the Kindle Singles channel (&#8220;Singles will be a highly curated group of content they feel is valuable to their readers&#8221; <a href="http://www.kindleexpert.com/kindle-singles-are-coming%E2%80%A6-and-here%E2%80%99s-what-you-need-to-know/" title="Kindle Singles are coming | Kindleexpert.com">according to the Kindle Expert website</a>).  That might make the content more desirable by patrons and more likely to be considered preservation-worthy.  (You can read about <a href="http://www.zdnet.com/blog/btl/review-my-amazon-kindle-single-publishing-experiment/43911" title="My Amazon Kindle Single publishing experiment | ZDNet">one author&#8217;s perspective</a> on publishing in the Kindle Singles program.)</p><p><h2 id="kindle-accessibility">Kindle for PC with Accessibility Plugin</h2></p><blockquote><p>Kindle for PC with Accessibility Plugin is a free application for your Windows PC. It provides the following accessibility features:</p><ul><li>Text-to-speech reading with adjustable voice settings</li><li>Voice-guided menu navigation</li><li>Large font sizes</li><li>High contrast reading mode</li><li>Keyboard navigation</li><li>Accessible shortcuts</li></ul><p>Because this software is an assistive technology, there are no restrictions on text-to-speech reading. In order to use the text-to-speech feature, an external screen reader program must be installed and running on the Windows PC.  Tested screen readers include: JAWS and NVDA. An external screen reader is used to read aloud menus and navigation items, while book text is read by a built-in text-to-speech engine.</p></blockquote><p>Although I&#8217;m hard pressed to find the formal announcement, a version of the <a href="http://www.amazon.com/gp/feature.html/ref=kin_pcacc_surl&#038;docId=1000632481" title="Kindle for PC with Accessibility Plugin">Kindle for PC with Accessibility Plugin</a> was made available earlier this month.  The National Federation of the Blind has a <a href="http://www.nfb.org/nfb/NewsBot.asp?MODE=VIEW&#038;ID=751" title="Amazon Kindle for PC | National Federation of the Blind">review of the software</a> with some constructive criticism that hopefully Amazon will take to heart.  What is interesting is that one can use a screen reading program such as the commercial <a href="http://www.freedomscientific.com/products/fs/jaws-product-page.asp" title="JAWS for Windows Screen Reading Software | Freedom Scientific">Jaws for Windows</a> or the open source <a href="http://www.nvda-project.org/" title="NVDA homepage">NonVisual Desktop Access</a> (NVDA) to have the text of the book read aloud &#8220;regardless of a publisher&#8217;s [text-to-speech] &#8230; choice.&#8221;  If you are serving a population of users with a sight impairment, this may be an option to look at to expand the universe of accessible materials to everything available in the Kindle store.</p><p><h2 id="trial-by-twitter">Peer review: Trial by Twitter</h2></p><blockquote><p>For many researchers, the pace and tone of this online review can be intimidating — and can sometimes feel like an attack. How are authors supposed to respond to critiques coming from all directions? Should they even respond at all? Or should they confine their replies to the conventional, more deliberative realm of conferences and journals? &#8220;The speed of communication is ahead of the sheer time needed to think and get in the lab and work,&#8221; said Felisa Wolfe-Simon, a postdoctoral fellow at the NASA Astrobiology Institute in Mountain View, California, and the lead author on the arsenic paper. Aptly enough, she circulated that comment as a tweet on Twitter, which is used by many scientists to call attention to longer articles and blog posts.</p><p>To bring some order to this chaos, it looks as though a new set of cultural norms will be needed, along with an online infrastructure to support them. The idea of open, online peer review is hardly new. Since Internet usage began to swell in the 1990s, enthusiasts have been arguing that online commenting could and should replace the traditional process of pre-publication peer review that journals carry out to decide whether a paper is worth publishing.</p></blockquote><p>This <a href="http://www.nature.com/news/2011/110119/full/469286a.html" title="Peer review: Trial by Twitter : Nature News">article in Nature News</a> points out the problem when commentary on scientific studies moves at Twitter speed.  The old mechanisms of published peer-reviewed articles followed by commentary in later issue of the same journal in the form of published letters is being challenged by the internet world of blogs and tweets.  As the author says, a new form of cultural norms is required as well as mechanisms to track the discourse.  [Via Eric Schmell]</p><p><h2 id="ebook-isbn">eBook Identifier Confusion Shakes Book Industry</h2></p><blockquote><p>Last Thursday, I was fortunate to be at a presentation of the Book Industry Study Group (BISG) about identification of eBooks. BISG hired Michael Cairns, the principal of <a href="http://infomediapartners.blogspot.com/" title="Information Media Partners" class="broken_link" rel="nofollow">Information Media Partners</a>, to do a study of the use, issues and practice surrounding assignment of ISBNs in the US book industry. Think of him as a structural engineer hired to inspect the damage to the supply chain&#8217;s supporting infrastructure after an earthquake. Cairns conducted 55 separate interviews with a total of 75 industry experts from all facets of the industry. (I was interviewed for my expertise in the use of ISBN in library linking systems).</p><ul><li>BISG eBook ISBN Study Findings Released <a href="http://personanondata.blogspot.com/2011/01/bisg-ebook-isbn-study-findings-released.html" title="BISG eBook ISBN Study Findings Released | Personanondata">Michael Cairns&#8217; blog</a></li><li>Summary of BISG Presentation <a href="http://www.bisg.org/docs/BISG_identification_of_e-books_research_project_summary_findings.pdf" title="Book Industry Study Group's Identification of E-Books Research Project, Summary of Report Findings">From BISG, PDF 730 KB</a></li></ul><p>Cairns (<a href="http://twitter.com/#%21/personanondata" title="personanondata on Twitter">@personanondata</a> on Twitter) is an industry veteran- he&#8217;s held senior executive positions at Bowker and other companies. His presentation was clear and direct, and he quickly went to the heart of the matter. He found very little support for the policy set forth by the 2005 revision of the ISBN standard regarding when to assign a new ISBN to an ebook.</p></blockquote><p>Eric Hellman writes about <a href="http://go-to-hellman.blogspot.com/2011/01/ebook-identifier-confusion-shakes-book.html" title="eBook Identifier Confusion Shakes Book Industry | Go To Hellman">his views of the dysfunction surrounding ISBN assignments for ebooks</a>.  &#8220;What problems?&#8221; you might ask &#8212; Eric writes has an example of how Barnes and Noble was enhancing some ebooks for their Nook platform.  By itself, this activity wouldn&#8217;t result in assigning a new ISBN.  But because publishers are now exerting more control over setting the prices of ebooks (the so-called &#8220;<a href="http://www.libraryjournal.com/article/CA6721294.html" title="Macmillan CEO Explains &#039;Agency Model&#039; for Selling Ebooks | Library Journal">agency model</a>&#8220;) the existence of these Nook-enhanced versions needs to cross back-and-forth between the publisher&#8217;s and retailer&#8217;s electronic systems.  The only commonly agreed upon identifier?  The ISBN.  And this proliferation of ISBN assignments is making trouble for library&#8217;s efforts to effectively identify material &#8212; which is to say nothing about what it is doing to our efforts to shoehorn these distinctions between various works into the MARC format used by our catalogs.  Is that a separate record for that manifestation with a different ISBN?</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/thursday-threads-2011w4/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>Options in Storage for Digital Preservation</title><link>http://dltj.org/article/preservation-storage-options/</link> <comments>http://dltj.org/article/preservation-storage-options/#comments</comments> <pubDate>Sun, 09 Jan 2011 22:45:10 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Meeting]]></category> <category><![CDATA[Amazon S3]]></category> <category><![CDATA[Association for Library Collections and Technical Services]]></category> <category><![CDATA[Chronopolis]]></category> <category><![CDATA[DAITS]]></category> <category><![CDATA[Duracloud]]></category> <category><![CDATA[Iron Mountain]]></category> <category><![CDATA[LOCKSS]]></category> <category><![CDATA[OCLC]]></category> <category><![CDATA[preservation]]></category> <category><![CDATA[storage]]></category> <category><![CDATA[trac]]></category><guid isPermaLink="false">http://dltj.org/?p=2101</guid> <description><![CDATA[A last-minute change to my plans for ALA Midwinter came on Tuesday when I was sought out to fill in for a speaker than canceled at the ALCTS Digital Preservation Interest Group meeting. Options for outsourcing storage and services for &#8230; <a href="http://dltj.org/article/preservation-storage-options/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2101"></abbr><p>A last-minute change to <a href="http://dltj.org/article/alamw11-schedule/">my plans for ALA Midwinter</a> came on Tuesday when I was sought out to fill in for a speaker than canceled at the <a href="http://connect.ala.org/node/119686" title="Digital Preservation Interest Group (ALCTS) | ALA Connect">ALCTS Digital Preservation Interest Group meeting</a>.  Options for outsourcing storage and services for preserving digital content has been a recent interest, so I volunteered to combine two earlier <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> blog posts with some new information and present it to the group for feedback.  The reaction was great, and here is the promised slide deck, links to further information, and some thoughts from the audience response.</p><p><h2>Slide Deck and References</h2><br /><div id="slideshare-options-in-storage" class="wp-caption alignright" style="width: 435px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><div style="width:425px" id="__ss_6499127"><strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/DataGazetteer/options-in-storage-for-digital-preservation" title="Options in Storage for Digital Preservation">Options in Storage for Digital Preservation</a></strong><object id="__sse6499127" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=201101digitalpreservationinterestgroupalcts-110109162838-phpapp02&#038;stripped_title=options-in-storage-for-digital-preservation&#038;userName=DataGazetteer" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse6499127" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=201101digitalpreservationinterestgroupalcts-110109162838-phpapp02&#038;stripped_title=options-in-storage-for-digital-preservation&#038;userName=DataGazetteer" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object></div><p><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Slides for &#039;Options in Storage for Digital Preservation&#039;</p></div><br />In the presentation there is a <a href="http://www.slideshare.net/DataGazetteer/options-in-storage-for-digital-preservation/14" title="Options in Storage for Digital Preservation">Table About Costs</a> that uses a scenario from an <a href="http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/">earlier <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> blog post</a>.  The text of the scenario is:<br /><blockquote>To examine the similarities and differences in costs, let’s use the OhioLINK Satellite Image collection as a prototypical example. It consists of about 2 terabytes (2TB) of high-quality images in TIFF format, with about 7.5GB of data going into the repository each month. In the interest of exploring everything that S3 can do, there is an assumption that approximately 4GB of data will be transferred out of the archive each month; OCLC’s Digital Archive does not have a end-user dissemination component.</p></blockquote><p>The point of showing this scenario is to show the widest range of costs &#8212; from a storage-only solution like Amazon S3 to a soup-to-nuts service like OCLC Digital Archive.  A word about the redacted costs.  Some of the numbers for OCLC&#8217;s Digital Archive response (from 2008) came from a confidential quote, so the numbers were removed from the public table.  For the numbers that are publicly listed, the values come from <a href="http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=49018" title="OCLC Introduces High-Priced Digital Archive Service">Barbara Quint&#8217;s article</a>.</p><p>The articles and blog posts I referenced in the course of the presentation were:</p><p>Iglesias, Edward and Wittawat Meesangnil (2010). Using Amazon S3 in Digital Preservation in a mid sized academic library: A case study of CCSU ERIS digital archive system. <i>The Code4Lib Journal</i>, issue 12, retrieved 5-Jan-2011 from <a href="http://journal.code4lib.org/articles/4468" title="Using Amazon S3 in Digital Preservation in a mid sized academic library: A case study of CCSU ERIS digital archive system | The Code4Lib Journal">http://journal.code4lib.org/articles/4468</a></p><p>Murray, Peter (2008). Long-term Preservation Storage: OCLC Digital Archive versus Amazon S3. <i>Disruptive Library Technology Jester.</i> Retrieved 5-Jan-2011 from <a href="http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/">http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/</a></p><p>Murray, Peter (2009). Can We Outsource the Preservation of Digital Bits?. <i>Disruptive Library Technology Jester.</i> Retrieved 5-Jan-2011 from <a href="http://dltj.org/article/outsource-digital-bits/">http://dltj.org/article/outsource-digital-bits/</a></p><p>Quint, Barbara (2008). OCLC Introduces High-Priced Digital Archive Service. <i>Information Today.</i> Retrieved 5-Jan-2011 from <a href="http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=49018" title="OCLC Introduces High-Priced Digital Archive Service | Information Today">http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=49018</a></p><ul><li>Amazon S3. Retrieved 5-Jan-2011 from <a href="http://aws.amazon.com/s3/" title="Amazon Simple Storage Service (Amazon S3)">http://aws.amazon.com/s3/</a></li><li>Chronopolis. Retrieved 5-Jan-2011 from <a href="http://chronopolis.sdsc.edu/" title="Chronopolis -- Digital Preservation Program -- Long-Term Mass-Scale Federated Digital Preservation"> http://chronopolis.sdsc.edu/</a></li><li>DAITSS. Retrieved 5-Jan-2011 from <a href="http://daitss.fcla.edu/" title="DAITSS - Trac">http://daitss.fcla.edu/</a></li><li>DuraCloud. Retrieved 5-Jan-2011 from <a href="https://wiki.duraspace.org/display/duracloud/DuraCloud">https://wiki.duraspace.org/display/duracloud/DuraCloud</a></li><li>Iron Mountain. Retreived 5-Jan-2011 from <a href="http://www.ironmountain.com/news/2009/impr02232009.asp" title="Iron Mountain Digital Introduces the Industry" class="broken_link" rel="nofollow">http://www.ironmountain.com/news/2009/impr02232009.asp</a></li><li>OCLC Digital Archive. Retrieve 5-Jan-2011 from <a href="http://www.oclc.org/us/en/digitalarchive/" title="Digital Archive [OCLC - Digital Collection Services]">http://www.oclc.org/us/en/digitalarchive/</a></li><li>Private LOCKSS Networks. Retrieved 5-Jan-2011 from <a href="http://lockss.stanford.edu/lockss/Private_LOCKSS_Networks" title="Private LOCKSS Networks - LOCKSS">http://lockss.stanford.edu/lockss/Private_LOCKSS_Networks</a></li></ul><p><h2>Some Thoughts</h2><br />There was a great deal of discussion after the presentation about how good of a guarantee is good enough.  Amazon S3, offers two levels of availability:  &#8220;Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year.&#8221;  The question was whether that slight risk of loss is &#8220;good enough&#8221; for our purposes.  Coming to grips with the digital storage, can we (as the librarian profession) get someone from Amazon to talk about what they do to assure that data is available?  Can the terms that they use be translated into terms that we use and understand?  Can we get a level of familiarity and comfort with their storage about what they do to trust them as a long-term data warehouse?  Can we pull out the appropriate questions of the <a href="http://www.dcc.ac.uk/resources/tools-and-applications/trustworthy-repositories" title="Trustworthy Repositories | Digital Curation Centre">Trusted Repositories Audit &amp; Certification: Criteria and Checklist</a> to see how Amazon S3 measures up?</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/preservation-storage-options/feed/</wfw:commentRss> <slash:comments>10</slash:comments> </item> <item><title>Thursday Threads: Amazon Pressures Publishers, Academic Spam, Mechanical Turk Spam, Multispectral Imaging</title><link>http://dltj.org/article/thursday-threads-2010w52/</link> <comments>http://dltj.org/article/thursday-threads-2010w52/#comments</comments> <pubDate>Thu, 30 Dec 2010 12:07:28 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Thursday Threads]]></category> <category><![CDATA[Amazon]]></category> <category><![CDATA[Amazon Mechanical Turk]]></category> <category><![CDATA[digitization]]></category> <category><![CDATA[Google Scholar]]></category> <category><![CDATA[jpeg2000]]></category> <category><![CDATA[preservation]]></category> <category><![CDATA[publishing]]></category> <category><![CDATA[search engine]]></category> <category><![CDATA[spam]]></category><guid isPermaLink="false">http://dltj.org/?p=1931</guid> <description><![CDATA[Receive DLTJ Thursday Threads:by&#160;E-mailby&#160;RSSDelivered by FeedBurner With the close of the year approaching, this issue marks the 14th week of DLTJ Thursday Threads. This issue has a publisher&#8217;s view of Amazon&#8217;s strong-arm tactics in book pricing, research into the possibility &#8230; <a href="http://dltj.org/article/thursday-threads-2010w52/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=1931"></abbr><div id="feedburner-thursday-threads-email-w52" class="wp-caption alignright" style="width: 230px;;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><form style="border:1px solid #ccc;padding:3px;margin:0;text-align:center;" action="http://feedburner.google.com/fb/a/mailverify" method="post" target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true"><p>Receive <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> Thursday Threads:</p><p>by&nbsp;<a href="http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads&#038;loc=en_US" title="D.L.T.J. Thursday Threads Email Subscription">E-mail</a><br /><input type="text" style="width:140px" name="email" value="Your e-mail address" onFocus="if (this.defaultValue==this.value) this.value = ''"/><input type="hidden" value="thursday-threads" name="uri"/><input type="hidden" name="loc" value="en_US"/><input type="submit" value="Subscribe" /></p><p>by&nbsp;<a href="http://feeds.dltj.org/thursday-threads/" title="D.L.T.J. Thursday Threads RSS Feed">RSS</a></p><p style="font-size: 80%">Delivered by <a href="http://feedburner.google.com" target="_blank" title="Google Feedburner Service">FeedBurner</a></p></form></div><p> With the close of the year approaching, this issue marks the 14th week of <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym> Thursday Threads</i>.  This issue has a publisher&#8217;s view of Amazon&#8217;s strong-arm tactics in book pricing, research into the possibility that academic authors could game Google Scholar with spam, demonstrations of how Amazon&#8217;s Mechanical Turk drives down the cost of enlisting humans to overwhelm anti-spam systems, and a story of multispectral imaging adding information in the process of digital preservation.</p><p>As the new year approaches, I wish you the best professionally and personally.</p><p><h2><a name="books_after_amazon">Books After Amazon</a></h2></p><blockquote><p>What happens when an industry concerned with the production of culture is beholden to a company with the sole goal of underselling competitors? Amazon is indisputably the king of books, but the issue remains, as Charlie Winton, CEO of the independent publisher Counterpoint Press puts it, “what kind of king they’re going to be.” A vital publishing industry must be able take chances with new authors and with books that don’t have obvious mass-market appeal. When mega-retailers have all the power in the industry, consumers benefit from low prices, but the effect on the future of literature—on what books can be published successfully—is far more in doubt.</p></blockquote><p><a href="http://www.bostonreview.net/BR35.6/roychoudhuri.php" title="Boston Review &amp;mdash; Onnesha Roychoudhuri: Books After Amazon">Onnesha Roychoudhuri publishes this view of Amazon&#8217;s marketing practices</a> in the lastest issue of the <a href="http://www.bostonreview.net/" title="Boston Review &amp;mdash; Home">Boston Review</a>.  From the publisher&#8217;s pespective, the strong-arm tactics described sound horrible.  But the story also points to cracks appearing &#8212; at least for the bigger publishers.  That may leave smaller, independent publishers in a big squeeze.  [Via OCLC Research's <a href="http://www.oclc.org/research/publications/newsletters/abovethefold/2010-12-17.htm" title="http://www.oclc.org/research/publications/newsletters/abovethefold/2010-12-17.htm">Above-the-Fold</a>]</p><p><h2><a name="academic_spam">Academic Search Engine Spam and Google Scholar&#8217;s Resilience Against it</a></h2></p><blockquote><p>Abstract: In a previous paper we provided guidelines for scholars on optimizing research articles for academic search engines such as Google Scholar. Feedback in the academic community to these guidelines was diverse. Some were concerned researchers could use our guidelines to manipulate rankings of scientific articles and promote what we call ‘academic search engine spam’. To find out whether these concerns are justified, we conducted several tests on Google Scholar. The results show that academic search engine spam is indeed—and with little effort—possible: We increased rankings of academic articles on Google Scholar by manipulating their citation counts; Google Scholar indexed invisible text we added to some articles, making papers appear for keyword searches the articles were not relevant for; Google Scholar indexed some nonsensical articles we randomly created with the paper generator SciGen; and Google Scholar linked to manipulated versions of research papers that contained a Viagra advertisement. At the end of this paper, we discuss whether academic search engine spam could become a serious threat to Web-based academic search engines.</p></blockquote><p><a href="http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0013.305" title="Academic Search Engine Spam and Google Scholar's Resilience Against it">Joeran Beel and Bela Gipp have this article</a> in the most recent issue of <a href="http://www.journalofelectronicpublishing.org/" title="The Journal of Electronic Publishing: Welcome">Journal of Electronic Publishing</a>.  In addition to being able to game <a href="http://scholar.google.com/" title="Google Scholar">Google Scholar</a>, the authors note that <a href="http://academic.research.microsoft.com/" title="Microsoft Academic Search">Microsoft Academic Search</a> and <a href="http://citeseer.ist.psu.edu/" title="CiteSeerX">CiteSeer</a> (as well as their own academic search engine currently under development &#8212; <a href="http://SciPlore.org/" title="SciPlore: Exploring Science">SciPlore</a>) have the same issues.  Although it is possible, we don&#8217;t know if it is being done &#8212; or even if there would be an penalties in the academic community for doing so.</p><p><h2><a name="mechanical_turk_spam">Mechanical Turk: Now with 40.92% spam</a></h2></p><blockquote><p>At this point, Amazon Mechanical Turk has reached the mainstream. Pretty much everyone knows about the concept. Post small tasks online, pay people cents, and get thousands of micro-tasks completed. Unfortunately, this resulted in some unfortunate trends. Anyone who frequents just a little bit the market will notice the tremendous number of spammy HITs. (HIT = a task posted for completion in the market; stands for Human Intelligence Task). &#8220;Test if the ads in my website work&#8221;. &#8220;Create a Twitter account and follow me&#8221;. &#8220;Like my YouTube video&#8221;. &#8220;Download this app&#8221;. &#8220;Write a positive review on Yelp&#8221;. A seemingly endless amount of spam HITs come to the market, mainly with the purpose of spamming &#8220;social media&#8221; metrics. So, with Dahn Tamir and Priya Kanth (MS student at NYU), we decided to examine how big is the problem. How many spammers join the market? How many spam HITs are there?</p></blockquote><p>This post from Panos Ipeirotis, Associate Professor at the IOMS Department at Stern School of Business of New York University, describes a <a href="http://behind-the-enemy-lines.blogspot.com/2010/12/mechanical-turk-now-with-4092-spam.html" title="Mechanical Turk: Now with 40.92% spam. - A Computer Scientist in a Business School">review of activities</a> posted to <a href="https://www.mturk.com/mturk/welcome">Amazon&#8217;s Mechanical Turk</a> service.  Spam is everywhere, and it appears that the Mechanical Turk is reducing the friction between buyers and workers of spam activity. [Via Ron Murray]</p><p><h2><a name="multispectral_imaging">Cutting-Edge Imaging Helps Scholar Reveal 8th-Century Manuscript</a></h2></p><blockquote><p>With a manuscript like the St. Chad Gospels, multispectral imaging—a series of scans, each based on a single part of the color spectrum—allows his team to create images that have the equivalent of three-dimensional detail, down to revealing the thickness of brush strokes on letters and illustrations. Cockled pages can be virtually flattened out so that all their details can be studied. Studied color band by color band, the chemical composition of ink can be determined.</p></blockquote><p>This <a href="http://chronicle.com/article/Cutting-Edge-Imaging-Helps/125616/" title="Cutting-Edge Imaging Helps Scholar Reveal 8th-Century Manuscript - Research - The Chronicle of Higher Education">article</a> by Jennifer Howard at the Chrnoicle of Higher Education reviews the story of how 8th-century documents in England were digitized by scholars at the University of Kentucky.  It caught my eye because of the mention of multispectral imaging; this is something that the JPEG2000 file format can natively store.  Digitization at this level doesn&#8217;t just provide alternative, online access to documents &#8212; it actually adds new information to the process of researching those documents.  [Note: the link is behind a publisher paywall. If you would like to see it, send me an e-mail and I'll forward you a short-term link from the Chronicle's website.]</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/thursday-threads-2010w52/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>Latest Views on JPEG2000 for Presentation and Archiving</title><link>http://dltj.org/article/jpeg2000-uk-report/</link> <comments>http://dltj.org/article/jpeg2000-uk-report/#comments</comments> <pubDate>Mon, 29 Nov 2010 20:22:17 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[JPEG2000]]></category> <category><![CDATA[imaging]]></category> <category><![CDATA[jpeg2000]]></category> <category><![CDATA[presentation]]></category> <category><![CDATA[preservation]]></category><guid isPermaLink="false">http://dltj.org/?p=1874</guid> <description><![CDATA[Earlier this month, the JPEG 2000 Implementation Working Group, the Wellcome Trust Library, and the U.K. Digital Preservation Coalition hosted a free one-day seminar called JPEG2000 for the Practitioner. The presentation slides are now linked to the seminar program and &#8230; <a href="http://dltj.org/article/jpeg2000-uk-report/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=1874"></abbr><p>Earlier this month, the <a href="http://jp2k-uk.wikidot.com/" title="JP2K-UK Working Group wiki">JPEG 2000 Implementation Working Group</a>, the <a href="http://library.wellcome.ac.uk/" title="The Wellcome Library">Wellcome Trust Library</a>, and the <a href="http://www.dpconline.org/" title="Digital Preservation Coalition homepage">U.K. Digital Preservation Coalition</a> hosted a free one-day seminar called <a href="http://www.dpconline.org/events/details/19-jpeg-2000-for-the-practioner" title="Events - JPEG 2000 for the Practioner | Digital Preservation Coalition">JPEG2000 for the Practitioner</a>.  The presentation slides are now linked to the <a href="http://www.dpconline.org/events/details/19-jpeg-2000-for-the-practioner" title="Events - JPEG 2000 for the Practioner | Digital Preservation Coalition">seminar program</a> and is a <a href="http://jpeg2000wellcomelibrary.blogspot.com/2010/11/jpeg-2000-seminar-edited-highlights-1.html" title="JPEG 2000 at the Wellcome Library: JPEG 2000 seminar - edited highlights #1">short</a> <a href="http://jpeg2000wellcomelibrary.blogspot.com/2010/11/jpeg-2000-seminar-edited-highlights-2.html" title="JPEG 2000 at the Wellcome Library: JPEG 2000 seminar - edited highlights #2">report</a> of the event by Christy Henshaw of Wellcome Library.  The presentation slides by themselves carry a great deal of depth even without a recording of the audio.  In particular I can recommend &#8220;<a href="http://www.dpconline.org/component/docman/doc_download/525-jp2knov2010tanner" title="Presentation slides from &#038;039;What did JPEG 2000 ever do for us?&#038;039; by Simon Tanner">What did JPEG 2000 ever do for us?</a>&#8221; by Simon Tanner and &#8220;<a href="http://www.dpconline.org/component/docman/doc_download/522-jp2knov2010clark" title="Presentation slides from &#038;039;JPEG 2000 standardization - a pragmatic viewpoint&#038;039; by Richard Clark">JPEG 2000 standardization &#8211; a pragmatic viewpoint</a>&#8221; by Richard Clark.  As brief introductions to where we&#8217;ve been with JPEG 2000 and where we could go.</p><p>Hat tip to Ron Murray for pointing this out to me.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/jpeg2000-uk-report/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Can We Outsource the Preservation of Digital Bits?</title><link>http://dltj.org/article/outsource-digital-bits/</link> <comments>http://dltj.org/article/outsource-digital-bits/#comments</comments> <pubDate>Thu, 05 Mar 2009 19:55:25 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Raw Technology]]></category> <category><![CDATA[preservation]]></category> <category><![CDATA[storage]]></category><guid isPermaLink="false">http://dltj.org/?p=800</guid> <description><![CDATA[A colleague forwarded an article from The Register with news of a new service from Iron Mountain for Cloud-Based File Archiving. It is billed as a &#8220;storage archiving service designed to help companies reduce costs of storing and managing static &#8230; <a href="http://dltj.org/article/outsource-digital-bits/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=800"></abbr><p>A colleague forwarded <a href="http://www.theregister.co.uk/2009/03/04/iron_mountain_cloud_filestore/" title="Iron Mountain punts subterranean data storage (The Register)">an article from <i>The Register</i></a> with news of a new service from Iron Mountain for <a href="http://www.ironmountain.com/news/2009/impr02232009.asp" title="Iron Mountain Digital Introduces the Industry&#039;s First Enterprise Solution for Cloud-Based File Archiving" class="broken_link" rel="nofollow">Cloud-Based File Archiving</a>.  It is billed as a &#8220;storage archiving service designed to help companies reduce costs of storing and managing static data files.&#8221;  My place of work is facing an increasing need large-scale digital preservation storage with the acquisition of a large collection of music and the conversion of our educational videos from physical DVD preservation to digital preservation.  We&#8217;re talking terabytes of content that is we need to keep in its archival form &#8212; uncompressed, high quality media files (not the lower quality, derivatives for day-to-day access).  It doesn&#8217;t make sense to keep that on expensive SAN storage, of course, so this article struck me at just the right time to consider alternatives.</p><div id="attachment_801" class="wp-caption alignnone" style="width: 789px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center;"><img src="http://cdn.dltj.org/wp-content/uploads/2009/03/virtual-file-store-architecture.png" alt="Architecture Diagram for Iron Mountain&#039;s Virtual File Store service, showing the placement of the Virtual File Store appliance relative to other assets on the data center network" title="Virtual File Store Architecture" width="779" height="644" class="size-full wp-image-801" /><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Architecture Diagram for Iron Mountain's Virtual File Store service, showing the placement of the Virtual File Store appliance relative to other assets on the data center network.  Graphic from product datasheet (http://www.ironmountain.com/resources/vfs/virtual_file_store_datasheet.pdf)</p></div><p>According to the product literature, the service works by putting a black box on your network where one can drop files via CIFS or NFS.  The black box transfers the files over the internet to two Iron Mountain data centers.  Files can then be retrieved via an on-line on-demand service or by exchanging physical media with Iron Mountain for bulk retrieval needs.</p><p>We know, of course, that digital preservation is more than just preserving the digital bits:  it is the intellectual exercise of describing the stored information, the effort of maintaining an accurate catalog of that information, and the burden of migrating file formats or emulating platforms to read old file formats.  Handling the raw bits is a big deal, too &#8212; checksumming to ensure unaltered status, refreshing files to new storage media, and protection from physical disasters.  This Iron Mountain solution seems to address this more mechanical portion of digital preservation, and it one that probably can benefit from aggregating the service needs of many customers (and so is ripe for outsourcing).</p><p>Is anyone doing something similar with their physical preservation of digital media?  Are there other companies that do the same thing?  (I know of <a href="http://www.oclc.org/us/en/digitalarchive/" title="OCLC Digital Archive product page">OCLC&#8217;s Digital Archive service</a> &mdash; I did a <a href="http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/" title="Long-term Preservation Storage: OCLC Digital Archive versus Amazon S3">comparison of it with Amazon S3</a> last year.)</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/outsource-digital-bits/feed/</wfw:commentRss> <slash:comments>10</slash:comments> </item> <item><title>Survey Responses Sought:  JPEG2000 for Still Images</title><link>http://dltj.org/article/jpeg2000-survey/</link> <comments>http://dltj.org/article/jpeg2000-survey/#comments</comments> <pubDate>Thu, 04 Sep 2008 19:31:50 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[JPEG2000]]></category> <category><![CDATA[digital libraries]]></category> <category><![CDATA[jpeg2000]]></category> <category><![CDATA[preservation]]></category> <category><![CDATA[survey]]></category><guid isPermaLink="false">http://dltj.org/?p=465</guid> <description><![CDATA[David Lowe, Preservation Librarian at the University of Connecticut, is coordinating a survey of JPEG2000 use for digital imagery. The survey asks questions about the use of the JPEG2000 file format (for archival purposes or for access systems), tools used &#8230; <a href="http://dltj.org/article/jpeg2000-survey/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=465"></abbr><p>David Lowe, Preservation Librarian at the University of Connecticut, is coordinating a survey of JPEG2000 use for digital imagery.  The survey asks questions about the use of the JPEG2000 file format (for archival purposes or for access systems), tools used (both JPEG2000 toolkits and software that embeds JPEG2000 toolkits), and considerations of mathematically lossless versus visually lossless compression settings.</p><p>This is his announcement:<br /><blockquote> I am writing to solicit your help with a survey of library-related digital project staff regarding the implementation of the JPEG 2000 standard for digital images (specifically still images and not motion). We estimate that this task will take approximately 15 minutes of your time. It is available now at: <a href="http://www.surveymonkey.com/s.aspx?sm=WXFAJwyRNZZilRWzrnum_2fw_3d_3d" title="JPEG2000 Survey">http://www.surveymonkey.com/s.aspx?sm=WXFAJwyRNZZilRWzrnum_2fw_3d_3d</a></p><p>The survey will remain active until October 31, 2008. Afterward, we will post the results via a report uploaded to our institutional repository, <a href="http://digitalcommons.uconn.edu/" title="DigitalCommons@UConn">digitalcommons.uconn.edu</a>.</p><p>Please note that in our report, personal information from the survey will not be revealed, and any comments used will remain unattributed unless the respondent prefers to be credited and indicates that desire in a separate email to me directly at david.lowe@uconn.edu.</p><p>Thank you for your help,</p><p>David Lowe<br />Preservation Librarian<br />UConn Libraries</p></blockquote><p>I encourage you to take the survey as well.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/jpeg2000-survey/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>Long-term Preservation Storage:  OCLC Digital Archive versus Amazon S3</title><link>http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/</link> <comments>http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/#comments</comments> <pubDate>Fri, 16 May 2008 11:55:36 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Raw Technology]]></category> <category><![CDATA[Amazon]]></category> <category><![CDATA[Amazon EC2]]></category> <category><![CDATA[Amazon S3]]></category> <category><![CDATA[OCLC]]></category> <category><![CDATA[preservation]]></category><guid isPermaLink="false">https://dltj.org/?p=361</guid> <description><![CDATA[Last month OCLC announced a new service offering for long-term storage of libraries&#8217; digital collections. Called Digital Archive&#8482;, it provides &#8220;a secure storage environment for you to easily manage and monitor the health of your master files and digital originals.&#8221; &#8230; <a href="http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="https://dltj.org/?p=361"></abbr><p>Last month <a href="http://www.oclc.org/us/en/news/releases/200810.htm" title="Press Release: OCLC offers Digital Archive service for long-term digital storage">OCLC announced a new service offering for long-term storage of libraries&#8217; digital collections</a>.  Called <a href="http://www.oclc.org/us/en/digitalarchive/" title="OCLC Digital Archive homepage">Digital Archive&trade;</a>, it provides &#8220;a secure storage environment for you to easily manage and monitor the health of your master files and digital originals.&#8221;  Barbara Quint has an article in Information Today called &#8220;<a href="http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=49018" title="Information Today Article: OCLC Introduces High-Priced Digital Archive Service">OCLC Introduces High-Priced Digital Archive Service</a>&#8221; in which she makes a comparison to <a href="http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2?ie=UTF8&amp;node=16427261" title="Amazon S3 product description">Amazon&#8217;s Simple Storage Service</a> (or &#8220;S3&#8243;) from primarily a cost perspective: &#8220;The price for S3 storage at Amazon Web Services is 15 cents a gigabyte a month or $1.80 a year, in comparison to OCLC’s $7.50 a gig.&#8221;  Barbara also goes into some of the technical differences, but I think it might be worthwhile to go a little more into depth on them.</p><p><h2>OCLC&#8217;s Digital Archive</h2><br />According to the <a href="http://www.oclc.org/us/en/digitalarchive/overview/" title="OCLC Digital Archive Service Overview">service overview</a>, Digital Archive is a content hosting service that provides:</p><ul type="square"><li>Systems management</li><li>Physical security</li><li>Data security</li><li>Data backups</li><li>Disaster recovery</li><li>ISO 9001 certification</li><li>Manifest verification</li><li>Virus check</li><li>Format verification</li><li>Fixity check</li></ul><p>It is targeted towards the preservation of digital masters.  There is a document on the Digital Archive website called <a href="http://www.oclc.org/us/en/digitalarchive/about/commitment/default.htm" title="&#039;Our commitment&#039; page on OCLC Digital Archive product site">Our commitment</a> that describes other aspects of a digital preservation program:  &#8220;OCLC is actively developing processes for full preservation of digital assets to ensure complete renderability, regardless of technology changes. This preservation system will likely involve a combination of migration and emulation.&#8221;  But it is not clear whether these services, beyond &#8220;bit preservation&#8221; activities, is part of the Digital Archive service or will be part of an add-on service to be developed later.</p><p>This &#8220;Digital Archive&#8221; is a revamping of an older product from OCLC, also called &#8220;Digital Archive&#8221; but one that included a web harvesting tools component.  The service and support documentation on the OCLC website still refers to the former version of Digital Archive, so there is little information about how the service works beyond what one can infer from the sales information.</p><p><h2>Amazon&#8217;s S3</h2><br />Amazon describes S3 as &#8220;a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites.&#8221;  Files are transfered across the internet to Amazon&#8217;s services and stored in multiple data centers.  Files can be retrieved using standard HTTP mechanisms (the same protocol that powers the web) and are protected by an optional access control mechanism.  S3 does have a <a href="http://www.amazon.com/gp/browse.html?node=379654011" title="Amazon Web Services S3 Service Level Agreement">Service Level Agreement</a> (SLA) that offers guarantees on performance.</p><p>SLA seems to extend only to availability of the service, not to a long term commitment to keeping track of files on the service.<br /><blockquote>AWS [Amazon Web Services, LLC] will use commercially reasonable efforts to make Amazon S3 available with a Monthly Uptime Percentage (defined below) of at least 99.9% during any monthly billing cycle (the &#8220;Service Commitment&#8221;). In the event Amazon S3 does not meet the Service Commitment, you will be eligible to receive a Service Credit as described below.</p></blockquote><p> There is no mention specifically in the S3 SLA about permanence of file storage.  In leu of that, one seems to be covered by the overarching <a href="http://www.amazon.com/AWS-License-home-page-Money/b/ref=sc_fe_c_0_16427261_10?ie=UTF8&#038;node=3440661" title="Amazon Web Services Customer Service Agreement">Amazon Web Services Customer Agreement</a>, which has several points of interest from a preservation use perspective:<br /><blockquote>3.3. Termination or Suspension by Us Other Than for Cause.<br />3.3.2. <i>Paid Services&#8230;</i>. We may suspend your right and license to use any or all Paid Services (and any associated Amazon Properties)&#8230;, or terminate this Agreement in its entirety (and, accordingly, cease providing all Services to you), for any reason or for no reason, at our discretion at any time by providing you sixty (60) days&#8217; advance notice in accordance with the notice provisions set forth in Section 15 below.</p></blockquote><p> So if they desire to terminate a library&#8217;s use of the service (assuming there was no specific cause &#8212; such as a violation of the terms of use &#8212; to do so), they have to give 60 days notice.  That&#8217;s when the &#8220;Data Preservation in the Event of Suspension or Termination&#8221; clause kicks in:<br /><blockquote>3.7.2. In the Event of Termination Other Than for Cause. In the event of any termination by us of any Service or any set of Services, or termination of this Agreement in its entirety, other than a for cause termination under Section 3.4.1, (i) we will not take any action to intentionally erase any of your data stored on the Services for a period of thirty (30) days after the effective date of termination; and (ii) your post termination retrieval of data stored on the Services will be conditioned on your payment of Service data storage charges for the period following termination, payment in full of any other amounts due us, and your compliance with terms and conditions we may establish with respect to such data retrieval.</p></blockquote><p> The customer agreement then goes on to say:<br /><blockquote>3.8. Post-Termination Assistance.Following the suspension or termination of your right to use the Services by us or by you for any reason other than a for cause termination (i.e., a termination under Section 3.2 or under Section 3.3), you shall be entitled to take advantage of any post-termination assistance we may generally make available with respect to the Services, such as data retrieval arrangements we may elect to make available. We may also endeavor to provide you unique post-suspension or post-termination assistance, but we shall be under no obligation to do so. Your right to take advantage of any such assistance, whether generally made available with respect to the Services or made available uniquely to you, shall be conditioned upon your acceptance of and compliance with any fees and terms we specify for such assistance.</p></blockquote><p>Perhaps the most troubling aspect, from a preservation point-of-view, deals with data security and backups.  Specifically, Amazon says that data security and backups are the responsibility of the customer.  The Amazon Web Services Customer Agreement says (emphasis added):<br /><blockquote>7.2. Security. We strive to keep Your Content secure, but cannot guarantee that we will be successful at doing so, given the nature of the Internet. Accordingly, without limitation to Section 4.3 above and Section 11.5 below, <strong>you acknowledge that you bear sole responsibility for adequate security, protection and backup of Your Content.</strong> We strongly encourage you, where available and appropriate, to use encryption technology to protect Your Content from unauthorized access and to routinely archive Your Content. We will have no liability to you for any unauthorized access or use, corruption, deletion, destruction or loss of any of Your Content.</p></blockquote><p> That kind of security and data backup is something you&#8217;d want in a preservation service.  Since activities against S3 storage is limited only by a knowing a private &#8220;key&#8221;<sup><a href="http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/#footnote_0_361" id="identifier_0_361" class="footnote-link footnote-identifier-link" title="S3 uses secret keys &amp;#8212; a 40-character password &amp;#8212; to verify the identify of the client making the request.  If the private key becomes known, anyone on the internet can perform operations actions as the content owner.">1</a></sup> (as opposed to limiting to particular IP addresses or not allowing deletes/modifications from the web at all), it is a real possibility that the archive can be harmed if the private key is disclosed.  Furthermore, S3 does not have a backup/restore service for retrieving files that were accidentally or maliciously deleted.</p><p><h2>Feature Comparison</h2><br />It is useful to compare Amazon&#8217;s S3 on a point-by-point basis OCLC&#8217;s Digital Archive service to try to put some meaning behind the cost numbers.</p><table><tr><th></th><th style="padding: .25em 1.5em;">OCLC Digital Archive</th><th>Amazon S3</th></tr><tr><td>Systems management</td><td>Yes</td><td>Yes</td></tr><tr><td>Physical security</td><td>Yes</td><td>Yes</td></tr><tr><td>Data security</td><td>Yes</td><td>No</td></tr><tr><td>Data backups</td><td>Yes</td><td>No</td></tr><tr><td>Disaster recovery</td><td>Yes</td><td>unclear</td></tr><tr><td>ISO 9001 certification</td><td colspan="2">whatever the heck that might mean in this context</td></tr><tr><td>Manifest verification</td><td>Yes</td><td>No</td></tr><tr><td>Format verification</td><td>Yes</td><td>No</td></tr><tr><td>Virus check</td><td>Yes</td><td>No</td></tr><tr><td>Fixity check</td><td>Yes</td><td>No</td></tr><tr><td>&#8220;Light archive&#8221; capability</td><td>No</td><td>Yes</td></tr></table><p>This is a useful comparison because it would indicate what one would have to layer on top of S3 to reach the level of service provided by Digital Archive.  For instance, it would be possible to create an application that would perform the manifest and format verifications as well as the periodic virus and fixity checks against the files in S3.  It would even be possible to run that application in <a href="http://www.amazon.com/EC2-AWS-Service-Pricing/b/ref=sc_fe_l_2?ie=UTF8&amp;node=201590011&amp;" title="Amazon Web Services EC2 homepage">Amazon&#8217;s Elastic Compute Cloud</a> (EC2) &#8212; a &#8220;virtual computing environment&#8221; that allows developers to easily create and deploy software on the internet.  Since data transferred between Amazon EC2 and Amazon S3 is free of charge, there wouldn&#8217;t be the S3 cost of periodically downloading the data to perform the virus and fixity checks.</p><p>One advantage to note about the S3 solution is that it can perform as a &#8220;light archive&#8221; &#8212; meaning the data is available to users in addition to being part of the content repository.  In contrast to the OCLC Digital Archive service &#8212; a &#8220;dark archive&#8221; &#8212; access to the data is highly or completely restricted.  Still, the lack of automated backups and a robust data security infrastructure in the S3 infrastructure are notable from a preservation data service perspective.</p><p><h2>Cost Comparison</h2><br />To examine the similarities and differences in costs, let&#8217;s use the OhioLINK Satellite Image collection as a prototypical example.  It consists of about 2 terabytes (2TB) of high-quality images in TIFF format, with about 7.5GB of data going into the repository each month.  In the interest of exploring everything that S3 can do, there is an assumption that approximately 4GB of data will be transfered out of the archive each month; OCLC&#8217;s Digital Archive does not have a end-user dissemination component.</p><table><tr><th></th><th colspan="2" style="text-align:center;padding: .25em 1.5em; border-bottom: 1px solid black;">OCLC Digital Archive</th><th colspan="2" style="text-align:center; border-bottom: 1px solid black;">Amazon S3</th></tr><tr><th></th><th>Rate</th><th>Cost</th><th>Rate</th><th>Cost</th></tr><tr><td>Setup Cost</td><td colspan="2" style="text-align:center;"><i>- &#8211; - redacted &#8211; - -</i></td><td colspan="2" style="text-align:center;"><i>- &#8211; - none &#8211; - -</i></td></tr><tr><td>Startup Ingest Cost</td><td colspan="2" style="text-align:center;"><i>- &#8211; - redacted &#8211; - -</i></td><td style="padding-right: 1.25em;">$0.10/GB into S3 [#1]</td><td>$200</td></tr><tr><td>Initial Storage Cost</td><td style="padding-right: 1.25em;">$750/100GB/year [#2]</td><td>$15,000/year</td><td>$0.15/GB/month</td><td>$3,600/year</td></tr><tr><td colspan="5"><hr style="width: 85%;" /></td></tr><tr><td>Ongoing Ingest Cost</td><td colspan="2" style="text-align:center;"><i>- &#8211; - redacted &#8211; - -</i></td><td>$0.10/GB into S3 [#1]</td><td>$9/year</td></tr><tr><td valign="top" style="padding-right: 1.25em;">Ongoing Storage Cost</td><td valign="top">$750/100GB/year [#2]</td><td style="margin-right: 1.25em;">previous year<br />plus $750/year [#3]</td><td valign="top">$0.15/GB/month</td><td>previous year<br />plus $10.80/year [#3]</td></tr><tr><td colspan="5"><hr style="width: 85%;" /></td></tr><tr><td>Ongoing Access Cost</td><td colspan="2" style="text-align:center;"><em>Not available</em></td><td>varies [#1, #4]</td><td>$8.16/year</td></tr></table><div style="font-size: 85%; margin-left: 2em; margin-top: 1em;">Note #1: Amazon S3 also adds charges by HTTP request, but those are considered negligible for the data load and the ongoing accesses.</p><p>Note #2: As listed in <a href="http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=49018" title="Information Today Article: OCLC Introduces High-Priced Digital Archive Service">Barbara Quint&#8217;s article</a>.  Charge is for any part of 100GB used.</p><p>Note #3: Additions each year factor in the assumption of adding 90GB/year to the collection.</p><p>Note #4: Costs for transfers out of S3 is:  $0.17/GB for the first 10TB/month; $0.13/GB for the next 40TB/month; $0.11/GB for the next 100TB/month; and $0.10/GB for outflowing data over 150TB/month.</p></div><p>For this prototypical example, S3 would cost $3,800 in the first year and roughly $3,615 per year after that, with the added benefit that end-users could access the content without using our infrastructure.  There are costs associated with the OCLC Digital Archive service that had to be redacted from the public version of this table due to a confidentiality clause, but the costs that are assumed for ongoing storage based on Barbara Quint&#8217;s article are comparable to the quote I got from OCLC and represent a large portion of the total yearly costs.</p><p>By way of comparison, we are planning the purchase of 50TB of storage this summer for roughly $250K; that is about $5,000/TB.  Amortize the cost of the hardware over five years and assume 150% of the purchase price represents maintenance, personnel support, and other factors, and we get $2,500/TB/year.  This doesn&#8217;t include software costs, so it is comparable to S3 in the functions table above; software would have to be written to verify the manifest and file formats on ingest as well as the monthly fixity and virus scanning.  It also represents only one copy of the data; it does not include the duplication across data centers that both Digital Archive and S3 provide.</p><p><h2>Conclusions</h2></p><p>OCLC&#8217;s Digital Archive product goes pretty far down the path of a preservation-worthy archive of digital files.  The value-added services, in addition to simply storing and retrieving files, make it as close to a one-stop shop as I&#8217;ve seen so far.  Whether outsourced digital preservation services makes sense &#8212; particularly at this price point &#8212; remains to be seen, especially since is hard to make a comparison since I&#8217;m betting that most of us aren&#8217;t (yet) doing all of the ongoing activities with digital preservation masters that Digital Archive is doing.</p><p>Amazon&#8217;s S3 is an inexpensive, network-oriented file hosting service, and as such it doesn&#8217;t have many of the features built into it that we would want to see in a preservation archive service.  Beyond raw file service, one would need to add layers of software and human activities to perform the functions that Digital Archive provides now.</p><p>Looking at OCLC&#8217;s Digital Archive and Amazon S3 is almost an apples-to-oranges comparison, both in price and in functionality.  Comparing functionality first, S3 is missing critical components of a preservation storage system &#8212; namely, rigorous access control and a content backup/restore facility.  Comparing costs, though, S3 is dramatically cheaper&#8230;and has the benefit of serving up large files to end-users using Amazon&#8217;s distributed infrastructure.</p><p>It is possible to level the functionality playing field a bit by taking responsibility for the ongoing maintenance of files in the S3 archive &#8212; those things that Digital Archive offers as value-added services over raw file storage.  An EC2 virtual machine running in Amazon&#8217;s infrastructure can perform the virus and fixity scanning.  And with good key maintenance (as with passwords, regularly changing the private key and securing it appropriately), S3 could conceivably offsite copies of content stored offline (e.g. burned to preservation quality optical media).  Again, in this scenario one has to take responsibility for refreshing the offline media and occasionally running comparisons against the S3 offsite copy.</p><h2>Footnotes</h2><ol class="footnotes"><li id="footnote_0_361" class="footnote">S3 uses secret keys &#8212; a 40-character password &#8212; to verify the identify of the client making the request.  If the private key becomes known, anyone on the internet can perform operations actions as the content owner.</li></ol>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/feed/</wfw:commentRss> <slash:comments>15</slash:comments> </item> <item><title>Preserving Digital Video</title><link>http://dltj.org/article/preserving-digital-video/</link> <comments>http://dltj.org/article/preserving-digital-video/#comments</comments> <pubDate>Tue, 08 Apr 2008 20:22:14 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Raw Technology]]></category> <category><![CDATA[accessibility]]></category> <category><![CDATA[digitization]]></category> <category><![CDATA[preservation]]></category> <category><![CDATA[standards]]></category> <category><![CDATA[video]]></category><guid isPermaLink="false">https://dltj.org/?p=348</guid> <description><![CDATA[My place of work is looking to acquire educational videos in a digital form with an eye towards long-term preservation. At this point we receive a physical form (preferably DVD, but sometimes VHS) and digitize it to a very lossy &#8230; <a href="http://dltj.org/article/preserving-digital-video/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="https://dltj.org/?p=348"></abbr><p>My place of work is looking to acquire educational videos in a digital form with an eye towards long-term preservation.  At this point we receive a physical form (preferably DVD, but sometimes VHS) and digitize it to a very lossy access format (RealMedia, in this case).  With this change, we would get a preservation-worthy digital copy from the producer/distributor and forego the physical version.</p><p>There is quite a lot written on preserving video, but I wanted to distill the requirements down into statements that vendors could reasonably provide today.  I think these are pretty sound requirements, but I&#8217;m looking for feedback.  In particular, I&#8217;m not quite sure how to handle the transfer of closed caption text from the publisher/distributor; suggestions are welcome.<br /><span id="more-348"></span><br />[Jester's note:  I just realized that an earlier version of this posting went out to the net about two hours before this "final" version.  Sorry about publishing the work-in-progress early; I must have hit the wrong button in the new version of WordPress...]</p><p><h2>File Formats</h2><br />Some of the clearest guidance on file formats comes from this short excerpt from the Moving Image section of the <a href="http://www.ahds.ac.uk/" title="The Arts and Humanities Data Service homepage">U.K. Arts and Humanities Data Service</a> <a href="http://www.ahds.ac.uk/preservation/ahds-preservation-documents.htm" title="AHDS Repository Policies and Procedures">Preservation Handbook</a>:</p><blockquote><p>Guidance on the preservation of digital video should, by necessity, change over time. [...] The MPEG-2 and MPEG-4 formats are better suited to high-quality digital video. MPEG-2 is better known for its use as a format for DVD-Video, which encourages confidence when considering the likelihood that the format will be readable in the long-term. The format has an average transfer rate of 2-5 megabits per second, but there may be disk space restraints and the software tools necessary to convert and store this format are costly. MPEG-4 has a lower transfer rate of 1-2 megabits per second and is intended for streaming video. Other codecs, such as QuickTime, Windows Media, Real Video and Open DIVX, are useful for specific purposes, but not suitable for preservation. <sup><a href="http://dltj.org/article/preserving-digital-video/#footnote_0_348" id="identifier_0_348" class="footnote-link footnote-identifier-link" title="Knight, G., &amp;amp; McHugh, J. (2005). Preservation Handbook: Moving Image.  p. 3.">1</a></sup></p></blockquote><p>The Library of Congress Sustainability of Digital Formats site has <a href="http://www.digitalpreservation.gov/formats/fdd/fdd000028.shtml" title="http://www.digitalpreservation.gov/formats/fdd/fdd000028.shtml">an entry for MPEG-2</a> (also known as H.262) and <a href="http://www.digitalpreservation.gov/formats/fdd/fdd000155.shtml" title="MPEG-4 File Format, Version 2">an entry for MPEG-4</a> (more completely, MPEG-4 file format version #2) that give the nitty-gritty details for the file formats.</p><p>The preservation master copies we want to store has a frame size of 720 pixels by 480 pixels.  (That size is for NTSC format videos, common in USA, Canada and Japan.  Master copies of PAL-format videos, common in Australia, New Zealand, the United Kingdom and most of Europe, is 720 x 576.)  This is the standard resolution used in MPEG-2-compressed commercially distributed DVD movies.<sup><a href="http://dltj.org/article/preserving-digital-video/#footnote_1_348" id="identifier_1_348" class="footnote-link footnote-identifier-link" title="Audio/Video Capture and Management (2002).">2</a></sup> These frame sizes are appropriate for analog video signals.  (&#8220;As defined by ITU-R Recommendation BT.601, more commonly know by the abbreviations Rec. 601 or BT.601 or its former name, CCIR 601. [It is] a standard published by the CCIR (now ITU-R) for encoding interlaced analogue video signals in digital form.&#8221;<sup><a href="http://dltj.org/article/preserving-digital-video/#footnote_2_348" id="identifier_2_348" class="footnote-link footnote-identifier-link" title="&amp;#8220;Rec. 601&amp;#8243; (2008).">3</a></sup> )  The audio is 48KHz stereo at 224 kb/s or better.</p><p><h2>Captioning Text</h2><br />There appears to be two primary schemes for binding closed captioned text with video files.  One from the W3C is <a href="http://www.w3.org/AudioVideo/" title="http://www.w3.org/AudioVideo/">Synchronized Multimedia Integration Language</a> (or SMIL) is an XML format and is used by many media players.  The other is Microsoft&#8217;s <a href="http://msdn2.microsoft.com/en-us/library/ms971327.aspx" title="Object moved">Synchronized Accessible Media Interchange</a> (or SAMI), a pseudo-HTML format that is only read by Windows Media player.</p><p>To make matters more complicated, a whole set of different schemes are used for DVDs.  (On VHS recordings, closed caption text was encoded in one of the non-visible lines that make up the video signal.  Since the DVD format only included visible lines, other schemes were required.)  The most popular seems to be the <a href="http://www.fileinfo.net/extension/scc" title="SCC File Extension - Open .SCC files">Scenarist Closed Caption (SCC) format</a>.  This is a binary file that exists on the DVD along side the video files.</p><p><h2>Resources Consulted</h2></p><div style="line-height:1.1em;margin-left:0.5in;text-indent:-0.5in;margin-top:1.5em;"><p style="margin:0">Arms, C. R., &amp; Fleischhauer, C. Sustainability of Digital Formats: Planning for Library of Congress Collections. <span style="font-style:italic;">National Digital Information Infrastructure and Preservation Program</span>. Retrieved April 8, 2008, from <a href="http://www.digitalpreservation.gov/formats/" title="Sustainability of Digital Formats: Planning for Library of Congress Collections">http://www.digitalpreservation.gov/formats/</a>.</p><p style="margin:0"><span style="font-style:italic;">Audio/Video Capture and Management</span>. (2002).In <span style="font-style:italic;">NINCH Guide to Good Practice</span> (1st). Retrieved April 8, 2008, from <a href="http://www.nyu.edu/its/humanities/ninchguide/VII/" title="NINCH Guide to Good Practice">http://www.nyu.edu/its/humanities/ninchguide/VII/</a>.</p><p style="margin:0">Guideline H: Provide access to multimedia presentations for users with sensory disabilities. <span style="font-style:italic;">Accessible Digital Media: Design Guidelines for Electronic Publications, Multimedia and the Web</span>.  Retrieved 14-Apr-2008 from <a href="http://ncam.wgbh.org/invent_build/web_multimedia/accessible-digital-media-guide/guideline-h-multimedia" title="Accessible Digital Media: Guideline H: Multimedia">http://ncam.wgbh.org/publications/adm/guideline_h.html</a>.</p><p style="margin:0">Knight, G., &amp; McHugh, J. (2005). <span style="font-style:italic;">Preservation Handbook: Moving Image</span>. AHDS Preservation Handbook. 8 p. Arts and Humanities Data Service. Retrieved April 8, 2008, from <a href="http://www.ahds.ac.uk/preservation/video-preservation-handbook.pdf" title="AHDS&#039;s Preservation Handbook: Moving Image">http://ahds.ac.uk/preservation/video-preservation-handbook.pdf</a>.</p><p style="margin:0">Rec. 601. (2008, April 8).<span style="font-style:italic;">Wikipedia, the free encyclopedia</span>. Retrieved April 8, 2008, from <a href="http://en.wikipedia.org/wiki/Rec._601" title="http://en.wikipedia.org/wiki/Rec._601">http://en.wikipedia.org/wiki/Rec._601</a> (<a href="http://en.wikipedia.org/wiki/Rec._601?oldid=204278564" title="http://en.wikipedia.org/wiki/Rec._601?oldid=204278564">version at time of citation</a>).</p></div><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://ahds.ac.uk/ to http://www.ahds.ac.uk/ on January 28th, 2011.</p><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://ahds.ac.uk/preservation/ahds-preservation-documents.htm to http://www.ahds.ac.uk/preservation/ahds-preservation-documents.htm on January 28th, 2011.</p><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://ahds.ac.uk/preservation/video-preservation-handbook.pdf to http://www.ahds.ac.uk/preservation/video-preservation-handbook.pdf on January 28th, 2011.</p><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://ahds.ac.uk/preservation/video-preservation-handbook.pdf to http://www.ahds.ac.uk/preservation/video-preservation-handbook.pdf on January 28th, 2011.</p><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://ncam.wgbh.org/publications/adm/guideline_h.html to http://ncam.wgbh.org/invent_build/web_multimedia/accessible-digital-media-guide/guideline-h-multimedia on January 28th, 2011.</p><h2>Footnotes</h2><ol class="footnotes"><li id="footnote_0_348" class="footnote">Knight, G., &amp; McHugh, J. (2005). <span style="font-style:italic;"><a href="http://www.ahds.ac.uk/preservation/video-preservation-handbook.pdf" title="http://ahds.ac.uk/preservation/video-preservation-handbook.pdf">Preservation Handbook: Moving Image</a></span>.  p. 3.</li><li id="footnote_1_348" class="footnote"><a href="http://www.nyu.edu/its/humanities/ninchguide/VII/" title="Audio/Video Capture and Management chapter of NINCH Guide to Good Practice">Audio/Video Capture and Management</a> (2002).</li><li id="footnote_2_348" class="footnote">&#8220;Rec. 601&#8243; (2008).</li></ol>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/preserving-digital-video/feed/</wfw:commentRss> <slash:comments>5</slash:comments> </item> </channel> </rss>
<!-- Served from: dltj.org @ 2012-02-11 12:16:07 by W3 Total Cache -->
