<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"><channel><title>Disruptive Library Technology Jester &#187; Internet Archive</title> <atom:link href="http://dltj.org/tag/internetarchive/feed/" rel="self" type="application/rss+xml" /><link>http://dltj.org</link> <description>We&#039;re Disrupted, We&#039;re Librarians, and We&#039;re Not Going to Take It Anymore</description> <lastBuildDate>Mon, 06 Feb 2012 20:04:22 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <cloud domain='dltj.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' /> <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/us/</creativeCommons:license> <item><title>Thursday Threads: Open Source in Health Care, The Big Deal,  Archives of Web Pages</title><link>http://dltj.org/article/thursday-threads-2011w11/</link> <comments>http://dltj.org/article/thursday-threads-2011w11/#comments</comments> <pubDate>Thu, 17 Mar 2011 10:46:53 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Thursday Threads]]></category> <category><![CDATA[ejournal]]></category> <category><![CDATA[Internet Archive]]></category> <category><![CDATA[open source]]></category><guid isPermaLink="false">http://dltj.org/?p=2719</guid> <description><![CDATA[Receive DLTJ Thursday Threads:by E-mailby RSSDelivered by FeedBurner We&#8217;re taking a break this week from the HarperCollins e-book story; although the commentary continues from librarians (and a few authors), there hasn&#8217;t been anything new (that I&#8217;ve seen) from HarperCollins itself. &#8230; <a href="http://dltj.org/article/thursday-threads-2011w11/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2719"></abbr><div id="feedburner-thursday-threads-email-2011w11" class="wp-caption alignright noprint noFrontPage" style="width: 230px;;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><form style="border: 1px solid rgb(204, 204, 204); padding: 3px; margin: 0pt; text-align: center;" action="http://feedburner.google.com/fb/a/mailverify" method="post" target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true"><p>Receive <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> Thursday Threads:</p><p>by <a href="http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads&#038;loc=en_US" title="D.L.T.J. Thursday Threads Email Subscription">E-mail</a><br /><input style="width: 140px;" name="email" value="Your e-mail address" onfocus="if (this.defaultValue==this.value) this.value = ''" type="text"/><input value="thursday-threads" name="uri" type="hidden"/><input name="loc" value="en_US" type="hidden"/><input value="Subscribe" type="submit"/></p><p>by <a href="http://feeds.dltj.org/thursday-threads/" title="D.L.T.J. Thursday Threads RSS Feed">RSS</a></p><p style="font-size: 80%;">Delivered by <a href="http://feedburner.google.com" target="_blank" title="Google Feedburner Service">FeedBurner</a></p></form></div><p> We&#8217;re taking a break this week from the HarperCollins e-book story; although the <a href="http://scienceblogs.com/confessions/2011/03/around_the_web_harpercollins_l.php" title="">commentary continues</a> from librarians (and a few authors), there hasn&#8217;t been anything new (that I&#8217;ve seen) from HarperCollins itself.  There is still plenty more to look at, though.  First up is a <a href="#health-care">report from the health care sector</a> on the applicability of open source and open systems.  Next is an <a href="#big-deal">interview with a financial analyst</a> that sees the end of the &#8220;big deal&#8221; for library journal subscriptions.  And lastly is a list of <a href="#web-archive">web archive services</a> that you could use to find old copies of web pages.</p><p>Feel free to send this to others you think might be interested in the topics.  If you find these threads interesting and useful, you might want to add the <a href="http://feeds.dltj.org/thursday-threads/" title="RSS Feed for DLTJ Thursday Threads">Thursday Threads RSS Feed</a> to your feed reader or subscribe to e-mail delivery using the form to the right.  If you would like a more raw and immediate version of these types of stories, watch <a href="http://friendfeed.com/dltj" title="Peter Murray - FriendFeed">my FriendFeed stream</a> (or subscribe to <a href="http://friendfeed.com/dltj?format=atom" title="Atom feed for Peter Murray's FriendFeed account">its feed</a> in your feed reader).  Comments and tips, as always, are <a href="http://dltj.org/contact">welcome</a>.</p><p><h2 id="health-care">Open Source, Open Standards, and Health Care Information Systems</h2></p><blockquote><p>Recognition of the improvements in patient safety, quality of patient care, and efficiency that health care information systems have the potential to bring has led to significant investment. Globally the sale of health care information systems now represents a multibillion dollar industry. As policy makers, health care professionals, and patients, we have a responsibility to maximize the return on this investment. To this end we analyze alternative licensing and software development models, as well as the role of standards. We describe how licensing affects development. We argue for the superiority of open source licensing to promote safer, more effective health care information systems. We claim that open source licensing in health care information systems is essential to rational procurement strategy.</p></blockquote><p>This might be a useful data point for libraries considering the adoption of open source for their mission-critical applications.  Two U.K. authors have published a <a href="http://www.jmir.org/2011/1/e24/" title="Open Source, Open Standards, and Health Care Information Systems | Journal of Medical Internet Research">report</a> that reviews general benefits of open source and open standards, noting in one heading that &#8220;Open Standards Facilitate Competition Between Open Source Software and Proprietary Software&#8221;.  They also compare the open source software development practices with those of proprietary software development and look at barriers to the adoption of open source software.  A great deal of the analysis is particular to health care information systems, but the report would be a useful template to applying the same analysis to core library systems.  [Via <a href="http://technews.acm.org/archives.cfm?fo=2011-03-mar/mar-09-2011.html" title="ACM TechNews for March 3, 2011">ACM TechNews</a>]</p><p><span class="Z3988" title="ctx_ver=Z39.88-2004&#038;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&#038;rft.jtitle=Journal+of+Medical+Internet+Research&#038;rft_id=info%3Adoi%2F10.2196%2Fjmir.1521&#038;rfr_id=info%3Asid%2Fresearchblogging.org&#038;rft.atitle=Open+Source%2C+Open+Standards%2C+and+Health+Care+Information+Systems&#038;rft.issn=1438-8871&#038;rft.date=2011&#038;rft.volume=13&#038;rft.issue=1&#038;rft.spage=&#038;rft.epage=&#038;rft.artnum=http%3A%2F%2Fwww.jmir.org%2F2011%2F1%2Fe24%2F&#038;rft.au=Reynolds%2C+Carl+J.&#038;rft.au=Wyatt%2C+Jeremy+C.&#038;rfe_dat=bpr3.included=1;bpr3.tags=Computer+Science+%2F+Engineering" style="font-style:italic;">Reynolds, Carl J., &#038; Wyatt, Jeremy C. (2011). Open Source, Open Standards, and Health Care Information Systems <span style="font-style: italic;">Journal of Medical Internet Research, 13</span> (1) DOI: <a rev="review" href="http://dx.doi.org/10.2196/jmir.1521" title="DOI redirect to journal article">10.2196/jmir.1521</a></span></p><p><h2 id="big-deal">The Demise of the Big Deal?</h2></p><blockquote><p><i>Interview question: You, however, believe that publishers will simply have to accept that their revenues are going to fall, because there really is no more money?</i></p><p><b>Claudio Aspesi:</b> I have no doubt that — over time — adjustments would be made. But it remains to be seen if they need all the 2,200/2,400 journals that the each of the largest publishers maintain today.</p><p>You know, my job is not to pass judgement on how people run their business or to decry capitalism, only to advise investors whether they should buy or sell stocks.</p><p>I can observe, however, that there is something unhealthy about an industry which has managed to alienate its customers to the point their membership associations increasingly focus time and attention on how to overturn the industry structure. It is not a good thing to have your customers spend their time trying to put you out of business.</p></blockquote><p>Richard Poynder <a href="http://poynder.blogspot.com/2011/03/demise-of-big-deal.html" title="The Demise of the Big Deal? | Poynder's Open and Shut blog">interviews Claudio Aspesi</a>, a financial analyst based at the <a href="http://en.wikipedia.org/wiki/Sell_side" title="Sell side | Wikipedia">sell-side</a> research firm <a href="https://www.bernsteinresearch.com/BRWEB/Public/Login.aspx?ReturnUrl=%2fbrweb%2fHome.aspx" title="Bernstein Research homepage">Sanford Bernstein</a>.  Aspesi issued a report last year that was critical of the financial outlook of Reed Elsevier and more recently has downgraded the outlook to “<a href="http://www.investopedia.com/terms/u/underperform.asp" title="Underperform Definition | Investopedia">underperform</a>”.  This interview gets into the reasoning behind Aspesi&#8217;s decision.</p><p><h2 id="web-archive">Archives of Dead Web Pages: Wayback, Cache, and More</h2></p><blockquote><p>The Web changes constantly, and sometimes that page that had just the information you needed yesterday (or last month or two years ago) is not available today. At other times you may want to see how a page&#8217;s content or design has changed. There are several sources for finding Web pages as they used to exist. While Google&#8217;s cache is probably the best known, the others are important alternatives that may have pages not available at Google or the Wayback Machine plus they may have an archived page from a different date. The table below notes the name of the service, the way to find the archived page, and some notes that should give some idea as to how old a page the archive may contain.</p></blockquote><p>Although <a href="http://www.searchengineshowdown.com/others/archive.shtml" title="Archives of Dead Web Pages: Wayback, Cache, and More | Search Engine Showdown">this list</a> is over three years old, many of the services are still active.  One addition of note is a <a href="http://waybackmachine.org/" title="Internet Archive Wayback Machine">beta test version</a> of the Internet Archive&#8217;s Wayback machine; it includes an improved interface and a more up-to-date archive of pages.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/thursday-threads-2011w11/feed/</wfw:commentRss> <slash:comments>6</slash:comments> </item> <item><title>Thursday Threads: HarperCollins Ebook Terms, Internet Archive Ebook Sharing, Future of Collections</title><link>http://dltj.org/article/thursday-threads-2011w9/</link> <comments>http://dltj.org/article/thursday-threads-2011w9/#comments</comments> <pubDate>Thu, 03 Mar 2011 03:35:45 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Thursday Threads]]></category> <category><![CDATA[David Lewis]]></category> <category><![CDATA[disruptive innovation]]></category> <category><![CDATA[ebooks]]></category> <category><![CDATA[HarperCollins-OverDrive controversy]]></category> <category><![CDATA[Internet Archive]]></category> <category><![CDATA[licensing]]></category> <category><![CDATA[Open Library]]></category><guid isPermaLink="false">http://dltj.org/?p=2690</guid> <description><![CDATA[Receive DLTJ Thursday Threads:by&#160;E-mailby&#160;RSSDelivered by FeedBurner It is an all e-books edition of DLTJ Thursday Threads this week. The biggest news was the announcement of the policy change by HarperCollins for ebooks distributed through OverDrive. Beyond that, though, was an &#8230; <a href="http://dltj.org/article/thursday-threads-2011w9/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2690"></abbr><div id="feedburner-thursday-threads-email-2011w09" class="wp-caption alignright noprint noFrontPage" style="width: 230px;;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><form style="border: 1px solid rgb(204, 204, 204); padding: 3px; margin: 0pt; text-align: center;" action="http://feedburner.google.com/fb/a/mailverify" method="post" target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true"><p>Receive <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> Thursday Threads:</p><p>by&nbsp;<a href="http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads&amp;loc=en_US" title="D.L.T.J. Thursday Threads Email Subscription">E-mail</a><br /><input style="width: 140px;" name="email" value="Your e-mail address" onfocus="if (this.defaultValue==this.value) this.value = ''" type="text"/><input value="thursday-threads" name="uri" type="hidden"/><input name="loc" value="en_US" type="hidden"/><input value="Subscribe" type="submit"/></p><p>by&nbsp;<a href="http://feeds.dltj.org/thursday-threads/" title="D.L.T.J. Thursday Threads RSS Feed">RSS</a></p><p style="font-size: 80%;">Delivered by <a href="http://feedburner.google.com" target="_blank" title="Google Feedburner Service">FeedBurner</a></p></form></div><p> It is an all e-books edition of <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym> Thursday Threads</i> this week.  The biggest news was the <a href="#hcod">announcement of the policy change</a> by HarperCollins for ebooks distributed through OverDrive.  Beyond that, though, was an announcement of a <a href="#ia-ol-ill">new sharing model and program</a> through the Internet Archive.  Lastly is a slidecast recording of a presentation by David Lewis on the <a href="#collections-futures">future of library collections</a>.</p><p>Before continuing, a quick apology and explanation.  E-mail readers received a pair of extra Thursday Threads messages and RSS subscribers got a dump of unrelated posts; I&#8217;m sorry.  The cause was an update of this blog&#8217;s WordPress software to <a href="http://wordpress.org/news/2011/02/threeone/" title="WordPress 3.1, lots of fun">version 3.1</a> and a conflict (<a href="http://wordpress.org/support/topic/plugin-simple-tags-category-archive-wordpress-31" title="WordPress &#8250; Support &raquo; [Plugin: Simple Tags] Category Archive - WordPress 3.1">maybe this one</a>) with the <a href="http://wordpress.org/extend/plugins/simple-tags/" title="WordPress &#8250; Simple Tags &laquo; WordPress Plugins">SimpleTags</a> plugin.  I believe all is well, but I won&#8217;t know until this post is published.</p><p>Feel free to send this to others you think might be interested in the topics.  If you find these threads interesting and useful, you might want to add the <a href="http://feeds.dltj.org/thursday-threads/" title="RSS Feed for DLTJ Thursday Threads">Thursday Threads RSS Feed</a> to your feed reader or subscribe to e-mail delivery using the form to the right.  If you would like a more raw and immediate version of these types of stories, watch <a href="http://friendfeed.com/dltj" title="Peter Murray - FriendFeed">my FriendFeed stream</a> (or subscribe to <a href="http://friendfeed.com/dltj?format=atom" title="Atom feed for Peter Murray's FriendFeed account">its feed</a> in your feed reader).  Comments and tips, as always, are <a href="http://dltj.org/contact">welcome</a>.</p><p><h2 id="hcod">HarperCollins Puts 26 Loan Cap on Ebook Circulations</h2></p><blockquote><p>In the first significant revision to lending terms for  ebook circulation, HarperCollins has announced that new titles licensed from  library ebook vendors will be able to circulate only 26 times before the license  expires.</p><p>Mention of the new terms was first made in a letter from  OverDrive CEO Steve Potash to customers yesterday. He wrote  [emphasis in original]:</p><blockquote><p>[W]e have been required to  accept and accommodate new terms for eBook lending as <strong><em>established by certain  publishers</em>.</strong> Next week, OverDrive will communicate a licensing  change from a publisher that, while still operating under the one-copy/one-user  model, will include a checkout limit for each eBook licensed. Under this  publisher&#8217;s requirement, for every new eBook licensed, the library (and the  OverDrive platform) will make the eBook available to one customer at a time  until the total number of permitted checkouts is  reached.</p></blockquote><p>Though the letter leaves the publisher unnamed,  HarperCollins confirmed today  to <em>[Library Journal]</em> that it is the publisher referred  to.</p></p></blockquote><p>In an odd one-two punch, this past week saw a disturbance in the status quo of e-book licensing.  The first punch came in the <a href="http://librarianbyday.net/localwp-content/uploads/2011/02/OverDrive-Library-Partner-Update-from-Steve-Potash-2-24-2011.pdf" title="Letter from Steve Potash of Overdrive">letter from OverDrive</a> [PDF] (part of which is quoted in the Library Journal article excerpted above).  The second in <a href="http://www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp" title="HarperCollins Puts 26 Loan Cap on Ebook Circulations | Library Journal">that Library Journal article</a> when we learned that the publisher pushing for the change of terms is HarperCollins.  Since then it has been the source of a great deal of discussion by librarians and a few <a href="http://www.courtneymilan.com/ramblings/2011/02/25/on-eating-your-seed-corn/" title="On eating your seed corn | Courtney Milan&#8217;s Blog">authors</a>, much of it in the form of <a href="http://search.twitter.com/search?q=%23hcod" title="#hcod - Twitter Search">tweets with the hash-tag &#8220;#hcod&#8221;</a> (short for HarperCollinsOverDrive).  Damage control comes in the form of open letters from <a href="http://overdriveblogs.com/library/2011/03/01/a-message-from-overdrive-on-harpercollins-new-ebook-licensing-terms/" title="A message from OverDrive on HarperCollins&#8217; new eBook licensing terms | OverDrive&#039;s Digital Library Blog">OverDrive</a> and <a href="http://harperlibrary.typepad.com/my_weblog/2011/03/open-letter-to-librarians.html" title="Open Letter to Librarians | Library Love Fest">HarperCollins</a>.  There has been a <a href="http://loosecannonlibrarian.net/?p=396" title="On Boycotts and Readers&#8217; Rights | Loose Cannon Librarian">call</a> for a <a href="http://boycottharpercollins.com/" title="Boycott HarperCollins">boycott</a>.  Bobbi Newman, <a href="http://librarianbyday.net/2011/02/25/publishing-industry-forces-overdrive-and-other-library-ebook-vendors-to-take-a-giant-step-back/" title="Publishing Industry Forces OverDrive and Other Library eBook Vendors to Take a Giant Step Back | Librarian by Day">one of the first to jump on the story</a>, is maintaining a <a href="http://www.delicious.com/librarianbyday/hcod" title="librarianbyday's hcod Bookmarks   on Delicious">list of news articles and commentary</a>.</p><div id="attachment_2673" class="wp-caption alignright" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><br /><style type='text/css'>.bbpBox41505956953067520{background:url(http://a3.twimg.com/a/1298584552/images/themes/theme1/bg.png) #C0DEED;padding:20px}p.bbpTweet{background:#fff;padding:10px
12px 10px 12px;margin:0;min-height:48px;color:#000;font-size:18px !important;line-height:22px;-moz-border-radius:5px;-webkit-border-radius:5px}p.bbpTweet
span.metadata{display:block;width:100%;clear:both;margin-top:8px;padding-top:12px;height:40px;border-top:1px solid #fff;border-top:1px solid #e6e6e6}p.bbpTweet span.metadata
span.author{line-height:19px}p.bbpTweet span.metadata span.author
img{float:left;margin:0
7px 0 0px;width:38px;height:38px}p.bbpTweet a:hover{text-decoration:underline}p.bbpTweet
span.timestamp{font-size:12px;display:block}</style><div class='bbpBox41505956953067520'><p class='bbpTweet'>We&#8217;re reading your posts &#038; listening to our authors. If you want to share longer thoughts w us, email library.ebook@harpercollins.com <a href="http://twitter.com/search?q=%23hcod" title="#hcod" class="tweet-url hashtag" rel="nofollow">#hcod</a><span class='timestamp'><a title='Sat Feb 26 14:32:45 +0000 2011' href='http://twitter.com/#!/HarperCollins/status/41505956953067520' title="http://twitter.com/#!/HarperCollins/status/41505956953067520">Feb 26, 2011</a> via <a href="http://www.hootsuite.com" rel="nofollow" title="301 Moved Permanently">HootSuite</a></span><span class='metadata'><span class='author'><a href='http://twitter.com/HarperCollins' title="http://twitter.com/HarperCollins"><img src="http://cdn.dltj.org/wp-content/uploads/2011/03/FireWater_normal.gif" /></a><strong><a href='http://twitter.com/HarperCollins' title="http://twitter.com/HarperCollins">HarperCollins</a></strong><br />HarperCollins</span></span></p></div><p><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Tweet from HarperCollins</p></div><p>As you can see, much has already been said about the issue, and since collection development is not my specialty, you probably shouldn&#8217;t look to me for an informed opinion.  (If pressed, I will suggest that it is a perfectly reasonable collection development policy to not buy access to material with terms that are not in the library&#8217;s and patron&#8217;s best interest.)  Instead, there is so much oddness in this new policy that I find I can&#8217;t put myself in HarperCollins&#8217; shoes.  First, using OverDrive as a proxy for announcing this policy change seems wrong (and, frankly, unfair to OverDrive).  Then as word spreads, you don&#8217;t make your own announcement, but rather talk to a reporter from Library Journal.  Then as news spreads through the day, you send a <a href="http://twitter.com/#!/HarperCollins/status/41505956953067520" title="Tweet from HarperCollins">single tweet</a>.  In fact, you don&#8217;t really publicly respond <a href="http://harperlibrary.typepad.com/my_weblog/2011/03/open-letter-to-librarians.html" title="Open Letter to Librarians | Library Love Fest">until five days after</a> the twitter universe and biblio-blogosphere have been talking about it.  And it is pretty much a non-engaging, public relations response.  (You do get credit, though, for allowing open comments on your blog post.  But some of that credit is taken back because you aren&#8217;t using a company branded blog.  Really? A typepad.com blog?  One of the tenants of most information literacy courses I&#8217;ve seen is to look for the source of the information, and it requires extra effort to take this blog seriously because it isn&#8217;t in the harpercollins.com domain space.)</p><p>If I were to guess, this seems like a trial balloon that was badly floated.  I certainly can&#8217;t fault HarperCollins for trying something new in the ebook licensing world, but this one has fallen flat.</p><p><h2 id="ia-ol-ill">Internet Archive and Library Partners Develop Joint Collection of 80,000+ eBooks To Extend Traditional In-Library Lending Model</h2></p><blockquote><p>Today [February 22, 2011], a group of libraries led by the Internet Archive announced a new, cooperative <a href="http://openlibrary.org/borrow" title="Borrow Books (Open Library)">80,000+ eBook lending collection</a> of mostly 20th century books on OpenLibrary.org, a site where it’s already possible to read over 1 million eBooks without restriction. During a library visit, patrons with an OpenLibrary.org account can borrow any of these lendable eBooks using laptops, reading devices or library computers. This new twist on the traditional lending model could increase eBook use and revenue for publishers. &#8230;</p><p>Any OpenLibrary.org account holder can borrow up to 5 eBooks at a time, for up to 2 weeks. Books can only be borrowed by one person at a time. People can choose to borrow either an in-browser version (viewed using the Internet Archive’s BookReader web application), or a PDF or ePub version, managed by the free Adobe Digital Editions software. &#8230;</p><p>Publishers selling their eBooks to participating libraries include Cursor and OR Books. Books purchased will be lent to readers as well as being digitally preserved for the long-term. This continues the traditional relationship and services offered by publishers and libraries.</p></blockquote><p>This press release from the <a href="http://www.archive.org/post/349420/in-library-ebook-lending-program-launched" title="Internet Archive and Library Partners Develop Joint Collection of 80,000+ eBooks To Extend Traditional In-Library Lending Model">Internet Archive</a> largely went unnoticed on the eve of the <a href="#hcod">#hcod</a> onslaught.  It was covered in the <a href="http://chronicle.com/blogs/wiredcampus/collaboration-seeks-to-provide-easier-access-to-e-books/30054" title="Collaboration Seeks to Provide Easier Access to E-Books | The Chronicle of Higher Education Wired Campus blog">Chronicle of Higher Education&#8217;s Wired Campus blog</a> and in <a href="http://www.libraryjournal.com/lj/home/889508-264/internet_archive_tests_new_ebook.html.csp" title="Internet Archive Tests New Ebook Lending Waters: In-Library, and License-Free | Library Journal">Library Journal</a>.  The latter has a few more helpful details: &#8220;IA founder Brewster Kahle and director Peter Brantley also told <em>LJ</em> that small independent publishers <a href="http://thinkcursor.com/" title="Cursor homepage">Cursor</a>, <a href="http://www.orbooks.com/" title="OR Books homepage">OR Books</a>, and <a href="http://www.smashwords.com/" title="Smashwords homepage">Smashwords</a> will donate ebooks license-free to the Open Library for lending to all Open Library members. With this venture, IA hopes to establish a &#8220;first-sale precedent&#8221; for e-lending, according to Brantley.&#8221;  One must be from one of the <a href="http://openlibrary.org/libraries" title="Libraries (Open Library)">participating libraries</a> to check out books.  My experience with most Internet Archive efforts is that the initial announcement is very subtle and not picked up widely, then slowly grows to something substantial.  I expect this project will follow much the same path and will have a noticeable imprint on the profession in a few years.</p><p><h2 id="collections-futures">Slidecast of David Lewis’ “Collections Futures” Talk</h2></p><blockquote><ul type="circle"><li>Context<ul type="disc"><li>The Big Shift</li><li>Interlude with Clay Shirky</li><li>A Bit of Disruptive Innovation Theory</li></ul></li><li>Collections in “A Strategy for Academic Libraries in the First Quarter of the 21st Century”</li><li>What Will Be Easy and What Will Be Hard</li></ul></blockquote><p>So far in <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym> Thursday Threads</i> I&#8217;ve intentionally avoided pointing to items inside this blog &#8212; preferring to link to events, resources, and conversations elsewhere.  I&#8217;m going to make sort-of-an-exception in this case because what I&#8217;m ultimately pointing to is not my work.  It is a <a href="http://dltj.org/article/collections-futures/">slidecast (recorded audio synchronized to slides) of David Lewis&#8217; presentation</a> at the <a href="http://www.oclc.org/research/events/2010-06-09a.htm" title="2010 RLG Partnership Annual Meeting Agenda">2010 Annual RLG Partnership Meeting</a>.  Starting with a foundation from John Hagel III, John Seely Brown and Lang Davison called the &#8220;<a href="http://www.johnhagel.com/shiftindex.pdf" title="Measuring the forces of long-term change: The 2009 Shift Index">Shift Index</a>&#8221; [PDF], Clay Shirky&#8217;s “<a href="http://www.ted.com/talks/clay_shirky_how_cellphones_twitter_facebook_can_make_history.html" title="Clay Shirky: How social media can make history | Video on TED.com">How Social Media Can Make History</a>” TED Talk, and Clayton Christensen&#8217;s disruptive innovation theories, David walks through the possibilities for three strategic issues facing academic libraries:  Complete the migration from print to electronic collections; Retire legacy print collections; and Migrate the focus of collections from purchasing materials to curating content.  The slidecast is about 75 minutes long and well worth the time as a thought-provoking view of what libraries should be doing to survive the next few decades.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/thursday-threads-2011w9/feed/</wfw:commentRss> <slash:comments>6</slash:comments> </item> <item><title>Mashups of Bibliographic Data: A Report of the ALCTS Midwinter Forum</title><link>http://dltj.org/article/mashups-of-bib-data/</link> <comments>http://dltj.org/article/mashups-of-bib-data/#comments</comments> <pubDate>Wed, 27 Jan 2010 21:14:52 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Meeting]]></category> <category><![CDATA[ALA Midwinter Conference 2010]]></category> <category><![CDATA[Association for Library Collections and Technical Services]]></category> <category><![CDATA[Dewey Decimal Classification]]></category> <category><![CDATA[Google Book Search]]></category> <category><![CDATA[Internet Archive]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[OCLC]]></category> <category><![CDATA[onix]]></category> <category><![CDATA[Open Library]]></category> <category><![CDATA[WorldCat]]></category><guid isPermaLink="false">http://dltj.org/?p=1478</guid> <description><![CDATA[This year the ALCTS Forum at ALA Midwinter brought together three perspectives on massaging bibliographic data of various sorts in ways that use MARC, but where MARC is not the end goal. What do you get when you swirl MARC, &#8230; <a href="http://dltj.org/article/mashups-of-bib-data/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=1478"></abbr><p>This year the <a href="http://connect.ala.org/node/91406" title="ALCTS Forum: Mix and Match: Mashups of Bibliographic Data | ALA Connect"><acronym title="Association for Library Collections and Technical Services">ALCTS</acronym> Forum at <acronym title="American Library Association">ALA</acronym> Midwinter</a> brought together three perspectives on massaging bibliographic data of various sorts in ways that <em>use</em> <acronym title="Machine Readable Cataloging">MARC</acronym>, but where MARC is not the end goal.  What do you get when you swirl MARC, <acronym title="ONline Information eXchange">ONIX</acronym>, and various other formats of metadata in a big pot?  Three projects:  ONIX Enrichment at OCLC, the Open Library Project, and Google Book Search metadata.<br /><span id="more-1478"></span><br />Below is a summary of how these three projects are messin&#8217; with metadata, as told by the Forum panelists.  I also recommend reading Eric Hellman&#8217;s <a href="http://go-to-hellman.blogspot.com/2010/01/google-exposes-book-metadata-privates.html" title="Google Exposes Book Metadata Privates at ALA Forum | Go-to-Hellman">Google Exposes Book Metadata Privates at ALA Forum</a> for his recollection and views of the same meeting.</p><p><h2 id="post-1478-h2-OCLC-ONIX">ONIX Enrichment at OCLC</h2></p><p><span class="removed_link" title="http://www.oclc.org/speakers/bios/register_renee.htm">Renee Register</span>, Global Product Manager for OCLC Cataloging and Metadata Services, was the first to present on the panel.  Her talk looked at a new and evolving product at OCLC on the enhancement of ONIX records with WorldCat records, and vice versa. <sup><a href="http://dltj.org/article/mashups-of-bib-data/#footnote_0_1478" id="identifier_0_1478" class="footnote-link footnote-identifier-link" title="For those not familiar with ONIX, it is a suite of standards promulgated by EDItEUR for the interchange of information on books and serial publications.  It is primarily used as the communication channel between the publishing industry through distribution chains to retail establishments.">1</a></sup></p><p>As libraries, Renee said &#8220;our instincts are collaborative&#8221; but &#8220;our data and workflow silos encourage redundancy and inhibit interoperability.&#8221;  Beyond the obvious differences in metadata formats, the workflows of libraries differ dramatically from other metadata providers and consumers. In libraries (with the exception of <acronym title="Cataloging in Print">CIP</acronym> and brief on-order records) the major work of bibliographic production is performed at the end of the publication cycle and ends with the receipt of the published item.  In the publisher supply chain, bibliographic data evolves over time, usually beginning months before publication and continuing to grow for months and years (sales information, etc.) after publication.  Renee had a graphic showing the current flow of metadata around the broader bibliographic universe that highlighted the isolation of library activity relative to publisher, wholesaler, and retailer activity.</p><p><div id="attachment_1484" class="wp-caption alignright" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://www5.oclc.org/downloads/presentations/MDS4Pubs_August_Webinar_200908.ppt" title="Slides from Publisher Supply Chain Webinar, August 2009"><img src="http://cdn.dltj.org/wp-content/uploads/2010/01/ONIX-enhancement-300x225.jpg" alt="" title="Diagram of the Process of Enhancing ONIX Records" width="300" height="225" class="size-medium wp-image-1484" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Diagram of the Process of Enhancing ONIX Records, from OCLC Services for the Publisher Supply Chain Webinar, August 2009</p></div>Renee when on to describe a &#8220;next generation cataloging data flow&#8221; where OCLC facilitates the inclusion of publisher data into <a href="http://www.worldcat.org/" title="WorldCat homepage" rel="homepage">WorldCat</a> and enhances publisher data with information extracted from WorldCat.  To the right is a version of the graphic she used at Midwinter taken from an earlier presentation on the same topic.  It show ONIX-formatted metadata coming into WorldCat, being cross-walked and matched with existing MARC data in WorldCat, and finally extracted and cross-walked back to ONIX resulting in <a href="http://publishers.oclc.org/en/metadata/default.htm" title="OCLC Metadata Services for Publishers"> enhanced ONIX metadata</a> for publishers to use in their supply chain.  If there is an exact match for the incoming ONIX record in WorldCat, the WorldCat record is enhanced with certain fields from the ONIX record (descriptions, author biographies, web links) &#8212; being careful not to override authority work being done by libraries, but adding enhancements that libraries may not otherwise input.  In turn, enhancements from exact match record and FRBR work set records (hardcover versus softcover versus audiobook, etc.) are added to the ONIX record (non-English subject headings, adding a Dewey Decimal Classification (DDC) field from another similar record if one doesn&#8217;t already exist, change the author field to an authority-controlled version).  If there is not an exact match for the ONIX record in WorldCat, a new WorldCat record is built from the ONIX record and it is subsequently enhanced by metadata found in the FRBR work set records.  In doing so, we are &#8220;increasing the goodness of metadata in the marketplace,&#8221; as Renee put it in her presentation.  OCLC is also creating a mapping between <a href="http://www.bisg.org/what-we-do-20-73-bisac-subject-headings-2009-edition.php" title="Standards &amp; Best Practices | Classification Schemes | BISAC Subject Headings 2009 Edition | Book Industry Study Group">BISAC Subject Headings</a><sup><a href="http://dltj.org/article/mashups-of-bib-data/#footnote_1_1478" id="identifier_1_1478" class="footnote-link footnote-identifier-link" title="By the way, it seems like BISAC is an acronym for &amp;#8220;Book Industry Systems Advisory Committee&amp;#8221;, the former name of the Book Industry Study Group.">2</a></sup> and the DDC system.  This allows the enhancement of ONIX with suggestions of BISAC Subject Terms and the enhancement of WorldCat records with generic DDC fields given an incoming BISAC Subject Term value from the ONIX record.</p><p>In her experience, Renee said that libraries need ways to enable our metadata to evolve over time and allow for publisher-created metadata to merge effectively with library-created metadata.  The bibliographic record needs to be a &#8220;living, growing&#8221; thing throughout the lifecycle of a title and beyond.  In concluding her remarks, she offered several resources to explore for further information:  the OCLC/NISO study on <a href="http://www.niso.org/publications/white_papers/StreamlineBookMetadataWorkflowWhitePaper.pdf" title="Streamlining Book Metadata Workflow">Streamlining Book Metadata Workflow</a>, the U.K. Research Information Network report on <a href="http://rin.ac.uk/creating-catalogues" title="Creating Catalogues: Bibliographic Records in a Networked World">Creating Catalogues: Bibliographic Records in a Networked World</a>, the Library of Congress <a href="http://www.loc.gov/bibliographic-future/news/" title="News, Press Releases and Reports - Working Group on the Future of Bibliographic Control (Library of Congress)">Study of the North American MARC Records Marketplace</a>, the Library of Congress <a href="http://cip.loc.gov/onixpro.html" title="LC ONIX Pilot Project" class="broken_link" rel="nofollow">CIP/ONIX Pilot Project</a>, and the <a href="http://publishers.oclc.org/en/default.htm" title="OCLC Publisher Supply Chain Website">OCLC Publisher Supply Chain Website</a>.</p><p><h2 id="post-1478-h2-Open-Library">From MARC to Wiki with Open Library</h2><br />The second presenter on the panel was <a href="http://kcoyle.net/" rel="homepage" title="Karen Coyle's home page">Karen Coyle</a>, talking about the mashup of metadata at the <a href="http://openlibrary.org/" title="Open Library project homepage" rel="homepage">Open Library</a> project at the <a href="http://archive.org/" title="Internet Archive homepage" rel="homepage">Internet Archive</a>.  The slides from her presentation are <a href="http://kcoyle.net/presentations/ol_boston.pdf" title="Open Library - Mix and Match Metadata presentation slides [PDF]">available from her website</a>.</p><p>Karen said right at the start that the Open Library project is different from most of what happens in libraries &#8212; it is &#8220;someone outside the library world making use of library data&#8221; &#8212; although the goal is arguably the same as others &#8212; &#8220;<a href="http://openlibrary.org/about" title="About Us (Open Library)">One web page for every book ever published</a>.&#8221;  As such, the Open Library isn&#8217;t a library catalog as librarians think of it in that it is not a representation of a libraries inventory. It has metadata for every book it can know about and a pointer to places where the book can be found, including all of the electronic books in Internet Archive (<a href="http://www.opencontentalliance.org/" rel="homepage" title="Open Content Alliance (OCA)">Open Content Alliance</a>, Google Public Domain, etc.) as well as pointers back to OCLC WorldCat.  Karen&#8217;s role for the project is that of &#8220;Library Data Informant.&#8221; The Internet Archive decided that they needed someone who understood library data in order to try to use it.  From Karen&#8217;s perspective, she is trying to be a resource for project but not give them any guidance on how to implement the service.  She is curious to see what the project would do when bibliographic data is viewed from a non-librarian perspective.  If they have questions, or if they have assumptions about data that are wrong, then she intervenes.</p><p>Karen went on to briefly describe the Open Library system.  Open Library doesn&#8217;t have records; rather, it has field types and data properties.  In this way, it uses semantic web concepts.  &#8220;Author&#8221; is a type, &#8220;Author birthdate&#8221; is another type, and so forth.  There are no set field types, so if the project gets data from source for which a type doesn&#8217;t yet exist, it can create a new one.  Each type can have data properties such as string, boolean, text, link, etc.  Nothing is required and everything is repeatable.  Everything &#8212; types, properties, and values &#8212; gets a <acronym title="Uniform Resource Identifier">URI</acronym> (a URI is an identifier like a URL, but conceptually a superset of the universe of URLs).  Titles, authors, subjects, author birthdates, and so on have URIs.  Lastly, the underlying data structures are based on wiki principles: all edits are saved and viewable, anyone can edit any value, anyone can add new types or properties, anyone can develop their own displays, etc.</p><p>The data that is now in Open Library came from a variety of sources.  They started with a copy of books from the Library of Congress, and continue to receive the weekly updates. They performed a crawl of Amazon&#8217;s book data.  They have gotten some from publishers, libraries, and individual users.  The last is perhaps the most interesting because it is mainly people outside the western world who are otherwise having trouble getting their works recognized.</p><p><h3 id="post-1478-h3-Problems-Issues">Problems, Issues, Challenges, and Opportunities with the Data</h3><br />People who use library data without the biases or assumptions of librarians come up with interesting ways to view the data.  Karen described a few of them.</p><dl class="inlineClass"><dt>Names -</dt><dd>&#8220;These library forms of names? Honestly no one but us can stand them.&#8221;  Even something as simple as the form of last-name-comma-first-name is troublesome.  No one else uses this form of the name: Amazon, Wikipedia, etc.  In processing these, any information between parenthesis has been deleted, birth and death dates move into separate field types.</dd><dt>Titles -</dt><dd>In working with the Open Library developers, this is one place that Karen tried insisting on applying a library practice:  knowing the initial article.  For us, this is important for sorting books in alphabetical order.  The developer response &#8212; why do we have to sort in alphabetical order?  &#8220;Where else but library catalogs to we see things sorted in alphabetical order?  Not in Google, not in Amazon, not anywhere.  Alphabetical order is not in the mindset anymore.&#8221;  They also found that the title might include extraneous data.  Amazon, for instance, appends the series title in parenthesis to the main title.  This is a demonstration of how other communities are not as concerned about strongly typing and separating information into fields. Amazon, of course, has reasons for series information into the main title: it helps sell books.</dd><dt>Product dimensions -</dt><dd>Publishers and distributors need to know characteristics of an item such as height, width, depth, and weight; they, of course, need to put it in a box and ship it.  Libraries, concerned about placing the item on the shelf, record just height.  Recording pagination is different, too: libraries use odd notations &#8220;ill. (some col)&#8221; and &#8220;xv, 200p.&#8221; versus simply &#8220;200 pages.&#8221;</dd><dt>Birthdates -</dt><dd>Librarians use birthdates to distinguish names; if there is no need to distinguish a name, birth and death dates are not added.  Someone looking at this from the outside would ask &#8216;Why don&#8217;t all authors have birth and death dates?&#8217;  This can be useful information for viewing the context of an item, not just to distinguish author names.  Open Library ran author names against Wikipedia to pick up not only birth and death years, but also the actual dates.</dd><dt>Subject headings -</dt><dd>Open Library using Library of Congress Subject Headings was out of the question. In processing the data, the Open Library developers just broke them apart into segments and used them. But because they were able to do data mining on the subject field types, they did find statistical relationships between the disassembled precoordinated headings and were able to present those to the user.</dd><dt>The View of the Data -</dt><dd>Rather than a traditional library view of long lists of author-title, the Open Library (in its next version coming in February) will have several different views into the mass of data: Authors; Books (what we would call <acronym title="Functional Requirements for Bibliographic Records">FRBR</acronym> &#8216;manifestations&#8217;); Works; Subjects; and eventually places, publishers, etc.  For example, when searching for an author one would get the author page.  On it would be all of the works from the author as well as other biographical information.  It looks similar to a WorldCat identities page, except it is the actual user interface built into the system.  Similarly, every work will have a page, and at the bottom of it one will see all of the editions of the work.  Also, each subject will have a page, and one will see a list of works with that subject as well as authors who write on that subject.  As Karen said, &#8220;The subject itself becomes an object of interest in the database, not just something that is just tacked on to the bottom of the library record.&#8221;</dd><dt>Data mining -</dt><dd>With the data in this format, it is possible to perform data mining actions against it. For instance, simple data mining such as country of publication, popular places that appear, etc.  When they had the problem of author names &#8212; knowing when to reverse surname and forname &#8212; they ran the names against Amazon and Wikipedia and retained the ones where they found the order of the entry was the same. The Open Library developers are also experimenting with data mining to find publisher names.  Publisher names, of course, vary dramatically, but by using ISBN prefixes they can pull together related items into a &#8220;publisher&#8221; view.</dd></dl><p>Karen suggested watching the <a href="http://edwardbetts.com/ol/" title="Index of /ol">Edward Betts&#8217;s site</a>, one of the developers of the Open Library project with an eye on the data mining aspects.  She said it is fun to look at our data when it can be viewed from this different point-of-view.  She also said to watch out for a new version of the <a href="http://openlibrary.org/" title="Open Library (Open Library)">Open Library website</a> coming in February.</p><p><h2 id="post-1478-h2-Google-Book-Search-Metadata">Google Book Search Metadata</h2><br />The final presenter was <a href="http://www.google.com/profiles/kurt.groetsch" title="Kurt Groetsch's Google Profile">Kurt Groetsch</a>, Technical Collections Specialist at Google where he works to provide understanding and insight into library partner collections and the digitized books from Google.  Kurt said that &#8220;Google has been fairly circumspect over the years about what we do on the Book Search project.&#8221;  He said it was a bit of a cultural legacy from the rest of the company and also possibly an artifact of the copyright litigation, but he is hoping to change that.  His presentation looked at how Google works with book metadata from three vantage points &#8212; the inputs into Google&#8217;s system, parsing by Google&#8217;s algorithms, and analysis and output into the public interfaces.</p><p>On the input side, Google is getting bibliographic metadata from over 100 sources in a variety of formats. MARC records are coming from libraries, union catalogs, commercial providers (OCLC), publishers/retails (one publisher supplies records in MARC format).  Google also gets ONIX records from commercial providers (such as Ingram and Bowker), publishers, and retailers.  Google is especially interested in data from non-U.S. retailers because it is a source of information about books published outside the United States; it helps facilitate discovery of items that they may not otherwise encounter in the <a href="https://books.google.com/partner/">publisher</a> and <a href="http://www.google.com/googlebooks/library.html" title="Google Books Library Project">library</a> programs.  Google also receives records in a variety of &#8220;idiosyncratic formats&#8221; &#8212; for example, publisher-contributed metadata (via the Publisher Partner Program); information associating books with jacket images; name authority records (from LC); reviews; popularity signals (sales data as well as <a name="anonymized_circulation_data">anonymized circulation data</a> from some library partners, useful for feeding into the relevancy ranking algorithm); and internally-generated metadata (for instance, whether a book is commercially available or not).  Google processes all of this information to come up with a single record that describes a book.  At this point they have over 800 million bibliographic records and one trillion bits of information in those records.</p><p>All of these records from all of these sources are processed and remixed with Google&#8217;s parsing algorithms about twice a week.  The first step is to transform the incoming records into a &#8220;less verbose format&#8221; for storage and processing.  It is a SQL-like structure that allows elements of the metadata to be queried.  Records are then parsed to extract specific bits of information, transform the bits as necessary, and write the information to an internal &#8220;resolved records&#8221; data structure (a subset of the data coming from the input formats).  In the presentation, Kurt had examples of how making inferences from data coming from both MARC and ONIX can be troublesome.  Parsing also involves extracting &#8220;bibkeys&#8221; from the records to aid in matching across sources of data.  Four types of identifiers are extracted from bibliographic records: OCLC numbers, <acronym title="Library of Congress Control Numbers">LCCN</acronym>s, ISBNs, and ISSNs.  They provide usually useful signals when matching bibliographic and help with assertions that two records describe the same manifestation.  Google also tries to parse item data when present in records representing multi-volume works, enumeration and chronology.  They will also treat barcode as a form of a &#8220;bibkey&#8221; if they get it from a library.  The parsing algorithm will also split records containing multiple ISBNs representing different product forms (e.g. hardback, paperback, etc.).</p><p>With all of this data parsed into records, Google starts its clustering process where records are examined and attached to each other.  Bibkeys provide significant evidence for relating records to each other, but bibkeys are not always present in a record (non-U.S. records and older records frequently contain no bibkeys).  The algorithms then fall back on text similarity matching using title, subtitle, contributor and other fields such as publisher and publication year.  The results are clusters of records representing the same manifestation. An algorithm then attempts to derive the &#8220;best-of&#8221; record for a single cluster from all of the parsed input records.  This is done in a field-by-field voting process based on the trustworthiness of individual fields from record sources.</p><p>Kurt went into some of the challenges facing the team building the clustering and best-of record creation algorithms.  For instance, in dealing with multivolume works they know of 5 numbering schemas with 3 number types in 15 different languages.  Enumeration is now showing in the public display, but the development team is still working with unparsable item data due to inconsistent cataloging practices between institutions&#8230;and sometimes inconsistencies within an institution.  Another problem is non-unique identifiers. In the current data set ISBN 7899964709 is shared by 75 books and ISBN 7533305353 is associated with 1413 books. There are also poor quality or &#8220;junk records&#8221;.  Kurt said his favorite was &#8220;The Mosaic Navigator&#8221; by Sigmund Freud published in 1939.  These are hard to identify with an algorithm, and they rely on reports of problems that enable the developers to go in and &#8220;kill&#8221; the troublesome record.  Another example is a book by Virginia Woolf where the incoming record had conflicting information; it had two 260 fields that contained different dates (1961, correct, and 1900) with fixed field information that strongly suggested that 1900 was the single date of publication.  When the data problem is systematic, they can identify it and compensate for it.  Kurt&#8217;s example for this case was &#8220;The United States Since 1945&#8243; published in 1899.  This one was highlighted in <a href="http://chronicle.com/article/Googles-Book-Search-A/48245/" title="Google's Book Search: A Disaster for Scholars - The Chronicle Review - The Chronicle of Higher Education">Geoffrey Nunberg&#8217;s criticism of Google Books metadata</a>.  In this case, there was a source of metadata from Brazil that when they didn&#8217;t know the date of publication would use 1899.  When Google went back and looked at the date distribution of books there was a huge spike in 1899.  Once Google knew about it they were able to go in and kill that information from that source of records. <sup><a href="http://dltj.org/article/mashups-of-bib-data/#footnote_2_1478" id="identifier_2_1478" class="footnote-link footnote-identifier-link" title="A side note: Google isn&amp;#8217;t the only one tripped up by this.  If one searches for the ISBN of the item, 0195038487, you get to more than one site that has the same incorrect publication date.  At least Google is attempting to clean up the data!">3</a></sup></p><p>In closing, Kurt said that Google is committed to engaging with the library community on improving metadata and metadata processing.</p><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://www.niso.org/publications/white_papers/Stream lineBookMetadataWorkflowWhitePaper.pdf to http://www.niso.org/publications/white_papers/StreamlineBookMetadataWorkflowWhitePaper.pdf on January 19th, 2011.</p><p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://www.oclc.org/speakers/bios/register_renee.htm on February 11th, 2011.</p><h2>Footnotes</h2><ol class="footnotes"><li id="footnote_0_1478" class="footnote">For those not familiar with <a href="http://www.editeur.org/8/ONIX/" title="ONIX Overview">ONIX</a>, it is a suite of standards promulgated by <a href="http://www.editeur.org/" title="EDItEUR homepage" rel="homepage">EDItEUR</a> for the interchange of information on books and serial publications.  It is primarily used as the communication channel between the publishing industry through distribution chains to retail establishments.</li><li id="footnote_1_1478" class="footnote">By the way, it seems like BISAC is an acronym for &#8220;Book Industry Systems Advisory Committee&#8221;, the former name of the <a href="http://www.bisg.org/" title="Book Industry Study Group homepage" rel="homepage">Book Industry Study Group</a>.</li><li id="footnote_2_1478" class="footnote">A side note: Google isn&#8217;t the only one tripped up by this.  If one searches for the ISBN of the item, 0195038487, you get to <a href="http://www.biggerbooks.com/book/9780195038484" title="The United States Since 1945 at BiggerBooks.com -  Leuchtenburg, 9780195038484, History">more</a> <a href="http://www.chegg.com/details/the-united-states-since-1945/0195038487/" title="Chegg.com: The United States Since 1945 by Leuchtenburg">than</a> <a href="http://www.amazon.co.uk/The-United-States-Since-1945/dp/0195038487" title="The United States Since 1945: Amazon.co.uk: Books">one</a> site that has the same incorrect publication date.  At least Google is attempting to clean up the data!</li></ol>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/mashups-of-bib-data/feed/</wfw:commentRss> <slash:comments>23</slash:comments> </item> <item><title>Comments on Google Book Search Settlement Coming to a Head (Again)</title><link>http://dltj.org/article/gbs-comments-due/</link> <comments>http://dltj.org/article/gbs-comments-due/#comments</comments> <pubDate>Thu, 03 Sep 2009 18:40:57 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[policy]]></category> <category><![CDATA[Amazon]]></category> <category><![CDATA[American Library Association]]></category> <category><![CDATA[Google Book Search]]></category> <category><![CDATA[Internet Archive]]></category> <category><![CDATA[legal]]></category><guid isPermaLink="false">http://dltj.org/?p=1252</guid> <description><![CDATA[Ah, it is the beginning of September when thoughts turn to going back to school, the days turn a little colder (in the northern hemisphere) and the smell of lawsuit briefs is in the air. Well, okay &#8212; the latter &#8230; <a href="http://dltj.org/article/gbs-comments-due/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=1252"></abbr><p>Ah, it is the beginning of September when thoughts turn to going back to school, the days turn a little colder (in the northern hemisphere) and the smell of lawsuit briefs is in the air.  Well, okay &#8212; the latter might not be what you expect, but this is a special September, after all. <a href="http://dltj.org/article/gbs-news/" title="Intervention by IA Denied; Deadline for Objections Extended | DLTJ">Postponed from May</a>, the deadline for filing comments in the Google Book Search settlement is coming up.  And everyone is weighing in (&#8220;again&#8221; for some) on the details of the settlement.  A couple of highlights.</p><p>The <a href="http://www.wo.ala.org/districtdispatch/?p=3579" title="Library associations submit supplemental filing, call for increased oversight of Google agreement">American Library Association (ALA), the Association of College and Research Libraries (ACRL) and the Association of Research Libraries (ARL)</a> again offered its support for the settlement, if only the court would promise to extend vigorous oversight of pricing and privacy practices of Google and the Books Rights Registry.  This came in the form of a <a href="http://wo.ala.org/gbs/ala-acrl-arl-brief-to-court/" title="ALA-ACRL-ARL supplemental brief">supplemental filing</a> to the <a href="http://wo.ala.org/gbs/wp-content/uploads/2009/05/googlebrieffinal.pdf" title="ALA-ACRL-ARL original brief">brief</a> the <a href="http://dltj.org/article/gbs-libraries-brief/" title=Library Associations File Amicus Brief for Google Book Search Settlement">three organizations filed in May</a> (just prior to the first comment deadline).</p><p><div id="attachment_1257" class="wp-caption alignright" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://cdn.dltj.org/wp-content/uploads/2009/09/Google-Search-for-Open-Book-Alliance.png"><img src="http://cdn.dltj.org/wp-content/uploads/2009/09/Google-Search-for-Open-Book-Alliance-300x222.png" alt="Google Search for Open Book Alliance" title="Google Search for Open Book Alliance" width="300" height="222" class="size-medium wp-image-1257" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Google Search for Open Book Alliance</p></div>An odd group of bedfellows has also <a href="http://www.openbookalliance.org/2009/08/opening-the-book/" title="'Opening the Book' posting">gotten together</a> to oppose the settlement.  Called the &#8220;<a href="http://www.openbookalliance.org/" title="Open Book Alliance homepage" rel="homepage">Open Book Alliance</a>&#8220;, it is made up of (at the moment): <a href="http://amazon.com/" rel="homepage" title="Amazon homepage">Amazon</a>, the <a href="http://asja.org/" rel="homepage" title="American Society of Journalists and Authors homepage">American Society of Journalists and Authors</a>, the <a href="http://clmp.org/" rel="homepage" title="Council of Literary Magazines and Presses homepage">Council of Literary Magazines and Presses</a>, the <a href="http://archive.org/" rel="homepage" title="Internet Archive homepage">Internet Archive</a>, <a href="http://microsoft.com/" rel="homepage" title="Microsoft homepage">Microsoft</a>, the <a href="http://nyla.org/" rel="homepage" title="New York Library Association homepage">New York Library Association</a>, the <a href="http://spdbooks.org/" rel="homepage" title="Small Press Distribution homepage">Small Press Distribution</a>, the <a href="http://www.sla.org/" rel="homepage" title="Special Libraries Association homepage">Special Libraries Association</a>, and <a href="http://yahoo.com/" rel="homepage" title="Yahoo! homepage">Yahoo!</a>.  Sound vaguely familiar?  That&#8217;s understandable; if you match up the interested parties of the OBA with the OCA (the <a href="http://www.opencontentalliance.org/" title="Open Content Alliance homepage" rel="homepage">Open Content Alliance</a>), you&#8217;ll find Microsoft, the Internet Archive, and Yahoo in common.  A search for &#8220;Open Book Alliance&#8221; in Google, in fact, still brings up the &#8220;Open Content Alliance&#8221; as the top hit.  The biggest new party, Amazon, to the formation of this group (or reconstitution, if you will) is undoubtedly the inspiration behind a press release from the Authors Guild (a party in the settlement agreement) with a biting title: <a href="http://authorsguild.org/advocacy/articles/amazon-accuses-someone-else-of-monopolizing.html" title="The Authors Guild - Amazon Accuses Someone Else of Monopolizing Bookselling">Amazon Accuses Someone Else of Monopolizing Bookselling</a>.</p><p>Lest you think the fun be over too soon, the deadline for filing briefs has been <a href="http://thepublicindex.org/docs/case_order/20090902.pdf" title="Judge Chin's order extending the deadline for filing briefs">extended</a> yet again from close of business tomorrow (Friday, September 4th, 2009) until 10:00am Tuesday.  Apparently, the court&#8217;s electronic filing system will be unavailable from 2pm today until 8am on Tuesday the 8th.  The main <a href="http://www.googlebooksettlement.com/" title="Google Book Search Settlement" rel="homepage">settlement website</a> says explicitly that the deadline for rights holders to opt out of the settlement remains September 4th.</p><p>But seriously, if you are looking for thoughtful commentary on the commentary, I recommend James Grimmelmann&#8217;s blog, <a href="http://laboratorium.net/" title="http://laboratorium.net/">The Laboratorium</a>.  Although there isn&#8217;t a single page that brings together all of his postings about the Google Book Search settlement, he helpfully prepends &#8220;GBS:&#8221; to the title of all such postings.  If you are looking to <em>participate</em> in the discussion surrounding the settlement, the best place I know of is the <a href="http://thepublicindex.org/settlement" title="The Public Index's version of the settlement statement">interactive version of the settlement notice</a> hosted at The Public Index.  There you can comment and watch the comments of others on a section-by-section basis, along with a <a href="http://thepublicindex.org/documents" title="The Public Index documents section">catalog of documents and links</a> from others regarding the settlement.</p><p>The next big event after the filing deadline is the Final Fairness Hearing, scheduled for 10am on October 7, 2009 (or, at least, scheduled for that day and time at the moment).  At the fairness hearing, we get to hear from the the court as it considers whether to grant final approval of the Settlement.  Somehow, though, I don&#8217;t think even that will be close to the final word on the settlement.  Stay tuned&#8230;<p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://www.wo.ala.org/districtdispatch/wp-content/uploads/2009/09/supplementbrief-FINAL.pdf to http://wo.ala.org/gbs/ala-acrl-arl-brief-to-court/ on January 20th, 2011.</p><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://sla.org/ to http://www.sla.org/ on February 11th, 2011.</p><div class='series_links'><a href='http://dltj.org/article/gbs-chronicle-highered/' title='Google Book Search Privacy, Orphan Works, and Monopoly'>Previous in series</a> <a href='http://dltj.org/article/gbs-hearing-postponed/' title='Google Book Search Settlement Hearing Is Likely Postponed'>Next in series</a></div>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/gbs-comments-due/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Intervention by IA Denied; Deadline for Objections Extended</title><link>http://dltj.org/article/gbs-news/</link> <comments>http://dltj.org/article/gbs-news/#comments</comments> <pubDate>Wed, 29 Apr 2009 15:46:59 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[policy]]></category> <category><![CDATA[copyright]]></category> <category><![CDATA[Google Book Search]]></category> <category><![CDATA[Internet Archive]]></category> <category><![CDATA[legal]]></category><guid isPermaLink="false">http://dltj.org/?p=901</guid> <description><![CDATA[New York Judge Denny Chin recently issued two rulings in the Google Book Search settlement. In the first, he &#8216; the request by the Internet Archive to intervene as a defendant in the lawsuit (and thus, presumably, be on firmer &#8230; <a href="http://dltj.org/article/gbs-news/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=901"></abbr><p>New York Judge <a href="http://en.wikipedia.org/wiki/Denny_Chin" title="Wikipedia: Denny Chin">Denny Chin</a> recently issued two rulings in the <a href="http://books.google.com/booksrightsholders/" title="Google Book Search Settlement Notice to Rights-holders">Google Book Search settlement</a>.  In the first, he <a href="http://www.scribd.com/doc/14596133/SDNY-Judge-Chin-Intervention-Denied" title="SDNY Judge Chin: Intervention Denied">&#8216;</a> the <a href="http://www.opencontentalliance.org/2009/04/17/internet-archive-files-intervention-request/" title="Internet Archive files Intervention Request; Open Content Alliance (OCA) Blog">request</a> by the Internet Archive to intervene as a defendant in the lawsuit (and thus, presumably, be on firmer founding to guide aspects of the settlement).  In his response, Judge Chin said:<br /><blockquote>The Court has received requests for pre-motion conferences by the Internet Archive, <a href="http://www.scribd.com/doc/14227449/Letter-to-Request-Intervention-in-Authors-Guild-v-Google" title="Letter to Request Intervention in Author's Guild v Google">Lewis Hyde, Harry Lewis, and the Open Access Trust</a>, Inc. seeking leave to intervene in this action.  I have construed their letters as motions to intervene, and the motions are denied.  The proposed interveners are, however, free to file objections to the proposed settlement or amicus briefs, either of which must be filed by the May 5, 2009 objection deadline.</p></blockquote><p> (The <a href="http://openaccesstrust.org/" title="Open Access Trust">Open Access Trust</a> is a proposal to form a legal trust for the revenue generated by unclaimed orphan works.)</p><p>In the second, Judge Chin <a href="http://www.scribd.com/doc/14741799/SDNY-Order-Extending-Deadline-to-September-4" title="SDNY - Order Extending Deadline to September 4">granted</a> a four month extension to the deadline for class members to opt out of the settlement or file objections.  The new deadline is now September 4th.  Requests for the extension came from a group of authors (including heirs of Steinbeck) in an <a href="http://www.scribd.com/doc/14685855/Authors-Letter-Extension-Request" title="Authors Letter Extension Request">April 24th letter</a> and a group of academic authors represented by Berkeley School of Law Professor <a href="http://www.law.berkeley.edu/php-programs/faculty/facultyProfile.php?facID=346" title="Pamela Samuelson Faculty Profile">Pamela Samuelson</a> (who recently wrote <a href="http://radar.oreilly.com/2009/04/legally-speaking-the-dead-soul.html" title="Legally Speaking: The Dead Souls of the Google Booksearch Settlement (O'Reilly Radar blog)">an eloquent post about the settlement on the O&#8217;Reilly Radar blog</a>).  Attorneys in the case <a href="http://www.scribd.com/doc/14685856/Ltr-M-Boni-to-J-Chin-42409" title="Ltr - M. Boni to J. Chin - 4.24.09">requested a two month extension</a>.</p><p>In related news, the court has received and logged three letters that object to the settlement:  from authors <a href="http://www.scribd.com/doc/14764020/GBS-Settlement-Objection-Letter-from-Hope-Ryden" title="GBS Settlement Objection Letter from Hope Ryden">Hope Ryden</a> and <a href="http://www.scribd.com/doc/14764116/GBS-Settlement-Objection-Letter-from-Lee-Killough" title="GBS Settlement Objection Letter from Lee Killough">Lee Killough</a> as well as <a href="http://www.scribd.com/doc/14764193/GBS-Settlement-Objection-Letter-from-Jenny-Darling-Associates" title="GBS Settlement Objection Letter from Jenny Darling &amp; Associates">Jenny Darling &amp; Associates</a> in Australia.  More may come next week from those anticipating the May 5th deadline, but I expect most will continue to flow in around the new September 4th date.</p><p>Found via <span class="removed_link" title="http://www.publishersweekly.com/article/CA6654190.html">two</span> <span class="removed_link" title="http://www.publishersweekly.com/article/CA6654845.html">articles</span> and a <a href="http://twitter.com/timoreilly/status/1643231746" title="Twitter / Tim O'Reilly: Excellent: Google book sea ...">tweet by Tim O&#8217;Reilly</a>.<p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://www.publishersweekly.com/article/CA6654190.html on January 28th, 2011.</p><p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://www.publishersweekly.com/article/CA6654845.html on January 28th, 2011.</p><div class='series_links'><a href='http://dltj.org/article/first-formal-gbs-objections/' title='Letters Begin Flying in Objection to the Proposed Google Book Search Settlement'>Previous in series</a> <a href='http://dltj.org/article/gbs-libraries-brief/' title='Library Associations File Amicus Brief for Google Book Search Settlement'>Next in series</a></div>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/gbs-news/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>Letters Begin Flying in Objection to the Proposed Google Book Search Settlement</title><link>http://dltj.org/article/first-formal-gbs-objections/</link> <comments>http://dltj.org/article/first-formal-gbs-objections/#comments</comments> <pubDate>Fri, 17 Apr 2009 18:41:17 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[policy]]></category> <category><![CDATA[Google]]></category> <category><![CDATA[Google Book Search]]></category> <category><![CDATA[Internet Archive]]></category> <category><![CDATA[Open Content Alliance]]></category><guid isPermaLink="false">http://dltj.org/?p=867</guid> <description><![CDATA[We are starting to see objections to the Google Book Search Settlement this month in advance of the May 5th deadline set up by the court. The first comes from the consumer advocacy group Consumer Watchdog (found by way of &#8230; <a href="http://dltj.org/article/first-formal-gbs-objections/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=867"></abbr><p>We are starting to see objections to the <a href="http://books.google.com/booksrightsholders/" title="Google Book Search Settlement Notice to Rights-holders - Books &amp;amp; Inserts Registry">Google Book Search Settlement</a> this month in advance of the May 5th deadline set up by the court.  The <a href="http://www.consumerwatchdog.org/corporateering/articles/?storyId=26117" title="Consumer Watchdog - Consumer Group Calls On Justice Department To Intervene In Google Book Settlement">first</a> comes from the consumer advocacy group Consumer Watchdog (<a href="http://www.ala.org/ala/alonline/currentnews/newsarchive/2009/april2009/googlescanobjections.cfm" title="Objection to Google Scanning Settlement Filed (American Libraries News)">found</a> by way of the American Libraries news feed).  They have submitted a letter to the U.S. Justice Department asking the antitrust division to delay the settlement until the &#8220;&#8216;most favored nation&#8217; clause favoring Google is removed and the deal&#8217;s &#8216;orphan works&#8217; provision is extended to cover all who might digitize books, not only Google.&#8221;  The <a href="http://www.consumerwatchdog.org/resources/ltrjusticegooglebook040109.pdf" title="Letter from Consumer Watchdog to the U.S. Justice Department">letter in PDF</a> is available on the Consumer Watchdog website.  The objections revolve around the provision that require the Books Rights Registry to give Google the same terms as anyone else who enters into agreements with the Registry (noting that more favorable terms might be required by a new party in order to compete with Google) as well as the fact that the copyright infringement protection for digitizing orphan works only extends to Google.</p><p>The American Libraries news piece also says:  &#8220;ALA, in conjunction with the Association for Research Libraries and ALA&#8217;s Association of College and Research Libraries, plans to file an <i>amicus</i> brief with the court about the settlement.&#8221;  We&#8217;ll likely see that before the May deadline.</p><p>Another objection came from the Open Content Alliance.  On behalf of the OCA, <a href="http://www.opencontentalliance.org/2009/04/17/internet-archive-files-intervention-request/" title="Internet Archive files Intervention Request">Internet Archive filed an intervention request</a> with the court seeking to become a party to the settlement.  The document, <a href="http://www.scribd.com/doc/14308286/Internet-Archive-Intervention-Google-Book-Search-" title="Internet Archive Intervention: Google Book Search">located on the Scribd service</a> (an odd choice, if you were to ask me), <a href="http://www.law.cornell.edu/rules/frcp/Rule24.htm" title="Federal Rules of Civil Procedure - Rule 24 (LII 2007 ed.)">is asking</a> to add itself as a defendant because the rights of internet content providers and of the public were not represented in the negotiations.  In the paragraph in the middle of the letter, it says &#8220;All other persons, including Internet content providers such as the [Internet] Archive, would not be able to use orphan works broadly without being exposed to claims to infringement.&#8221;  And further down:  &#8220;Google has negotiated for itself certainty in its use of orphan works under the terms of the settlement through the mechanism of the [Books Rights Registry], whereas marketplace competitors are able to negotiate with the [Registry] only for commercial exploitation of those works with identified rightsholders.&#8221;</p><p>Stay tuned &#8212; I think this is going to get more interesting in the next few weeks.</p><p><h2>Update</h2></p><dl class="dltj-updates"><dt>17-Apr-2009</dt><dd>The O&#8217;Reilly Radar blog has a <a href="http://radar.oreilly.com/2009/04/legally-speaking-the-dead-soul.html" title="Legally Speaking:  The Dead Souls of the Google Booksearch Settlement - O'Reilly Radar">post</a> by <a href="http://people.ischool.berkeley.edu/~pam/" title="Pamela Samuelson">Pamela Samuelson</a>, professor at the University of California at Berkeley with a joint appointment in the School of Information and the School of Law, on this very topic.  While not a formal filing with the court, she comments on the legal implications of the settlement in very clear language.<br /><blockquote>This column argues that the proposed settlement of this lawsuit is a privately negotiated compulsory license primarily designed to monetize millions of orphan works. It will benefit Google and certain authors and publishers, but it is questionable whether the authors of most books in the corpus (the &#8216;dead souls&#8217; to which the title refers) would agree that the settling authors and publishers will truly represent their interests when setting terms for access to the Book Search corpus.</p></blockquote></dd></dl><div class='series_links'><a href='http://dltj.org/article/gbs-online-market/' title='What Does the Google Book Settlement Mean for the Online Book Market?'>Previous in series</a> <a href='http://dltj.org/article/gbs-news/' title='Intervention by IA Denied; Deadline for Objections Extended'>Next in series</a></div>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/first-formal-gbs-objections/feed/</wfw:commentRss> <slash:comments>6</slash:comments> </item> <item><title>Open Library Demonstration Screencast</title><link>http://dltj.org/article/open-library/</link> <comments>http://dltj.org/article/open-library/#comments</comments> <pubDate>Fri, 20 Jul 2007 14:05:12 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Disruption in Libraries]]></category> <category><![CDATA[description]]></category> <category><![CDATA[Internet Archive]]></category> <category><![CDATA[library 2.0]]></category> <category><![CDATA[ngc4lib]]></category> <category><![CDATA[Open Library]]></category> <category><![CDATA[screencast]]></category><guid isPermaLink="false">http://dltj.org/2007/07/open-library/</guid> <description><![CDATA[Earlier this week, Aaron Swartz of the Internet Archive <a href="http://www.aaronsw.com/weblog/openlibrary" title="Announcing the Open Library (Aaron Swartz&#039;s Raw Thought)">announced</a> the <a href="http://demo.openlibrary.org/" title="The Open Library demonstration site homepage">demonstration website of the Open Library project</a>, a new kind of book catalog that brings together traditional publisher and library bibliographic data in an interface with the user-contributed paradigm of Wikipedia.  Okay, I'll pause for a moment while you parse that last sentence.  Think you got it?  Read -- and watch -- further.Open Library has been <a href="http://www.librarything.com/thingology/2007/07/open-library.php" title="Open Library (Thingology - LibraryThing&#039;s ideas blog)">mentioned</a> a <a href="http://digitaleccentric.blogspot.com/2007/07/open-library.html" title="Open Library (Digital Eccentric blog)">bit</a> in the <a href="http://blogs.talis.com/panlibus/archives/2007/07/license_for_ope.php" title="License for Open Library? (panlibus blog)">blogs</a> <a href="http://www.libraryjournal.com/blog/1090000309/post/1800011980.html" title="The People&#039;s Catalog (Roy Tennant&#039;s blog)">this week</a>, but not to the extent I thought was worthy of the magnitude of the project.  So I recorded a screencast introduction (in Flash Video format below followed by a rough transcript) that looks at not only the browsing side of the system but also the record editing and record creation aspects of Open Library.  As I say at the end of the recording, Open Library is one of those mind-bending, assumption-shattering projects that, at least for me, is challenging my thoughts about what library service could be and should be.  Congratulations to the team at the Internet Archive, and I'm looking forward to future enhancements and directions for the project. <a href="http://dltj.org/article/open-library/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/2007/07/open-library/"></abbr><p>Earlier this week, Aaron Swartz of the Internet Archive <a href="http://www.aaronsw.com/weblog/openlibrary" title="Announcing the Open Library (Aaron Swartz&#039;s Raw Thought)">announced</a> the <a href="http://demo.openlibrary.org/" title="The Open Library demonstration site homepage">demonstration website of the Open Library project</a>, a new kind of book catalog that brings together traditional publisher and library bibliographic data in an interface with the user-contributed paradigm of Wikipedia.  Okay, I&#8217;ll pause for a moment while you parse that last sentence.  Think you got it?  Read &#8212; and watch &#8212; further.</p><p>Open Library has been <a href="http://www.librarything.com/thingology/2007/07/open-library.php" title="Open Library (Thingology - LibraryThing&#039;s ideas blog)">mentioned</a> a <a href="http://digitaleccentric.blogspot.com/2007/07/open-library.html" title="Open Library (Digital Eccentric blog)">bit</a> in the <a href="http://blogs.talis.com/panlibus/archives/2007/07/license_for_ope.php" title="License for Open Library? (panlibus blog)" class="broken_link" rel="nofollow">blogs</a> <a href="http://www.libraryjournal.com/blog/1090000309/post/1800011980.html" title="The People&#039;s Catalog (Roy Tennant&#039;s blog)" class="broken_link" rel="nofollow">this week</a>, but not to the extent I thought was worthy of the magnitude of the project.  So I recorded a screencast introduction (in Flash Video format below followed by a rough transcript) that looks at not only the browsing side of the system but also the record editing and record creation aspects of Open Library.  As I say at the end of the recording, Open Library is one of those mind-bending, assumption-shattering projects that, at least for me, is challenging my thoughts about what library service could be and should be.  Congratulations to the team at the Internet Archive, and I&#8217;m looking forward to future enhancements and directions for the project.<br /><br /><object type="application/x-shockwave-flash" data="http://dltj.org/wp-content/plugins/pb-embedflash/swf/mediaplayer.swf?width=720&amp;height=500" width="720" height="500" class="embedflash"><param name="movie" value="http://dltj.org/wp-content/plugins/pb-embedflash/swf/mediaplayer.swf?width=720&amp;height=500" /><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="flashvars" value="file=http://drc-dev.ohiolink.edu/presentations/open-library-screencast.flv&amp;searchbar=false" /><small>(Please open the article to see the flash file or player.)</small></object></p><p>Rough transcript of the screen cast is below.</p><p><h2>Introduction</h2></p><p>Hello, and welcome to this screencast overview of the <a href="http://openlibrary.org/" title="The Open Library homepage">Open Library project</a>.  Open Library is an effort by the Internet Archive to create a comprehensive catalog of every book.  As the <a href="http://demo.openlibrary.org/about" title="About Us<br /> (The Open Library)">project&#8217;s &#8220;about&#8221; page</a> says, &#8220;Not every book on sale, or every important book, or even every book in English, but simply every book.&#8221;  The about page goes on to describe the characteristics of Open Library project &#8212; that it is a project enabled by Internet technology because no physical space could hold it and that it aims to pull together records from publishers and libraries.  It is also a project in the same vein as Wikipedia, meaning that any user can create and edit the records in the system.</p><p>In this overview, I&#8217;ll lead you through searching and browsing the Open Library&#8217;s demonstration website from the perspective of any modern library catalog interface.  Then I&#8217;ll show you where it deviates from traditional library catalogs by exposing the underlying wiki nature of the database; we&#8217;ll examine the changes that users have made and we&#8217;ll even make a change ourselves.  And finally I&#8217;ll show the process of creating entirely new records in the system.  So let&#8217;s get started.</p><p><h2>Searching</h2></p><p>We&#8217;re looking at the <a href="http://demo.openlibrary.org/" title="The Open Library demonstration site homepage">home page of the Open Library project demonstration site</a>.  In the middle is a search box with a suggested search &#8212; &#8220;tom sawyer adventure&#8221;.  That is a good suggestion so we&#8217;ll click on Go.  Open Library returns <a href="http://demo.openlibrary.org/search?q=tom+sawyer+adventure" title="Search Results (The Open Library)">a classic, relevance ranked list of matching records</a> with some book covers along the left side and a faceted list of refinements along the right.  So right away you can see that there are some authority control problems here in the author names &#8212; Twain comma Mark, Mark comma Twain, and Twain comma Mark with birth and death dates &#8212; and here in the language field.  But I have high hopes that the developer team will find some intriguing ways to address these problems.</p><p>Back over here in the results area we have the various editions of Samuel Clemen&#8217;s &#8220;The Adventures of Tom Sawyer&#8221; &#8212; let&#8217;s pick <span class="removed_link" title="http://demo.openlibrary.org/b/adventures_of_Tom_Sawyer">the 1876 edition to see the full record display</span> &#8212; there.  We have the publisher, publication date and place, language, and a summary or review of sorts at the bottom.  We also see signs of the availability of full text &#8212; over here in the options box there is a <a href="http://openlibrary.org/details/adventuresoftoms00twaiuoft" title="Open Library: Details: The adventures of Tom Sawyer">download from the Internet Archive link</a>, a &#8220;Scan Sponsor&#8221; field here and a &#8220;View this book&#8221; graphic.  This is one of the items scanned by the Open Content Alliance and made available by the Internet Archive through the Open Library project.  A very nice interface for paging through the book.  So one could imagine that the Open Library could become the primary vehicle by which Open Content Alliance materials are made available to the public.</p><p>So let&#8217;s go back here to the metadata page.  Remember in the introduction that I said that the data was malleable in a wiki-like fashion.  The Open Library developers created a system that allows for user-contributed updates (a la Wikipedia) to fielded data (like your classic bibliographic record).  The two hints that the record is modifiable are this big edit button in the middle of the metadata and this more subtile <span class="removed_link" title="http://demo.openlibrary.org/b/adventures_of_Tom_Sawyer?m=history">&#8220;[history]&#8221; link</span> near the top of the page.  Let&#8217;s start with the history link to see what has been done to this record.</p><p>This page should look familiar to those who have worked with wikis before.  It shows a listing of edits that were made to this record from most recent to the very first edit, who made the change (identified by IP addresses in this case because the people making the changes were not logged to an account at the time), an editor-supplied comment about what was done, and when the change was made.  We can go back in time and see the page at a particular version through the links under the &#8220;When&#8221; column, or we can use the compare function to see the difference between two version.  In the case of <span class="removed_link" title="http://demo.openlibrary.org/b/adventures_of_Tom_Sawyer?b=3&amp;a=2&amp;m=diff">the changes between version 2 and version 3</span>, we see that the editor added &#8220;Canada&#8221; as the place of publication.  On this page you start to see the fielded nature of this wiki structure, but the best place to see it is look at the record edit screen itself.</p><p>These are all full-text fields on this page with no controlled vocabulary.  You&#8217;ll note the absence of any MARC field names here, but as you scroll through you&#8217;ll see the evidence of MARC and AACR2 in the field labels.  Down at the bottom is an edit summary to describe the changes made to the record, then save, preview and delete version buttons &#8212; all classic wiki functions.</p><p><h2>Editing</h2></p><p>Now, I&#8217;d like to show the full record editing process, but since I don&#8217;t have this Mark Twain book in hand, I&#8217;m going to bring up another record that I created yesterday &#8212; &#8220;<span class="removed_link" title="http://demo.openlibrary.org/b/Eric_Meyer_on_CSS">Eric Meyer on CSS</span>&#8220;.  Before showing the editing process, let&#8217;s linger here a moment at the &#8220;options&#8221; box along the right side.  Since this is a more modern book (as opposed to the Tom Sawyer book we saw first), there are additional options here for purchasing the book through these various vendors or borrowing the book through a very nice link into Open Worldcat and two web-based book trading sites.</p><p>But back to the metadata.  There is one error and one omission in this record &#8212; perhaps this is a subtile demonstration of problems that creep in with user-generated content.  First, the error, is that there is an extra digit in the ISBN-10 field, which is a big problem because the links in the options box use the ISBN as a linking field and at the time of this recording they don&#8217;t work.  They will work in a moment, though.  The second problem is that I forgot to put in the publication date.  But hey, no problem, all I need to do is &#8220;Edit&#8221; this record.</p><p>So we are back to <a href="http://demo.openlibrary.org/b/Eric_Meyer_on_CSS?m=edit" title="edit Eric Meyer on CSS : Mastering the Language of Web Design (The Open Library)">the edit screen</a>, and I&#8217;m going to scroll down and fix the ISBN-10 field like so, then scroll down a little further and add the publication date.  Then I&#8217;ll scroll all the way to the bottom and type in an edit summary &#8212; &#8220;Fixed the ISBN and added a publication date&#8221; &#8212; and hit save.  We&#8217;re now back at the metadata display screen and <a href="http://worldcat.org/isbn/0-73571-245-x" title="Eric Meyer on CSS : mastering the language of Web design [WorldCat.org]">the link to Open Worldcat</a> now works.  So, as an aside, one wonders what the folks in Dublin, Ohio, think about this.  It is competition on the one hand since Worldcat is also aiming to be the most comprehensive catalog of books in the world.  On the other hand, perhaps there is room for cooperation by somewhat getting vetted changes to Open Library records into the OCLC union catalog.  Who knows?</p><p><h2>Creating a New Record</h2></p><p>Alright, back to current reality.  Let&#8217;s add a record to Open Library, and in this case I&#8217;m going to use an ARL SPEC Kit that I wrote a number of years ago called &#8220;Library Patron Privacy&#8221;.  First let&#8217;s run a search in Open Library to see if it is there, and no, it isn&#8217;t.  The only way I&#8217;ve figured out how to enter a new item is to go to the URL where the page would be located and get the classic wiki &#8220;This page does not exist. Create it?&#8221; message.</p><p>One of the quirks I found in the system is that I have to create author wiki pages before book wiki pages &#8212; otherwise I&#8217;ll get a Python error message on the screen.  I&#8217;ve reported this to the Open Library developers, but in the meantime just know authors need to be created before their books.  Which is to say that authors have wiki pages in Open Library in addition to books.  The structure of URLs to Open Library author pages is the letter &#8220;a&#8221; followed by a slash followed by the author&#8217;s last, first and middle names separated by underscore characters.  So I&#8217;ll go to the URL of that form, then click on the &#8220;Create it&#8221; link.</p><p>Now here is one of the tricky parts of the existing interface.  The page type starts as &#8220;type/page&#8221;, and as you can see it doesn&#8217;t have any of the fielded elements that we saw in previous examples.  What you have do do is change the page type to &#8220;type/author&#8221; and then you get the fielded HTML form.  So I&#8217;m going to go through here and fill in some of the parts.  Then go down to the edit summary field and write a summary of this change, then click save.  Now that <span class="removed_link" title="http://demo.openlibrary.org/a/Murray_Peter_E">Open Library knows who I am</span>, let&#8217;s create the record for the book.</p><p>You&#8217;ve seen the structure of the URLs to book pages before &#8212; a &#8220;b&#8221; followed by a slash followed by the book title with spaces replaced by underscore characters.  I&#8217;ll put that in the URL field and get the default page type.  This needs to be changed to &#8220;type/edition&#8221; in order to get the bibliographic record fields.  There.  Now I&#8217;ll go through here and enter the data.  When we get down to the author field we enter it in the same format that we used to create it &#8212; an &#8220;a&#8221; followed by a slash followed by the name with spaces replaced by underscores.</p><p>So we&#8217;ll just finish up here and come down to the edit summary field, put something in here, and hit save. <span class="removed_link" title="http://demo.openlibrary.org/b/Library_Patron_Privacy">This record</span> is now in the system, and you can see the public display here along with the links on the right because I entered an ISBN.  I haven&#8217;t quite figured out how to get a cover image into the system yet &#8212; I expect there is a file upload interface somewhere, but I haven&#8217;t found it.</p><p><h2>Conclusions</h2></p><p>So that&#8217;s all there is, and I don&#8217;t say that in a way to denigrate the work that has been done by the development team so far.  As the URL and site banner indicate, it is a demonstration system &#8212; and a compelling demonstration it is.  All sorts of questions immediately come to mind, of course &#8212; will there be a controlled vocabulary or authority control built into the system, can data be exported out of records &#8212; and, for that matter, can end-users bulk import data into the system, are there Web2.0 niceties like tagging and RSS feeds in the works, and so forth.</p><p>Even with all of those questions, Open Library is one of those mind-bending, assumption-shattering projects that, at least for me, is challenging my thoughts about what library service could be and should be.  Congratulations to the team at the Internet Archive, and I&#8217;m looking forward to future enhancements and directions for the project.</p><p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://demo.openlibrary.org/b/adventures_of_Tom_Sawyer on January 19th, 2011.</p><p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://demo.openlibrary.org/b/adventures_of_Tom_Sawyer?m=history on January 19th, 2011.</p><p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://demo.openlibrary.org/b/adventures_of_Tom_Sawyer?b=3&#038;a=2&#038;m=diff on January 19th, 2011.</p><p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://demo.openlibrary.org/b/Eric_Meyer_on_CSS on January 19th, 2011.</p><p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://demo.openlibrary.org/a/Murray_Peter_E on January 19th, 2011.</p><p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://demo.openlibrary.org/b/Library_Patron_Privacy on January 19th, 2011.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/open-library/feed/</wfw:commentRss> <slash:comments>9</slash:comments> <enclosure url="http://drc-dev.ohiolink.edu/presentations/open-library-screencast.flv" length="0" type="video/x-flv" /> </item> </channel> </rss>
<!-- Served from: dltj.org @ 2012-02-11 12:43:06 by W3 Total Cache -->
