<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"><channel><title>Disruptive Library Technology Jester &#187; MARC</title> <atom:link href="http://dltj.org/tag/marc/feed/" rel="self" type="application/rss+xml" /><link>http://dltj.org</link> <description>We&#039;re Disrupted, We&#039;re Librarians, and We&#039;re Not Going to Take It Anymore</description> <lastBuildDate>Mon, 06 Feb 2012 20:04:22 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <cloud domain='dltj.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' /> <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/us/</creativeCommons:license> <item><title>Thursday Threads: Beyond MARC, Library-controlled DRM, Spam Study</title><link>http://dltj.org/article/thursday-threads-2011w21/</link> <comments>http://dltj.org/article/thursday-threads-2011w21/#comments</comments> <pubDate>Fri, 27 May 2011 01:01:21 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Thursday Threads]]></category> <category><![CDATA[digital rights management]]></category> <category><![CDATA[ebooks]]></category> <category><![CDATA[Library of Congress]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[spam]]></category><guid isPermaLink="false">http://dltj.org/?p=2906</guid> <description><![CDATA[Receive DLTJ Thursday Threads:by&#160;E-mailby&#160;RSSDelivered by FeedBurner Threads this week without commentary. (It has been a long week that included only one flight of four that actually happened without a delay, cancellation, or redirection.) Big announcements are one from the Library &#8230; <a href="http://dltj.org/article/thursday-threads-2011w21/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2906"></abbr><div id="feedburner-thursday-threads-email-2011w21" class="wp-caption alignright noprint noFrontPage" style="width: 230px;;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><form style="border: 1px solid rgb(204, 204, 204); padding: 3px; margin: 0pt; text-align: center;" action="http://feedburner.google.com/fb/a/mailverify" method="post" target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true"><p>Receive <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> Thursday Threads:</p><p>by&nbsp;<a href="http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads&amp;loc=en_US" title="D.L.T.J. Thursday Threads Email Subscription">E-mail</a><br /><input style="width: 140px;" name="email" value="Your e-mail address" onfocus="if (this.defaultValue==this.value) this.value = ''" type="text"/><input value="thursday-threads" name="uri" type="hidden"/><input name="loc" value="en_US" type="hidden"/><input value="Subscribe" type="submit"/></p><p>by&nbsp;<a href="http://feeds.dltj.org/thursday-threads/" title="D.L.T.J. Thursday Threads RSS Feed">RSS</a></p><p style="font-size: 80%;">Delivered by <a href="http://feedburner.google.com" target="_blank" title="Google Feedburner Service">FeedBurner</a></p></form></div><p> Threads this week without commentary.  (It has been a long week that included only one flight of four that actually happened without a delay, cancellation, or redirection.)  Big announcements are one from the Library of Congress to <a href="#p2906-replace-marc">re-envision the way bibliographic information travels</a>, one from Douglas County (Colorado) Library&#8217;s <a href="#p2906-cipa-dcl">experiment with taking ownership of ebooks and applying its own digital rights management</a>, and a <a href="#p2906-spam">study on the ecosystem of spam</a>.</p><p>Feel free to send this to others you think might be interested in the topics.  If you find these threads interesting and useful, you might want to add the <a href="http://feeds.dltj.org/thursday-threads/" title="RSS Feed for DLTJ Thursday Threads">Thursday Threads RSS Feed</a> to your feed reader or subscribe to e-mail delivery using the form to the right.  If you would like a more raw and immediate version of these types of stories, watch <a href="http://friendfeed.com/dltj" title="Peter Murray - FriendFeed">my FriendFeed stream</a> (or subscribe to <a href="http://friendfeed.com/dltj?format=atom" title="Atom feed for Peter Murray's FriendFeed account">its feed</a> in your feed reader).  Comments and tips, as always, are <a href="http://dltj.org/contact">welcome</a>.</p><p><h2 id="p2906-replace-marc">Transforming our Bibliographic Framework: A Statement from the Library of Congress</h2></p><blockquote><p>Spontaneous comments from participants in the US RDA Test show that a broad cross-section of the community feels budgetary pressures but nevertheless considers it necessary to replace MARC 21 in order to reap the full benefit of new and emerging content standards.  The Library now seeks to evaluate how its resources for the creation and exchange of metadata are currently being used and how they should be directed in an era of diminishing budgets and heightened expectations in the broader library community.<div style="text-align: right; width: 100%;"><cite>- <a href="http://www.loc.gov/marc/transition/news/framework-051311.html" title="Transforming our Bibliographic Framework: A Statement from the Library of Congress">Transforming our Bibliographic Framework: A Statement from the Library of Congress</a>, Bibliographic Framework Transition Initiative</cite></div></blockquote><p>Also see John Mark Ockerbloom&#8217;s <a href="http://everybodyslibraries.com/2011/05/24/open-datas-role-in-transforming-our-bibliographic-framework/" title="Open data’s role in transforming our bibliographic framework « Everybody's Libraries">Open data’s role in transforming our bibliographic framework</a> for more details and links to other posts talking about the <a href="http://www.loc.gov/marc/transition/index.html" title="Bibliographic Framework Transition Initiative | Library of Congress">Bibliographic Framework Transition Initiative</a>.</p><p><h2 id="p2906-cipa-dcl">Douglas County Library to Distribute Ebooks with its own DRM</h2></p><blockquote><p>We are pleased to announce a partnership between the <a href="http://www.CIPABooks.com" target="_blank" title="Join CIPA - We're Independent Publishers Working Together&amp;nbsp; 303-365-CIPA (303-365-2472)">Colorado Independent Publishers Association (CIPA)</a>, and two Colorado libraries: <a href="http://www.rrcc.edu/library/" target="_blank" title="Red Rocks Community College :: Success Your Way">Red Rocks Community College Library</a>, and Douglas County Libraries.</p><p>Many members of CIPA have entered the world of digital publishing. By June of 2011, Red Rocks Community College Library and Douglas County Libraries will not only offer eBooks from CIPA’s authors for checkout through their library catalogs, but will also allow click-through purchases of these titles.</p><div style="text-align: right; width: 100%;"><cite>- <a href="http://douglascountylibraries.org/content/new-e-book-partnership" title="New e-book partnership | Douglas County Libraries">New e-book partnership</a>, Douglas County Libraries</cite></div></blockquote><p>There are more details on a <a href="http://www.equacc.ala.org/2011/05/20/library-signs-agreement-with-independent-publishers/" title="Library signs agreement with independent publishers | EQUACC">post</a> on the ALA Presidential Task Force on Equitable Access to Electronic Content blog along with an earlier post about that library&#8217;s experiments with <a href="http://www.equacc.ala.org/2011/04/25/adobe-content-server/" title="Adobe Content Server | EQUACC">Adobe Content Server</a>.</p><p><h2 id="p2906-spam">Study Says Spam Can Be Cut by Blocking Card Transactions</h2></p><blockquote><p>For years, a team of computer scientists at two <a href="http://topics.nytimes.com/topics/reference/timestopics/organizations/u/university_of_california/index.html" title="More articles about the University of California in the New York Times">University of California</a> campuses has been looking deeply into the nature of <a href="http://topics.nytimes.com/top/reference/timestopics/subjects/s/spam_electronic_mail/index.html" title="More articles about spam in the New York Times">spam</a>, the billions of unwanted e-mail messages generated by networks of zombie computers controlled by the rogue programs called botnets. They even coined a term, “<a title="The related research paper." href="http://www.icsi.berkeley.edu/pubs/networking/2008-ccs-spamalytics.pdf">spamalytics</a>,” to describe their work.</p><p>Now they have concluded an experiment that is not for the faint of heart: for three months they set out to receive all the spam they could (no quarantines or filters need apply), then systematically made purchases from the Web sites advertised in the messages.</p><div style="text-align: right; width: 100%;"><cite>- <a href="http://www.nytimes.com/2011/05/20/technology/20spam.html?_r=2" title="Study Says Spam Can Be Cut by Blocking Card Transactions | New York Times">Study Says Spam Can Be Cut by Blocking Card Transactions</a>, by John Markoff, New York Times</cite></div></blockquote>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/thursday-threads-2011w21/feed/</wfw:commentRss> <slash:comments>6</slash:comments> </item> <item><title>Recordings from Code4Lib Virtual Lightning Talks Available</title><link>http://dltj.org/article/code4lib-virtual-lightning-talk-recordings/</link> <comments>http://dltj.org/article/code4lib-virtual-lightning-talk-recordings/#comments</comments> <pubDate>Mon, 02 May 2011 18:56:41 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Raw Technology]]></category> <category><![CDATA[code4lib]]></category> <category><![CDATA[eprints]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[solr]]></category> <category><![CDATA[vufind]]></category><guid isPermaLink="false">http://dltj.org/?p=2849</guid> <description><![CDATA[Thanks to everyone for participating in the first Code4Lib Virtual Lightning Talks on Friday. In particular, my gratitude goes out to Ed Corrado, Luciano Ramalho, Michael Appleby, and Jay Luker being the first presenters to try this scheme for connecting &#8230; <a href="http://dltj.org/article/code4lib-virtual-lightning-talk-recordings/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2849"></abbr><p>Thanks to everyone for participating in the first <a href="http://wiki.code4lib.org/index.php/Virtual_Lightning_Talks" title="Virtual Lightning Talks | Code4Lib">Code4Lib Virtual Lightning Talks</a> on Friday.  In particular, my gratitude goes out to Ed Corrado, Luciano Ramalho, Michael Appleby, and Jay Luker being the first presenters to try this scheme for connecting library technologists.  My apologies also to those who couldn&#8217;t connect, in particular to Elias Tzoc Caniz who had signed up but found himself locked out by a simultaneous user count in the presentation system.  Recordings of the presentation audio and screen capture video <a href="http://www.archive.org/search.php?query=subject%3A%22Code4Lib%20Virtual%20Lightning%20Talks%22" title="Search for &#038;039;Code4Lib Virtual Lightning Talks&#038;039; in the Internet Archive">are now up in the Internet Archive</a>.</p><table><tr style="text-align: left;"><th>Name</th><th> Topic</th></tr><tr><td> Edward M. Corrado</td><td> <a href="http://www.archive.org/details/CodaboxUsingE-printsForASmallScalePersonalRepository" title="Recording of CodaBox: Using E-Prints for a small scale personal repository">CodaBox: Using E-Prints for a small scale personal repository</a></td></tr><tr><td> Luciano Ramalho</td><td> <a href="http://www.archive.org/details/Marc-dmAJavascriptApiForIndexingMarc-jsonRecordsInCouchdb" title="Recording of MARC-DM: a JavaScript API for indexing MARC-JSON records in CouchDB">MARC-DM: a JavaScript API for indexing MARC-JSON records in CouchDB</a></td></tr><tr><td> Michael Appleby</td><td> <a href="http://www.archive.org/details/ExtendingVufindForCross-collectionSearch" title="Recording of Extending VuFind for cross-collection search">Extending VuFind for cross-collection search</a></td></tr><tr><td> Jay Luker</td><td> <a href="http://www.archive.org/details/ExtendingSolrsDefaultSimilarityScoringForLongerFulltextDocuments" title="Recording of Extending Solr's default Similarity scoring for longer, fulltext documents">Extending Solr&#8217;s default Similarity scoring for longer, fulltext documents</a></td></tr></table><p><h2>Lessons Learned</h2><br />First, people were locked out when they shouldn&#8217;t have been.  The most we saw online at any particular time as 25, but the room was supposed to be able to hold 60.  I think the problem was how I entered e-mail addresses into the system to reserve slots for the presenters and the people who signed up in advance.  (Which obviously didn&#8217;t work because one of the presenters and at least one of the attendees who signed up in advance didn&#8217;t get in.)  Should we do this again (see below) I&#8217;ll try to debug the problem.</p><p>Second, some comments I got were about cranky Java applets and applications.  LYRASIS has two conference tools at its disposal &#8212; Java-based Centra and Flash-based Acrobat Connect &#8212; and I chose Centra because running Flash on LINUX is an issue.  Maybe this will need to be revisited (or maybe there is another Java-based conference system that can do better).</p><p>Third, since we were not limited by space and other timing constraints, can the five-minutes-per-presenter limit be relaxed?  I have mixed feelings about this; I think defined time limits promote better presentations, but the four presentations this first go-around went to the end of the five minute time limit and there was no opportunity for questions or audience interaction.</p><p>On the whole, it seemed like a positive experience from my perspective and from that of the feedback I&#8217;ve received so far.  I&#8217;m going to start a conversation thread in <a href="http://groups.google.com/group/code4libcon?pli=1" title="code4libcon | Google Groups">Code4LibCon</a> (where all of the Code4Lib meeting planning discussion takes place) to see if it is worthwhile to do again and to identify what should be done differently.  If you are interested, please consider joining and contributing to the discussion.  Or e-mail me privately and I&#8217;ll reflect your comments into the group discussion.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/code4lib-virtual-lightning-talk-recordings/feed/</wfw:commentRss> <slash:comments>9</slash:comments> </item> <item><title>What To Do With ISO 2709:2008?</title><link>http://dltj.org/article/iso-2709/</link> <comments>http://dltj.org/article/iso-2709/#comments</comments> <pubDate>Wed, 27 Apr 2011 01:43:48 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Raw Technology]]></category> <category><![CDATA[ISO2709]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[National Information Standards Organization]]></category> <category><![CDATA[standards]]></category><guid isPermaLink="false">http://dltj.org/?p=2822</guid> <description><![CDATA[My employer recently became a member of NISO and I was made the primary representative. This is my first formal interaction with the standards organization heirarchy (NISO &#8594; ANSI &#8594; ISO) and as one of the side effects I&#8217;m being &#8230; <a href="http://dltj.org/article/iso-2709/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2822"></abbr><div><p>My employer recently became a member of NISO and I was made the primary representative.  This is my first formal interaction with the standards organization heirarchy (<abbr title="National Information Standards Organization">NISO</abbr> &rarr; <abbr title="American National Standards Institute">ANSI</abbr> &rarr; <abbr title="International Standards Organization">ISO</abbr>) and as one of the side effects I&#8217;m being asked to provide advice to NISO on how its vote should be cast on relevant ISO ballots.  Much of it has been pretty routine so far, but today one jumped out at me &#8212; the systematic review for the standard <a href="http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=41319" title="ISO 2709:2008 - Information and documentation -- Format for information exchange">ISO 2709:2008</a>, otherwise blandly known as <a href="http://en.wikipedia.org/wiki/ISO_2709" title="ISO 2709 - Wikipedia, the free encyclopedia">Information and documentation — Format for information exchange</a>.  You might know it as the underlying structure of MARC.  (Though, to describe it accurately, MARC is a subset or profile of ISO 2709.)  And the voting options are: Confirm (as is), Revise/Amend, Withdraw (the standard), or Abstain (from the vote).<br /><span id="more-2822"></span><br /><h2>What is ISO 2709?</h2><br />The scope statement of the standard is:<br /><blockquote><p>This International Standard specifies the requirements for a generalized exchange format which will hold records describing all forms of material capable of bibliographic description as well as other types of records. It does not define the length or the content of individual records and does not assign any meaning to tags, indicators or identifiers, these specifications being the functions of an implementation format.</p><p>This International Standard describes a generalized structure, a framework designed specially for communications between data processing systems and not for use as a processing format within systems.</p></blockquote><p> The <a href="http://en.wikipedia.org/wiki/ISO_2709" title="ISO 2709 | Wikipedia">Wikipedia page for ISO 2709</a> pretty much sums up what is in the standard itself without all of the gory definitions and details, and if you are used to dealing with MARC records, it&#8217;ll look familiar.</p><p>According to the documentation I can find, ISO 2709 was last revised in 2008 when it was &#8220;technically revised to incorporate specification of the use of ISO/IEC 10646 using 8-bit Unicode Transformation Format (UTF-8) encoding.&#8221;  The ballot in play now is a &#8220;systematic review&#8221;<sup><a href="http://dltj.org/article/iso-2709/#footnote_0_2822" id="identifier_0_2822" class="footnote-link footnote-identifier-link" title="&amp;#8220;In addition to the continuous maintenance of the standard described above, a comprehensive review of a database standard at regular intervals may be necessary which is organized in accordance with the rules in the ISO/IEC Directives and the ISO Supplement for the systematic review process.&amp;#8221; Procedure for the development and maintenance of standards in database format. Annex ST of the ISO supplement to the ISO/IEC Directives.">1</a></sup> of the 2008 revision of the standard.</p><p><h2>What are my choices again?</h2><br />As a member of NISO, I can cast an advisory vote to recommend how NISO &#8212; the U.S. representative to ISO for this <a href="http://www.iso.org/iso/iso_technical_committee.html?commid=48798" title="ISO - Technical committees - TC 46/SC 4 - Technical interoperability">technical committee</a> &#8212; casts it single vote among all of the voting countries of this technical committee.  And in my capacity as a NISO member, I can vote to confirm the standard, revise it, or ask that it be withdrawn.  And so here is my quandry.  As a standard for &#8220;generalized exchange format which will hold records describing all forms of material capable of bibliographic description&#8221; it works okay, but I think it is hard to argue with the fact that information exchange formats have moved well beyond this sort of format.  (My favorite interchange format is XML, but there are some that advocate now for JSON as a universal exchange format.)</p><p>So here is where I need help.  Should I vote to confirm the <i>status quo</i>?  Or should I vote to revise/amend with a comment that says it is time to take this interchange format into XML, and in doing so set a path for the eventual deprecation of what we know as ISO 2709:2008?  Should I take the bold step and vote to withdraw the standard (which itself seems extreme given its current wide use in the library and closely related fields)?</p><p>What would you do with ISO 2709?</p></div><h2>Footnotes</h2><ol class="footnotes"><li id="footnote_0_2822" class="footnote">&#8220;In addition to the continuous maintenance of the standard described above, a comprehensive review of a database standard at regular intervals may be necessary which is organized in accordance with the rules in the ISO/IEC Directives and the ISO Supplement for the systematic review process.&#8221; <a href="http://www.iso.org/sites/ConsumersStandards/en/pdf/ISO%20Supplement%20-%20Annex%20.pdf" title="http://www.iso.org/sites/ConsumersStandards/en/pdf/ISO%20Supplement%20-%20Annex%20.pdf" class="broken_link" rel="nofollow">Procedure for the development and maintenance of standards in database format</a>. Annex ST of the <a href="http://www.iso.org/directives" title="ISO/IEC Directives and ISO supplement ">ISO supplement to the ISO/IEC Directives</a>.</li></ol>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/iso-2709/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Real Life Example of Creative Commons License Applied to MARC Records</title><link>http://dltj.org/article/cc0-marc-records/</link> <comments>http://dltj.org/article/cc0-marc-records/#comments</comments> <pubDate>Fri, 18 Mar 2011 16:45:06 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[policy]]></category> <category><![CDATA[cc0]]></category> <category><![CDATA[Creative Commons]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[University of Florida]]></category> <category><![CDATA[WorldCat]]></category><guid isPermaLink="false">http://dltj.org/?p=2727</guid> <description><![CDATA[Eric Morgan posted a message to the Next Generation Catalog for Libraries mailing list this morning that points to a announcement by the University of Florida library that they are now applying a Creative Commons Public Domain Dedication statement to &#8230; <a href="http://dltj.org/article/cc0-marc-records/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2727"></abbr><p><a href="http://infomotions.com/" title="Infomotions, LLC">Eric Morgan</a> posted a <a href="http://article.gmane.org/gmane.culture.libraries.ngc4lib/9018" title="NGC4LIB mailing list message with the subject 'university of florida' by Eric Morgan on March 18,2011 | Gmane">message</a> to the Next Generation Catalog for Libraries mailing list this morning that points to a <a href="http://www.uflib.ufl.edu/catmet/creativecommons.html" title="Creative Commons License | University of Florida George A. Smathers Libraries">announcement</a> by the <a href="http://www.uflib.ufl.edu/" title="University of FLorida George A. Smathers Libraries homepage">University of Florida library</a> that they are now applying a <a href="http://creativecommons.org/publicdomain/zero/1.0/" title="CC0 1.0 Universal | Creative Commons">Creative Commons Public Domain Dedication</a> statement to <abbr title="MAchine Readable Cataloging">MARC</abbr> records they create.  Their announcement says:</p><blockquote><p>Beginning March 2011, the University of Florida Smathers Libraries implemented a policy to include a Creative Commons license in all of its original cataloging records. The records are considered public domain with unrestricted downstream use for any purpose.</p><p>The following MARC 588 field (Source of Description Note) is added to new records contributed to WorldCat. It has not been added retrospectively to University of Florida original records in WorldCat.</p><p style="padding-left:2em;font-family:monospace;">588::|a This bibliographic record is available under a Creative Commons CC0 license. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.</p></blockquote><p>Their announcement page also provides links to some examples from the OPAC.  Scroll down to the bottom of the page to see the CC0 graphic and declaration.</p><ul><li><a href="http://uf.catalog.fcla.edu/permalink.jsp?20UF005023800" title="http://uf.catalog.fcla.edu/permalink.jsp?20UF005023800">Latinoamericanismo : historia intelectual de una geografía inestable</a></li><li> <a href="http://uf.catalog.fcla.edu/permalink.jsp?20UF005056555" title="http://uf.catalog.fcla.edu/permalink.jsp?20UF005056555">Mapa Everest de carreteras, España y Portugal</a></li><li> <a href="http://uf.catalog.fcla.edu/permalink.jsp?20UF005023882" title="http://uf.catalog.fcla.edu/permalink.jsp?20UF005023882">The Song of Ceylon</a></li></ul><p>The University of Florida joins <a href="http://www.lib.umich.edu/open-access-bibliographic-records" title="Open Access Bibliographic Records Available for Download and Use | Library Information Technology | MLibrary">University of Michigan</a> in making original cataloging records available under CC0.  To refresh your memory CC0 &#8220;Public Domain Dedication&#8221; statement (<a href="http://www.plagiarismtoday.com/2009/02/25/cc0-waiving-copyrights/" title="CC0: Waiving Copyrights">it isn&#8217;t a license!</a>) says:<br /><blockquote><p>The person who associated a work with this deed has <b>dedicated</b> the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.</p><p>You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.</p></blockquote><p>Now this is an interesting development because it is a kind of viral declaration of the same sort that <a href="http://wiki.code4lib.org/index.php/OCLC_Policy_Change" title="OCLC Policy Change - Code4Lib">OCLC proposed</a> with the <a href="http://dltj.org/article/oclc-review-board-initial-recommendations/">withdrawn</a> draft of the record use policy.  There OCLC was going to add a <a href="http://www.oclc.org/us/en/bibformats/en/9xx/996.shtm" title="996 WorldCat Record Use Policy Link [OCLC]">996 field</a> to all records exported from WorldCat that would say:</p><table border="0" cellpadding="1" cellspacing="0" style="padding-left:2em;font-family:monospace;"><tbody><tr valign="top"><td width="30" align="left">996</td><td width="10" align="right"></td><td width="17" align="left"></td><td width="90%" align="left">OCLCWCRUP &Dagger;i Use and transfer of this record is governed by the OCLC&reg; Policy for Use and Transfer of WorldCat&reg; Records &Dagger;u http://purl.org/oclc/wcrup</td></tr></tbody></table><p>This would do something similar except that it would make viral the public domain declaration on records added to OCLC WorldCat.  (It is probably also an oversight that the <a href="http://www.oclc.org/us/en/bibformats/en/9xx/996.shtm" title="996 WorldCat Record Use Policy Link [OCLC]">996 field</a> documentation is still on OCLC&#8217;s site.)  Does this begin to segment WorldCat into records that can and cannot be used?  Or is it redundant since some think that MARC records, as a recitation of facts, cannot be copyrighted anyway?</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/cc0-marc-records/feed/</wfw:commentRss> <slash:comments>15</slash:comments> </item> <item><title>Thursday Threads: Personal Book Digitizer, Status of Book Piracy, Core Elements of Description</title><link>http://dltj.org/article/thursday-threads-2011w3/</link> <comments>http://dltj.org/article/thursday-threads-2011w3/#comments</comments> <pubDate>Thu, 20 Jan 2011 11:50:44 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Thursday Threads]]></category> <category><![CDATA[digital rights management]]></category> <category><![CDATA[digitization]]></category> <category><![CDATA[Karen Smith-Yoshimura]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[metadata]]></category> <category><![CDATA[piracy]]></category> <category><![CDATA[publishing]]></category> <category><![CDATA[textbook]]></category><guid isPermaLink="false">http://dltj.org/?p=2330</guid> <description><![CDATA[Receive DLTJ Thursday Threads:by&#160;E-mailby&#160;RSSDelivered by FeedBurnerIt wasn&#8217;t too long ago that the music industry was in an uproar about stories of how easy it was to copy digital audio files and make digital copies with high fidelity. It was predicted &#8230; <a href="http://dltj.org/article/thursday-threads-2011w3/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2330"></abbr><div id="feedburner-thursday-threads-email-2011w03" class="wp-caption alignright noprint noFrontPage" style="width: 230px;;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><form style="border: 1px solid rgb(204, 204, 204); padding: 3px; margin: 0pt; text-align: center;" action="http://feedburner.google.com/fb/a/mailverify" method="post" target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true"><p>Receive <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> Thursday Threads:</p><p>by&nbsp;<a href="http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads&amp;loc=en_US" title="D.L.T.J. Thursday Threads Email Subscription">E-mail</a><br /><input style="width: 140px;" name="email" value="Your e-mail address" onfocus="if (this.defaultValue==this.value) this.value = ''" type="text"/><input value="thursday-threads" name="uri" type="hidden"/><input name="loc" value="en_US" type="hidden"/><input value="Subscribe" type="submit"/></p><p>by&nbsp;<a href="http://feeds.dltj.org/thursday-threads/" title="D.L.T.J. Thursday Threads RSS Feed">RSS</a></p><p style="font-size: 80%;">Delivered by <a href="http://feedburner.google.com" target="_blank" title="Google Feedburner Service">FeedBurner</a></p></form></div><p>It wasn&#8217;t too long ago that the music industry was in an uproar about stories of how easy it was to copy digital audio files and make digital copies with high fidelity.  It was predicted that we would see the same thing in other media forms, and this week&#8217;s <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym> Thursday Threads</i> has two stories on the topic of book publishing.  First is news of another inexpensive and simple (and now to be commercially produced) <a href="#booksaver">book digitizing system</a>.  Although the process of &#8220;ripping&#8221; a book from its physical medium might take longer than an audio track, these kind of devices are emerging that will make it simple to do.  What happens with the digital copy after that?  The second Thursday Threads pointer is to an <a href="#book-piracy">interview</a> with the founder of book publishing industry consultant about the state of book piracy, how it is measured, and why digital rights management software is a poor way to stop it.  The last entry this week is a <a href="#corebibdescr">short excerpt of a brief summary</a> of a study conducted by OCLC last year on the usage of MARC tags in cataloging records.<br /><span id="more-2330"></span><br />As a side note, apologies to <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> readers that had problems reading some of the content here over the past couple of weeks.  A series of problems with my personal server &#8212; driven by the fact, I believe, that the server was first set up about 10 years ago and all the patches, tweaks, and updates over the decade have finally driven performance into the ground &#8212; prompted me to migrate this blog to Amazon&#8217;s Web Services cloud.  It is now running on a micro <a href="http://aws.amazon.com/ec2/" title="Amazon Elastic Compute Cloud (Amazon EC2)">Elastic Cloud Computing (EC2)</a> virtual machine backed by <a href="http://aws.amazon.com/s3/" title="Amazon Simple Storage Service (Amazon S3)">Simple Storage Service (S3)</a> and the <a href="http://aws.amazon.com/cloudfront/" title="Amazon CloudFront">CloudFront</a> content distribution network.  I&#8217;ve also been optimizing the snot out of configuration &#8212; employing all sorts of new tricks for reducing the time it takes to deliver pages to your browser.  I have another blog post in draft with the details for when anyone (even me!) wants to replicate it.  Given enough personal time, watch for that in the next week or so.</p><p>All of that said, if you are seeing things that don&#8217;t look or function right, <a href="http://dltj.org/contact/">please let me know</a>.</p><p><h2 id="booksaver">Book Saver &#8211; A personal book digitization setup from ION</h2><br /><div id="attachment_2333" class="wp-caption alignright" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://www.ionaudio.com/booksaver" title="http://www.ionaudio.com/booksaver"><img src="http://cdn.dltj.org/wp-content/uploads/2011/01/booksaver_angle_lrg-300x187.jpg" alt="Booksaver from ION" title="Booksaver from ION" width="300" height="187" class="size-medium wp-image-2333" /></a><br /><iframe title="YouTube video player" class="youtube-player" type="text/html" width="298" height="198" src="http://www.youtube.com/embed/annCmIa-a08" frameborder="0"></iframe><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Picture and Demonstration Video of the Book Saver from ION</p></div></p><blockquote><p>Book Saver has two cameras that take separate images in rapid succession of each page within an open book. Both cameras of Book Saver also have a flash for allowing the page to be fully illuminated during the scanning process. Book Saver’s cradle, where the book is placed during the scanning process, is also angled as to not require you to hold pages down to get a flat, even surface. While similar devices require up to seven seconds per one page, Book Saver takes only one second per two pages!</p></blockquote><p>News of the new <a href="http://www.ionaudio.com/booksaver" title="http://www.ionaudio.com/booksaver">Book Saver</a> product comes from <a href="http://www.librarybazaar.com/2011/01/15/book-saver-vs-drm/" title="Book Saver vs. DRM? | Library Bazaar">Fiacre O&#8217;Duinn</a>.  It is a hand-held device for digitizing book materials.  The promotional literature says it takes about 15 minutes to digitize a 200-page book.  The product was <a href="http://www.ionaudio.com/content380172" title="http://www.ionaudio.com/content380172">announced</a> in time for the Consumer Electronics Show earlier this month, but is not yet available.  It is expected to ship this summer with a <a href="http://www.crunchgear.com/2011/01/12/ion-audio-book-saver-does-just-that-saves-books/" title="Ion Audio Book Saver Does Just That, Saves Books">manufacturer&#8217;s suggested retail price of $189</a> (I&#8217;m already seeing price points of <a href="http://www.mobilemag.com/2011/01/12/ions-book-saver-book-scanner-scans-200-page-books-in-15-minutes/" title="Ion Book Scanner digitizes your 200-page books in 15 minutes for eReading | Mobile Magazine">$149</a> mentioned).</p><p>One of the &#8220;Key Features&#8221; listed on the product page is that the device &#8220;eliminates the need to purchase electronic versions of reading material you already own.&#8221;  As Fiacre points out in his post, this really brings down the cost (in equipment and in effort) of digitally reproducing books.  Are we about to see a new wave of personal book sharing/piracy?  And what will the impact on libraries be?  In the higher education arena, it is already being mentioned as a way to <a href="http://www.hackcollege.com/blog/2011/1/10/hands-on-with-the-ion-audio-book-saver.html" title="Hands On with the Ion Audio Book Saver | HackCollege">digitize textbooks</a>.  It is conceivable that students would <a href="http://dltj.org/article/textbooks-on-reserve/" title="Textbooks On Reserve Program at Miami University | DLTJ">borrow textbooks</a> from our libraries, digitize them in an afternoon, and return them &#8212; or maybe just digitize them in the library.  Do we need to get ahead of devices like this with education and policy initiatives?</p><p><h2 id="book-piracy">Book Piracy: Less DRM, More Data</h2></p><blockquote><p>As digital book publishing continues to expand at a rapid pace to meet reader demands, piracy rears its head at the forefront of many a discussion in publisher circles. Many publishers respond to the perceived threat with strict digital rights management (DRM) software. But is this the best solution? And does it even provide protection from piracy?</p><p>In the following interview, <a href="http://magellanmediapartners.com/" title="Magellan Media Partners">Magellan Media</a> founder and TOC 2011 speaker <a href="http://www.toccon.com/toc2011/public/schedule/speaker/5146?cmp=il-radar-tc11-oleary-piracy" title="Speaker: Brian O’Leary: O'Reilly Tools of Change for Publishing Conference 2011 - O'Reilly Conferences, February 14 - 16, 2011, New York">Brian O&#8217;Leary</a> (<a href="http://twitter.com/brianoleary" title="http://twitter.com/brianoleary">@brianoleary</a>) discusses the current state of book piracy, how measurement data isn&#8217;t sufficient to determine its impact, and why DRM is a poor anti-piracy tool.</p></blockquote><p>The same arguments in favor of digital rights management for the music sector are now being made in the book publishing sector. <a href="http://radar.oreilly.com/2011/01/book-piracy-drm-data.html" title="Book piracy: Less DRM, more data - O'Reilly Radar">This interview</a> comes from the perspective of why DRM is the wrong answer to the perceived problem of book piracy.  The backdrop is <a href="https://en.oreilly.com/toc2011/public/register?cmp=il-radar-tc11-oleary-piracy">O&#8217;Reilly Media&#8217;s Tools of Change for Publishing</a> conference to be held next month in New York City.</p><p><h2 id="corebibdescr">Core Bibliographic Description</h2></p><blockquote><p>Those “outliers” can be categorized according to three general purposes:</p><ul><li><em>Provenance and Identity</em>: identifiers (e.g. ISBN, OCLC, etc.) and cataloging source (040)</li><li><em>Elements useful for discovery:</em> title statement (245), personal names (100, 700) and subject (650)</li><li><em>Elements useful for understanding and evaluation:</em> publication statement (260), physical description (300), and notes (500)</li></ul><p>That’s it. In a nutshell you have the very core of bibliographic description as defined by librarians over the last century or so.</p></blockquote><p>This <a href="http://hangingtogether.org/?p=834" title="The Core of Bibliographic Description | hangingtogether.org">post</a> by <a href="http://hangingtogether.org/?page_id=207" title="Roy Tenant Biography">Roy Tenant</a> briefly summarizes the work of OCLC Research staff member <a href="http://www.oclc.org/research/people/smith-yoshimura.htm" title="Karen Smith-Yoshimura | OCLC - People">Karen Smith-Yoshimura</a>.  The research work was to <a href="http://www.oclc.org/research/activities/attributes/default.htm" title="Gather Evidence to Inform Changes in MARC Metadata Practices [OCLC - Activities]">gather evidence to inform changes in MARC metadata practices</a>, and that project page includes a <a href="http://www.oclc.org/research/publications/library/2010/2010-06.pdf" title="Implications of MARC Tag Usage on Library Metadata Practices report in pDF">72 page report</a> [PDF] and an Excel <a href="http://cdn.dltj.org/wp-content/uploads/2011/01/2010-06a.xls" title="Full Data Tables Related to MARC Tag Usage in WorldCat">spreadsheet of data tables</a> along with <a href="http://www5.oclc.org/downloads/research/webinars/20100318mtu.wmv" title="Audio in WMV format of results webinar">audio</a> and <a href="http://www5.oclc.org/downloads/research/webinars/20100318mtu.mp4" title="Video recording in MPEG4 format of the results webinar">video</a> of a <a href="http://www.catalogingfutures.com/catalogingfutures/2010/04/webinar-implications-of-marc-tag-usage-on-library-metadata.html" title="Cataloging Futures: Webinar: Implications of MARC tag usage on library metadata">one hour webinar</a> on the report.  In my <a href="http://friendfeed.com/dltj/710d04c0/core-of-bibliographic-description-oclc" title="The Core of Bibliographic Description | Peter Murray's FriendFeed">FriendFeed posting of Roy&#8217;s article</a>, <a href="http://waltcrawford.name/" title="Walt Crawford">Walt Crawford</a> noted a similar finding in his 1986 <a href="http://books.google.com/books?id=9NXgAAAAMAAJ&#038;dq=Bibliographic+Displays+in+the+Online+Catalog&#038;hl=en&#038;ei=ZHI3TeCzLIH-8Ab79s2cBA&#038;sa=X&#038;oi=book_result&#038;ct=result&#038;resnum=1&#038;ved=0CC8Q6AEwAA" title="Bibliographic displays in the online catalog | Google Book Search">Bibliographic displays in the online catalog</a>.  As Walt notes, &#8220;somehow it&#8217;s not surprising that it&#8217;s still true in 2010.&#8221;</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/thursday-threads-2011w3/feed/</wfw:commentRss> <slash:comments>8</slash:comments> <enclosure url="http://www5.oclc.org/downloads/research/webinars/20100318mtu.wmv" length="68512623" type="video/asf" /> <enclosure url="http://www5.oclc.org/downloads/research/webinars/20100318mtu.mp4" length="288204112" type="video/mp4" /> </item> <item><title>Defining Metadata and Making Metadata Accessible</title><link>http://dltj.org/article/defining-metadata-accessibility/</link> <comments>http://dltj.org/article/defining-metadata-accessibility/#comments</comments> <pubDate>Wed, 17 Nov 2010 01:40:20 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[L/IS Profession]]></category> <category><![CDATA[accessibility]]></category> <category><![CDATA[description]]></category> <category><![CDATA[Karen Coyle]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[metadata]]></category> <category><![CDATA[Resource Description and Access]]></category><guid isPermaLink="false">http://dltj.org/?p=1842</guid> <description><![CDATA[In preparation for the last webinar of the three-part series &#8220;Using RDA: Moving into the Metadata Future&#8220;, I&#8217;m reading again Karen Coyle&#8216;s &#8220;Library Data in a Modern Context&#8221; &#8212; the first chapter of Understanding the Semantic Web: Bibliographic Data and &#8230; <a href="http://dltj.org/article/defining-metadata-accessibility/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=1842"></abbr><p>In preparation for the last webinar of the three-part series &#8220;<a href="http://www.alastore.ala.org/detail.aspx?ID=3125" title="Using RDA: Moving into the Metadata Future (A Three-part ALA TechSource Workshop) - ALA Store">Using RDA: Moving into the Metadata Future</a>&#8220;, I&#8217;m reading again <a href="http://www.kcoyle.net/" title="Karen Coyle's home page" rel="homepage">Karen Coyle</a>&#8216;s &#8220;Library Data in a Modern Context&#8221; &#8212; the first chapter of<cite><a href="http://alatechsource.metapress.com/content/g212v1783607/" title="Understanding the Semantic Web: Bibliographic Data and Metadata - ALA TechSource">Understanding the Semantic Web: Bibliographic Data and Metadata</a></cite>.  Right at the start she has a clear and useful definition of this thing we call &#8220;metadata.&#8221;<br /><span id="more-1842"></span></p><blockquote><p>The most common definition of <i>metadata</i> is “data about data.” This short, catchy definition is worthy of a successful advertising campaign. Unfortunately, it doesn&#8217;t really help us understand metadata, and is actually somewhat incorrect. A more useful definition is decidedly less snappy, but can help us understand the helpful role that metadata can play in facilitating information access. In fact, a functional definition gives us a viable roadmap for our own studies of metadata utility and quality.</p><p>So here it goes—metadata is constructed, constructive, and actionable:</p><ul><li><b>Constructed:</b> Metadata is not found in nature. It is entirely an invention; it is an artificiality.</li><li><b>Constructive:</b> Metadata is constructed for some purpose, some activity, to solve some problem. The proliferation of metadata formats that seem similar on the surface is often evidence of different definitions of needs or of different contexts. We may dream of a universal set of metadata for some set of things, like biological entities, printed books, or a calendar of events, but are likely to be disappointed in practice.</li><li><b>Actionable:</b> The point of metadata is to be useful in some way. This means that it is important that one can act on the metadata in a way that satisfies some needs.<sup><a href="http://dltj.org/article/defining-metadata-accessibility/#footnote_0_1842" id="identifier_0_1842" class="footnote-link footnote-identifier-link" title="Coyle, Karen. &ldquo;Library Data in a Modern Context.&rdquo; Library Technology Reports 46.1 (2010): 5-13.">1</a></sup></li></ul></blockquote><p>A little further on Karen focuses on the actionablity of metadata.  I have a heightened awareness of the need for other-than-visual access to information based on the last few months of activity with my previous employer, so I reread this section with &#8220;new eyes&#8221; (so to speak):<br /><blockquote>&#8230;today&#8217;s metadata must be in a form that can be processed by computers, and the sense that it is “actionable” really needs to be interpreted as being “actionable by electronic machines.” Even when the final goal is to display the data to humans in an understandable form, the data will undergo some machine processing on the way to its destination on a screen [or] in printed form <strong style="font-style:italic">or when read aloud by a screen reader</strong>.</p></blockquote><p>I added that last part.  The transformation of the meaning of the metadata into a visual form is but one possible sensory input across the human-computer divide.  It is important to also design interfaces that bring meaning to data by supplying labels to values in alternate ways.  For the <a href="http://www.loc.gov/marc/bibliographic/bd300.html" title="MARC 21 Format for Bibliographic Data: 300: Physical Description">MARC 300 field</a>, it is the difference between &#8220;ix, 74 p. : ill. ; 23 cm&#8221; and &#8220;9 pages of introductory material followed by 74 numbered pages. Includes illustrations. 23 centimeters high.&#8221;  If the only way to transmit this information was auditory, which one of these would you like spoken to you?  Is it: &#8220;eye-ex, seventy four pee. ill. twenty three cem&#8221;?</p><p>Now let&#8217;s try to engineer that backwards.  Is the auditory version easier to do with:</p><div class="wp_syntax"><div class="code"><pre class="txt" style="font-family:monospace;">300    |aix, 74 p. :|bill. ;|c23 cm</pre></div></div><p>or something like this made-up, <a href="http://www.loc.gov/standards/mods/" title="Metadata Object Description Schema: MODS (Library of Congress)"><acronym title="Metadata Object Description Schema">MODS</acronym></a>-like markup:</p><div class="wp_syntax"><div class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;physicaldescription<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;extent<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;pagination<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
       <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;pages</span> <span style="color: #000066;">type</span>=<span style="color: #ff0000;">&quot;introductory&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>9<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/pages<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
       <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;pages</span> <span style="color: #000066;">type</span>=<span style="color: #ff0000;">&quot;numbered&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>74<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/pages<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/pagination<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;illustration</span> <span style="color: #000000; font-weight: bold;">/&gt;</span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;height</span> <span style="color: #000066;">unit</span>=<span style="color: #ff0000;">&quot;cm&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>23<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/height<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/extent<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/physicaldescription<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></div></div><p>With the second, we can produce something like the first &#8212; or even the abbreviated display version.  But it is considerably more difficult to create the auditory version from the first, particularly with the wide variation of punctuation encoding <acronym title="International Standard Bibliographic Description">ISBD</acronym> offers.  It just isn&#8217;t machine actionable, which makes it difficult to transform, reuse, and translate that data in another context.</p><p>I&#8217;m reminded too of <a href="http://bibwild.wordpress.com/2010/11/03/alcts-rda-presentation/" title="ALCTS RDA presentation &laquo; Bibliographic Wilderness">this recent quote from Jonathan Rochkind</a>:  &#8220;Of course, our legacy environment is even worse, with the ‘data model’ being supplied by an unholy combination of ISBD &#8230; and MARC&#8230;.&#8221;  It would be good to stop doing our data entry in the language of the computer (e.g. MARC).  Based on the chat from the first webinar in the series, we wouldn&#8217;t expect catalogers to type out the XML fragment above.  There should be computer-assisted workflows to capture the data and store it with all the required semantics.  That XML would be used for machine-to-machine communication and transformation into the output desired by the user &#8212; be it a short-hand visual display or an auditory reading of information.</p><h2>Footnotes</h2><ol class="footnotes"><li id="footnote_0_1842" class="footnote">Coyle, Karen. “Library Data in a Modern Context.” Library Technology Reports 46.1 (2010): 5-13.</li></ol>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/defining-metadata-accessibility/feed/</wfw:commentRss> <slash:comments>11</slash:comments> </item> <item><title>MARC isn&#8217;t Dead, but it is a Dead End</title><link>http://dltj.org/article/marc-as-dead-end/</link> <comments>http://dltj.org/article/marc-as-dead-end/#comments</comments> <pubDate>Fri, 29 Oct 2010 16:29:02 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[L/IS Profession]]></category> <category><![CDATA[AACR]]></category> <category><![CDATA[American Library Association]]></category> <category><![CDATA[Functional Requirements for Bibliographic Records]]></category> <category><![CDATA[Karen Coyle]]></category> <category><![CDATA[linked data]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[metadata]]></category> <category><![CDATA[Resource Description and Access]]></category> <category><![CDATA[semantic web]]></category><guid isPermaLink="false">http://dltj.org/?p=1823</guid> <description><![CDATA[This week I sat in on the first of the three &#8220;Using RDA: Moving into the Metadata Future&#8221; webinars being hosted by ALA. This one was hosted by Karen Coyle with the title New Models of Metadata where she talked &#8230; <a href="http://dltj.org/article/marc-as-dead-end/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=1823"></abbr><p>This week I sat in on the first of the three &#8220;<a href="http://www.alastore.ala.org/detail.aspx?ID=3125" title="Using RDA: Moving into the Metadata Future (A Three-part ALA TechSource Workshop)">Using RDA: Moving into the Metadata Future</a>&#8221; webinars being hosted by <acronym title="American Library Association">ALA</acronym>.  This one was hosted by <a href="http://kcoyle.net/" title="Karen Coyle's home page" rel="homepage">Karen Coyle</a> with the title <a href="http://www.alatechsource.org/blog/2010/10/continuing-the-conversation-new-models-of-metadata.html" title="Continuing the Conversation: New Models of Metadata | ALA TechSource">New Models of Metadata</a> where she talked about library-specific efforts such as<acronym title="Resource Description and Access"><a href="http://www.rdatoolkit.org/" title="RDA Toolkit">RDA</a></acronym> and <acronym title="Functional Requirement for Bibliographic Records"><a href="http://www.ifla.org/en/publications/functional-requirements-for-bibliographic-records" title="Functional Requirements for Bibliographic Records | IFLA">FRBR</a></acronym> as well as the <a href="http://linkeddata.org/" title="Linked Data - Connect Distributed Data across the Web">linked data</a> effort in the wider world of information.  There was a great deal of concern expressed in the chat window by participants about the future of cataloging, of cataloguers, and of <acronym title="MAchine-Readable Cataloging"><a href="http://www.loc.gov/marc/" title="MARC STANDARDS (Network Development and MARC Standards Office, Library of Congress)">MARC</a></acronym>.  The latter brought up memories of <a href="http://roytennant.com/professional.html" title="Roy Tennant: Professional Life">Roy Tennant</a>&#8216;s &#8220;<a href="http://www.libraryjournal.com/article/CA250046.html" title="MARC Must Die | Library Journal">MARC Must Die</a>&#8221; declaration.  My take away, though, isn&#8217;t that MARC is dead as much as MARC is a dead end.<br /><span id="more-1823"></span><br /><div id="attachment_1824" class="wp-caption alignright" style="width: 190px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://www.wfhowes.co.uk/catalogue/titles.php?&amp;t=4401" title="W. F. Howes Ltd (UK) - Audio Book &amp;amp; Large Print Publishers"><img src="http://cdn.dltj.org/wp-content/uploads/2010/10/Library-of-the-Dead-cover-art-180x300.jpg" alt="" title="&#039;Library of the Dead&#039; cover art" width="180" height="300" class="size-medium wp-image-1824" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Cover art from 'Library of the Dead' audio book</p></div><br /><h2>MARC, not dead yet?</h2><br />We know that MARC isn&#8217;t dead; the communications format, along with its <acronym title="Anglo-American Cataloguing Rules, Second Edition"><a href="http://www.aacr2.org/" title="AACR2">AACR2</a></acronym> companion rules for describing bibliographic resources, are deeply and daily ingrained in our systems and processes.  For the same reasons, I think it is fair to say that MARC isn&#8217;t dying.  (The fate of AACR2 with respect to RDA may be a little closer to the edge.)  What I propose, though, is that MARC is a dead end.  Karen makes a comment &#8212; <a href="http://www.alatechsource.org/blog/2010/10/continuing-the-conversation-new-models-of-metadata.html#comment-2803" title="Continuing the Conversation: New Models of Metadata | ALA TechSource">On the brokenness of MARC</a> &#8212; that starts to enumerate some of the basic issues with the MARC format.  (Karen&#8217;s <a href="http://www.kcoyle.net/marcdead.html" title="Is MARC Dead? by Karen Coyle">writings from 10 years ago</a> lists even more details.)  Also, as Karen pointed out in her presentation (and many others have done before her), MARC is a format that is only used in the library community.  As a communications format, it is cumbersome &#8212; requiring those outside the library community to use custom code toolkits to read and write the format.  That is a pretty high barrier for the wider world to want to use library bibliographic data encoded in MARC.</p><p>What trips up our community even more, I think, is that we have a tendency to equate this communications format with mental model of how we describe things from a bibliographic point of view.  We think of discrete records that describe these things rather than a network (or, more accurately, a <a href="http://en.wikipedia.org/wiki/Graph_theory" title="Graph theory - Wikipedia">graph</a>) of interrelated nodes.  This forces us to focus on the textual content of fields and not on the relationships between things.  And in doing so, we are not making the best use of our limited efforts to describe the things in our curatorial care.</p><p>MARC may not be dead, but it is a dead end.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/marc-as-dead-end/feed/</wfw:commentRss> <slash:comments>20</slash:comments> </item> <item><title>Thursday Threads: RDF, Digital Document Tampering, and Amazon&#8217;s Mechanical Turk</title><link>http://dltj.org/article/thursday-threads-2010w42/</link> <comments>http://dltj.org/article/thursday-threads-2010w42/#comments</comments> <pubDate>Thu, 21 Oct 2010 12:49:02 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Thursday Threads]]></category> <category><![CDATA[Amazon]]></category> <category><![CDATA[Amazon Mechanical Turk]]></category> <category><![CDATA[description]]></category> <category><![CDATA[Federal Library Depository Program]]></category> <category><![CDATA[government documents]]></category> <category><![CDATA[Jenn Riley]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[metadata]]></category> <category><![CDATA[ProPublica]]></category> <category><![CDATA[RDF]]></category> <category><![CDATA[semantic web]]></category><guid isPermaLink="false">http://dltj.org/?p=1746</guid> <description><![CDATA[Enter your email address to receive DLTJ Thursday Threads:Delivered by FeedBurnerThis is definitely becoming a habit&#8230;welcome to the fourth edition of DLTJ&#8216;s Thursday Threads. If you find these interesting and useful, you might want to add the Thursday Threads RSS &#8230; <a href="http://dltj.org/article/thursday-threads-2010w42/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=1746"></abbr><div id="feedburner-thursday-threads-email" class="wp-caption alignright" style="width: 310px;;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><form style="border:1px solid #ccc;padding:3px;text-align:center;" action="http://feedburner.google.com/fb/a/mailverify" method="post" target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=thursday-threads', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true"><p>Enter your email address to receive <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> Thursday Threads:</p><input type="text" style="width:140px" name="email"/><input type="hidden" value="thursday-threads" name="uri"/><input type="hidden" name="loc" value="en_US"/><input type="submit" value="Subscribe" /><p>Delivered by <a href="http://feedburner.google.com" target="_blank" title="Google Feedburner Service">FeedBurner</a></p></form></div><p>This is definitely becoming a habit&#8230;welcome to the fourth edition of <a href="http://dltj.org/category/thursday-threads/"><i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym>&#8216;s</i> Thursday Threads</a>.  If you find these interesting and useful, you might want to add the <a href="http://feeds.dltj.org/thursday-threads/">Thursday Threads RSS Feed</a> to your feed reader or subscribe to e-mail delivery using the form to the left.  If you would like a more raw and immediate version of these types of stories, watch <a href="http://friendfeed.com/dltj" title="Peter Murray - FriendFeed">my FriendFeed stream</a> (or subscribe to <a href="feed://friendfeed.com/dltj?format=atom" title="Peter Murray - FriendFeed - Atom Feed">its feed</a> in your feed reader).  Comments, as always, are welcome.<br /><span id="more-1746"></span><br /><h2>Defining Linked Data By Analogy</h2></p><blockquote><p>RDF is the grammar for a language of data.  URIs are the words of that language.  As in natural language, these words (i.e., the URIs) belong to grammatical categories.  RDF properties (such as &#8220;isReferencedBy&#8221;) function a bit like verbs, RDF classes like nouns.</p><p>As in natural languages, where utterances are meaningful only if they follow a sentence grammar, RDF statements follow a simple and consistent three-part grammar of subject, predicate, and object.  Analogously to paragraphs, RDF statements are aggregated into RDF graphs.</p></blockquote><p>This is a <a href="http://lists.w3.org/Archives/Public/public-lld/2010Oct/0088.html" title="Good grammar and proper footnotes for data from Thomas Baker on 2010-10-18 (public-lld@w3.org from October 2010)">posting from Thomas Baker</a> on the <acronym title="World Wide Web Consortium">W3C</acronym> Library Linked Data exploratory group mailing list. It compares <acronym title="Resource Description Framework">RDF</acronym> to natural languages using analogies of grammar, words, sentences, and paragraphs. I think this is a useful way to think about RDF and linked data, although as initial introduction to the topic, you might want to see the presentation below.</p><p><h2>RDF For Librarians presentation recording</h2></p><blockquote><p>The RDF model underlying Semantic Web technologies is frequently described as the future of structured metadata. Its adoption in libraries has been slow, however. This is due in no small part to fundamental differences in the modeling approach that RDF takes, representing a &#8220;bottom up&#8221; architecture where a description is distributed and can be made up of any features deemed necessary, whereas the record-centric approach taken by libraries tends to be more &#8220;top down&#8221; relying on prespecified feature sets that all should strive to make the best use of. This presentation will delve deeply into the differences between these two approaches to explore why the RDF approach has proven difficult for libraries, look at some RDF-based initiatives that are happening in libraries and how they are allowing different uses of this metadata than was previously possible, and pose some questions about how libraries might best.</p></blockquote><p>Jenn Riley gave this hour-long presentation to the Indiana University Digital Library Brown Bag earlier this month.  The URL to the slides synchronized to the audio recording is <a href="http://breeze.iu.edu/p48776227/" title="Digital Library Brown Bag, RDF for Librarians, 9/22/2010">http://breeze.iu.edu/p48776227/</a>.  The <a href="http://www.dlib.indiana.edu/education/brownbags/fall2010/rdf/rdf.pdf" title="Digital Library Brown Bag, RDF for Librarians presentation slides, 9/22/2010">presentation slides</a> and the <a href="http://www.dlib.indiana.edu/education/brownbags/fall2010/rdf/rdfhandout.pdf" title="Digital Library Brown Bag, RDF for Librarians presentation handout, 9/22/2010">handout</a> from the session are available as well.  I highly recommend spending an hour with this presentation to learn about how linked data compares and contrasts with MARC records. (via <a href="http://managemetadata.org/blog/2010/10/05/jenn-riley-on-rdf/" title="Jenn Riley on RDF | Metadata Matters">Diane Hillmann</a>)</p><p><h2>The Future of the Federal Depository Libraries</h2></p><blockquote><p>[ProPublica's Dafna] Linzer&#8217;s expose of government tampering with a court docket is an example of the problem on which the LOCKSS Program has been working for more than a decade, how to make the digital record resistant to tampering and other threats. The only reason this case was detected was because Linzer created and kept a copy of the information the government published, and this copy was not under their control. Maintaining copies under multiple independent administrations (i.e. not all under control of the original publisher) is a fundamental requirement for any scheme that can recover from tampering (and in practice from many other threats).</p></blockquote><p>David Rosenthal <a href="http://blog.dshr.org/2010/10/future-of-federal-depository-libraries.html" title="DSHR's Blog: The Future of the Federal Depository Libraries">summarizes</a> a story about how a published document from the U.S. government was changed and why we need highly-distributed copies of government documents to detect and recover from tampering.  There are big implications here for the future of government documents depository programs.</p><p><h2>ProPublica’s Guide to Mechanical Turk</h2></p><blockquote><p>Amazon Mechanical Turk – or mTurk – is an online marketplace, set up by the online shopping site Amazon, where anyone can hire workers to complete short, simple tasks over the Internet. Amazon originally developed it as an in-house tool, and commercialized it in 2005. The mTurk workforce now numbers more than 100,000 workers in 200 countries, according to Amazon. At ProPublica, we use it for tasks like collecting, reformatting, and de-duplicating data. This is a guide to journalists looking to use Mechanical Turk in their data projects. It’s meant for users who are already familiar with mTurk and are looking for ways to improve their results.</p></blockquote><p>Do you have repetitive digital conversion or analysis jobs that can be broken down into manageable-sized chunks?  ProPublica published <a href="http://www.propublica.org/article/propublicas-guide-to-mechanical-turk" title="ProPublica&amp;#8217;s Guide to Mechanical Turk - ProPublica">this guide</a> on using <a href="https://requester.mturk.com/mturk/resources" title="Amazon Mechanical Turk Resources">Amazon&#8217;s Mechanical Turk</a> service to outsource this activity.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/thursday-threads-2010w42/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Mashups of Bibliographic Data: A Report of the ALCTS Midwinter Forum</title><link>http://dltj.org/article/mashups-of-bib-data/</link> <comments>http://dltj.org/article/mashups-of-bib-data/#comments</comments> <pubDate>Wed, 27 Jan 2010 21:14:52 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Meeting]]></category> <category><![CDATA[ALA Midwinter Conference 2010]]></category> <category><![CDATA[Association for Library Collections and Technical Services]]></category> <category><![CDATA[Dewey Decimal Classification]]></category> <category><![CDATA[Google Book Search]]></category> <category><![CDATA[Internet Archive]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[OCLC]]></category> <category><![CDATA[onix]]></category> <category><![CDATA[Open Library]]></category> <category><![CDATA[WorldCat]]></category><guid isPermaLink="false">http://dltj.org/?p=1478</guid> <description><![CDATA[This year the ALCTS Forum at ALA Midwinter brought together three perspectives on massaging bibliographic data of various sorts in ways that use MARC, but where MARC is not the end goal. What do you get when you swirl MARC, &#8230; <a href="http://dltj.org/article/mashups-of-bib-data/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=1478"></abbr><p>This year the <a href="http://connect.ala.org/node/91406" title="ALCTS Forum: Mix and Match: Mashups of Bibliographic Data | ALA Connect"><acronym title="Association for Library Collections and Technical Services">ALCTS</acronym> Forum at <acronym title="American Library Association">ALA</acronym> Midwinter</a> brought together three perspectives on massaging bibliographic data of various sorts in ways that <em>use</em> <acronym title="Machine Readable Cataloging">MARC</acronym>, but where MARC is not the end goal.  What do you get when you swirl MARC, <acronym title="ONline Information eXchange">ONIX</acronym>, and various other formats of metadata in a big pot?  Three projects:  ONIX Enrichment at OCLC, the Open Library Project, and Google Book Search metadata.<br /><span id="more-1478"></span><br />Below is a summary of how these three projects are messin&#8217; with metadata, as told by the Forum panelists.  I also recommend reading Eric Hellman&#8217;s <a href="http://go-to-hellman.blogspot.com/2010/01/google-exposes-book-metadata-privates.html" title="Google Exposes Book Metadata Privates at ALA Forum | Go-to-Hellman">Google Exposes Book Metadata Privates at ALA Forum</a> for his recollection and views of the same meeting.</p><p><h2 id="post-1478-h2-OCLC-ONIX">ONIX Enrichment at OCLC</h2></p><p><span class="removed_link" title="http://www.oclc.org/speakers/bios/register_renee.htm">Renee Register</span>, Global Product Manager for OCLC Cataloging and Metadata Services, was the first to present on the panel.  Her talk looked at a new and evolving product at OCLC on the enhancement of ONIX records with WorldCat records, and vice versa. <sup><a href="http://dltj.org/article/mashups-of-bib-data/#footnote_0_1478" id="identifier_0_1478" class="footnote-link footnote-identifier-link" title="For those not familiar with ONIX, it is a suite of standards promulgated by EDItEUR for the interchange of information on books and serial publications.  It is primarily used as the communication channel between the publishing industry through distribution chains to retail establishments.">1</a></sup></p><p>As libraries, Renee said &#8220;our instincts are collaborative&#8221; but &#8220;our data and workflow silos encourage redundancy and inhibit interoperability.&#8221;  Beyond the obvious differences in metadata formats, the workflows of libraries differ dramatically from other metadata providers and consumers. In libraries (with the exception of <acronym title="Cataloging in Print">CIP</acronym> and brief on-order records) the major work of bibliographic production is performed at the end of the publication cycle and ends with the receipt of the published item.  In the publisher supply chain, bibliographic data evolves over time, usually beginning months before publication and continuing to grow for months and years (sales information, etc.) after publication.  Renee had a graphic showing the current flow of metadata around the broader bibliographic universe that highlighted the isolation of library activity relative to publisher, wholesaler, and retailer activity.</p><p><div id="attachment_1484" class="wp-caption alignright" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://www5.oclc.org/downloads/presentations/MDS4Pubs_August_Webinar_200908.ppt" title="Slides from Publisher Supply Chain Webinar, August 2009"><img src="http://cdn.dltj.org/wp-content/uploads/2010/01/ONIX-enhancement-300x225.jpg" alt="" title="Diagram of the Process of Enhancing ONIX Records" width="300" height="225" class="size-medium wp-image-1484" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Diagram of the Process of Enhancing ONIX Records, from OCLC Services for the Publisher Supply Chain Webinar, August 2009</p></div>Renee when on to describe a &#8220;next generation cataloging data flow&#8221; where OCLC facilitates the inclusion of publisher data into <a href="http://www.worldcat.org/" title="WorldCat homepage" rel="homepage">WorldCat</a> and enhances publisher data with information extracted from WorldCat.  To the right is a version of the graphic she used at Midwinter taken from an earlier presentation on the same topic.  It show ONIX-formatted metadata coming into WorldCat, being cross-walked and matched with existing MARC data in WorldCat, and finally extracted and cross-walked back to ONIX resulting in <a href="http://publishers.oclc.org/en/metadata/default.htm" title="OCLC Metadata Services for Publishers"> enhanced ONIX metadata</a> for publishers to use in their supply chain.  If there is an exact match for the incoming ONIX record in WorldCat, the WorldCat record is enhanced with certain fields from the ONIX record (descriptions, author biographies, web links) &#8212; being careful not to override authority work being done by libraries, but adding enhancements that libraries may not otherwise input.  In turn, enhancements from exact match record and FRBR work set records (hardcover versus softcover versus audiobook, etc.) are added to the ONIX record (non-English subject headings, adding a Dewey Decimal Classification (DDC) field from another similar record if one doesn&#8217;t already exist, change the author field to an authority-controlled version).  If there is not an exact match for the ONIX record in WorldCat, a new WorldCat record is built from the ONIX record and it is subsequently enhanced by metadata found in the FRBR work set records.  In doing so, we are &#8220;increasing the goodness of metadata in the marketplace,&#8221; as Renee put it in her presentation.  OCLC is also creating a mapping between <a href="http://www.bisg.org/what-we-do-20-73-bisac-subject-headings-2009-edition.php" title="Standards &amp; Best Practices | Classification Schemes | BISAC Subject Headings 2009 Edition | Book Industry Study Group">BISAC Subject Headings</a><sup><a href="http://dltj.org/article/mashups-of-bib-data/#footnote_1_1478" id="identifier_1_1478" class="footnote-link footnote-identifier-link" title="By the way, it seems like BISAC is an acronym for &amp;#8220;Book Industry Systems Advisory Committee&amp;#8221;, the former name of the Book Industry Study Group.">2</a></sup> and the DDC system.  This allows the enhancement of ONIX with suggestions of BISAC Subject Terms and the enhancement of WorldCat records with generic DDC fields given an incoming BISAC Subject Term value from the ONIX record.</p><p>In her experience, Renee said that libraries need ways to enable our metadata to evolve over time and allow for publisher-created metadata to merge effectively with library-created metadata.  The bibliographic record needs to be a &#8220;living, growing&#8221; thing throughout the lifecycle of a title and beyond.  In concluding her remarks, she offered several resources to explore for further information:  the OCLC/NISO study on <a href="http://www.niso.org/publications/white_papers/StreamlineBookMetadataWorkflowWhitePaper.pdf" title="Streamlining Book Metadata Workflow">Streamlining Book Metadata Workflow</a>, the U.K. Research Information Network report on <a href="http://rin.ac.uk/creating-catalogues" title="Creating Catalogues: Bibliographic Records in a Networked World">Creating Catalogues: Bibliographic Records in a Networked World</a>, the Library of Congress <a href="http://www.loc.gov/bibliographic-future/news/" title="News, Press Releases and Reports - Working Group on the Future of Bibliographic Control (Library of Congress)">Study of the North American MARC Records Marketplace</a>, the Library of Congress <a href="http://cip.loc.gov/onixpro.html" title="LC ONIX Pilot Project" class="broken_link" rel="nofollow">CIP/ONIX Pilot Project</a>, and the <a href="http://publishers.oclc.org/en/default.htm" title="OCLC Publisher Supply Chain Website">OCLC Publisher Supply Chain Website</a>.</p><p><h2 id="post-1478-h2-Open-Library">From MARC to Wiki with Open Library</h2><br />The second presenter on the panel was <a href="http://kcoyle.net/" rel="homepage" title="Karen Coyle's home page">Karen Coyle</a>, talking about the mashup of metadata at the <a href="http://openlibrary.org/" title="Open Library project homepage" rel="homepage">Open Library</a> project at the <a href="http://archive.org/" title="Internet Archive homepage" rel="homepage">Internet Archive</a>.  The slides from her presentation are <a href="http://kcoyle.net/presentations/ol_boston.pdf" title="Open Library - Mix and Match Metadata presentation slides [PDF]">available from her website</a>.</p><p>Karen said right at the start that the Open Library project is different from most of what happens in libraries &#8212; it is &#8220;someone outside the library world making use of library data&#8221; &#8212; although the goal is arguably the same as others &#8212; &#8220;<a href="http://openlibrary.org/about" title="About Us (Open Library)">One web page for every book ever published</a>.&#8221;  As such, the Open Library isn&#8217;t a library catalog as librarians think of it in that it is not a representation of a libraries inventory. It has metadata for every book it can know about and a pointer to places where the book can be found, including all of the electronic books in Internet Archive (<a href="http://www.opencontentalliance.org/" rel="homepage" title="Open Content Alliance (OCA)">Open Content Alliance</a>, Google Public Domain, etc.) as well as pointers back to OCLC WorldCat.  Karen&#8217;s role for the project is that of &#8220;Library Data Informant.&#8221; The Internet Archive decided that they needed someone who understood library data in order to try to use it.  From Karen&#8217;s perspective, she is trying to be a resource for project but not give them any guidance on how to implement the service.  She is curious to see what the project would do when bibliographic data is viewed from a non-librarian perspective.  If they have questions, or if they have assumptions about data that are wrong, then she intervenes.</p><p>Karen went on to briefly describe the Open Library system.  Open Library doesn&#8217;t have records; rather, it has field types and data properties.  In this way, it uses semantic web concepts.  &#8220;Author&#8221; is a type, &#8220;Author birthdate&#8221; is another type, and so forth.  There are no set field types, so if the project gets data from source for which a type doesn&#8217;t yet exist, it can create a new one.  Each type can have data properties such as string, boolean, text, link, etc.  Nothing is required and everything is repeatable.  Everything &#8212; types, properties, and values &#8212; gets a <acronym title="Uniform Resource Identifier">URI</acronym> (a URI is an identifier like a URL, but conceptually a superset of the universe of URLs).  Titles, authors, subjects, author birthdates, and so on have URIs.  Lastly, the underlying data structures are based on wiki principles: all edits are saved and viewable, anyone can edit any value, anyone can add new types or properties, anyone can develop their own displays, etc.</p><p>The data that is now in Open Library came from a variety of sources.  They started with a copy of books from the Library of Congress, and continue to receive the weekly updates. They performed a crawl of Amazon&#8217;s book data.  They have gotten some from publishers, libraries, and individual users.  The last is perhaps the most interesting because it is mainly people outside the western world who are otherwise having trouble getting their works recognized.</p><p><h3 id="post-1478-h3-Problems-Issues">Problems, Issues, Challenges, and Opportunities with the Data</h3><br />People who use library data without the biases or assumptions of librarians come up with interesting ways to view the data.  Karen described a few of them.</p><dl class="inlineClass"><dt>Names -</dt><dd>&#8220;These library forms of names? Honestly no one but us can stand them.&#8221;  Even something as simple as the form of last-name-comma-first-name is troublesome.  No one else uses this form of the name: Amazon, Wikipedia, etc.  In processing these, any information between parenthesis has been deleted, birth and death dates move into separate field types.</dd><dt>Titles -</dt><dd>In working with the Open Library developers, this is one place that Karen tried insisting on applying a library practice:  knowing the initial article.  For us, this is important for sorting books in alphabetical order.  The developer response &#8212; why do we have to sort in alphabetical order?  &#8220;Where else but library catalogs to we see things sorted in alphabetical order?  Not in Google, not in Amazon, not anywhere.  Alphabetical order is not in the mindset anymore.&#8221;  They also found that the title might include extraneous data.  Amazon, for instance, appends the series title in parenthesis to the main title.  This is a demonstration of how other communities are not as concerned about strongly typing and separating information into fields. Amazon, of course, has reasons for series information into the main title: it helps sell books.</dd><dt>Product dimensions -</dt><dd>Publishers and distributors need to know characteristics of an item such as height, width, depth, and weight; they, of course, need to put it in a box and ship it.  Libraries, concerned about placing the item on the shelf, record just height.  Recording pagination is different, too: libraries use odd notations &#8220;ill. (some col)&#8221; and &#8220;xv, 200p.&#8221; versus simply &#8220;200 pages.&#8221;</dd><dt>Birthdates -</dt><dd>Librarians use birthdates to distinguish names; if there is no need to distinguish a name, birth and death dates are not added.  Someone looking at this from the outside would ask &#8216;Why don&#8217;t all authors have birth and death dates?&#8217;  This can be useful information for viewing the context of an item, not just to distinguish author names.  Open Library ran author names against Wikipedia to pick up not only birth and death years, but also the actual dates.</dd><dt>Subject headings -</dt><dd>Open Library using Library of Congress Subject Headings was out of the question. In processing the data, the Open Library developers just broke them apart into segments and used them. But because they were able to do data mining on the subject field types, they did find statistical relationships between the disassembled precoordinated headings and were able to present those to the user.</dd><dt>The View of the Data -</dt><dd>Rather than a traditional library view of long lists of author-title, the Open Library (in its next version coming in February) will have several different views into the mass of data: Authors; Books (what we would call <acronym title="Functional Requirements for Bibliographic Records">FRBR</acronym> &#8216;manifestations&#8217;); Works; Subjects; and eventually places, publishers, etc.  For example, when searching for an author one would get the author page.  On it would be all of the works from the author as well as other biographical information.  It looks similar to a WorldCat identities page, except it is the actual user interface built into the system.  Similarly, every work will have a page, and at the bottom of it one will see all of the editions of the work.  Also, each subject will have a page, and one will see a list of works with that subject as well as authors who write on that subject.  As Karen said, &#8220;The subject itself becomes an object of interest in the database, not just something that is just tacked on to the bottom of the library record.&#8221;</dd><dt>Data mining -</dt><dd>With the data in this format, it is possible to perform data mining actions against it. For instance, simple data mining such as country of publication, popular places that appear, etc.  When they had the problem of author names &#8212; knowing when to reverse surname and forname &#8212; they ran the names against Amazon and Wikipedia and retained the ones where they found the order of the entry was the same. The Open Library developers are also experimenting with data mining to find publisher names.  Publisher names, of course, vary dramatically, but by using ISBN prefixes they can pull together related items into a &#8220;publisher&#8221; view.</dd></dl><p>Karen suggested watching the <a href="http://edwardbetts.com/ol/" title="Index of /ol">Edward Betts&#8217;s site</a>, one of the developers of the Open Library project with an eye on the data mining aspects.  She said it is fun to look at our data when it can be viewed from this different point-of-view.  She also said to watch out for a new version of the <a href="http://openlibrary.org/" title="Open Library (Open Library)">Open Library website</a> coming in February.</p><p><h2 id="post-1478-h2-Google-Book-Search-Metadata">Google Book Search Metadata</h2><br />The final presenter was <a href="http://www.google.com/profiles/kurt.groetsch" title="Kurt Groetsch's Google Profile">Kurt Groetsch</a>, Technical Collections Specialist at Google where he works to provide understanding and insight into library partner collections and the digitized books from Google.  Kurt said that &#8220;Google has been fairly circumspect over the years about what we do on the Book Search project.&#8221;  He said it was a bit of a cultural legacy from the rest of the company and also possibly an artifact of the copyright litigation, but he is hoping to change that.  His presentation looked at how Google works with book metadata from three vantage points &#8212; the inputs into Google&#8217;s system, parsing by Google&#8217;s algorithms, and analysis and output into the public interfaces.</p><p>On the input side, Google is getting bibliographic metadata from over 100 sources in a variety of formats. MARC records are coming from libraries, union catalogs, commercial providers (OCLC), publishers/retails (one publisher supplies records in MARC format).  Google also gets ONIX records from commercial providers (such as Ingram and Bowker), publishers, and retailers.  Google is especially interested in data from non-U.S. retailers because it is a source of information about books published outside the United States; it helps facilitate discovery of items that they may not otherwise encounter in the <a href="https://books.google.com/partner/">publisher</a> and <a href="http://www.google.com/googlebooks/library.html" title="Google Books Library Project">library</a> programs.  Google also receives records in a variety of &#8220;idiosyncratic formats&#8221; &#8212; for example, publisher-contributed metadata (via the Publisher Partner Program); information associating books with jacket images; name authority records (from LC); reviews; popularity signals (sales data as well as <a name="anonymized_circulation_data">anonymized circulation data</a> from some library partners, useful for feeding into the relevancy ranking algorithm); and internally-generated metadata (for instance, whether a book is commercially available or not).  Google processes all of this information to come up with a single record that describes a book.  At this point they have over 800 million bibliographic records and one trillion bits of information in those records.</p><p>All of these records from all of these sources are processed and remixed with Google&#8217;s parsing algorithms about twice a week.  The first step is to transform the incoming records into a &#8220;less verbose format&#8221; for storage and processing.  It is a SQL-like structure that allows elements of the metadata to be queried.  Records are then parsed to extract specific bits of information, transform the bits as necessary, and write the information to an internal &#8220;resolved records&#8221; data structure (a subset of the data coming from the input formats).  In the presentation, Kurt had examples of how making inferences from data coming from both MARC and ONIX can be troublesome.  Parsing also involves extracting &#8220;bibkeys&#8221; from the records to aid in matching across sources of data.  Four types of identifiers are extracted from bibliographic records: OCLC numbers, <acronym title="Library of Congress Control Numbers">LCCN</acronym>s, ISBNs, and ISSNs.  They provide usually useful signals when matching bibliographic and help with assertions that two records describe the same manifestation.  Google also tries to parse item data when present in records representing multi-volume works, enumeration and chronology.  They will also treat barcode as a form of a &#8220;bibkey&#8221; if they get it from a library.  The parsing algorithm will also split records containing multiple ISBNs representing different product forms (e.g. hardback, paperback, etc.).</p><p>With all of this data parsed into records, Google starts its clustering process where records are examined and attached to each other.  Bibkeys provide significant evidence for relating records to each other, but bibkeys are not always present in a record (non-U.S. records and older records frequently contain no bibkeys).  The algorithms then fall back on text similarity matching using title, subtitle, contributor and other fields such as publisher and publication year.  The results are clusters of records representing the same manifestation. An algorithm then attempts to derive the &#8220;best-of&#8221; record for a single cluster from all of the parsed input records.  This is done in a field-by-field voting process based on the trustworthiness of individual fields from record sources.</p><p>Kurt went into some of the challenges facing the team building the clustering and best-of record creation algorithms.  For instance, in dealing with multivolume works they know of 5 numbering schemas with 3 number types in 15 different languages.  Enumeration is now showing in the public display, but the development team is still working with unparsable item data due to inconsistent cataloging practices between institutions&#8230;and sometimes inconsistencies within an institution.  Another problem is non-unique identifiers. In the current data set ISBN 7899964709 is shared by 75 books and ISBN 7533305353 is associated with 1413 books. There are also poor quality or &#8220;junk records&#8221;.  Kurt said his favorite was &#8220;The Mosaic Navigator&#8221; by Sigmund Freud published in 1939.  These are hard to identify with an algorithm, and they rely on reports of problems that enable the developers to go in and &#8220;kill&#8221; the troublesome record.  Another example is a book by Virginia Woolf where the incoming record had conflicting information; it had two 260 fields that contained different dates (1961, correct, and 1900) with fixed field information that strongly suggested that 1900 was the single date of publication.  When the data problem is systematic, they can identify it and compensate for it.  Kurt&#8217;s example for this case was &#8220;The United States Since 1945&#8243; published in 1899.  This one was highlighted in <a href="http://chronicle.com/article/Googles-Book-Search-A/48245/" title="Google's Book Search: A Disaster for Scholars - The Chronicle Review - The Chronicle of Higher Education">Geoffrey Nunberg&#8217;s criticism of Google Books metadata</a>.  In this case, there was a source of metadata from Brazil that when they didn&#8217;t know the date of publication would use 1899.  When Google went back and looked at the date distribution of books there was a huge spike in 1899.  Once Google knew about it they were able to go in and kill that information from that source of records. <sup><a href="http://dltj.org/article/mashups-of-bib-data/#footnote_2_1478" id="identifier_2_1478" class="footnote-link footnote-identifier-link" title="A side note: Google isn&amp;#8217;t the only one tripped up by this.  If one searches for the ISBN of the item, 0195038487, you get to more than one site that has the same incorrect publication date.  At least Google is attempting to clean up the data!">3</a></sup></p><p>In closing, Kurt said that Google is committed to engaging with the library community on improving metadata and metadata processing.</p><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://www.niso.org/publications/white_papers/Stream lineBookMetadataWorkflowWhitePaper.pdf to http://www.niso.org/publications/white_papers/StreamlineBookMetadataWorkflowWhitePaper.pdf on January 19th, 2011.</p><p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://www.oclc.org/speakers/bios/register_renee.htm on February 11th, 2011.</p><h2>Footnotes</h2><ol class="footnotes"><li id="footnote_0_1478" class="footnote">For those not familiar with <a href="http://www.editeur.org/8/ONIX/" title="ONIX Overview">ONIX</a>, it is a suite of standards promulgated by <a href="http://www.editeur.org/" title="EDItEUR homepage" rel="homepage">EDItEUR</a> for the interchange of information on books and serial publications.  It is primarily used as the communication channel between the publishing industry through distribution chains to retail establishments.</li><li id="footnote_1_1478" class="footnote">By the way, it seems like BISAC is an acronym for &#8220;Book Industry Systems Advisory Committee&#8221;, the former name of the <a href="http://www.bisg.org/" title="Book Industry Study Group homepage" rel="homepage">Book Industry Study Group</a>.</li><li id="footnote_2_1478" class="footnote">A side note: Google isn&#8217;t the only one tripped up by this.  If one searches for the ISBN of the item, 0195038487, you get to <a href="http://www.biggerbooks.com/book/9780195038484" title="The United States Since 1945 at BiggerBooks.com -  Leuchtenburg, 9780195038484, History">more</a> <a href="http://www.chegg.com/details/the-united-states-since-1945/0195038487/" title="Chegg.com: The United States Since 1945 by Leuchtenburg">than</a> <a href="http://www.amazon.co.uk/The-United-States-Since-1945/dp/0195038487" title="The United States Since 1945: Amazon.co.uk: Books">one</a> site that has the same incorrect publication date.  At least Google is attempting to clean up the data!</li></ol>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/mashups-of-bib-data/feed/</wfw:commentRss> <slash:comments>23</slash:comments> </item> <item><title>Further Consideration of OCLC Records Use Policy</title><link>http://dltj.org/article/oclc-records-use-policy-2/</link> <comments>http://dltj.org/article/oclc-records-use-policy-2/#comments</comments> <pubDate>Thu, 29 Jan 2009 01:37:26 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[policy]]></category> <category><![CDATA[ALA Midwinter 2009]]></category> <category><![CDATA[Biblios]]></category> <category><![CDATA[copyright]]></category> <category><![CDATA[description]]></category> <category><![CDATA[Google Book Search]]></category> <category><![CDATA[MARC]]></category> <category><![CDATA[OCLC]]></category> <category><![CDATA[Open Library]]></category><guid isPermaLink="false">http://dltj.org/?p=701</guid> <description><![CDATA[At ALA Midwinter, ALCTS sponsored a panel discussion about sharing library-created data inside and outside the library community, with a particular focus on cataloging data. I was honored to be ask to speak on the topic from the perspective of &#8230; <a href="http://dltj.org/article/oclc-records-use-policy-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=701"></abbr><p>At ALA Midwinter, ALCTS sponsored a panel discussion about sharing library-created data inside and outside the library community, with a particular focus on cataloging data. I was honored to be ask to speak on the topic from the perspective of a consortial office. This is the second and final post in a series that represents an approximation of what I said on the panel.</p><p>The <a href="http://dltj.org/article/oclc-records-use-policy-1/">first part</a> examined the nature of surrogate records that we create as a means to get users to content.   The post looked at where we get records, how humans and machines can create them, and the rights associated with component data that makes up the records.</p><p><h2>Right to reuse records without restrictions</h2><br />One way to handle the clouded nature of surrogate ownership is to follow the lead of the &Dagger;biblios.net and the Open Library Project:  publish the surrogates with a public domain dedication or with an &ldquo;open data&rdquo; license.  This is going to become increasingly important as the variety of systems that use this kind of data evolve and in some cases move outside the library space.</p><p>The first area where is important is with discovery layers.  A new generation of discovery layers are taking surrogates from a variety of sources &ndash; catalogs, publishers, index/abstract services, etc. &ndash; and performing actions such as consolidating records and building relationships between surrogates.  These derived surrogates are presented to users in new interfaces or new portals into existing interfaces.  Examples of these systems are the <a href="http://www.extensiblecatalog.org/" title="About the eXtensible Catalog project">Extensible Catalog project</a> at the University of Rochester and the newly announced <a href="http://www.serialssolutions.com/summon/" title="Summon from Serials Solutions">Serials Solutions Summon product</a>.  OhioLINK recently <a href="http://dltj.org/article/discovery-layer-itn/">solicited responses from vendors</a> where this kind of capability is a key product of a new discovery layer.  Other projects (such as subject-specific portals) also seek to re-purpose the data &ndash; mix it up with other sources of data to create new uses and views that are specific to a particular user community.  Anything other than a permissive-for-all-by-default will put up roadblocks and cause builders of these systems to seek data from other services.</p><p>In addition to presenting the surrogates to users in new ways, libraries are also investigating new forms of workflow and collaborative activities surrounding the creation and maintenance of bibliographic records.  One of the strong desires of many involved in the <a href="http://oleproject.org/" title="The OLE Project homepage">OLE Project</a> is cooperative purchasing and cooperative technical services.  OhioLINK has also recently issued an RFI seeking new options for highly collaborative workflows in the maintenance of surrogate records.  Old models of charging for use of records can hinder the ability of cooperating institutions to optimize costs and efforts of back-room library options.</p><p>The elephant in the room is the recently proposed OCLC Records Use Policy.  Setting aside the debatable legal framework under which OCLC asserted the right to set a usage policy on records from the cooperative, there were clauses in the proposed policy that jeopardize the usability of records, and as a consequence the viability of the cooperative as a whole.  Actions that restrict use of data or create uncertainty around the use of data lessen the value of that data.  I think few would argue that value can be created by aggregating services on top of the data; the activities in the <a href="http://www.worldcat.org/devnet/wiki/Services" title="Services - WorldCat Developers&#039; Network">WorldCat Grid</a> and <a href="http://www.worldcat.org/devnet/index.php/Main_Page" title="Main Page - WorldCat Developers&#039; Network">Developer&rsquo;s Network</a> point to that.  Revenue could be generated in fees charged to non-cooperative members.  It is conceptually important to separate the hosting of the surrogates from the layered services on top of them &ndash; WorldCat Local, mediated ILL, collection analysis, by way of example.</p><p><h2>Parting thoughts</h2><br />OCLC was <a href="http://www.oclc.org/about/history/default.htm" title="History of OCLC">created forty years ago</a> based on the use of new technologies and relationships that technology enabled.  While we all want the cooperative to exist and flourish, it should not do so by engaging in activities that solely protect it.   Portions of the proposed policy appear to mandate that OCLC be in the middle of any exchange of records.  While one can appreciate the ability of a large web footprint like &ldquo;<a href="http://www.worldcat.org/" title="WorldCat homepage">worldcat.org</a>&rdquo; to drive traffic to local libraries, when it comes to sharing factual and non-factual data in surrogate records, being in the middle might not always be the most efficient way to make use of bibliographic data.  OhioLINK&rsquo;s efforts are based on a state mandate to be more efficient and effective for the users of higher education libraries in Ohio.  On balance, the rules of the cooperative cannot trump what might be in the best interests of the members.  Asserting the right to impose policy restrictions on records</p><p><h2>Post-panel Thoughts</h2><br />At the end of co-panelist Karen Calhoun&#8217;s remarks, she encouraged attendees to send comments to the Review Board of Shared Data Creation and Stewardship via the <a href="mailto:recorduse@oclc.org">recorduse@oclc.org</a> e-mail address.  I certainly encourage interested parties to do that, but to also find some way to post it in a public forum.  The <a href="http://wiki.code4lib.org/index.php/OCLC_Policy_Change" title="OCLC Policy Change - Code4Lib">discussion of the proposed policy</a> has been both spirited and informative.  Since this is a matter at the core of the cooperative, I don&#8217;t think the discussion should be limited to a one-way feed of information into the review board.  The discussion should also occur between us:  the members of the OCLC cooperative and community.  If you have a blog, post about it.  If not, consider <a href="http://lisnews.org/user/register" title="User account | LISNews">creating one at LISnews.org</a> and <span class="removed_link" title="http://lisnews.org/node/add/blog">post about it</span> there.  Or use mailing lists such as <a href="http://listserv.syr.edu/archives/autocat.html" title="Archives of AUTOCAT@LISTSERV.SYR.EDU">Autocat</a> and <a href="http://www.listserv.uga.edu/archives/radcat.html" title="Archives of RADCAT@LISTSERV.UGA.EDU">Radcat</a>.  OCLC already has a community forum platform &#8212; <a href="http://www.webjunction.org/home" title="WebJunction homepage">WebJunction</a> &#8212; and it would be good to see OCLC use that as a forum for public discussion.</p><p>Thanks to Charles Wilt, Executive Director of ALCTS, for inviting me to speak at the <a href="http://www.ala.org/ala/mgrps/divs/alcts/alcts.cfm" title="Association for Library Collections and Technical Services (ALCTS) homepage">ALCTS</a> Forum and to Karen Calhoun for facilitating the invitation.  My appreciation also goes out to my co-panelists:  Karen Calhoun (who has <a href="http://www.slideshare.net/amarintha/creating-and-sustaining-communities-around-shared-data-the-case-of-oclc-presentation" title="Creating and Sustaining Communities Around Shared Data: The Case of OCLC - SlideShare">posted her slides online</a>), <a href="http://everybodyslibraries.com/2009/01/28/open-catalog-apis-and-data-ala-presentation-notes-posted/" title="Open catalog APIs and data: ALA presentation notes posted &amp;laquo; Everybody&amp;#8217;s Libraries">John Mark Ockerbloom</a> (who also <a href="http://works.bepress.com/john_mark_ockerbloom/10/" title="Open records, open possibilities">posted his slides and approximate speech transcript</a>), and Brian Schottlaender (who eloquently summarized statements from the other panelists and took point in fielding questions from the audience).<p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://lisnews.org/node/add/blog on January 13th, 2011.</p><div class='series_links'><a href='http://dltj.org/article/oclc-records-use-policy-1/' title='Consideration of OCLC Records Use Policy'>Previous in series</a> <a href='http://dltj.org/article/guardian-correction/' title='Correction Added to Guardian Story on OCLC Record Use Policy'>Next in series</a></div>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/oclc-records-use-policy-2/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> </channel> </rss>
<!-- Served from: dltj.org @ 2012-02-11 09:12:49 by W3 Total Cache -->
