<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"><channel><title>Disruptive Library Technology Jester &#187; privacy</title> <atom:link href="http://dltj.org/tag/privacy/feed/" rel="self" type="application/rss+xml" /><link>http://dltj.org</link> <description>We&#039;re Disrupted, We&#039;re Librarians, and We&#039;re Not Going to Take It Anymore</description> <lastBuildDate>Fri, 18 May 2012 15:43:10 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <cloud domain='dltj.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' /> <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/us/</creativeCommons:license> <item><title>&#8220;The Challenges of User Consent&#8221; &#8212; Handling Shibboleth User Attributes</title><link>http://dltj.org/article/shibboleth-user-attributes/</link> <comments>http://dltj.org/article/shibboleth-user-attributes/#comments</comments> <pubDate>Fri, 06 May 2011 20:51:38 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Raw Technology]]></category> <category><![CDATA[privacy]]></category> <category><![CDATA[Shibboleth]]></category><guid isPermaLink="false">http://dltj.org/?p=2868</guid> <description><![CDATA[One of the great things about the Shibboleth inter-institution single sign-on software package is the ability for the Identity Provider to limit how much a Service Provider knows about a user&#8217;s request for service. (Not familiar with those capitalized terms? &#8230; <a href="http://dltj.org/article/shibboleth-user-attributes/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2868"></abbr><p>One of the great things about the <a href="http://shibboleth.internet2.edu/" title="Shibboleth homepage">Shibboleth</a> inter-institution single sign-on software package is the ability for the Identity Provider to limit how much a Service Provider knows about a user&#8217;s request for service.  (Not familiar with those capitalized terms?  Read on for definitions.)  But with this capability comes great flexibility, and with the flexibility can come lots of management overhead.  So I was intrigued to see the <a href="https://lists.internet2.edu/sympa/arc/shibboleth-announce/2011-04/msg00007.html" title="IAM Online May 11 - The Challenges of User Consent | shibboleth-announce mailing list">announcement</a> for an <a href="http://www.incommon.org/iamonline/" title="Identity and Access Management Online">online webinar</a> from the InCommon Shibboleth Federation with the title &#8220;The Challenges of User Consent&#8221; covering the issues of managing who gets access to what information about users.<br /><span id="more-2868"></span><br />From the webinar description:<br /><blockquote><p>Are you starting to see more requests from SPs seeking user attributes? Would you like to explore methods that would simplify the attribute release process? &nbsp;You aren’t alone. Campuses are seeking a scalable approach to managing attribute release that will minimize admin involvement and allow users to access sites like those that support collaborative work and want such attributes as EPPN, name, and email.</p><p>Automating the user consent procedure, combined with metadata-driven attribute release, provides an approach that greatly simplifies this process for all parties, and allows users to reach sites without delay.</p><p>Join us for a discussion and demonstration from Brown University and the University of Southern California.</p><p><strong>Host/Moderator: </strong>Tom Barton, University of Chicago and InCommon Technical Advisory Comittee</p><p><strong>Presenters:<br /> Steven Carmody</strong>, Brown University and InCommon TAC<br /><strong>Russ Beall</strong>, University of Southern California></p></blockquote><p>Lots more abbreviations and technical terms there, so here is a short primer:</p><dl><dt>Service Provider (SP)</dt><dd>A web server protected by Shibboleth that a user wants to access.</dd><dt>Identity Provider (IdP)</dt><dd>A web server that can authenticate a user (determine who the user is, typically with username/password) and store User Attributes.</dd><dt>User Attributes</dt><dd>Data about a user, including name, email address, affiliation status (student, employee, faculty, etc.), eduPersonPrincipalName, and TargetedIDs.</dd><dt>eduPersonPrincipalName (EPPN)</dt><dd>A string in the form of <i>user</i>@<i>domain</i> that uniquely identifies the user at an Identity Provider.  (<a href="http://www.incommonfederation.org/attributesummary.html#eduPersonPrincipal" title="Attribute Summary | InCommon">InCommon technical definition</a>)</dd><dt>TargetedID</dt><dd>An opaque string stored/generated by the Identity Provider that is unique to each user and Service Provider pair.  Passed as a User Attribute between the Identity Provider and the Service Provider, it can facilitate long-term user sessions at the Service Provider without revealing the identity of the user.</dd></dl><p>This is all stuff that as librarians we should be concerned about.  Arguably, a Service Provider should only have enough information to satisfy the demands of a license agreement, and in most cases those demands can be satisfied with an assertion that a user is of a proper affiliation with a library (e.g. &#8220;patron&#8221; or &#8220;student&#8221; or &#8220;employee&#8221; or simply &#8220;member&#8221;).  It is baked into the Shibboleth trust model that the Service Provider will honor the User Attributes presented by the Identity Provider.</p><p>What makes the announcement of this webinar interesting is that Service Providers seem to be asking for the non-opaque eduPersonPrincipalName attribute.  I&#8217;ve long thought that TargetedID &#8212; an opaque/random string shared between the Identity Provider and Service Provider &#8212; is a much better answer to enabling privacy for functions like marked-item-lists, relevance ranking based on user search history, and other services that are unique to an individual.  Because TargetedID doesn&#8217;t give away the person&#8217;s identity yet is guaranteed by the IdP to be unique to one person at one SP, it is ideal for situations when the SP doesn&#8217;t really need to know exactly <em>who</em> is making the request.  (Sure, if a user coming to an SP with a TargetedID then gives the SP his/her name or e-mail address, then that person is no longer anonymous but that was a choice the user made.)</p><p>So I&#8217;m planning on tuning in next Wednesday to get caugh up on what is happening with User Attributes in Shibboleth-land.  If you care about this kind of stuff, perhaps you can join me, too.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/shibboleth-user-attributes/feed/</wfw:commentRss> <slash:comments>6</slash:comments> </item> <item><title>Encryption of Patron Data in Modern Integrated Library Systems</title><link>http://dltj.org/article/ils-encryption/</link> <comments>http://dltj.org/article/ils-encryption/#comments</comments> <pubDate>Wed, 04 May 2011 00:30:24 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[L/IS Profession]]></category> <category><![CDATA[encryption]]></category> <category><![CDATA[integrated library system]]></category> <category><![CDATA[privacy]]></category> <category><![CDATA[security]]></category><guid isPermaLink="false">http://dltj.org/?p=2853</guid> <description><![CDATA[&#8220;How much effort do you want to spend securing your computer systems? Well, how much do you not want to be in front of a reporter&#8217;s microphone if a security breach happens?&#8221; I don&#8217;t remember the exact words, but that &#8230; <a href="http://dltj.org/article/ils-encryption/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2853"></abbr><p>&#8220;How much effort do you want to spend securing your computer systems?  Well, how much do you not want to be in front of a reporter&#8217;s microphone if a security breach happens?&#8221;  I don&#8217;t remember the exact words, but that quote strongly resembles something I said to a boss at a previous job.  Securing systems is unglamorous detail work.  One slip-up plus one persistent (or lucky) attacker means years of dedicated efforts are all for naught as personal information is inadvertently released.  See, for example, what happened recently with <a href="http://news.consumerreports.org/electronics/2011/05/sony-25-million-more-accounts-hacked-but-were-really-sorry.html" title="Sony: 25 million more accounts hacked, but we're really sorry | Consumer Reports">Sony Online Entertainment&#8217;s</a> recent troubles.</p><p>It was in that frame of mind that I responded to a series of questions from a librarian taking a computer science class.  (As someone else who straddles the computer-science/library-science divide, I wanted to encourage this line of thinking!)  Now library systems typically don&#8217;t have credit card information, so they may not be attractive to individuals that seek to expose or exploit personal information.  But our systems do have physical addresses, e-mail addresses, and sometimes birthdays or other personal data.  And we have a <a href="http://www.ala.org/ala/issuesadvocacy/intfreedom/librarybill/interpretations/privacy.cfm" title="An Interpretation of the Library Bill of Rights: Privacy | ALA">professional ethic to keep patron use information private</a>.</p><p>The person that sent me these questions asked that I not mention a name or affiliation, but that it was okay that I repost the questions along with my replies.  I&#8217;m hoping this encourages some discussion because my understanding of the use of encryption in ILS products is very narrow and only somewhat deep (and is getting shallower by the day as my direct experience is going on ten years old).</p><blockquote><p>Background on the project is that during our encryption unit, I realized that I didn&#8217;t know anything about what libraries to do back up our strongly stated policies about protecting patron privacy, so I wanted to find out more about it.</p><p>Questions:</p><ol type="1" start="1"><li>What encryption tools/standards, if any, are used to safeguard patron accounts (name, items checked out, databases accessed, etc.) at the library?</li><li>Where in the systems do these tools typically fit &#8212; at the ILS level, or somewhere else? (e.g., university ID systems)</li><li>How are circulation and other records expunged? I.e., are they permanently deleted in such a way that hard drive forensics couldn&#8217;t bring them back?</li></ol></blockquote><p>In my experience, this patron information is not encrypted in integrated library systems.  The difficulty is that if those bits of information are encrypted, they must be decrypted by the program in order to be useful (generating an overdue notice means the patron&#8217;s information must be known to the program, displaying the patron&#8217;s name on his/her account information screen, etc.).  And for programs to decrypt they must have the secret key.  And if the programs know the secret key it is trivial for an attacker to get the key as well.  And since good encryption, by its nature, is computationally &#8220;expensive&#8221; there would be a lot of system load with all of the encryption and decryption of bits of information.  (Computationally expensive is good because it makes it harder for an attacker to guess the correct key.)</p><div id="attachment_2856" class="wp-caption alignright" style="width: 458px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><object width="448" height="379" type="image/svg+xml" data="http://cdn.dltj.org/wp-content/uploads/2011/05/Password-Hashing.svg.gzip"><img src="http://cdn.dltj.org/wp-content/uploads/2011/05/Password-Hashing.png" alt="" title="Password Hashing Flowchart" class="size-full wp-image-2856" /></object><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Password Hashing Flowchart</p></div><p>Note that passwords are a special case.  Passwords are not really encrypted in a database; rather the output of a &#8220;one way hash&#8221; algorithm is stored.  When the user tries to log in, the same one way hash algorithm is applied to the text string entered as a password and if the output matches what is stored in the database the user is let in.</p><p>As the diagram shows, with the login attempts the hashed password is not decrypted; the output of the hash algorithm is compared to what is known to be the hashed password.</p><p>[Aside: I'm trying an experiment in this post.  The diagram is a Scalable Vector Graphic (SVG) file.  It seems to be showing up fine in the browsers I'm testing, but I have no idea how it will appear in the RSS feed or if you are using an RSS reader or receiving this post via <a href="http://feedburner.google.com/fb/a/mailverify?uri=DisruptiveLibraryTechnologyJester&amp;loc=en_US" title="FeedBurner Email Subscription">FeedBurner e-mail</a>.  If you don't see the graphic, try viewing the post via the <a href="http://dltj.org/article/ils-encryption/"><i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i> website.</a>]</p><p>The most effective encryption would be at the database management system layer.  For instance, Oracle has &#8220;<a href="http://www.oracle.com/technetwork/database/options/advanced-security/index-099011.html" title="Transparent Data Encryption | Oracle">Transparent Data Encryption</a>&#8221; feature.  &#8220;Data is automatically encrypted when it is written to disk and automatically decrypted when accessed by the application.&#8221;  Automatic encryption is not built into MySQL, but you can use a <a href="http://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_aes-encrypt" title="Encryption and Compression Functions | MySQL 5.5 Reference Manual">MySQL-specific function to encrypt a field</a>.  PostgreSQL has a <a href="http://www.postgresql.org/docs/current/static/pgcrypto.html" title="pgcrypto | PostgreSQL Documentation">contributed module</a> that performs the function.</p><p>Another option &#8212; other than database-level encryption &#8212; is to have the operating system encrypt the underlying filesystem (for example, the <a href="http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/filesysnew-efs.html" title="Encrypted File System | Red Hat documentation">Red Hat Encrypted Filesystem</a>).  That way all of the database storage files &#8212; stored in that filesystem directory &#8212; would be encrypted.</p><p>Note, though, that in any of these cases, the key is known to the computer somehow, and so it is possible for an attacker to recover the key and decrypt the data.  There are, of course, varying levels of obscurity one can apply to the key, but I think we&#8217;re getting pretty far off on a tangent.</p><p>How often circulation and other records would be expunged would depend on implementations in each software system, but as a general guideline I don&#8217;t think a strong deletion mechanism is used to obliterate data on the disk.  I&#8217;d be happy to be proven otherwise.  And as you consider hard drive forensics, also think about pulling the same information off backup tapes; that would probably be easier to get to.</p><p>In a follow-up, I was asked:</p><blockquote><p>WRT your response on Q2, do you have an idea of what level &#8220;most&#8221; or &#8220;some&#8221; libraries might have the encryption, or were you speaking purely from a view of what ideal/good situations might look like?</p><p>On 3, I have heard from a few others that there seems to be just deletion with no zeroing out features or the like and that it does take a period of time (1-2 months) for backup tapes to be overwritten. So it strikes me that the weakest link may be in the area we talk most about protecting.</p></blockquote><p>With regards to the database-level or the filesystem-level encryption, I was speaking from a point of view of what idea/good situations might look like.  One of the outcomes of posting these questions to a wider group of readers is, I hope, more real-world experience reports from people who might be running systems that actually do this.</p><p>Yes, I think those are weak links, with the backup tapes being the biggest problem.  One can&#8217;t predict when blocks on a live filesystem disk will be overwritten, but overwriting tapes is pretty predictable &#8212; and easy because one doesn&#8217;t need access to the live system.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/ils-encryption/feed/</wfw:commentRss> <slash:comments>10</slash:comments> </item> <item><title>Full Text of ARL SPEC Kit 278 on Library Patron Privacy Now Online</title><link>http://dltj.org/article/library-patron-privacy-fulltext/</link> <comments>http://dltj.org/article/library-patron-privacy-fulltext/#comments</comments> <pubDate>Mon, 02 May 2011 16:27:05 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[L/IS Profession]]></category> <category><![CDATA[Association of Research Libraries]]></category> <category><![CDATA[privacy]]></category><guid isPermaLink="false">http://dltj.org/?p=2839</guid> <description><![CDATA[Almost a decade ago while at the University of Connecticut I conducted a survey of ARL libraries on their patron privacy practices. The full text of that survey and ARL member responses are available from Google Books and from HathiTrust. &#8230; <a href="http://dltj.org/article/library-patron-privacy-fulltext/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=2839"></abbr><p>Almost a decade ago while at the University of Connecticut I conducted a survey of <abbr title="Association of Research Libraries">ARL</abbr> libraries on their patron privacy practices.  The full text of that survey and ARL member responses are available from <a href="http://books.google.com/books?id=d77gAAAAMAAJ&amp;printsec=frontcover&amp;source=gbs_atb" title="Library patron privacy: SPEC kit - Google Books">Google Books</a> and from <a href="http://catalog.hathitrust.org/Record/004725133" title="Library patron privacy : SPEC kit | Hathi Trust Digital Library">HathiTrust</a>.  Lee Anne George of ARL confirmed via e-mail that permission has been given for full view of SPEC Kits up through 2005 as well as other ARL publications.  Lee Anne said that there are over 400 titles now in full view.</p><p>This information is most likely of historical interest only &#8212; privacy on the internet has certainly moved from where it was eight years ago.  The survey was done right at the height of concern over the <a href="http://en.wikipedia.org/wiki/USA_PATRIOT_Act" title="USA PATRIOT Act | Wikipedia">USA PATRIOT Act</a> and when models like the <a href="http://www.truste.com/" title="Privacy Seals &amp; Services | Online Trust &amp; Safety from TRUSTe">TRUSTe</a> were gaining traction.  (I don&#8217;t think the need for explicit privacy policies and defined practices has gone away.  But any policy drafted on the interent the way it was eight years ago probably needs to be reviewed and updated.)  So I&#8217;m glad that ARL has decided to make this and similar studies openly available.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/library-patron-privacy-fulltext/feed/</wfw:commentRss> <slash:comments>6</slash:comments> </item> <item><title>Views on Sharing (or, What Do We Want From OCLC?)</title><link>http://dltj.org/article/views-on-sharing/</link> <comments>http://dltj.org/article/views-on-sharing/#comments</comments> <pubDate>Wed, 29 Sep 2010 01:51:16 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[L/IS Profession]]></category> <category><![CDATA[Carl Grant]]></category> <category><![CDATA[cooperatives]]></category> <category><![CDATA[discovery]]></category> <category><![CDATA[OCLC]]></category> <category><![CDATA[privacy]]></category> <category><![CDATA[SkyRiver/Innovative versus OCLC lawsuit]]></category> <category><![CDATA[WorldCat]]></category><guid isPermaLink="false">http://dltj.org/?p=1681</guid> <description><![CDATA[Within the span of a recent week we&#8217;ve had two views of the OCLC cooperative. In one we have a proposition that OCLC has gone astray from its core roots and in the other a celebration of what OCLC can &#8230; <a href="http://dltj.org/article/views-on-sharing/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=1681"></abbr><p>Within the span of a recent week we&#8217;ve had two views of the OCLC cooperative.  In one we have a proposition that OCLC has gone astray from its core roots and in the other a celebration of what OCLC can do.  One proposes a new mode of cooperation while the other extols the virtues of the existing cooperative.  Both writers claim &#8212; independently &#8212; to &#8220;talk to librarians&#8221; and represent the prevailing mood of the profession.  Can these two viewpoints be reconciled?</p><p><h2>&#8220;Too Many Cooks?&#8221;</h2><br />The pro-establishment view first.  In a <a href="http://community.oclc.org/cooperative/2010/09/too-many-cooks.html" title="Too many cooks? - The OCLC Cooperative Blog">post</a> by <a href="http://www.oclc.org/speakers/bios/nilges_chip.htm" title="William &#038;039;Chip&#038;039; Nilges [OCLC]">Chip Nilges</a> on the <a href="http://community.oclc.org/cooperative/" title="The OCLC Cooperative Blog: Insights and information from OCLC staff on topics that are fundamental to your cooperative.">OCLC Cooperative Blog</a>, we get the view that the backing of the wider librarian community is key to OCLC being able to <a href="http://www.oclc.org/news/releases/2010/201049.htm" title="H.W. Wilson databases indexed in WorldCat Local [OCLC]">negotiate with content vendors like H.W. Wilson</a>.  Chip&#8217;s &#8220;talk to librarians&#8221; quote is:<br /><blockquote>I spend quite a bit of time talking both to librarians and industry partners&#8211;publishers, booksellers, Web-technology providers, search engine companies&#8211;all kinds of people doing interesting things in our space. And in those talks, there is often a discussion of one of the following: content, technology or community. What I&#8217;ve come to realize, though, is that the best results come from places where all three come together.</p></blockquote><p> Chip&#8217;s post is short but clear in its view that the community of OCLC members is something special and that it adds value to member libraries.</p><p><h2>&#8220;The Cooperative We Need&#8221;</h2><br />The other perspective comes from <a href="http://www.exlibrisgroup.com/?catid=%7B795BD8B6-47DE-4722-8D5D-B664EEEFB34C%7D" title="Bio: Carl Grant">Carl Grant</a> in a <a href="http://commentary.exlibrisgroup.com/2010/09/cooperative-we-need-open-collaborative.html" title="The cooperative we need: Open &amp; Collaborative Library Content" class="broken_link" rel="nofollow">post</a> on his <a href="http://commentary.exlibrisgroup.com/" title="Commentary from Carl Grant" class="broken_link" rel="nofollow">Ex Libris blog</a>.  His thesis is that OCLC has an important role to play in adding value to bibliographic data, but that its motives are too intertwined with for-profit interests to carry out this role effectively.  Carl&#8217;s &#8220;talk to librarians&#8221; quote is:<br /><blockquote>It appears to me that the interests of the OCLC we know today do not appear to be in total alignment with the needs and interests of its overall actual membership. Perhaps they are in alignment with the interests of the Board, Council, and other governing and administrative arms, but the feeling I get in talks with librarians is that it is not in alignment with what they want. As I talk to librarians, across the country today, I hear that what they want is an organization, a cooperative that is focused on developing and providing open and collaborative library content and services that are widely accessible by all in order that they (the librarians) can focus on re-establishing and/or maintaining the value of libraries in our society.</p></blockquote><p> Carl goes on to propose the creation of a utility that aggregates the ratings and rankings of individual users into a database that can enhance the relevance ranking of the emerging generation of discovery layer products.</p><p><h2>My Thoughts</h2><br />This &#8220;talk to librarians&#8221; thread through the two posts makes me reflect on a question I asked earlier on <i><acronym title="Disruptive Library Technology Jester">DLTJ</acronym></i>: <a href="http://dltj.org/article/oclc-social-contract/" title="What Does It Mean to Be a Member of OCLC? | Disruptive Library Technology Jester">&#8220;What Does it Mean to be a Member of OCLC?&#8221;</a> Although I probably haven&#8217;t talked to nearly the number of librarians as Chip and Carl, in my discussions within the profession I still haven&#8217;t come to a resolution to this basic question.  That question itself is tied to another question coming through in the contrast between these two posts:  What Do We Want From OCLC?</p><p>Carl describes the problem in his post.  When a not-for-profit vendor acquires a significant number of for-profit companies (and spins them back out again), how can we (members, vendors, and the library community in general) understand how the mix of commercial and non-commercial interests are playing out at the management level?  Can the OCLC that is the bibliographic utility, the metadata <a href="http://orweblog.oclc.org/archives/001611.html" title="Platforming a library network: destination and switch - Lorcan Dempsey's Weblog">switch</a> between bibliographic-based services, and the <acronym title="Research and Development">R&#038;D</acronym> braintrust co-exist with the for-profit businesses, motivations, and operations?  Or, to put it more sharply, does the negotiation of H.W. Wilson content for use on the subscription-based WorldCat database hinder the evolution of discovery layers that being developed by companies that don&#8217;t have the tax-advantaged not-for-profit status?  (And don&#8217;t forget about the allegations of anti-competitive behavior in the <a href="http://www.librarytechnology.org/web/breeding/skyriver-vs-oclc/" title="http://www.librarytechnology.org/web/breeding/skyriver-vs-oclc/">SkyRiver/Innovative-versus-OCLC lawsuit</a>.)</p><p>In closing this section, I want to pull out and emphasize another quotation from Carl&#8217;s post:<br /><blockquote>In the end, all of these business initiatives, and now resulting lawsuit, strongly work against OCLC being able to do what it does best—building collaboration, content, and related services as a non-profit entity to serve the larger profession.</p></blockquote><p> Agreed.</p><p><h2>Carl&#8217;s Grand Idea</h2><br />What might get lost if you only closely read the first half of Carl&#8217;s post &#8212; as it initially did for me &#8212; is the second half where he describes the concept for enhancing WorldCat in a manner that benefits all&#8230;both library members and commercial entities.  He does this by noting that the &#8220;valuable points of open source software&#8221; can be applied &#8212; in a social media fashion &#8212; to a service that aggregates usage, ratings, and comments in a way that advances relevance ranking of discovery tools.  Now initially the mind swirls with concerns of privacy and informed user consent in gathering this data in one central pool.  I don&#8217;t think we know enough yet in the library community about building privacy-robust systems that meet an American librarian&#8217;s information privacy ethos.  But done right it also has the ability to build a reputation-based social feedback loop that adds important new information to the bibliographic utility.  And because of its better-when-bigger characteristic, only a neutral party like the not-for-profit OCLC cooperative could serve as an aggregator and distributor of this data.</p><p>I highly recommend reading <a href="http://commentary.exlibrisgroup.com/2010/09/cooperative-we-need-open-collaborative.html" title="The cooperative we need: Open &amp; Collaborative Library Content" class="broken_link" rel="nofollow">Carl&#8217;s post</a> and thinking about ways of answering the question &#8220;What Do We Want From OCLC?&#8221;  I commend Carl for his courage and vision in articulating his points and proposing something new for the profession to drive towards.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/views-on-sharing/feed/</wfw:commentRss> <slash:comments>11</slash:comments> </item> <item><title>Google Book Search Privacy, Orphan Works, and Monopoly</title><link>http://dltj.org/article/gbs-chronicle-highered/</link> <comments>http://dltj.org/article/gbs-chronicle-highered/#comments</comments> <pubDate>Mon, 29 Jun 2009 12:37:39 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[policy]]></category> <category><![CDATA[copyright]]></category> <category><![CDATA[orphan works]]></category> <category><![CDATA[privacy]]></category><guid isPermaLink="false">http://dltj.org/?p=1100</guid> <description><![CDATA[A few weeks ago, a reporter at the Chronicle of Higher Education interviewed Adam Smith, Google&#8217;s director of product management, about the Google Book Search settlement and posted the interview in audio form. The page isn&#8217;t dated, but guessing from &#8230; <a href="http://dltj.org/article/gbs-chronicle-highered/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=1100"></abbr><p>A few weeks ago, a reporter at the <i>Chronicle of Higher Education</i> interviewed Adam Smith, Google&#8217;s director of product management, about the Google Book Search settlement and <a href="http://chronicle.com/article/Audio-Whats-Next-for-Google/48349/" title="Audio: Adam Smith: What&#039;s Next for Google Book Search?  - Chronicle.com">posted the interview in audio form</a>.  The page isn&#8217;t dated, but guessing from metadata in the URL it was somewhere around the publication of paper issue dated June 26, 2009.  I&#8217;m calling out this particular interview because Mr. Smith said things that I hadn&#8217;t heard in other forms yet &#8212; Google&#8217;s intentions about privacy in Google Book Search, an explicit statement about the Book Rights Registry releasing information about the status of orphan works, and a statement on what Google expects the size of the orphan works problem to be once the Registry has been in operation for a while.<br /><span id="more-1100"></span><br />Below is a rough transcript of portions of the interview.  I&#8217;ve added emphasis in the transcript to the parts that I hadn&#8217;t heard Google representatives say before.</p><blockquote><p>Chronicle host:  There has been a lot of concern among librarians and in the library community about access and privacy.  Can you alay some of those fears?</p><p>Adam Smith:  There has been a lot of discussion about how this settlement affect things such as access and privacy, and what we are really looking at is creating a product that will be broadly accessible to the university community as well as the internet community generally. [...] I think with respect to privacy, Google hasn&#8217;t designed the product yet so it is hard to have a privacy policy for it, but <strong>we fully intend to have a policy that is consistent with a lot of the standard procedures in the library community today</strong>.  Things such as allowing authentication to happen via IP.  But we take privacy seriously and it will be consistent with Google&#8217;s privacy policy as well as have some specific provisions when we actually get down to designing the product.</p><p>Chronicle host:  There have been a lot of interest and concern in so called &#8220;orphan works&#8221; &#8212; where do those fit into the settlement and how do respond to some of the anxiety about that.</p><p>Adam Smith:  So there is no technical definition of &#8220;orphan works&#8221; but for the purposes here we&#8217;ll say a book for which no rightsholder exists.  Google&#8217;s mission in this is to really provide broad access to all of these books and when you look at the corpus as a whole, the percentage of books that are available &#8212; say &#8212; is about 20% are in the public domain or more, about 5% are kind of in print.  What that leaves is this center of books that are not in print but may be or may be not in copyright.  And what we believe is through the settlement agreement and the establishment of the Books Rights Registry, which is an author- and publisher-controlled entity that will try to track down the rights holders of the particular book, we believe that over time what will happen is that rightsholders will come forward to claim the money that was generated via the economic models and this will allow for better identification of the specific rightsholders to the works. <strong>And the Books Rights Registry has committed to making any information &#8212; or making the information about whether or not a book has been claimed &#8212; making that public so that someone who&#8217;s interested in making use of one of these potentially orphan works can understand as to whether or not a rightsholder has come forward for that particular book.</strong></p><p>[...]</p><p>Chronicle host: Another concern is maybe the one that Google encounters the most &#8212; is the question of monopoly.  And why we should be happy that the idea that a private company has essential control over 10 million plus works?</p><p>Adam Smith: So I think at its root what&#8217;s really important here is to look at the agreements.  And Google has non-exclusive agreements at the root of all of its agreements.  So, its agreements with its library partners are non-exclusive, its agreements with its publishers and authors are non-exclusive.  So anyone is free to enter into agreements with those institutions or those publishers.  With respect to the settlement agreement, for all works for which a rightsholder comes forward, the Books Rights Registry will have the ability to license or enter into economic models with other parties for those works. So really this is not an exclusive license to Google, but rather it&#8217;s establishing the ability for them to get access to these.  Obviously for the public domain works, there is no rights or contract associated with that. <strong>So what this really leaves is what we believe is a very thin slice of the remaining books, which are the orphan worked books</strong>.</p></blockquote><p>I&#8217;m glad to see some sensitivity to the notion of privacy in Mr. Smith&#8217;s response to that question.  The notion of privacy goes beyond using IP address authentication to enable institutional subscription users to access the scanned books, of course &#8212; specifically to the collection and disposition of log files related to individuals&#8217; use of the Google books database.  I wonder if Google will really consider severing the link between reader and work, as is common practice in libraries today.  In the case of online books, that would mean not collecting &#8212; or at least immediately anonymizing &#8212; the IP address of the machine used to read portions of the book.  Time will tell, and this is certainly an area where I hope there is more dialog between Google and academic libraries (should the settlement agreement be approved).</p><p>It is interesting that a Google representative is making statements about what the Books Rights Registry will do with orphan works information.  I would think it would be up to the registry&#8217;s board of directors to decide whether or not they publicly release information about the orphan status of a work.  I don&#8217;t recall reading in the settlement agreement that it would be mandatory.</p><p>Mr. Smith&#8217;s answer to the monopoly question ignores the &#8220;most favored nation&#8221; clause in the settlement agreement that says the Registry cannot offer licensing terms to another party that are more favorable than the ones offered to Google.  While that might not be a monopoly in the strictest sense, it certainly makes it harder for any other entity to compete effectively with Google.  That same answer also shows Google&#8217;s optimism in the estimate that there will be &#8220;a very thin slice&#8221; of works that will turn out to be orphans &#8212; in copyright but without an identified rightsholder.  I can only assume that they have internal research to back that up.  My gut tells me that there is considerably more than a thin slice, but that part of Mr. Smith&#8217;s answer plays well with the notion that Google won&#8217;t really have a monopoly because there will be so few books that Google will have the exclusive protections in the class action lawsuit settlement to digitize.</p><p>Adam Smith also has answers to questions about why Google didn&#8217;t fight it out in court, what Google is doing to help the settlement be approved, and what Google&#8217;s reaction might be if the settlement isn&#8217;t approved.</p><div class="zemanta-pixie"><img class="zemanta-pixie-img" src="http://cdn.dltj.org/wp-content/uploads/2009/06/pixy.gif?x-id=b601fcac-7c82-419c-9ec3-547cfa4cc95f" /><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://chronicle.com/media/audio/v55/i40/smith/ to http://chronicle.com/article/Audio-Whats-Next-for-Google/48349/ on January 20th, 2011.</p><div class='series_links'><a href='http://dltj.org/article/gbs-umich-amendment/' title='Interesting Bits in the Univ of Michigan Amendment to Google Book Search Agreement'>Previous in series</a> <a href='http://dltj.org/article/gbs-comments-due/' title='Comments on Google Book Search Settlement Coming to a Head (Again)'>Next in series</a></div>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/gbs-chronicle-highered/feed/</wfw:commentRss> <slash:comments>15</slash:comments> </item> <item><title>Clay Shirky on the Need for Better Information Filters</title><link>http://dltj.org/article/clay-shirky-on-information-filters/</link> <comments>http://dltj.org/article/clay-shirky-on-information-filters/#comments</comments> <pubDate>Mon, 06 Oct 2008 02:56:42 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Meta Category]]></category> <category><![CDATA[Clay Shirky]]></category> <category><![CDATA[filters]]></category> <category><![CDATA[privacy]]></category> <category><![CDATA[productivity]]></category> <category><![CDATA[publishing]]></category><guid isPermaLink="false">http://dltj.org/?p=528</guid> <description><![CDATA[Last month, Clay Shirky gave a presentation with the title &#8220;It&#8217;s Not Information Overload. It&#8217;s Filter Failure&#8221; at the Web 2.0 Expo. 1 Shirky admits up front at the start of the talk that the topic is something new that &#8230; <a href="http://dltj.org/article/clay-shirky-on-information-filters/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/?p=528"></abbr><p>Last month, <a href="http://shirky.com/" title="Clay Shirky's homepage">Clay Shirky</a> gave a presentation with the title &#8220;<a href="http://en.oreilly.com/webexny2008/public/schedule/detail/4817" title="It's Not Information Overload. It's Filter Failure.: Web 2.0 Expo New York 2008 - Co-produced by TechWeb &amp;amp; O'Reilly Conferences, September 16 - 19, 2008, New York, NY">It&#8217;s Not Information Overload.  It&#8217;s Filter Failure</a>&#8221; at the <a href="http://www.web2expo.com/" title="Web 2.0 Expo homepage">Web 2.0 Expo</a>. <sup><a href="http://dltj.org/article/clay-shirky-on-information-filters/#footnote_0_528" id="identifier_0_528" class="footnote-link footnote-identifier-link" title="Web 2.0 Expo, co-produced by TechWeb and O&amp;#8217;Reilly Media, &amp;#8220;is a global annual gathering of technical, design, marketing, and business professionals who are building the next generation web. Web 2.0 Expo features the most innovative and successful Internet industry figures and companies providing attendees with examples of business models, development paradigms, and design strategies to enable mainstream businesses and new arrivals to the Web 2.0 world to take advantage of this new generation of services and opportunities.&amp;#8221;">1</a></sup> Shirky admits up front at the start of the talk that the topic is something new that he is exploring, and as a result the ideas are not fully formed.  (I get lost in how the last of his three examples applies to the topic at hand, for instance.)  But his viewpoint is a refreshing way to look at the issue of &#8220;information overload&#8221; from a new perspective, and it is worth looking at even in this raw stage.  For starters, he says that we&#8217;ve been facing information overload for the past 500 years &#8212; since the introduction of the Gutenburg movable type press gave readers more books than they could possibly read.  What has changed in the last decade has been how past information &#8220;filters&#8221; are no longer effective.</p><div id="video_1" class="wp-caption aligncenter" style="width: 494px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;"><embed src="http://blip.tv/play/Ac6tVwA" type="application/x-shockwave-flash" width="480" height="390" allowscriptaccess="always" allowfullscreen="true"></embed><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Video of Clay Shirky&#8217;s talk at Web 2.0 Expo.  23 minutes, 51 seconds.</p></div><p>Shirky posits that the expense of printing a book made publishers both the creators of the object and filters for information printed in objects.  The relatively high up-front costs of producing the book meant publishers in the position of selecting only the best information to print.  Publishers were, in effect, a kind of filter of quality for the onslaught of information as a way of reducing their risks of printing content that no one would want to read.  The internet has driven the cost of publishing to near zero, and as such the &#8220;pre-publication&#8221; filter that publishers provided is no longer in place.  (He calls this &#8220;post-Gutenburg economics.)  In Shirky&#8217;s words, &#8220;the filter for quality is way downstream from the site of production.&#8221;</p><p>Shirky points to some examples of filters and talks about their effectiveness.  For inbound communication, the example is e-mail spam and how spam filters must be constantly tuned.  This is a pretty clear example of what he is talking about &#8212; the cost of production is cheap and the assessment of quality is done by the reader, not the producer.  The second example is one of outbound communication; Shirky tells the story of a colleague who attempted to use Facebook privacy settings to slowly disseminate the fact that she had broken up with her colleague.  (That isn&#8217;t what happened.  P.S.:  Karen Schneider &#8212; your name pops up briefly in one of Clay&#8217;s screenshots!)  The third example is that of a <a href="http://www.thestar.com/News/Canada/article/347442" title="Ryerson student won't be expelled | The Star">student that faced expulsion from a Canadian university because he started a Facebook homework group</a>.  Shirky&#8217;s point with this example seems to be that a filter-of-inconvenience was removed through the use of technology &#8212; that a study group of 147 students wouldn&#8217;t actually occur in real life but was replicated on Facebook.</p><p>Some other quotes that caught my ear:</p><ul type="disc"><li>&#8220;Managing your privacy practices is an unnatural act&#8230;  Privacy is a way of managing information flow&#8230;  The big question we&#8217;re facing around privacy now is that were not moving from one engineered system to another engineered system with different characteristics.  We&#8217;re moving from an evolved system to an engineered system.&#8221;</li><li>&#8220;The inefficiency of information flow wasn&#8217;t a bug, it was a feature.  That&#8217;s what privacy was.&#8221;</li><li>&#8220;What the internet does is allows large systems that are free-rider tolerant rather than free-rider resistant.&#8221;</li><li>&#8220;It really is about rethinking the [higher education] institutional model.  You have to have group conversation.  You have to have individual effort.  You have to design a system that accommodates both.&#8221;</li><li>&#8220;If you have the same problem for a long time, maybe it&#8217;s not a problem.  Maybe it is a fact.&#8221;  &#8211;Yitzhak Rabin</li></ul><h2>Footnotes</h2><ol class="footnotes"><li id="footnote_0_528" class="footnote">Web 2.0 Expo, co-produced by <a href="http://www.techweb.com/" title="TechWeb homepage">TechWeb</a> and <a href="http://www.oreilly.com/" title="O'Reilly homepage">O&#8217;Reilly Media</a>, &#8220;is a global annual gathering of technical, design, marketing, and business professionals who are building the next generation web. Web 2.0 Expo features the most innovative and successful Internet industry figures and companies providing attendees with examples of business models, development paradigms, and design strategies to enable mainstream businesses and new arrivals to the Web 2.0 world to take advantage of this new generation of services and opportunities.&#8221;</li></ol>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/clay-shirky-on-information-filters/feed/</wfw:commentRss> <slash:comments>12</slash:comments> </item> <item><title>&#8220;Everyone&#8217;s Guide to By-Passing Internet Censorship for Citizens Worldwide&#8221;</title><link>http://dltj.org/article/bypassing-internet-censorship/</link> <comments>http://dltj.org/article/bypassing-internet-censorship/#comments</comments> <pubDate>Thu, 18 Oct 2007 20:36:17 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Raw Technology]]></category> <category><![CDATA[encryption]]></category> <category><![CDATA[privacy]]></category> <category><![CDATA[system administration]]></category> <category><![CDATA[tor]]></category><guid isPermaLink="false">http://dltj.org/2007/10/bypassing-internet-censorship/</guid> <description><![CDATA[The title of this post is the same as the report it describes, Everyone&#8217;s Guide to By-Passing Internet Censorship for Citizens Worldwide [PDF]. It was announced by Ronald Deibert last week on his blog at Citizen Lab. The one sentence &#8230; <a href="http://dltj.org/article/bypassing-internet-censorship/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/2007/10/bypassing-internet-censorship/"></abbr><p><a href="http://www.nartv.org/mirror/circ_guide.pdf" title="Cover of “Everyone’s Guide to By-Passing Internet Censorship for Citizens Worldwide”"><img src="http://cdn.dltj.org/wp-content/uploads/2007/10/circ_guide.jpg" alt="Cover of “Everyone’s Guide to By-Passing Internet Censorship for Citizens Worldwide”" style="width: 25%; border-right: 2px solid gray; border-bottom: 2px solid gray; margin: 0 0 1.5em 2em; float: right;" /></a>The title of this post is the same as the report it describes, <a href="http://www.nartv.org/mirror/circ_guide.pdf" title="Full text of report: &#039;Everyone&#039;s Guide to By-Passing Internet Censorship for Citizens Worldwide&#039;">Everyone&#8217;s Guide to By-Passing Internet Censorship for Citizens Worldwide</a> [PDF].  It was <a href="http://deibert.citizenlab.org/2007/10/everyones-guide-to-by-passing-internet-censorship-for-citizens-worldwide-new-release/" title="Blog post: Everyone&#039;s Guide to By-Passing Internet Censorship for Citizens Worldwide">announced by Ronald Deibert</a> last week on his blog at Citizen Lab.  The one sentence synopsis goes like this:  &#8220;This guide is meant to introduce non-technical users to Internet censorship circumvention technologies, and help them choose which of them best suits their circumstances and needs.&#8221;</p><p>Although the stated audience is non-technical users, I found the description of techniques and circumstances under which one might deploy the techniques very interesting.  The document provides guidance for those seeking circumvention and those who want to provide it.  After a brief introduction to censorship activities worldwide (including in the United States), it walks the reader through an analysis of needs and describes solutions that meet the needs based on the user&#8217;s technical skills.  I knew &#8216;<a href="http://tor.eff.org/" title="Tor: anonymity online">tor</a>&#8216; &#8212; a long-time favorite of mine &#8212; would be in there, but I was surprised by the range of other options.</p><p>To put a library spin on the report, some of the solutions offered are usable on &#8220;public computers&#8221; &#8212; such as, say, what one might find in a library.  One could take the report and read about the techniques with the intent to block them on your public workstations, but I think another reading of it would say that such attempts are ultimately futile because of the likelihood of other similar services popping up to take their place.  Unless you are running a white-list-only setup (that is to say, your public workstations are explicitly set to <em>only</em> allow access to a prescribed set of sites), any user can walk up to any public workstation and access the circumvention sites described in the report or any other ones that spring into existence.</p><p>The circumvention techniques are, of course, do not provide an assurance of privacy.  Even though the network traffic is encrypted, the activities of the user can still be monitored by keystroke loggers and other techniques in the workstation itself.  In order to get around that, one would need to restart the public workstation with a <span class="removed_link" title="http://www.tech-faq.com/bootable-linux-distributions.shtml">bootable Linux distribution</span>, but that is perhaps a report for another time&#8230;<p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://deibert.citizenlab.org/Circ_guide.pdf to http://www.nartv.org/mirror/circ_guide.pdf on January 28th, 2011.</p><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://deibert.citizenlab.org/Circ_guide.pdf to http://www.nartv.org/mirror/circ_guide.pdf on January 28th, 2011.</p><p style="padding:0;margin:0;font-style:italic;">The text was modified to update a link from http://deibert.citizenlab.org/blog/_archives/2007/10/10/3282831.html to http://deibert.citizenlab.org/2007/10/everyones-guide-to-by-passing-internet-censorship-for-citizens-worldwide-new-release/ on January 28th, 2011.</p><p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://www.tech-faq.com/bootable-linux-distributions.shtml on January 28th, 2011.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/bypassing-internet-censorship/feed/</wfw:commentRss> <slash:comments>5</slash:comments> </item> <item><title>On the Internet, nobody knows you’re a dog.  But we can tell if you are a major news organization or corporation.</title><link>http://dltj.org/article/wikipedia-credibility/</link> <comments>http://dltj.org/article/wikipedia-credibility/#comments</comments> <pubDate>Wed, 15 Aug 2007 15:16:19 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[policy]]></category> <category><![CDATA[culture]]></category> <category><![CDATA[privacy]]></category> <category><![CDATA[Wikipedia]]></category><guid isPermaLink="false">http://dltj.org/2007/08/wikipedia-credibility/</guid> <description><![CDATA[Published in The New Yorker July 5, 1993.Image from The Cartoon Bank As the saying, now a part of Internet lore, goes: &#8220;On the Internet, nobody knows you&#8217;re a dog.&#8221; That may be true, but now we must add: &#8220;But &#8230; <a href="http://dltj.org/article/wikipedia-credibility/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/2007/08/wikipedia-credibility/"></abbr><div style="width:300px; font-size:85%; float: right; padding: 0 0 1.5em 2em;"><img src="http://www.cartoonbank.com/assets/1/22230_m.gif" alt="Illustration of a dog, sitting at a computer terminal, talking to another dog.  Includes caption: “On the Internet, nobody knows you’re a dog.”" />Published in <i>The New Yorker</i> July 5, 1993.<br />Image from <a href="http://www.cartoonbank.com/item/22230" title="Peter Steiner : &amp;#8220;On the Internet, nobody knows you&amp;#8217;re a dog.&amp;#8221; - Cartoonbank.com">The Cartoon Bank</a></div><p> As the saying, <a href="http://query.nytimes.com/gst/fullpage.html?res=9F00E7DE113FF937A25751C1A9669C8B63&#038;sec=&#038;spon=&#038;partner=permalink&#038;exprod=permalink" title="Cartoon Captures Spirit of the Internet  - New York Times">now a part of Internet lore</a>, goes:  &#8220;On the Internet, nobody knows you&#8217;re a dog.&#8221;  That may be true, but now we must add: &#8220;But we do know if you are from a major news organization or corporation.&#8221;</p><p>Wired magazine <a href="http://www.wired.com/politics/onlinerights/news/2007/08/wiki_tracker?currentPage=all" title="Wired News:  See Who&#039;s Editing Wikipedia - Diebold, the CIA, a Campaign">reports</a> on the efforts of <a href="http://virgil.gr" title="VIRGIL.GRiffith">Virgil</a> Griffith to <a href="http://wikiscanner.virgil.gr/" title="List anonymous wikipedia edits from interesting organizations">expose the source of anonymous edits to Wikipedia</a>.  In Virgil&#8217;s words, &#8220;I came up with the idea when I heard about Congressmen getting caught for white-washing their wikipedia pages.&#8221;  So he created a searchable database of anonymous edits to Wikipedia pages indexed by the IP address of the computer that made the edit.  By cross-referencing those edits with the database of IP addresses assigned to organizations, one can speculate with some certainty about who made the edit &#8212; or at least the organization responsible for the IP address of the person who made the edit.  There is a list of interesting examples of wikiscanner results along the right side of the <a href="http://wikiscanner.virgil.gr/" title="List anonymous wikipedia edits from interesting organizations">wikiscanner homepage</a>, and Wired Magazine is inviting users to <a href="http://blog.wired.com/27bstroke6/wikiwatch/" title="List of interesting Wikipedia edits on Wired Blogs">submit interesting examples</a> as well.</p><p>This is an interesting project, but it is not without faults.  First, non-anonymous edits &#8212; that is, when a user signs into Wikipedia &#8212; are not tracked.  Since account registrations are free and not tied to a particular IP address, edits by an organization can be &#8220;masked&#8221; behind a slew of pseudo-anonymous accounts.  (For instance, the <span class="removed_link" title="http://wikiscanner.virgil.gr/f.php?ip1=192.153.30.0-255&amp;ip2=&amp;ip3=&amp;ip4=">list of edits for the range of IP addresses assigned to the OhioLINK central offices</span> does not include <a href="http://en.wikipedia.org/w/index.php?limit=50&#038;title=Special%3AContributions&#038;contribs=user&#038;target=DataGazetteer&#038;namespace=0&#038;year=&#038;month=-1" title="Wikipedia edits for Peter Murray">edits made when I was logged into to my Wikipedia account</a>.)  Second, Virgil is presumably using a recent snapshot of the IP assignments database.  Since IP address assignments can change over time, a current assignee could be implicated by the changes of the previous owner.  Third, the whole system can be thwarted by <a href="http://tor.eff.org/" title="Tor: anonymity online (homepage)">systems</a> and services that mask the IP address of the machine being used.  So the credibility of the anonymous edits database is about the same as that of Wikipedia itself &#8212; good enough for most uses, but not extremely high.<p style="padding:0;margin:0;font-style:italic;" class="removed_link">The text was modified to remove a link to http://wikiscanner.virgil.gr/f.php?ip1=192.153.30.0-255&#038;ip2=&#038;ip3=&#038;ip4= on January 19th, 2011.</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/wikipedia-credibility/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> </channel> </rss>
<!-- Served from: dltj.org @ 2012-05-24 15:49:31 by W3 Total Cache -->
