<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"><channel><title>Disruptive Library Technology Jester &#187; Directory of Open Access Journals</title> <atom:link href="http://dltj.org/tag/doaj/feed/" rel="self" type="application/rss+xml" /><link>http://dltj.org</link> <description>We&#039;re Disrupted, We&#039;re Librarians, and We&#039;re Not Going to Take It Anymore</description> <lastBuildDate>Mon, 06 Feb 2012 20:04:22 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <cloud domain='dltj.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' /> <creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/us/</creativeCommons:license> <item><title>Analysis of Google Scholar and Google Books</title><link>http://dltj.org/article/google-scholar-and-books/</link> <comments>http://dltj.org/article/google-scholar-and-books/#comments</comments> <pubDate>Wed, 15 Aug 2007 20:34:29 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Raw Technology]]></category> <category><![CDATA[Directory of Open Access Journals]]></category> <category><![CDATA[ejournal]]></category> <category><![CDATA[Google]]></category> <category><![CDATA[Google Book Search]]></category> <category><![CDATA[Google Scholar]]></category> <category><![CDATA[publishing]]></category><guid isPermaLink="false">http://dltj.org/2007/08/google-scholar-and-books/</guid> <description><![CDATA[Two papers were published recently exploring the quality of Google Scholar and Google Books.Google ScholarPhilipp Mayr and Anne-Kathrin Walter, both of GESIS / Social Science Information Center in Bonn, Germany, uploaded an article to arXiv called &#8220;An exploratory study of &#8230; <a href="http://dltj.org/article/google-scholar-and-books/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/2007/08/google-scholar-and-books/"></abbr><p>Two papers were published recently exploring the quality of <a href="http://scholar.google.com/" title="Google Scholar homepage">Google Scholar</a> and <a href="http://books.google.com/" title="Google Book Search homepage">Google Books</a>.</p><p><br clear="all" /><h2>Google Scholar</h2><br />Philipp Mayr and Anne-Kathrin Walter, both of GESIS / Social Science Information Center in Bonn, Germany, uploaded an article to arXiv called &#8220;<a href="http://arxiv.org/abs/0707.3575" title="arXiv abstract page for &#039;An exploratory study of Google Scholar&#039;">An exploratory study of Google Scholar</a>.&#8221; <sup><a href="http://dltj.org/article/google-scholar-and-books/#footnote_0_275" id="identifier_0_275" class="footnote-link footnote-identifier-link" title="Judging from the citation listed on Philipp Mayr&amp;#8217;s homepage, the article will appear in an upcoming issue of Online Information Review from Emerald Group Publishing.">1</a></sup> Originally created as a presentation for a 2005 conference, it was updated in January 2007 to reflect new findings and published as a paper.  Excerpts from the abstract include:<br /><blockquote>The study shows deficiencies in the coverage and up-to-dateness of the [Google Scholar] index. Furthermore, the study points up which web servers are the most important data providers for this search service and which information sources are highly represented. We can show that there is a relatively large gap in Google Scholar’s coverage of German literature as well as weaknesses in the accessibility of Open Access content. Major commercial academic publishers are currently the main data providers.</p><p>We conclude that Google Scholar has some interesting pros (such as citation analysis and free materials) but the service can not be seen as a substitute for the use of special abstracting and indexing databases and library catalogues due to various weaknesses (such as transparency, coverage and up-to-dateness).</p></blockquote><p>The authors performed a &#8220;brute force analysis&#8221; (their words) of the coverage of Google Scholar by comparing search results by journal title with five journal lists:  ISI Arts &#038; Humanities Citation Index, ISI Social Science Citation Index, ISI Science Citation Index, open access journals listed by <abbr title="Directory of Open Access Journals">DOAJ</abbr>, and journals from the SOLIS database (mainly German-language journals from sociological disciplines).  They queried Google Scholar using the &#8220;Return articles published in&#8230;&#8221; limiter on the advanced search screen, downloaded the first 100 records for each title, then parsed and analyzed each of the records.  In total, 621,000 records from Google Scholar search results were analyzed.</p><p><img src="http://cdn.dltj.org/wp-content/uploads/2007/08/IdentificationOfJournals.png" alt="Number of Articles Found in Google Scholar by Title List" title="Number of Articles Found in Google Scholar by Title List" align="right" width="431" height="291" border="0" style="padding: 0 0 1.5em 2em;" />The authors first determined the coverage of titles in the five journal lists in the Google Scholar database.  The authors note surprise at the relative lack of coverage for open access titles listed in the DOAJ.  I think this can be explained by the fact that many open access publishers are not using a systematic application to put their content on the internet.  Of the 2,804 journals in the DOAJ directory, only 846 are searchable via DOAJ&#8217;s own article-level indexing service.<sup><a href="http://dltj.org/article/google-scholar-and-books/#footnote_1_275" id="identifier_1_275" class="footnote-link footnote-identifier-link" title="Numbers from the DOAJ home page, as of 15-Aug-2007.">2</a></sup> If the journals can&#8217;t be easily harvested at the article level, then they Google can&#8217;t add them to the Scholar article index.</p><p><br clear="all" /><img src="http://cdn.dltj.org/wp-content/uploads/2007/08/DistributionOfDocumentTypes.png" alt="Distribution of Document Types Among the Lists Queried" title="Distribution of Document Types Among the Lists Queried" align="right" width="430" height="291" border="0" style="padding: 0 0 1.5em 2em;" />Based on the semantics provided in each record, the authors divided the results into three categories (referred to in the paper as &#8220;document types&#8221;):  links to complete descriptive records on an external (publisher&#8217;s or aggregator&#8217;s) site, citation-only records (no full-text and no link to more complete information at an external site), and direct access links to full text.  The distribution of results is shown in the table to the right.</p><p>The paper also includes an analysis of the various publisher and portal sites that supply information to Google Scholar&#8217;s index.</p><p><h2>Google Books</h2><br />The August issue of First Monday contains an article by Paul Duguid called &#8220;<a href="http://www.firstmonday.org/issues/issue12_8/duguid/index.html" title="First Monday article: &#039;Inheritance and loss? A brief survey of Google Books&#039;">Inheritance and loss?  A brief survey of Google Books</a>&#8220;.  The article is a somewhat contrived exploration of the Google Books Library Project through his lens of quality assurance derived &#8220;through innovation or through &#8216;inheritance.&#8217;&#8221;  His thesis seems to be that users expect the reputations of the libraries participating in the project (Harvard, University of Michigan, New York Public, Stanford, and Oxford among the <a href="http://books.google.com/googlebooks/partners.html" title="Google Book Search Library Partners">other partners</a> are arguably a reputable group) convey a level of quality to the results of the digitization process in the Google Books Library Project.  Duguid then goes on to pick what arguably has to be the hardest book artifact to capture digitally (various editions of Laurence Sterne&#8217;s &#8220;<a href="http://andromeda.rutgers.edu/~jlynch/Biblio/shandy.html" title="Tristram Shandy: An Annotated Bibliography by Jack Lynch"><i>The Life and Opinions of Tristram Shandy, Gentleman</i></a>&#8220;) as an example of everything that is wrong with Google Books.</p><p>I don&#8217;t subscribe to that notion at all, but it is perhaps because I&#8217;ve been around enough technology and innovation to know that each new service needs to stand on its own. <i>Tristram Shandy</i> is in part an experiment in typography and layout by the author, as Duguid describes in detail in this article, that is unusual and atypical to the extreme, so I think many of the characterizations of the Google Books project, based on this one artifact, are unfair and short-sighted.  When you strip away the false dichotomy of innovative-or-inherited-quality, the oddities surrounding the <i>Tristram Shandy</i> artifact, and various unnecessary pot-shots<sup><a href="http://dltj.org/article/google-scholar-and-books/#footnote_2_275" id="identifier_2_275" class="footnote-link footnote-identifier-link" title="&amp;#8220;A quick look at the online catalogue for Stanford&rsquo;s library shows that the Stanford volume presented as your second choice by Google Books is actually tucked away in the Stanford Auxiliary library along with &ldquo;infrequently&ndash;used&rdquo; texts.&amp;#8221;">3</a></sup> Duguid&#8217;s analysis does point to some apparent problems with Google&#8217;s scheme for digitizing and indexing books.  The quality of some of the scans pointed out in the <i>Tristram Shandy</i> artifact and others are sources of concern.  Substandard metadata is another:<br /><blockquote>Not a word is mentioned about multiple volumes or volume number. Indeed, a quick survey of the Google Book Project suggests that Google doesn’t recognize volume numbers. Not only are the different editions (Harvard’s from 1896, Stanford’s from 1904) given exactly the same name, but also the different volumes of this Stanford’s multivolume edition are labeled identically. Consequently, whatever algorithm Google uses to find the book, it is quite likely, as in this case, to offer volume II first.</p></blockquote><p>Reservations aside, it is a good review the some of the problematic outcomes of the Google Books Library Project.</p><h2>Footnotes</h2><ol class="footnotes"><li id="footnote_0_275" class="footnote">Judging from the citation listed on <a href="http://www.gesis.org/IZ/Mayr/" title="" class="broken_link" rel="nofollow">Philipp Mayr&#8217;s homepage</a>, the article will appear in an upcoming issue of Online Information Review from Emerald Group Publishing.</li><li id="footnote_1_275" class="footnote">Numbers from the <a href="http://www.doaj.org/" title="Directory of Open Access Journals homepage">DOAJ home page</a>, as of 15-Aug-2007.</li><li id="footnote_2_275" class="footnote">&#8220;A quick look at the online catalogue for Stanford’s library shows that the Stanford volume presented as your second choice by Google Books is actually tucked away in the Stanford Auxiliary library along with “infrequently–used” texts.&#8221;</li></ol>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/google-scholar-and-books/feed/</wfw:commentRss> <slash:comments>5</slash:comments> </item> <item><title>Article-Level OAI-PMH Harvest Available from DOAJ</title><link>http://dltj.org/article/doaj-articles/</link> <comments>http://dltj.org/article/doaj-articles/#comments</comments> <pubDate>Wed, 11 Jul 2007 20:51:54 +0000</pubDate> <dc:creator>Peter Murray</dc:creator> <category><![CDATA[Raw Technology]]></category> <category><![CDATA[description]]></category> <category><![CDATA[Directory of Open Access Journals]]></category> <category><![CDATA[ejournal]]></category> <category><![CDATA[oai-pmh]]></category> <category><![CDATA[open access]]></category><guid isPermaLink="false">http://dltj.org/2007/07/doaj-articles/</guid> <description><![CDATA[Earlier this year the DOAJ began offering a new schema for registered articles that significantly improves the value of OAI-PMH harvested article content. Prior to this addition the only scheme available was Dublin Core, which as a metadata schema for &#8230; <a href="http://dltj.org/article/doaj-articles/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<abbr class="unapi-id ignore noPrint" title="http://dltj.org/2007/07/doaj-articles/"></abbr><p>Earlier this year the <a href="http://www.doaj.org/doaj?func=loadTempl&#038;templ=070509" title="Article-level OAI-PMH announcement on DOAJ website"><abbr title="Directory of Open Access Journals">DOAJ</abbr> began offering a new schema for registered articles</a> that significantly improves the value of OAI-PMH harvested article content.  Prior to this addition the only scheme available was Dublin Core, which as a metadata schema for describing article content is woefully inadequate.  (Dublin Core, of course, was never designed to handle the complexity of the description of an average article.)  The <a href="http://www.doaj.org/schemas/doajArticles.xsd" title="doajArticles&#039; XML schema">new schema</a> (graphically represented here<br /><a href="http://cdn.dltj.org/wp-content/uploads/2007/07/doajArticles_schema_image1.png" rel="lightbox"><img src="http://cdn.dltj.org/wp-content/uploads/2007/07/.doajArticles_schema_image.png" alt="doajArticles schema image" title="doajArticles schema image" align="right" width="112" height="146" border="0" /></a> &#8212; select thumbnail to see a larger version) includes elements for ISSN/eISSN, volume/issue, start/end page numbers, and author affiliation.  There is also a <code>&lt;fullTextUrl&gt;</code> element that is a link to the article content itself (not the splash page of the article on the publisher&#8217;s site).</p><p>Article content using this schema is harvestable through the DOAJ OAI-PMH provider site (for instance, using a <a href="http://www.doaj.org/oai.article?verb=ListRecords&#038;metadataPrefix=doajArticle" title="XML harvest of the latest articles added to the DOAJ article archive"><code>ListRecords</code> verb with a <code>doajArticle</code> metadata prefix</a> against the PMH URL).  This is, in fact, the same schema journal publishers use to submit article content to the DOAJ article database.  With these pieces in place, it is now conceivable to harvest open access journal article content through the DOAJ and add it to a local journal article repository (such as the <a href="http://journals.ohiolink.edu/ejc/article.cgi?issn=14649055&#038;issue=v25i0002&#038;article=191_etoe" title="Journals: the OhioLINK experience&#039; article record in OhioLINK EJC">Electronic Journal Center</a> in the case of OhioLINK).</p><p>Thanks go out to the DOAJ folks for making this available!</p>]]></content:encoded> <wfw:commentRss>http://dltj.org/article/doaj-articles/feed/</wfw:commentRss> <slash:comments>7</slash:comments> </item> </channel> </rss>
<!-- Served from: dltj.org @ 2012-02-11 08:49:42 by W3 Total Cache -->
