Thursday Threads: Open Source in Health Care, The Big Deal, Archives of Web Pages

Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

We’re taking a break this week from the HarperCollins e-book story; although the commentary continues from librarians (and a few authors), there hasn’t been anything new (that I’ve seen) from HarperCollins itself. There is still plenty more to look at, though. First up is a report from the health care sector on the applicability of open source and open systems. Next is an interview with a financial analyst that sees the end of the “big deal” for library journal subscriptions. And lastly is a list of web archive services that you could use to find old copies of web pages.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my FriendFeed stream (or subscribe to its feed in your feed reader). Comments and tips, as always, are welcome.

Open Source, Open Standards, and Health Care Information Systems

Recognition of the improvements in patient safety, quality of patient care, and efficiency that health care information systems have the potential to bring has led to significant investment. Globally the sale of health care information systems now represents a multibillion dollar industry. As policy makers, health care professionals, and patients, we have a responsibility to maximize the return on this investment. To this end we analyze alternative licensing and software development models, as well as the role of standards. We describe how licensing affects development. We argue for the superiority of open source licensing to promote safer, more effective health care information systems. We claim that open source licensing in health care information systems is essential to rational procurement strategy.

This might be a useful data point for libraries considering the adoption of open source for their mission-critical applications. Two U.K. authors have published a report that reviews general benefits of open source and open standards, noting in one heading that “Open Standards Facilitate Competition Between Open Source Software and Proprietary Software”. They also compare the open source software development practices with those of proprietary software development and look at barriers to the adoption of open source software. A great deal of the analysis is particular to health care information systems, but the report would be a useful template to applying the same analysis to core library systems. [Via ACM TechNews]

Reynolds, Carl J., & Wyatt, Jeremy C. (2011). Open Source, Open Standards, and Health Care Information Systems Journal of Medical Internet Research, 13 (1) DOI: 10.2196/jmir.1521

The Demise of the Big Deal?

Interview question: You, however, believe that publishers will simply have to accept that their revenues are going to fall, because there really is no more money?

Claudio Aspesi: I have no doubt that — over time — adjustments would be made. But it remains to be seen if they need all the 2,200/2,400 journals that the each of the largest publishers maintain today.

You know, my job is not to pass judgement on how people run their business or to decry capitalism, only to advise investors whether they should buy or sell stocks.

I can observe, however, that there is something unhealthy about an industry which has managed to alienate its customers to the point their membership associations increasingly focus time and attention on how to overturn the industry structure. It is not a good thing to have your customers spend their time trying to put you out of business.

Richard Poynder interviews Claudio Aspesi, a financial analyst based at the sell-side research firm Sanford Bernstein. Aspesi issued a report last year that was critical of the financial outlook of Reed Elsevier and more recently has downgraded the outlook to “underperform”. This interview gets into the reasoning behind Aspesi’s decision.

Archives of Dead Web Pages: Wayback, Cache, and More

The Web changes constantly, and sometimes that page that had just the information you needed yesterday (or last month or two years ago) is not available today. At other times you may want to see how a page’s content or design has changed. There are several sources for finding Web pages as they used to exist. While Google’s cache is probably the best known, the others are important alternatives that may have pages not available at Google or the Wayback Machine plus they may have an archived page from a different date. The table below notes the name of the service, the way to find the archived page, and some notes that should give some idea as to how old a page the archive may contain.

Although this list is over three years old, many of the services are still active. One addition of note is a beta test version of the Internet Archive’s Wayback machine; it includes an improved interface and a more up-to-date archive of pages.

Thursday Threads: Open Publishing Alternatives, Open Bibliographic Data, Earn an MBA in Facebook, Unconference Planning

Receive DLTJ Thursday Threads:

• by E-mail

• by RSS RSS Icon

Delivered by FeedBurner

The highlights of the past week are around publishing — first with a model proposed by Eric Hellman in which consumers can pool enough money to pay publishers to “set a book free” under a Creative Commons license, then with an announcement by the University of Pittsburgh offering free hosting of open access e-journals. Since we have to be able to describe and find this content, their bibliographic descriptions are important; John Wilkin proposes a model for open access to elements of bibliographic descriptions. Rounding out this week’s topics are a report of a master’s degree program in business using Facebook, and tips for planning an unconference meeting.

Paying Publishers to Set their Content Free

[Eric] Hellman’s new model is something he calls GlueJar. He proposes to “unglue” e-books from their publishers so that they can be available to the world, DRM-free and under Creative Commons license. Here’s the model: publishers sign on with works that they want to “unglue.” They determine what they are willing to be paid for ungluing each work. Users contribute money towards the ungluing. When the threshold amount is reached for a given title, that title is unglued: it appears in all contributors’ e-book reader libraries and in repositories used for online public library access. The publisher is paid, and GlueJar takes a commission.

In other words, publishers just need to determine a price for content being taken off their hands, and if the public is willing to pay that price, it happens. (Users aren’t charged until works they want to unglue are unglued.) No more transaction costs; anyone can distribute the content to anyone else. Publishers could possibly retain subsidiary rights to the content, such as print on demand or derivative work rights.

Bill Rosenblatt of the Copyright and Technology blog looks at the problem publishers have of finding good content creators and having a model that makes that content widely available. Towards the end of his post, he summarizes Eric Hellman’s proposed model for “ungluing ebooks” in a way that makes sense for creators, publishers, and consumers. So far as I know, no one has taken Eric up on a trial of his model, but I think it would be interesting to see if it was practical. [Found via OCLC Research’s Above the Fold.]

University of Pittsburgh Library System Offers Free E-Journal Publishing Service

Pitt’s University Library System (ULS) is now offering free e-journal publishing services to help academic journals make their content available to a global audience while eliminating the cost of print production. 
The E-journal Publishing Program—part of ULS’ D-Scribe Digital Publishing Program, which partners with the University of Pittsburgh Press—“is in keeping with the ULS’ commitment to free and immediate access to scholarly information and its mission to support researchers in the production and sharing of knowledge in a rapidly changing publishing industry,” said Rush G. Miller, Hillman University Librarian and director of the ULS. 
The ULS trains a journal’s editorial staff in the use of Open Journal Systems (OJS) software, which channels the flow of scholarly content from initial author submissions through peer review and final online publication and indexing. OJS provides the tools necessary for the layout, design, copy editing, proofreading, and archiving of journal articles. The platform provides a vast set of reading tools to extend the use of scholarly content through RSS feeds and postings to Facebook and Twitter. E-journal articles can be discovered via blogs, databases, search engines, library collections, and other means. 

The University of Pittsburgh announced that it is offering the infrastructure for managing and hosting electronic journals with an at-cost print-on-demand supplement. Since the cost of the digital publishing platform is absorbed by the University of Pittsburgh and since peer review is typically done at no cost, what’s left on the expense side of the balance sheet? Paying the editorial staff? Marketing and advertising the journal? Has the University of Pittsburgh tipped the equation enough to make this model viable?

Open Bibliographic Data: How Should the Ecosystem Work?

In the conversations about openness of bibliographic data, I often find myself in an odd position, vehemently in support of it but almost as vehemently alarmed at the sort of rhetoric that circulates about the ways that data should be shared.

The problem with both the arguments OCLC makes and many of the arguments for openness seem to be predicated on the view that bibliographic data are largely inert, lifeless “records” and that these records are the units that should be distributed and consumed.

Nothing could be further from the truth.

The above quote is just one small piece of a posting by John Wilkin on the Open Knowledge Foundation blog. In it he plants a flag for the library profession to drive towards with bibliographic data that is published in a fine-grained, easily recombined manner. In being too focused on silos of “lifeless records” (WorldCat, local ILSs, Open Library, etc.), he suggests that the profession is missing out on ways we (and our users!) can combine and enhance bibliographic data. John’s statement is in parallel with a growing movement towards linked data, a movement that encompasses a reinvigorating of bibliographic description using FRBR and RDA (the current and progressive best thinking of the library community) with the foundational elements of the “semantic web” vision. For more on the latter, see the work of the W3C-supported Library Linked Data Incubator Group and the work of Karen Coyle and Diane Hillman, among others.

On a related note, the JISC community in the UK has also published the Open Bibliographic Data Guide. “It is about the business cases for Open Bibliographic Data – releasing some or all of a library’s catalogue records for open use and re-use by others.”

Poking, Tagging and Now Landing an M.B.A

But thanks to a pair of young British entrepreneurs, students who do want both a business education and the credential to prove it can now pursue their studies at the same time as they “poke” their friends, tag photos, update their relationship status or harvest their virtual crops on FarmVille.

The London School of Business and Finance Global M.B.A. bills itself as “the world’s first internationally recognized M.B.A. to be delivered through a Facebook application.”

Hmm — meet the students where they are? This story from the New York Times outlines an MBA program that is fully immersed in the Facebook environment. I wonder if the completion rate of a Facebook-based program will be higher than that of other online systems because users spend more time in the Facebook environment. [Via Steven Bell]

How I Planned a Successful Unconference in 6 hours – and You Can Too

Last Friday I ran WhereCamp5280 in Denver, which attracted over 70 people (many from out of state and a couple from Canada), used thousands of dollars from top-tier sponsors and was organized in probably less than six hours total. An unconference is a conference in the loosest of terms. People show up, we build our own agenda and then go for it. Here I’ll describe how it was run.

Steve Coast, a guest author for ReadWriteWeb, give this how-to guide for planning an unconference. An unconference is a relatively new style of event where the content of the meeting is defined by the people who show up and participate. The common guidelines for such meetings1 are: 1) The people who come are the best people who could have come; 2) Whatever happens is the only thing that could have happened; 3) It starts when it starts; 4) It’s over when it’s over; and 5) Exercise the Law of Two Feet. The last might take some more explanation; it means: “If you are not learning or contributing to a talk or presentation or discussion it is your responsibility to find somewhere where you can contribute or learn.”

In my experience, the unconference format is great if you want a group to brainstorm around a central idea or if you want to promote professional networking connections among a group. If you are looking for a particular outcome or have a specific agenda, this format does not work well.


  1. These rules are common, but I found them most clearly expressed at the Scratchpad Wikia. []

Proposals for NISO Work Items: Physical Delivery Best Practices and Standardized Markup for Journal Articles

NISO voting members are currently considering two new work items: a statement of best practices for the physical delivery of library resources and formalizing the NLM journal article DTD de facto standards. The Physical Delivery and Standardized Markup for Journal Articles proposal documents are openly available for download.

The first is a proposal submitted by Valerie Horton, Executive Director, Colorado Library Consortium (CLiC), on the Physical Delivery of Library Resources — and subsequently approved by NISO’s Discovery to Delivery Topic Committee —that aims to develop a statement of best practices. This proposed project would build on the efforts of three recent projects: Moving Mountains, Rethinking Resource Sharing’s Physical Delivery Committee, and the American Library Association’s ASCLA ICAN’s Physical Delivery Discussion Group. The document is proposed to include recommendations for: packaging, shipping codes, labeling, acceptable turn-around time, lost or damaged materials handling, package tracking, ergonomic considerations, statistics, sorting, a set of elements to be used for comparison purposes to determine costs, linking of regional and local library carriers, and international delivery.

The second proposal on Standardized Markup for Journal Articles was submitted by Jeff Beck, Technical Information Specialist, National Center for Biotechnology Information (NCBI) — and subsequently approved by NISO’s Content & Collection Management Topic Committee — and is based on the National Library of Medicine’s journal archiving and interchange tag suite. Three schemas for journal articles are include in the Suite and are maintained by NLM: NLM Archiving and Interchange Tag Set, NLM Journal Publishing Tag Set, and the NLM Article Authoring Tag Set. The goal of this work item is to take the currently existing Journal Archiving and Interchange Tag Suite version 3.0, the three journal article schemas, and the documentation and shepherd them through the NISO process to become an ANSI/NISO consensus standard.

For a proposed working group to get started, at least 10% of NISO’s Voting Members must express an interest in the work item. The Physical Delivery ballot ends on September 1 and Journal Article Markup ends on September 2. Should the work items be approved, you can express interest in joining the working groups by using the NISO Contact Form, even if you aren’t affiliated with a NISO Voting Member organization.

Analysis of PubGet — An Expedited Fulltext Service for Life Science Journal Articles

In June, a new service that speeds access to life sciences literature reached a milestone. Called PubGet, it is a service that reduces the number of clicks to the full text of an article, and the milestone was activating the 50th institution using its service. Using its own proprietary “pathing engine”, it links directly to the full text on the publisher’s website. PubGet does this by understanding the link structure for each journal of each publisher and constructing the link to the full-text based on information from the citation. The PubGet service focuses on the life sciences journals indexed in PubMed — hence the play on names: PubMed to PubGet.

How It Works

OLINKS screen for Christensen article

Link Resolver screen for Christensen article

EBSCOhost screen for Christensen article

EBSCOhost screen for Christensen article

Typical View of a PubGet Article Display

Typical View of a PubGet Article Display

In a typical interaction, a user would start at a web page with a journal article citation that has a link to the user’s OpenURL resolver. Contained in that link is the citation metadata that identifies the specific article. Clicking on that link takes you to the OpenURL resolver web page for that specific article. That web page contains links to any online versions of the article, and might also include links to library catalog records for physical copies, and options to search for similar articles. An example of one of these pages is this one from my place of work for an article by Clayton Christensen in the Harvard Business Review. (When you are coming from an OhioLINK member institution, it looks like the screen image to the right.) Clicking on that link that says “Full text of this article at EBSCO” takes you to yet another page — this time from EBSCOhost — that has the citation data again and the options for viewing or taking other actions on the article. Once there it is one more click to the HTML or PDF full text of the article. From the perspective of the creators of PubGet, that is two clicks and two screens too many. PubGet’s pathing engine knows about the structure of links on the publishers website, and so it creates a link directly from the citation in the search results list to the article PDF.

The pathing engine is one of three components that make up the service. The other two are a search engine and a personalization feature. The search engine indexes the citation and abstract fields; it is not nearly as sophisticated as the thesauri-driven search engine native to PubMed, but it does the job for cases when you have a known citation. The personalization feature allows you to tie your account on PubGet to an institution, and with that knowledge the PubGet service can know exactly what digital rights your institution has for each journal and can create links to the full-text article that go through your institution’s proxy server. The account system also enables you to have new articles matching your search criteria sent to you and to mark articles in the search results for later bulk downloading (via a Firefox plugin).1

Thinking About PubGet in a Wider Information Ecosystem

One quandary I have with PubGet is that it bypasses OpenURL as the open standard for linking to full-text content. In order to take advantage of PubGet’s unique characteristic — the pathing engine to get straight to the article text — you need to start at the PubGet site itself in order to get the direct URLs to the articles. This is a pretty significant downside to the service. You can’t get the pathing engine along with the powerful PubMed search engine.

It would be nice if PubGet could be set up as an OpenURL target, and when it receives a request translates it to the direct link to the full-text using its pathing engine. That way I could set up PubGet as the OpenURL resolver in my PubMed account, and the article links in PubMed would automatically go to the full text. I don’t know how this would work as a business model for PubGet, though, because as an OpenURL resolver is this manner, it makes the PubGet website invisible — through a series of browser redirects I’d go from PubMed to PubGet to the publisher site. (If the point is to make money selling advertising for related industries, a configuration that completely by-passes any visible signs of PubGet would cut into that revenue source.)

As I was sharing background on PubGet with Thomas Dowling, a colleague at OhioLINK, he pointed out something I didn’t know about OpenURL: it is within the standard to specify a “service type” in the OpenURL Context Object. Section 5.1 of the NISO standard for OpenURL says a service type is “The resource that defines the type of service (pertaining to the Referent) that is requested.” And there are indeed service types registered as part of the SAP2 Community Profile: abstract, citation, fulltext, holdings, ill, and any. So the recipient of an OpenURL request, using the “fulltext” Service Type, should be able to replicate the proprietary PubGet pathing engine using a standard OpenURL structure. In a brief bit of experimentation, though, I was not able to find an OpenURL resolver that a) knew how to handle a Service Type parameter, and/or b) knew how to honor that parameter by getting directly to the full text. As OpenURL undergoes its 5-year review this year, it might be worthwhile to emphasize this part of the standard with examples and descriptions of best practices so it is more widely adopted.

Other Articles on PubGet


  1. Since the articles are not held within the PubGet service itself, the bulk article downloading function requires a Firefox plugin so that the article requests come from your browser to the publisher’s site. []

More on Commercial Versus Not-For-Profit Open Access Publishing

DLTJ featured a discussion last month on what I saw as the outcomes of “clashing values” between the interest of businesses and that of not-for-profit higher education. The discussion started with “Educational Patents, Open Access Journals, and Clashing Values” and continued with a focus on open access publishing specifically with “What Is BioMed Central?.” Here is a update on the topic in the form of an e-mail from Ray English and a press release from Marquette Books.

Ray English’s Perspective on Open Access Publisher Economics

Ray English is the Director of Libraries at Oberlin College (also notably the chair of the SPARC Steering Committee and 2006 Recipient of ACRL’s Academic/Research Librarian of the Year Award) sent an e-mail related to this topic to an internal OhioLINK mailing list. I’m grateful for his permission to reproduce it here:

Here’s some background on open access journals that I hope is helpful.

There are now three major open access journal publishers – BioMed Central, Hindawi, and PLOS.

BioMed Central and Hindawi are both commercial publishers that follow an author fee model. BioMed Central also has an arrangement that allows institutional memberships to cover all or a portion the fees. I would characterize both publishers as having reasonable prices and being focused on access to the literature, rather than profit maximization. Hindawi charges relatively low author fees, in part because they’re based in Egypt and have a lower cost structure. BioMed Central author fees are higher, but below the per article author fees for various “open choice” plans that are place for most commercial publishers. Hindawi reported that they had become profitable last year and BioMed Central has projected that they will
be profitable in the coming year.

PLOS is a non-profit publisher. They follow an author fee model, with institutional memberships, and they also have had a lot of foundation support. They report that their various journals are viable financially except for PLOS Biology, their flagship journal. PLOS Biology’s cost structure, which includes a great deal of content and value ads beyond individual research articles, can’t be supported by their current model of author fees and institutional memberships.

Those of you who are curious about the financial status of these three publishers may want to check out the podcast and PowerPoints from the recent SPARC-ACRL forum at ALA in DC. High-level representatives from BioMed Central, Hindawi, and PLOS spoke about their publishing programs and their financial status. The title of the forum was: “Course check: A conversation with three open access publishers aboutthe challenges of sustainability” The podcast and PowerPoints are not yet online, but should be up
before too long at: [Jester’s Note: they are available at]

There are many open access journals beyond those produced by these three publishers. The Directory of Open Access Journals now lists just over 2800 titles. They operate under a variety of business models. The vast majority of them are non-profit. A study done a couple of years ago found that under half of all OA journals charge author fees.

Marquette Books’ Open Access Announcement

Right about the same time I got Ray’s message, there was a press release by Marquette Books about their plans to begin publishing eight open access journals. Portions are reproduced here. With information like this being published, it is no wonder the open access publishing marketplace is awash in contradictory statements.

Marquette Books Goes “Open Access” with Communication Journals

Eight new scientific journals that focus on communication processes and effects will be available free of charge to scholars and the public in 2008, Marquette Books LLC of Spokane, Washington, announced today.

MB Publisher David Demers said he believes this is the first time a privately owned publishing house has made all of its journals open access. According to the Directory of Open Access Journals (, almost all open access journals are published by universities or nonprofit organizations, which in turn receive financial support from tax revenues or private donations.

“At a time when most for-profit publishers are increasing the costs of their journals,” said Demers, “we decided to go the opposite route and offer all of our journals free of charge. We want the scholarship in our journals to be read by as many people as possible.”


To compensate for some of the loss of subscription revenue, Demers said the online portal through which scholars and the public will access the PDF content of MB journals will contain some advertising for MB’s scholarly and trade books. But he doesn’t expect sales of those books or institutional subscriptions to the hard copy versions of MB journals (priced at $85 for one journal and $35 for each additional journal) to cover the costs of making the journals open access.

“This is a long-term strategy,” he said. “We believe open-access along with our policy of allowing scholars to keep the copyright to their submissions will enhance the quality of our journals as well as our brand name.”


Demers said many higher education librarians are upset with publishers who charge hundreds or even thousands of dollars a year for journal subscriptions. “The best kept secret in book publishing is that journal publishing is the most profitable arm of the industry,” Demers said. “There clearly isn’t enough competition in this market.”


Most open access journals are available only in electronic form. But Marquette Books also will publish hard copy versions of the journals for libraries and interested individuals ($35 to $85 for a single journal subscription).

The eight journals scheduled for publication in Winter 2008 are Journal of Media Sociology, Journal of Global Mass Communication, Russian Journal of Communication, Journal of Health & Mass Communication, Journal of Media Law & Ethics, American Journal of Media Psychology, Journal of Communication Studies and International Journal of Media and Foreign Affairs. More information about the journals can be found at


Am I reading this right? “[Marquette Books Publisher David Demers] doesn’t expect sales of books [advertised through the journal portal] or institutional subscriptions to the hard copy versions of MB journals to cover the costs of making the journals open access.” So Marquette is choosing to offer the open access material at a loss? Just because they believe in the philosophy of open access? Count me as skeptical.

The text was modified to update a link from to on January 20th, 2011.

The text was modified to update a link from to on January 20th, 2011.

The text was modified to update a link from to on November 21st, 2012.

The text was modified to update a link from to on November 21st, 2012.

Analysis of Google Scholar and Google Books

Two papers were published recently exploring the quality of Google Scholar and Google Books.

Google Scholar

Philipp Mayr and Anne-Kathrin Walter, both of GESIS / Social Science Information Center in Bonn, Germany, uploaded an article to arXiv called “An exploratory study of Google Scholar.” 1 Originally created as a presentation for a 2005 conference, it was updated in January 2007 to reflect new findings and published as a paper. Excerpts from the abstract include:
The study shows deficiencies in the coverage and up-to-dateness of the [Google Scholar] index. Furthermore, the study points up which web servers are the most important data providers for this search service and which information sources are highly represented. We can show that there is a relatively large gap in Google Scholar’s coverage of German literature as well as weaknesses in the accessibility of Open Access content. Major commercial academic publishers are currently the main data providers.

We conclude that Google Scholar has some interesting pros (such as citation analysis and free materials) but the service can not be seen as a substitute for the use of special abstracting and indexing databases and library catalogues due to various weaknesses (such as transparency, coverage and up-to-dateness).

The authors performed a “brute force analysis” (their words) of the coverage of Google Scholar by comparing search results by journal title with five journal lists: ISI Arts & Humanities Citation Index, ISI Social Science Citation Index, ISI Science Citation Index, open access journals listed by DOAJ, and journals from the SOLIS database (mainly German-language journals from sociological disciplines). They queried Google Scholar using the “Return articles published in…” limiter on the advanced search screen, downloaded the first 100 records for each title, then parsed and analyzed each of the records. In total, 621,000 records from Google Scholar search results were analyzed.

Number of Articles Found in Google Scholar by Title ListThe authors first determined the coverage of titles in the five journal lists in the Google Scholar database. The authors note surprise at the relative lack of coverage for open access titles listed in the DOAJ. I think this can be explained by the fact that many open access publishers are not using a systematic application to put their content on the internet. Of the 2,804 journals in the DOAJ directory, only 846 are searchable via DOAJ’s own article-level indexing service.2 If the journals can’t be easily harvested at the article level, then they Google can’t add them to the Scholar article index.

Distribution of Document Types Among the Lists QueriedBased on the semantics provided in each record, the authors divided the results into three categories (referred to in the paper as “document types”): links to complete descriptive records on an external (publisher’s or aggregator’s) site, citation-only records (no full-text and no link to more complete information at an external site), and direct access links to full text. The distribution of results is shown in the table to the right.

The paper also includes an analysis of the various publisher and portal sites that supply information to Google Scholar’s index.

Google Books

The August issue of First Monday contains an article by Paul Duguid called “Inheritance and loss? A brief survey of Google Books“. The article is a somewhat contrived exploration of the Google Books Library Project through his lens of quality assurance derived “through innovation or through ‘inheritance.'” His thesis seems to be that users expect the reputations of the libraries participating in the project (Harvard, University of Michigan, New York Public, Stanford, and Oxford among the other partners are arguably a reputable group) convey a level of quality to the results of the digitization process in the Google Books Library Project. Duguid then goes on to pick what arguably has to be the hardest book artifact to capture digitally (various editions of Laurence Sterne’s “The Life and Opinions of Tristram Shandy, Gentleman“) as an example of everything that is wrong with Google Books.

I don’t subscribe to that notion at all, but it is perhaps because I’ve been around enough technology and innovation to know that each new service needs to stand on its own. Tristram Shandy is in part an experiment in typography and layout by the author, as Duguid describes in detail in this article, that is unusual and atypical to the extreme, so I think many of the characterizations of the Google Books project, based on this one artifact, are unfair and short-sighted. When you strip away the false dichotomy of innovative-or-inherited-quality, the oddities surrounding the Tristram Shandy artifact, and various unnecessary pot-shots3 Duguid’s analysis does point to some apparent problems with Google’s scheme for digitizing and indexing books. The quality of some of the scans pointed out in the Tristram Shandy artifact and others are sources of concern. Substandard metadata is another:

Not a word is mentioned about multiple volumes or volume number. Indeed, a quick survey of the Google Book Project suggests that Google doesn’t recognize volume numbers. Not only are the different editions (Harvard’s from 1896, Stanford’s from 1904) given exactly the same name, but also the different volumes of this Stanford’s multivolume edition are labeled identically. Consequently, whatever algorithm Google uses to find the book, it is quite likely, as in this case, to offer volume II first.

Reservations aside, it is a good review the some of the problematic outcomes of the Google Books Library Project.


  1. Judging from the citation listed on Philipp Mayr’s homepage, the article will appear in an upcoming issue of Online Information Review from Emerald Group Publishing. []
  2. Numbers from the DOAJ home page, as of 15-Aug-2007. []
  3. “A quick look at the online catalogue for Stanford’s library shows that the Stanford volume presented as your second choice by Google Books is actually tucked away in the Stanford Auxiliary library along with “infrequently–used” texts.” []

What Is BioMed Central?

My posting on Friday about the clashing values of academic institutions and businesses prompted a comment from Bill Hooker about linking to his blog posting about the pricing structure at BioMed Central (BMC). His comment and the e-mail I received this morning from BMC (reproduced below) got me rethinking about the nature of open access publishing.

What is BioMed Central?

It is a business. It has advertising (even an “Sales and Marketing Director” listed on its “In-House Team” page) and it generates revenue for services beyond the per-article charge at the time of publication. As the e-mail below says, one can purchase “direct emails and keyword search term sponsorship” from BMC. To the best of my understanding of U.K. tax law, it is not a registered non-profit organization. (Its contact page says BioMed Central Ltd is “a company registered in England and Wales with Company Number 3680030 […] and having VAT number GB 4662477 23.”) In its FAQ page “How does BioMed Central make money?” BMC describes revenue-generating possibilities:

As a publisher, BioMed Central obviously has to be profitable to survive as a service for the biomedical community. We believe that if we add value to raw data, we are entitled to charge for access to it. If a journal commissions topical or thematic reviews, and so helps individuals orientate themselves amidst the complexity of available research, it may have a subscription charge. Furthermore, if we construct and maintain community alerting services, allowing users to discover where, in the eyes of their peers, quality and significance lie, we will charge. An example of such product is Faculty of 1000. We are also carrying advertising on our site, and we will be creating other products and services for which a charge will be made.

I think it is safe to say that BMC is well within the relm of what can be characterized as a “business.”

Compared to PLoS

Another larger player in the open access publishing arena is the Public Library of Science (PLoS), and it serves as a good point of comparison. Number six of its nine core principles is labeled “Financial fairness”:

As a nonprofit organization, PLoS charges authors a fair price that reflects the actual cost of publication. However, the ability of authors to pay publication charges will never be a consideration in the decision whether to publish.

In contrast to BMC, PLoS is a 501(c)(3) Public Charity under the U.S. IRS tax code. As a consequence it files an IRS form 990 that allows us to see the details of its operation and gauge whether there is an undue burden on revenues based on expenses. PLoS lists ways to contribute to its effort, including individual contributions and institutional membership/sponsorship.


In my own mind, I had equated “open access” with “not-for-profit” — and in the case of BMC this is not the case. I had thought that open access was universally like PLoS’ model. Just to be clear about this, I’m not saying that academic values are good and business values are bad. I am saying that we should not expect businesses to act based on the values that drive academic institutions and that we shouldn’t be surprised when businesses behave like businesses. I was surprised to learn that BMC is a business.

Perhaps BMC is ultimately a “better” model. Does the revenue from advertising and direct marketing to registered users ultimately drive down the cost of per-article publishing? Do the subscription (pay-for) services offered by BMC not interfere with access to the underlying article data? (Here’s a good question: can one take the open access article data from BMC and construct competing value-added services? Based on reading point #8 in BMC’s Terms and Conditions, it would appear not. Update 20070815T0907: The answer to this question is yes, one can. See the first comment by Matt Cockerill, followed by my reply and Matt’s announcement of changes to the Terms and Conditions page that, to my reading, makes points eight and nine of BioMed Central’s Terms and Conditions much clearer.) These are all questions to be answered as the open access model evolves.

I would be curious to learn in the comments if others thought that “open access” equated to “not-for-profit”. It is entirely possible that I’m the only one, in which case this jester just made a fool of himself.

The E-mail

This is the e-mail that arrived in my inbox overnight:

From: "BioMed Central Advertising" &>
Subject: BioMed Central August News and Offers
Date: Mon, 13 Aug 2007 11:24:29 +0100

News from BioMed Central

August Offer

15% discount on all direct emails and keyword search term sponsorship booked before 28th September 2007!
Did you know that with BioMed Central you can choose to target your direct emails by institution type, specialty, country, techniques used and job title? Did you also know that you will be guaranteed to get maximum exposure from your emails as well as getting good value for money?
10,000 email names with targeting selections would be $3750 but with a 15% discount this will only be $3187 saving you $563!
BioMed Central web ads offer you a bigger impact than advertising on search engines.
Why not try a keyword package?
Advertise on search results when your keyword is used.
Choosing 5 keywords or phrases would be $1000 for 6 months but with a 15% discount this will only be $850.
New Portal and Gateways!
New portal from BioMed Central highlights importance of open access to scientific and medical literature for the developing world.  Sponsor the new Open Access and Developing World portal and be part of increasing access to the scientific and medical literature for those in the developing world.
The new Global Health and Microbiology and Infectious Diseases Gateways are now available for sponsorship.
Contact BioMed Central today at or on +44(0)20 7631 9168 – or reply to this email for more information.

You have received this message in a belief that it would be of interest.
If you would not like to receive any further messages from BioMed Central, please reply to with the word “remove” in the subject line.

BioMed Central Ltd, Science Navigation Group, Middlesex House, 34-42 Cleveland Street,
London, W1T 4LB, United Kingdom


p style=”padding:0;margin:0;font-style:italic;”>The text was modified to update a link from to on November 8th, 2012.

Article-Level OAI-PMH Harvest Available from DOAJ

Earlier this year the DOAJ began offering a new schema for registered articles that significantly improves the value of OAI-PMH harvested article content. Prior to this addition the only scheme available was Dublin Core, which as a metadata schema for describing article content is woefully inadequate. (Dublin Core, of course, was never designed to handle the complexity of the description of an average article.) The new schema (graphically represented here
doajArticles schema image — select thumbnail to see a larger version) includes elements for ISSN/eISSN, volume/issue, start/end page numbers, and author affiliation. There is also a <fullTextUrl> element that is a link to the article content itself (not the splash page of the article on the publisher’s site).

Article content using this schema is harvestable through the DOAJ OAI-PMH provider site (for instance, using a ListRecords verb with a doajArticle metadata prefix against the PMH URL). This is, in fact, the same schema journal publishers use to submit article content to the DOAJ article database. With these pieces in place, it is now conceivable to harvest open access journal article content through the DOAJ and add it to a local journal article repository (such as the Electronic Journal Center in the case of OhioLINK).

Thanks go out to the DOAJ folks for making this available!

A Known Citation Discovery Tool in a Library2.0 World

When it comes to seeking a full-text copy of that known-item citation, are our users asking “what have you done for me lately?” OpenURL has taken us pretty far when one starts in an online environment — a link that sends the citation elements to our favorite link resolver — but it only works when the user starts online with an OpenURL-enabled database. (We also need to set aside for the moment the need for some sort of OpenURL base resolver URL discovery tool — how does an arbitrary service know which OpenURL base resolver I want to use!) What if a user has a citation on a printed paper or from some other non-online form? Could we make their lives easier, too? Here is one way. (Thanks go out to Celeste Feather and Thomas Dowling for helping me think through the possibilities and issues.)

Some sites have addressed this issue with a static HTML form that prompts for the citation information (for example, this sample from Ex Libris’ SFX demo server). That is so Web-1.0, though — you have to fill out the entire form before you get any response back from the server on the availability (or non-availability) of the article you are trying to find. What if we could meet the user where they are with an interactive dialog that would quickly connect the user with the article?

One of the underlying assumptions is that users are still going to the OPAC to do a known-item search by journal title to see if the journal is held by the library in some form. The scheme that follows, though, would work just as well in A-Z journal lists and other places where the ISSN of the desired journal is known. The process starts when the user clicks on a link encoded with the ISSN from the bibliographic record pointing into our new OpenURL link resolver. The link resolver returns to the browser a page that lists articles from the given ISSN in reverse chronological order. (The theory here is, of course, to give the user some results, even though they are not likely what the user wants!)

The page also has an HTML form with fields for citation elements. As the user keys information into the form fields, AJAX calls update the results area of the web page with relevant hits. For instance, if a user types the first few letters of the author’s last name, the results area of the web page shows articles by that author in the journal. (We could also help the user with form-field completion based on name authority records and other author tables so that even as the user types the first few letters of the last name he or she could then pick the full name out of a list.) With luck, the user might find the desired article without any additional data entry!

Another path into the citation results via the link resolver: if a user types the volume into the form field, the AJAX calls cause links to appear to issues of that volume in addition to updating the results to a reverse chronological listing of articles. If a user then types the issue into the HTML form field or clicks the issue link, the results area displays articles from that issue in page number order. Selecting the link of an article would show the list of sources where the article can be found (as our OpenURL resolvers do now), and off the user goes.

Ideally, all of the form elements would be AJAX’d, so if a user types an author’s last name and a year, the appropriate citation(s) would show up. One might even be able to insert this activity in an IFRAME right on the bibliographic record display (if one have that much control over HTML layouts of various systems).

So what is to prevent us from reaching this citation discovery nirvana now? Well, a couple of things:

  1. Our link resolvers don’t know anything about the actual citations of items in journals — they just know how to take the citation elements and construct a URL that points into some other system. In order to make this work, our link resolver would need to be paired with a metasearch engine or a comprehensive index of article citations (or both).
  2. It would be helpful if there was an ISSN disambiguator to find alternate ISSNs (print vs. electronic, etc.) in order to cast a wide net for possible results. (In other words, a counterpart to OCLC’s xISBN resolver service.)

Two pretty high hurdles. Still, it would be really useful if we could pull it off, wouldn’t it?

Integration announced for DPubS (e-journal publishing system) and FEDORA (digital object repository)

The August 2006 edition of “The DPubS Report” produced by Cornell University Libraries for the DPubS community announced work underway at the Penn State to bridge the worlds of DPubS and FEDORA. Here is the line from the newsletter:

--------------------------------------------------------------------------SOFTWARE DEVELOPMENT UPDATE--------------------------------------------------------------------------[...]NEAR-TERM SCHEDULED WORK[...]* Penn State is working on Fedora interoperability. The plan is tohave that capability in the September release, with a working versionfor testing in late August.

The newsletter goes on to say that the work will be made available under an open source license, so I for one can’t wait to see what it looks like and how we might apply it to our own needs.