Thursday Threads: Digital Reference Librarians, First Sale Danger, Open Access, Data Modeling

When I say "<blank> is a question answering system. A question can be posed in natural language and ... <blank> can come up with a very precise answer to that question" -- what comes to mind to fill in the <blank>? If you guessed a system developed by IBM to appear alongside human contestants on Jeopardy, you'd be right. That quote comes from video posted by IBM earlier this year that is the topic of the first DLTJ Thursday Threads entry. This weeks other entries look at possible erosions of copyright first sale doctrine, the state of open access publishing, and a proposition for new definitions to terms of art in data modeling.

Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page. If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.

Reference Librarian of the Future? IBM Supercomputer ‘Watson’ to Challenge ‘Jeopardy’ Stars

IBM 'Watson' Video on YouTube

An I.B.M. supercomputer system named after the company’s founder, Thomas J. Watson Sr., is almost ready for a televised test: a bout of questioning on the quiz show “Jeopardy.”

I.B.M. and the producers of “Jeopardy” will announce on Tuesday [December 14, 2010] that the computer, “Watson,” will face the two most successful players in “Jeopardy” history, Ken Jennings and Brad Rutter, in three episodes that will be broadcast Feb. 14-16, 2011.

For I.B.M., “Watson” is an important test of artificial intelligence. Scientists there have been talking to “Jeopardy” about a man vs. machine match-up for the better part of two years. “If the program beats the humans, the field of artificial intelligence will have made a leap forward,” John Markoff of The New York Times wrote in April 2009.

"Reference Librarian of the Future?" is the question Bernie Sloan asks as he forwards news of this competition to the Next Generation Catalog for Libraries mailing list. The quote above comes from the New York Times Media Decoder blog post that describes the competition, and the four minute YouTube video posted by IBM in June 2010 describes and demonstrates "Watson" in action in a mock competition. Is "Watson" good enough to replace a reference librarian backed by a slew of resources? Maybe -- particularly if speed of an answer isn't valued as much as accuracy. (Some of Watson's "goofs" in the video are pretty funny, but may come as a result of trying to get to an answer too quickly.) Whether such technology is now affordable is an entirely different question.

Is the sky falling on library lending?

First sale is the rule that once a lawful copy of a work is sold, the exclusive right to control distribution of that copy is “exhausted.” Therefore libraries can lend books, consumers can resell CDs and NetFlix can rent DVDs through the mail. First sale is not a necessary or automatic part of a copyright law; many countries have different provisions on what is sometimes called “exhaustion,” such as a statutory fee for each library loan of an item. Most importantly, first sale does not apply when a work is licensed rather than sold. Many of the current threats to first sale are controversies over where the boundary between a sale and a licensing transaction really is.

Here is a catalog, quite long I’m afraid, of some of these controversies and, I think, threats to library lending.

Without strong copyright "first sale" doctrine, the foundation for much of what we do as libraries could be erroded faster than patrons are taking to internet search engines over our library services. Okay...that statement is hyperbolic, but Kevin Smith's post at Duke University's Scholarly Communications blog can make one wonder how fast that end is coming. First on the list is the UCLA video streaming case mentioned in last week's DLTJ Thursday Threads. (By the way, no new information from the court on that case this week.) Keven goes on to mention several other cases that may impact how first sale doctrine is applied to libraries. [Via Barbara Fister at Inside Higher Ed]

On the State of Open Access Publishing

OA self-archiving has come to be called the “green” road to OA (or “Green OA”), to distinguish it from OA journal publishing, which is called the “gold” road to OA (“Gold OA”). The most frequent misconception about OA is that OA only means Gold OA (publishing).

Stevan Harnad posted an announcement of this pre-press version of an article of his to appear in a future issue of Logos Journal. Although some of my colleagues on FriendFeed take issue with the premise that is the title of the article -- "Gold Open Access Publishing Must Not Be Allowed to Retard the Progress of Green Open Access Self-Archiving" -- I think first half of the article gives a concise overview of the history of open access publishing and sound definitions of "Gold" open access publishing (where the publisher makes the articles freely available) and "Green" open access publishing (which is author self-submission in public archives). [Via Celeste Feather]

Different Kinds of Data Models: History and a Suggestion

In this article, David C. Hay tries to redeem himself for some of his contributions to controversy in the data modeling world.

So, where does this leave our original problem with conceptual and logical models? I hereby modify my original organization described above. Harking back to the original ANSI ideas about the “External,” “Conceptual” and “Internal” schema, as updated by the upgraded Zachman Framework, I propose the following definitions:

Conceptual Model – Any model that describes the business. It may be one of the following:

Strategic Model – This may be a model of basic terms, linked with many-to-many relationships, if desired, but focusing on establishing basic categories.

Business Owner’s Model – This is about the semantics of the organization. If appropriate, entity/relationship models can be developed, but more useful is an SBVR analysis, and OWL descriptions. This is the “Semantic Model” (the “external schema” in the original ANSI view).

Architect’s Model – This is an entity/relationship model of fundamental entity classes, encompassing as much of the enterprise coherently as possible. This is the “Architect’s Model” (the “conceptual schema” in the original ANSI view).

Technology Model – Any model that reflects the technological environment being addressed. It may be one of the following:

Designer’s Model – In the data world, this is the model that accommodates the technology being used for data management. It may be in terms of tables and columns, object-oriented classes, dimensions, XML tags, or whatever. This is the “Designer’s Model” (the “logical” part of the “internal schema” in the original ANSI view).

Builder’s Model – This is the configuration of physical databases, tablespaces, or even cylinders and tracks of the physical database. The builder is the one who spreads the “People” table over three continents (the “physical” part of the “internal schema” in the original ANSI view).

The working system

For all of the developer geeks in the DLTJ community is this article by data modeling luminary David Hay. What is the physical model versus the logical model, and how are they related to the conceptual model? The answer varies depending on who you ask (and who's writings you might be a disciple of). In this brief note, David proposes new definitions for these concepts in the hope of moving the conversation forward. [Via Ron Murray]

The text was modified to update a link from http://dewey.library.nd.edu/mailing-lists/ngc4lib/ to https://listserv.nd.edu/cgi-bin/wa?A0=NGC4LIB on November 16th, 2012.

The text was modified to update a link from http://www.library.yale.edu/~llicense/ListArchives/1012/msg00064.html to http://liblicense.crl.edu/ListArchives/1012/msg00064.html on November 21st, 2012.