Skip to content
Solely for the Purpose of Catching $PAMRZ

Beyond Federated Search Redux

It started with a post by Carl Grant on the Federated Search Blog: Beyond Federated Search – Winning the Battle and Losing the War?. I bookmarked this in Delicious and copied this extended quote from the text into the bookmark:

I’ve long argued that librarianship on top of digital information is about the authority/authenticity/appropriateness of the information provided to the user, as opposed to the overwhelming amounts of information available via other search tools that don’t provide that differentiation. In order to meet those tests, one thing that is clear is that libraries and librarians should never cede control to other organizations over the content they offer to their end-users. It doesn’t matter if that happens because the content providers fail to provide access via federated search, or whether the library has allowed third party organizations to determine what content they can access via a local index discovery tool. Ceding this control cripples the ability of a library to build unique and precise informational offerings that target the needs of their end-users.

This in turn got pulled into my FriendFeed stream and the ensuing discussion seemed too valuable to let sit there, so I’m creating this post with those replies and adding a little bit more of my own thoughts. (Since all of these were public comments, I believe it is good nettiquete to reproduce them here with attribution. If not, please let me know…particularly if you are one of the people quoted!)

Dorothea Salo was the first to post a comment:

1) We HAVE ceded control. So what do we do about that? 2) Authority/authenticity doesn’t mean jack to the satisficing patron. Which is IMO most of them.

This was followed shortly by Deepak Singh:

That control is long gone. I think people do care about authority, but IMO, that will come from outside the library community, at least on the technology side.

Does everyone really think we have ceded control? I think we still have it; we just don’t market it as an asset to the user like we should/could. It is “the discovery layer problem” that we are all trying to tackle. My take on it is that we should put all of the information we can into a unified index with a user interface as simple as Google but with the added advantage of improving relevance of results via fielded data and librarian vetting for authority/authenticity/appropriateness. I subscribe to the notion that federated search can’t take us far enough … that there is benefit in bringing together metadata for our vetted resources and expanding/enhancing the metadata. This added-value metadata comes in computing relationships and relevancy between records, attempting to apply uniform headings on records based on machine heuristics, and other tricks that can’t be done in real time with small subsets of data that we get back through federated search interfaces.

Richard Ackerman then jumped into the conversation with an excellent point about misplacing focus on the user interface itself:

I think to some extent it doesn’t matter if we’ve ceded control or not – we’ve been having this discussion/argument at my office – the tech architects’ side being that, if we want to add value at all, we have to build a discovery layer anyway – which we will expose in many different places, including browser extensions – but once you build it, the cost to also show it as a searchbox on your website is low. In other words, it doesn’t matter that “they won’t come” – the website is free anyway since you need to build the underlying infrastructure if you ever want to have a hope of delivering enhanced services around content and metadata. I also think “search in this box, and discover far more of the millions of dollars of content that we license for you than if you search in THAT box (e.g. google)” has got to be a compelling argument… surely? Researchers, what do you think?

In a FriendFeed comment, I thanked Richard for reminding me about how this concept is more than the user interface. I “know” that — it is the cornerstone of OhioLINK’s discovery layer strategy — but I haven’t internalized it in my thinking very well.

And a follow-up from Dorothea:

Yes, we have ceded control. We cannot insist that any given vendor support either an API or a data-provision protocol. Until we CAN, we have ceded control of discovery. Yet one more reason to dump the vendors in favor of OA. </radical>

I think Dorothea’s argument is stronger for supporting open source software than open access content. With an open source software solution, we can see the innards of the data and create the APIs we need to make extended use of that data. Richard also followed up on Dorothea’s comment:

Dorothea, I keep hearing that story internally too – “oh won’t it will be great when all the publishers are gone and the library can be the Temple of OA”. The whole point of OA is that anyone can have it, anywhere. Considering we do a terrible job of helping our users find content that is licensed from a few huge publishers, we’re going to do better when content is scattered all over the place? Since it’s OA, what’s to stop a thousand startups from loading it all on their local harddrives? Doesn’t OA just take the problem from libraries (who pay millions of dollars to license content) doing a terrible job even though they have a perfect right to intermediate access, to libraries (who pay nothing for OA content) trying to out do *every other web search engine on the entire web, a battle which we were never in and lost long ago*? OA doesn’t make things better, it make them much, much worse for libraries. (and as always, I mean pure special/research libraries, not public libraries or unis)

William Gunn posted a comment:

It’s a compelling argument, but I haven’t seen an implementation that lives up to the promise. Pubmed’s “searching pubmed for X will give Y results” message that it shows people arriving there via google search is the closest thing I’ve seen that actually shows more value in searching using their interface than using Google. Most in-library search functions I’ve seen (admittedly not many) are woefully bad, but they are probably the third-party things Dorothea is railing about.

I didn’t post this comment to FriendFeed, but I agree with William’s assessment. In fact most of the innovation in end-user interfaces is coming out of the libraries themselves, public and academic, and not coming from the traditional vendor community. I’m thinking of projects like VuFind, Haithi Trust, the OLE Project, and others. There are some notable exceptions to this — the demonstration of Serials Solutions Summon, for instance, at ALA Midwinter is one example. But on the whole I think libraries are putting sweat equity into evolving or recreating their digital presence.

Dorothea followed up on Richard’s comment:

I guess it depends on where the money turns out to be. I’m actually not all that troubled about libraries getting pushed out of the discovery business; if it can be done better and cheaper elsewhere, fine and dandy. If games start being played, libraries can get back in and compete as long as everything’s still OA. We libraries suck enough at this that I think it’s something we should stop doing.

I replied to Dorothea that I think we need to stick with the discovery end of the trade until the context sensitive linking — e.g., get the user to the appropriate copy — is better. What I don’t want to end up with is giving up on the discovery layer to the point where users aren’t coming to content that we have paid for on their behalf. Perhaps that will be the day when everything is open access, but can’t hold my breath that long.

The last word at the moment goes to Dorothea:

Understood and agreed, Peter. Though I have days where I wish we’d just tell ‘em “if I can’t get my patrons easily and quickly to your stuff, it is WORTHLESS, ergo I will no longer pay for it.” That doesn’t necessarily have to mean OA, of course.

That is the conversation so far. Do you have any thoughts? Please add them here or on the original FriendFeed post. I should note that the WordPress plug-in I was using to shuttle comments between DLTJ posts and FriendFeed isn’t working at the moment, so I may need to edit this post with interesting comments that come from FriendFeed (and vice versa).

9 Comments

  1. Jonathan Rochkind | April 1, 2009 at 6:34 pm | Permalink

    In fact, I think the Summon approach, which is what Carl originally wrote was a marker of our loss of control, _potentially_ provides the _technological_ means for us to regain control.

    Consider broadcast federated search. We are _stuck_ with the search packages that vendors give us. You can offer a federated search that combines a particular EBSCO db with a particular Wilson db. But there’s no good way to provide a search that searches only certain journals accross both dbs — unless EBSCO or Wilson provide packages with those journals.

    With Summon, on the other hand, _technologically_ you could provide a search accross only certain journals, perhaps organized in subject sets to YOUR liking.

    Of course, realistically, who the heck has time to create and maintain such sets of journals, accross the tens of thousands of journals that we have? But I understand that SerSol hopes to create some themselves. You won’t have to search accross everything in Summon — for instance, you can already limit to just things your institution has in full text. Something our users really want, and which is nearly impossible with broadcast search.

    I still don’t understand why Carl or Sol think that the Summon approach will lead to less control than we have now. We already have not that much control, at the whim of our vendors. That may not be a good thing, but what makes Summon a step backwards exactly? If we need to do meta-search somehow (and Carl already argued we did at http://federatedsearchblog.com/2008/10/27/we-don’t-really-need-metasearch…/; I agree) … what’s Carl’s suggestion of how to do it with more control? Current broadcast search technology sure isnt’ doing it.

    [cross-commented to federated search blog]

  2. the Jester | April 3, 2009 at 12:07 pm | Permalink

    It is probably that my impressions of Carl’s post are colored by the fact that OhioLINK is in the process of building/acquiring its own unified index to content. We go into this with the weight of the consortium to demand that the unified index be representative and comprehensive of the underlying content that we’ve licensed. The only exception would come in cases where content providers won’t give up the data to put into a unified index; those will be searched via a metasearch engine. We’d be prepared to push back, though, and say that the content that is searched via the metasearch engine will inherently have a “second class” status in the user interface.

    You bring up good points about how Summon, in particular, doesn’t represent a step backwards. I wonder if Carl will formulate a response on the original blog post.

  3. Brian Despain | April 7, 2009 at 7:09 pm | Permalink

    A unified index meta data repository like Summon doesn’t solve the access issue but rather it solves a user experience problem in the speed of broadcast searches. In the end I think it’s going to be difficult to get publishers to agree to provide their meta data for indexing. At Deep Web we have been developing hybrid solutions like Summon for some time – it really helps speed results and helps normalize data across publishers. I am not sure that treating sources that don’t want to put their meta data in the repository like second class citizens is the way to go, it seems like a harsh stick but it goes with the carrot that putting your meta data in the repository allows you to scale your content without a huge investment of IT resources.

  4. the Jester | April 8, 2009 at 10:17 am | Permalink

    Just for clarity, I don’t think we can call Summon a hybrid solution. They have stated quite emphatically that they have no intention of binding a federated search solution with their metadata index. Personally, I think that is short-sighted because I anticipate there will always be cases where the metadata can’t be integrated into a unified index. If an institution desired, though, the Serials Solutions business model does seem to offer the opportunity for an institution to subscribe to the API version of Summon and build its own interface that integrates a federated search system.

  5. Jonathan Rochkind | April 8, 2009 at 10:27 am | Permalink

    True, Peter, but they’ve also stated clearly that the out-of-the-box interface will be an open source (PHP I think) app which uses an API to the actual Summon functionality — the same API which will be available to customers too.

    So there’s nothing to stop customers from writing a front-end that IS a hybrid solution, using an external broadcast search solution. I mean, you would have trouble merging the results from both, but you could provide the results in seperate listings, which is the only realistic thing to do anyway, I think.

    I think it’s reasonable for Summon to limit the scope of their project to not include broadcast search, but provide the hooks for you to easily combine it with broadcast search in your own hybrid solution. The scope of Summon is big enough already!

    I’m actually not sure whether to be optimstic about _publishers_ sharing metadata with Summon. I would be pessimistic about an A&I db or aggregator sharing metadata with Summon, becuase summon will be seen as a competitor. But if Summon isn’t actually providing access to human-readable fulltext (as they are not) — do most publishers make much money off search of their content alone? Maybe the biggest ones like Elsevier, which are also aggregators. But for a publisher who isn’t also an aggregator, I’m somewhat optimistic that they’d see it as in their interests to share rather than withhold metadata. Having the technical infrastructure to easily and regularly deliver the metadata is another story though.

    (And, by the way, I consider plain text full text used for matching queries but NOT used to actually deliver a human-readable version — to be ‘metadata’. )

  6. the Jester | April 8, 2009 at 10:49 am | Permalink

    Good point about their intention to release the interface of Summon that we saw at ALA Midwinter into open source. Time will tell if it can be readily modified. I’ve also heard from good sources that there is an active effort to put Aquabrowser Library atop the Summon API.

    Here in Ohio, the intention our project is to merge the unified index data with the federated search data. My gut tells me it will be possible, and I’ve seen evidence in some systems that it can happen. We’ll see if it happens at scale, though.

    As we chatted about on the code4lib IRC channel I’m somewhat optimistic that A&I providers may see benefits from putting their data into Summon. This is based on the assumption that they are getting paid for their data, so it represents a new revenue stream for them. (I don’t know if this is actually the case.) Unified indexes like Summon aren’t necessarily a threat to their core users because the heavy duty researchers will likely need the advanced indexing and search capabilities that an aggregator like Summon won’t be able to provide. For OhioLINK’s project, we are operating under the assumption that the unified discovery layer represents a new interface to the data; it doesn’t replace existing interfaces where that advanced functionality will still reside.

  7. Brian Despain | April 8, 2009 at 11:00 am | Permalink

    I don’t see any reason why merging the unified index and the federated search data. We have done that for multi-million document full text document sets and federated search results. It can be made fairly seamless to users and offers the up to date results of broadcast search and the speed & flexibility of an index. I think you are right the unified discovery layer represents another way to search and explore underlying data, that will most likely lead to increased database usage as users use the unified index as a discovery tool

  8. Jonathan Rochkind | April 8, 2009 at 11:04 am | Permalink

    Brian, how can you present a merged result set without waiting for the broadcast search to complete, thus bringing everything down the slowest common denominator and NOT offering the speed of an index?

    Oh, you’re presenting incremental results?

    That’s a whole different UI challenge that I’ve basically given up on. I can’t figure out any way to have incremental results that are not REALLY confusing to users.

  9. Brian Despain | April 8, 2009 at 11:10 am | Permalink

    We have also discussed giving users sliders to indicate their preference for newer results or more complete results. The point in the UI is to be seamless to users where possible and let users make their choice based on their research needs. If Summon offers a robust enough API, it might be possible to integrate outside federated search results. I think it’s a bit short sighted as well since there always going need to bring some data outside the index.

Post a Comment

Your email is never published nor shared. Required fields are marked *
Human Detection Scheme
(What's this?)
Comment Preview

Additional comments powered by BackType

Subscribe without commenting

From the Disruptive Library Technology Jester (http://dltj.org/), printed on Saturday the 20th of March 2010 at 1:38:53 AM EDT (-0400). The URL to this page is http://dltj.org/article/beyond-federated-search-redux/

[Creative Commons Logo] This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.