Jester's Cap

Disruptive Library Technology Jester

We're Disrupted, We're Librarians, and We're Not Going to Take It Anymore

Main menu

Skip to primary content
Skip to secondary content
  • About the Blog
  • About the Author
  • About the Tagline
  • Comment Policy
  • Contact

Post navigation

← Previous Next →

Can Google be Out-Googled?

Posted on July 30, 2006 by Peter Murray
This entry was posted in Disruption in Libraries and tagged Amazon, disruptive innovation, Google, library 2.0, metadata by Peter Murray. Bookmark the permalink.

I have been heard to remark to other librarians on occasion a comment along the lines of “Don’t fear Google; Don’t Chase Google; Let’s Out-Google Google!” After allowing the confused stare linger for a moment or the hysterical laughter die down, I explain my thesis: we have something Google doesn’t have — no, it isn’t the selective care with which we select “authoritative” material (the PageRank algorithm does a pretty good job at that); and no, it isn’t our warehouses of books (the Google Book Search project will pretty effectively capture that) — we have faceted metadata. And lots of it.

Google is a big, full-text search engine. It has various algorithms for parsing the content of a web page to determine what pieces are more important than others, other algorithms for examining user’s natural language query to guess what the user is really seeking (sound like a reference interview?) then yet more algorithms for relevance ranking that will push certain pages to the forefront. And guess what, it (and its peer search engines) do a pretty good job. But at their core they are still full-text search engines and a lot of educated guessing.

Libraries, on the other hand, are awash with metadata. And, based on our obsession with it, we do a pretty good job. At OhioLINK, I’ve been privy to many a meeting where the merits of one vendor’s metadata structure and quality were pitted against another, always in pursuit of the best possible metadata surrogate for our users.

Who is Our Real Competition? Google or Amazon?

To bring my out-Google Google thesis into sharp relief, I offer: with respect to search engines, who is a library’s biggest competitor — Google or Amazon? Okay, let me ask it another way. Say you have a four-year-old who adores Disney’s The Lion King (not a big stretch for my imagination — how about you?) and she wants to have all sorts of related artifacts for her birthday party. You’d like to bake a cake or cupcakes or something like that for her. Where would you start your search? Google or Amazon?

If you started from Google with a search for “Lion King Cupcakes”:

Google Search for "Lion King Cupcakes"

Google Search Results for “Lion King Cupcakes”. Note: to save space in the screen-shot, the Google-supplied advertisements and irrelevant portions of the page have been removed.

Ironically (for this example) your first two hits are into Amazon pages. The next two hits are to a party store — also a likely source of useful stuff, but you’d actually have to visit those pages to be sure. (Google’s algoriths are just guessing what is on the party store pages, after all, right?) The last three hits are likely not relevant at all…a Blogspot page “posted by Cupcake”, an entry into the IMDB for one of the voice actors (“cupcake” shows up in the title of another one of the actor’s films), and an-odd-and-likely-irrelevant page from MySpace with the title “www.myspace.com/killsxthrills”. After looking through these pages, you’d then have to do another search for “Lion King Cake” and wade through those search results as well.

If you started from Amazon with a search for “Lion King” (more general than the Google search, and you’ll see why that is better in a moment) you’d get the expected results of soundtracks, movies, etc. in the main search result area:

Amazon Search for "Lion King"

Amazon search for “Lion King”

But along the left side is a breakdown by, in Amazon’s case, product type. Look at the green-highlighted one — “Gourmet Food” … that sounds promising:

Amazon Search for "Lion King" Refined

Amazon search results for “Lion King”, refined to “Gourmet Food”.

Ah! Here we go: one that I know I can throw out because I’m not interested in fruit snacks and three — two decorating kits and one pre-made party favor cookies (no baking on my part!) — that are clearly relevant. And just in case the were to be a flood of foodstuffs related to The Lion King, the faceted breakdown continues on the left side with the “Narrow by Category”, “Narrow by Cuisine”, “Narrow by Brand”, and “Narrow by Price” options.

That’s the power of metadata in a search context. As libraries, we pay a lot of money for that metadata but don’t make nearly as much use of it as we could in our search and browse interfaces.

Is Out-Googling Possible? A Brief History of the Networked Information World

With me so far? So faceted browse/search with our existing rich metadata is s good thing. But can we really take on the Google Giant? Even the absence of insider information or a business plan to take on Google, you have have to consider an emphatic “yes!” as the answer. Here’s why.

Remember Gopher? That was going to revolutionize the world of information retrieval, and “Veronica” (the very easy rodent-oriented net-wide index of computerized archives) was right there helping us ferret out (sorry for the third rodent reference…) information from the thousands of Gopher servers around the world. 1

But then this web thing came a long, and we dropped Gopher/Veronica (and WAIS, for that matter) quicker than a chipmunk being chased by a fire-breathing dragon (okay, that really stretched the rodent-to-firefox metaphor). Instead we picked up Mosaic and its follow-ons, and a new search engine — AltaVista. How many of us (that were around at the time) set our browser home pages to AltaVista? Nothing could beat it, and we all thought we were seeing the end of libraries.

Then what … remember? Yahoo came onto the scene and we all set our home pages to that. Yahoo was indexing the internet our way (with labels and hierarchies), and was searchable too. Did we think that anything could stop Yahoo?

Has your home page (or at least your information discovery tool, a.k.a. “search engine”) changed since the AltaVista and Yahoo days? Did you have a passing fancy with NorthernLight (blue folders!)? Have you eyed other tools?

If you answered “yes” to any of those questions, can you really say that Google will remain supreme? And if not Google, then what? Could we (the library community writ large) build a knowledge discovery using as a source all of the faceted metadata we’ve produced in the last 30 years (MARC catalogs, Indexing/Abstracting, Citations, etc.)? And if we did it more like Amazon that Google, could we out-Google Google?

Doubts To My Own Vision

Can we do it? As the lead (and perhaps only) cheerleader for the Out-Google Google thesis, I’m starting to have my doubts. First came what looked like automated faceted analysis in Google. That turned out to be Google Co-op Health — a human-driven effort to add metadata to selected websites that appear in the search engine results. (By the way…why didn’t we [the library profession] think of that? And now that it has been thought of why aren’t we doing more with enlisting the aid of experts from their own field in categorizing their segment of the world of information?)

The second, through an odd time warp, was an article in the Washington Post from last November called What Lurks in Its Soul? (I have to say “odd time warp” because although the article was published nine months ago, I only ran across it today after a Technorati search for “disruptive change AND (libraries OR library)” pulled up a blog entry from four months ago on the article.) Here is the lead paragraph:

What Lurks in Its Soul?

By David A. Vise

Sunday, November 13, 2005; Page B01

The soul of the Google machine is a passion for disruptive innovation.

Powered by brilliant engineers, mathematicians and technological visionaries, Google ferociously pushes the limits of everything it undertakes. The company’s DNA emanates from its youthful founders, Sergey Brin and Larry Page, who operate with “a healthy disregard for the impossible,” as Page likes to say. Their goal: to organize all of the world’s information and make it universally accessible, whatever the consequences.

So I’m not sure anymore — my faith in our ability to win a black-and-white, us-versus-them battle with Google for knowledge retrieval supremacy has been shaken. Do we cede that ground to Google? Can Google, in an Innovator’s Dilemma disruptive fashion, be out-Googled? Are we, the library community, the one’s to do it?

Your thoughts?

[Edited 20060731T0826 to fix HTML and text typos.]

The text was modified to remove a link to http://www.google.com/coop/topic?cx=health_devel on November 17th, 2010.

The text was modified to update a link from http://library.csun.edu/About_the_Library/asrs.html to http://library.csun.edu/About/ASRS on December 31st, 2010.

Footnotes

  1. In case the University of Oklahoma libraries decides that Veronica information on their website is no longer relevant, you can pull up this page out of the Internet Archive. [↩]
Link to this post!

Share this:

(This post was updated on 30-Dec-2010.)

Links in "Can Google be Out-Googled?"

Tags for "Can Google be Out-Googled?"

Find Related Content: within DLTJ Technorati del.icio.us Wikipedia
Amazon Find posts tagged 'Amazon' in DLTJ Find posts tagged 'Amazon' in Technorati Find posts tagged 'Amazon' in del.icio.us Find posts tagged 'Amazon' in Wikipedia (English)
disruptive innovation Find posts tagged 'disruptive innovation' in DLTJ Find posts tagged 'disruptive innovation' in Technorati Find posts tagged 'disruptive innovation' in del.icio.us Find posts tagged 'disruptive innovation' in Wikipedia (English)
Google Find posts tagged 'Google' in DLTJ Find posts tagged 'Google' in Technorati Find posts tagged 'Google' in del.icio.us Find posts tagged 'Google' in Wikipedia (English)
library 2.0 Find posts tagged 'library 2.0' in DLTJ Find posts tagged 'library 2.0' in Technorati Find posts tagged 'library 2.0' in del.icio.us Find posts tagged 'library 2.0' in Wikipedia (English)
metadata Find posts tagged 'metadata' in DLTJ Find posts tagged 'metadata' in Technorati Find posts tagged 'metadata' in del.icio.us Find posts tagged 'metadata' in Wikipedia (English)

Related Posts on Disruptive Library Technology Jester

  • Automated Faceted Analysis In Google?
  • Google News Archive Search — Where Are the Links to Content from Libraries?
  • Analysis of Google Scholar and Google Books
  • Google Sets Its Sights On Hosting Knowledge
  • Google Book Search Settlement: Introduction, Public Announcements

Track and Share With Others

• Technorati iconTechnorati Cosmos

• TrackBack URI


Logging In...

Profile cancel

Sign in with Twitter Sign in with Facebook
or

Not published

  • 13 Replies
  • 11 Comments
  • 0 Tweets
  • 0 Facebook
  • 2 Pingbacks
Last reply was November 17, 2010
  1. Out-Google Google at ebyblog
    View August 7, 2006

    [...] pmurray has an article on google, amazon and out-googling google. It’s worth a read if just to start a discussion. Can we do it? As the lead (and perhaps only) cheerleader for the Out-Google Google thesis, I’m starting to have my doubts. First came what looked like automated faceted analysis in Google. That turned out to be Google Co-op Health — a human-driven effort to add metadata to selected websites that appear in the search engine results. (By the way…why didn’t we [the library profession] think of that? And now that it has been thought of why aren’t we doing more with enlisting the aid of experts from their own field in categorizing their segment of the world of information?) [...]

    Reply
  2. LibrarianInBlack: Out Google the Google Monster
    View September 11, 2006

    Kramer auto Pingback[...] The Disruptive Library Technology Jester has an interesting discussion: "Can Google be Out-Googled?"   Libraries have something Google doesn’t–oodles of faceted metadata.  Can this save us, make our resources more useful, more findable, in more ways?  Do we even want to compete with Google on this front?  Are we also competing with Amazon’s database and search function?  It’s a good read, well worth the few minutes it will take you to complete. [...]

    Reply
  3. Sergio Berna
    View October 4, 2006

    First of all I would like to say that I enjoyed reading the article, it is very well written and summarizes in a very precise form the writers point of view about the problem.

    Nevertheless it starts with a very interesting point that is not further followed. The main question is, Do you compete at all?. Or stated in a more exact form, do you need to compete at all?. In those 2 questions I want to imply that maybe, google is no competition for you.

    Lets have a look at the Google and Amazon search engines example from another point of view. The user.

    What the user wants / needs is a solution for his problem, find lion king cupcakes. He doesn’t really think about metadata at all. He just thinks about his problem and as such goes to massive text search engine like google. There, google, excelling at his core business directly redirects the user to the best place available to satisfy the user perceived need. Amazon.

    No competition among Google and Amazon as such, and I don’t think it is in the mind of neither of them to compete. Google is certainly not competing with Amazon since it is the first reference to happen. And I don’t really think Amazon thinks Google is competing with him since Google is driving clients right to its grasp.

    Lets get back to the user. In his mind there are two processes, the first process is oriented towards locating something that satisfies his need. The second process is directed towards obtaining that very same thing in the (easiest?, cheapest?, most secure?, just choose your adjective) way. There we see a collaboration in locating the subject and after that Google disappears and Amazon is all alone in finally satisfying the client. Is Amazon doesn’t satisfy the client both Google and Amazon lose. On the other hand if the client leaves Amazon as a happy buyer both win.

    So the question is, Why compete at all?. Google is not an specialized, categorizing, need oriented catalog. It is just a very smart search engine. In mi opinion it is so good at doing what it does because it presents the most simple interface ever shown. A simple textbox and a button. You cant simplify more without showing a blank page. The message is clear to the user:

    “hey just tell me what you need in three or four words, ill do the first part of the road for you and will set you in the portal of the best specialist available”

    Thinking in a Google interface with combos and such simply breaks the Google concept (have any of you ever used the advanced search interface?). But you can’t have specialized advice without those. In it is there where there is plenty of space to breathe and where I see lots of opportunities for cooperative-competing.

    Reply
  4. the jester
    View October 6, 2006

    I think you may have missed the main thrust of the posting. It is common rhetoric in the North American library community to make library services “appear more like Google.” As you indicated in your comment, a simple search box that encourages the user to put in just three or four words about the information being sought is, if usability studies are to be believed, a very appealing interface that works well in stark contrast to the prototypical library service interface.

    Rather, the main focus of the posting was how to make our complicated library service interfaces as appealing as Google yet deliver a better end result to users. My argument was that the answer lies in faceted metadata, as demonstrated by the Amazon interface and, to a limited extent, the NCSU Libraries Catalog interface. Users can be presented a rich set of exploratory and limiting functions after the initial three or four words if the descriptive metadata exists to drive the interface creation. Amazon has that rich metadata, Google does not, and you can see the effects of that in how they present search results.

    (The “limited extent” comment with regards the the NCSU Libraries Catalog interface, by the way, refers to the fact that its interface is only useful for finding printed and bound aggregate volumes of material — otherwise known as “books.” It is like the first Amazon interfaces that were limited to just books because that was the only thing in Amazon’s database. Through Amazon’s interface, however, you can now discover a wide range of items, and the next generation discovery interface in libraries should find not only books, but article, maps, pictures, datasets, websites, and other relevant material…all from the same simple search box.)

    Reply
  5. Sergio Berna
    View October 9, 2006

    I see your point. It is true that user experience is what finally draws the line between a successful and commonly used application and a well thought application that is simply not used.

    As far as my experience with library catalog search applications is concerned I have always found the interface hard to use and difficult to understand (IBM370 terminals and such). Maybe because it was mainly directed towards making the most of the metadata accessible through search. But as an user I always had the problem to first understand how to map the concepts in my mind to the metadata used to map de concepts by the application and second to understand the search results returned by the application. Luckily the librarian was always near to lend a helping hand.

    To be frank that’s the main problem I see with metadata. It represents the world as the cataloguer understands it using his own knowledge base. But a different user may find it much harder to locate that very same book using concepts that are alien to him. So maybe a lot of metadata gets to a point where it is too much metadata and simply adds to the noise making it more difficult to retrieve the desired result from a search query.

    Maybe that’s a strong point with Google. It has so many cataloguers available (web authors) that for every concept expressed by an user using words or word combinations it is able to locate several HTML pages where an author has expressed that very same concepts using similar words. It will provide also pages that are not closely related to the concepts the user had in mind. But attempting a best effort search over so many data (not metadata) is able to locate more results than the ones that would be possible to obtain through an exact term metadata search.

    So maybe a good question is, when do a lot of metadata becomes too much metadata?

    Reply
  6. the jester
    View October 9, 2006

    Your perspective very interesting and somewhat refreshing — thank you for continuing the conversation.

    [quote comment="5521"]As far as my experience with library catalog search applications is concerned I have always found the interface hard to use and difficult to understand (IBM370 terminals and such).[/quote]

    Based on this comment and others in the context of this dialog that you are neither a librarian nor a library professional. (That is what is making this conversation so refreshing!) Please correct me if I’m wrong.

    [quote comment="5521"]So maybe a good question is, when do a lot of metadata becomes too much metadata?[/quote]

    This is a very keen observation, and I would offer the answer “when the metadata gets in the way.”

    If I may speak for the library profession as a whole, there is a debate going on about the role of metadata in providing access to information. It has been argued that we spend too much time on the description of “book” and “article” items when simply a search across their text is all that is required to pull them up in response to a user’s search request. It has also been argued that now is definitely not the time to abandon rigorous description of content by cataloguers — that it is now more urgently needed with the explosion of information.

    My own professional beliefs are somewhat scattered between these two extremes. On the one hand, this rich metadata has already been created and paid for so we might as well use it to its greatest extent. And “use it” does not mean the kind of in-your-face library catalog search applications that you rightly point out are hard to use and difficult to understand. Rather that metadata can be used in more subtle ways to guide the user’s discovery process as I hope is exemplified by the Amazon example.

    On the other hand, I believe that we can no longer afford to pay for the human effort tied up in the description of textual materials (books and articles, mostly). Amazon also shows us that it is possible to run computer algorithms across the corpus of textual material — its Statistically Improbable Phrases and Capitalized Phrases — that can approximate subject catalouging to the point where it is arguably “good enough” for the purposes of drawing together works on similar topics (which is what subject catalouging is all about anyway). Instead the efforts of the library profession should be put towards the textual description of items that as yet defy an algorithmic approach: images, audio, and video, for instance.

    And on yet a third hand — if I may have that many — is the role of the user-as-catalouger through link analysis, social bookmarking, collective annotation, and lots of other useful “web 2.0″ techniques.

    I’m still not sure Google is the best model for this, though, because it lacks one key ingredient: selection. Google’s web crawlers attempt to look at everything in the web, index it, and make the retrieval results somehow usable to the end user. In the next age of libraries — when perhaps we all have three hands — the selective application of computer algorithms coupled with user-driven annotation and professional description all over a targeted range of materials can “out-Google” the Google that we know today.

    Reply
  7. Sergio Berna
    View October 9, 2006

    First in answer to your question I’m neither a librarian nor a library professional. I come from the other side of the problem, the technical one. My expertise has always been directed towards analysis, design and development of metadata driven applications such as document and record management systems.

    Right now for example, I’m in the middle of a project involving fedora (Digital Object Repository) for the development of a records management system mainly oriented towards information preservation. Understanding by preservation that the content of the document must be accessible and usable in a far away future where technological innovation and evolution might render the previous format or system unusable. As such, we must not only provide the means so that the document format is updated accordingly to technical evolution but also so that the information provided to catalogue and locate the document (metadata) is evolved so that we also preserve the ability to locate that very same document.

    A good example would be converting all the pages contained in the library of congress to tiff and store them all away in a single HDD with a sequential name file contained in a single folder. We preserve the information as such, but nobody is going to be able to locate anything.

    As such, in this kind of application, metadata collection, cataloguing and evolution is the key to a successful application.

    I located your page while performing a state of the art search and got a very pleasant surprise in finding that not only there is someone on the other side of the problem worrying about these things such as metadata and its technical implications, but also they know very well what they are talking about. No offense meant by the previous comment but it is rare to find non-technical people really worried about the technical problems related to their area of expertise as it is also very difficult to find technical people worrying about the real problem and not on how to solve it. Maybe that’s why on most of the cases we come up with a very good solution for a problem that nobody has.

    Returning to the Google versus Amazon problem and your point of data contained in the document versus metadata about the document I would like to further follow your example search.

    In order to do so I have written “lion king cupcakes” on the Google search interface and have located Amazon on the seventh position of the search result. Then I have changed the search to “Lion King cupboards” and have been amazed to find Amazon on the first and second position of the search. What’s the difference?

    The main difference is that the product located under the “lion king cupboard” search had 5 user reviews while the “lion king cupcakes” had none. Not all the people that reviewed the first product had the same point of view. In fact what makes Amazon the first option to be located is that it contains 5 different points of view that depict the product as these 5 users see it. And two of them see the product fit for a cupboard.

    To summarize it is not the metadata that Amazon knows about the product what drives me directly to it. It is the comment of a user in a Web 2.0 way that sees the product closely to the way I see it and that uses the very same words I have used in my search. It is also true that once I have located the product it is the metadata I know about the product what really makes it useful.

    Maybe that’s the point behind the faceted metadata you mentioned earlier. The more facets your metadata has, the closest to a final user community it is and its usefulness in a search increases.

    Another point is that maybe the process is divided in two parts. The first part is locating the object, the second part is making it usable. Metadata is the key to second part, but in searching and locating the object maybe too much metadata simply gets in the way.

    Reply
  8. the jester
    View October 9, 2006

    If it helps explain my perspective, my university training is in systems analysis and I came into the library world purely by chance. So I can really appreciate much of what you are describing.

    Although I couldn’t reproduce your example using the North American Google search engine (based on your IP address I’m assuming you would be using the Spain edition of Google), I can agree with your assessment. In the case of looking for “lion king cupboard” your Google search picked up the most relevant hits — I wonder if even Amazon’s search engine would have picked them up in the comment fields of a product listing.

    Still, one has to wonder if at some point the raw keyword index across the entire web content is going to break down at some point. (We could probably find some who will argue that it already has.) In Amazon’s database, those comments are part of the product listing’s metadata (taking on a very liberal definition of the word “metadata” now). In Google’s database, it is most likely undifferentiated text. As a discovery facet, for instance, I think it would be useful to the end user to know whether ‘cupboard’ appeared in the “abstract” of the item or in a comment about the item. Well, not directly useful to the user, but the usefulness could be brought out in search results weighting, visual cues in the results listing, and post-search options (e.g. a checkbox to eliminate the occurrence of the word ‘cupboard’ in all item comments).

    In short, what I think we’re agreeing on is that information retrieval in a “web 2.0″ world is about three parts: the object itself, formal description or metadata, and annotations supplied by the end user.

    Glad to hear you are using FEDORA in your records management system. That strengthens my believe that we are using the right system here at OhioLINK for content preservation and delivery.

    Reply
  9. Sergio Berna
    View October 10, 2006

    [quote post="94"]Still, one has to wonder if at some point the raw keyword index across the entire web content is going to break down at some point.[/quote]

    My personal opinion is that it will break at some point in the future. A good example of that is that our conversation now appears the fifth at Google while trying to locate pages using the “lion king cupcakes” term. And I would be very hard pressed to believe that for someone using that terms to locate a page, our conversation is useful in any way.

    [quote post="94"]I think it would be useful to the end user to know whether ‘cupboard’ appeared in the “abstract” of the item or in a comment about the item. Well, not directly useful to the user, but the usefulness could be brought out in search results weighting, visual cues in the results listing, and post-search options (e.g. a checkbox to eliminate the occurrence of the word ‘cupboard’ in all item comments[/quote]

    You have a good point there. Not all the words used in a textual object description have the same location weight. Google already knows that, but the only thing it is able to do is to assign weight depending on the page position, surrounding words and page references.

    The real question behind is whether it can be done better without dramatically increasing the costs. And whether that cost increase has an adequate return in user perception.

    Imagine that while writing our opinions we had written them all surrounded with appropriated metadata such as <abstract>… <academicExample>Lion King Cupcakes</academicExample> … </abstract> or to use a better example lets imagine a book content indexing where we place metadata while indexing the content such as:

    And then <mainCharacter>Jhon</mainCharacter>, <secondaryCharacter>Mary </secondaryCharacter> traveled to <location>New York</location>.

    On the first example in case Google had inferred that our intent is to buy something it could have stripped our conversation based on the fact that there those words where simply used as an example.

    On the second example trying to locate a concrete book where the words Jhon and New York appear could be a real nightmare using a simple search and indexing engine. But maybe a book where Jhon is the main character is easier.

    And so we get to what I think is what you implied on your article. We have two totally different technologies. The first is text indexing. This technology ensures the best probability to locate a page related to the terms used while searching, provided that we have a rich and varied description of the subject. But it is the job of the user to get something useful out of it.

    On the other hand we have rich metadata cataloguing and truth inferencing engines where we can search in so many different ways that we can provide infinite customization to the query so that it reflects the exact user intent and locate only what the user wants / needs.
    On the first case the job is performed by the user after the query, on the second case the job is performed before the query.

    Which one is better?. Well the only thing I can say for sure is that the second is more expensive. My personal opinion is somewhere near yours in that a wise combination of both technologies might very well be the answer.

    Reply
  10. the jester
    View October 11, 2006

    [quote comment="5595"]My personal opinion is that [raw keyword indexing] will break at some point in the future. A good example of that is that our conversation now appears the fifth at Google while trying to locate pages using the “lion king cupcakes” term. And I would be very hard pressed to believe that for someone using that terms to locate a page, our conversation is useful in any way.[/quote]

    Yes, I noticed that myself — and was actually quite surprised how this one (dynamically generated) web page could leap up into the Google top 10 for ‘lion king cupcakes’. At the very least I’m going to have to find a better example the next I demonstrate these concepts live!

    [quote]Not all the words used in a textual object description have the same location weight. Google already knows that, but the only thing it is able to do is to assign weight depending on the page position, surrounding words and page references.[/quote]
    Right! Google can only make a guess based on context. In some cases (feeds of vendor inventory for Froogle, perhaps raw marked-up versions of stories for Google News, etc.) Google may have access to the underlying structure, but their efforts to this point seem to be using those structural semantics to tweak the relevance ranking algorithm. (Although there are some facets, such as ‘price,’ used in the Froogle interface.)

    [quote]We have two totally different technologies. The first is text indexing. This technology ensures the best probability to locate a page related to the terms used while searching, provided that we have a rich and varied description of the subject. But it is the job of the user to get something useful out of it.

    On the other hand we have rich metadata cataloguing and truth inferencing engines where we can search in so many different ways that we can provide infinite customization to the query so that it reflects the exact user intent and locate only what the user wants / needs.

    On the first case the job is performed by the user after the query, on the second case the job is performed before the query.[/quote]

    Ah, very clearly and succinctly stated. That is the crux of the matter, I believe. And I would agree that the second is somewhat expensive — particularly when it is human effort performing the metadata cataloging. Where I see promise is in the decrease the cost of computing capacity and the improvement of algorithmic approaches to automated description.

    To take a page from Clayton Christensen’s theory of disruptive innovations: is automated description of textual content good enough for some less-demanding users? The answer I think is yes. Is it good enough for high-demanding users as compared to human-driven description? No. Will it ever be? I think the answer here, too, is “yes.”

    “Good Enough” — in this context — is the user’s perceived performance of retrieval tools that use automated description versus those that use human-driven description. If/when the perceived performance of the retrieval tool based on automated description is ‘good enough,’ Christensen’s model then goes on to say that user choice is based on other factors such as ‘cost’. Assuming for the moment that the information technology solution will be cheaper than the human-driven solution, the users will use the information technology solution.

    Only time will tell, of course, how this will all pan out…

    Reply
  11. Sergio Berna
    View October 17, 2006

    [quote post="94"]To take a page from Clayton Christensen’s theory of disruptive innovations: is automated description of textual content good enough for some less-demanding users? The answer I think is yes. Is it good enough for high-demanding users as compared to human-driven description? No. Will it ever be? I think the answer here, too, is “yes.” [/quote]

    Very good point. The only thing I would add is that my impression as a technologists is that “will it ever be?, yes”, and very soon.

    We have the tools, we have the knowledge and we have the chance.

    Starting with projects such as Wordnet, or Eurowordnet.
    Text indexing initiatives such as google. Metadata cataloguing and triple stores search engines. And a lot of other technologies.

    And the most important tool of all, Web 2.0 interfaces that help to bring the user into the application my impression is that we are in the verge of a huge change.

    My impression is that up to now internet has been considered only as a huge automatic vending machine where I can simply place an order and get a result. It is curious but with all the technology available I can hardly see any portal where all the things they do cant be traced back to the automatic vending machine example.

    Lets examine two distinct and extreme situations that I think reflect the extent of the problem.

    Would you ever consider closing the library and placing instead an automatic book dispenser machine such as most of the video stores do?. I think not.

    Would you, in the other hand, close the retail shop and place all your employees in a portal where they can replicate the user retail shop experience through web technologies?. To the best of my knowledge “It has not been done”.

    Let me further follow the last example so that it might be fully understood. In this way imagine all your employees working at a contact center and available to answer Internet Telephony, Video Conferences, Online Chats, email requests and any other way of internal communication and cooperative navigation. In this way, when a “would be customer” enters into the portal (retail shop) he might look around a bit (search and navigation) but up into the portal he suddenly can read a message such as “welcome, can I help you?”, or he might see that there are 4 people ready to help him and upon selection a chat window, or audio / video conference is started as in a normal “real life” buying experience.

    In the middle of those two extreme initiatives comes this kind of technology we are talking about. What we are always trying to do is to emulate the user face to face real life buying experience through technology.

    In real life the customer comes into the shop. The librarian is able to see how the user is dressed, what books is he looking at. He is able to speak with him and in case he is a very good sales man to drive the user towards the thing he needs, using non-structured language. Finally what drives the difference between a good seller and a bad one is that he can know faster what the user wants through both visual and conversational skills. To summarize, he is able to “bring the customer into the deal”.

    I think this notion of “bringing the customer in” is what drives web 2.0 initiatives as such. And is the real breakthrough in my opinion.

    Reply
  12. the jester
    View October 19, 2006

    [quote comment="5727"]My impression is that up to now internet has been considered only as a huge automatic vending machine where I can simply place an order and get a result. It is curious but with all the technology available I can hardly see any portal where all the things they do cant be traced back to the automatic vending machine example.[/quote]

    As I think about it, I’m finding this to be a very useful analogy (not only for library sites but service websites at large). Of course we wouldn’t replace the library with an automated vending machine, yet our websites push the ‘customer’ that way. Nor do we want to.

    As you point out, there is a fine line between “stalking” the user (either through the website or in the physical world) and being readily available for help. Being ‘readily available’ also is an expense in human capital that needs to be used wisely, so as much as possible the technology needs to bring make that human-to-human contact as effective as possible. As you point out, perhaps one of the value-add for libraries to out-Google Google is that human touch to know “faster what the user wants through both visual and conversational skills.”

    Reply
  13. Ken Cooper
    View November 17, 2010

    Can Google be Out-Googled? http://bit.ly/bnnVEx

    Reply

Home

Search

Recent Posts

  • Code4Lib Journal Issue #20 Published; My Editorial: “It is Volunteers All the Way Down…”
  • Notes on the Code4Lib Virtual Lightning Talks
  • Interlibrary Loan Standards Undergoing Revision at the ISO Level
  • Vote for an ALA2013 Ignite Session on Open Source Communities
  • A Great iPad Keyboard/Case Combination: New Trent Airbender
  • ResourceSync Specification Draft Published for Comment

Archives

  • 2013: J F M A M J J A S O N D
  • 2012: J F M A M J J A S O N D
  • 2011: J F M A M J J A S O N D
  • 2010: J F M A M J J A S O N D
  • 2009: J F M A M J J A S O N D
  • 2008: J F M A M J J A S O N D
  • 2007: J F M A M J J A S O N D
  • 2006: J F M A M J J A S O N D
  • 2005: J F M A M J J A S O N D

Feeds and Such

  • Link to Podcast (RSS feed) for this blog
    Add Podcast to iTunes subscription
    Receive DLTJ by e-mail:


    Delivered by FeedBurner
  • View Peter Murray's profile on LinkedIn

Copyright

This work by Peter Murray is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States.

Creative Commons License
© 2013 | Theme based on Twenty Eleven by Wordpress.org | DLTJ strives for Standards Compliant XHTML & CSS | RSS Posts & Comments
From the Disruptive Library Technology Jester (http://dltj.org/), printed on Friday the 24th of May 2013 at 4:06:34 PM UTC (+0000). The URL to this page is

[Creative Commons Logo] This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
This work by Peter Murray is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States.