<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"	>
<channel>
	<title>Comments on: Can Google be Out-Googled?</title>
	<atom:link href="http://dltj.org/article/can-google-be-out-googled/feed/" rel="self" type="application/rss+xml" />
	<link>http://dltj.org/article/can-google-be-out-googled/</link>
	<description>We&#039;re Disrupted, We&#039;re Librarians, and We&#039;re Not Going to Take It Anymore</description>
	<lastBuildDate>Thu, 11 Mar 2010 18:47:21 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: the jester</title>
		<link>http://dltj.org/article/can-google-be-out-googled/comment-page-1/#comment-5783</link>
		<dc:creator>the jester</dc:creator>
		<pubDate>Thu, 19 Oct 2006 14:48:49 +0000</pubDate>
		<guid isPermaLink="false">http://dltj.org/2006/07/can-google-be-out-googled/#comment-5783</guid>
		<description>[quote comment=&quot;5727&quot;]My impression is that up to now internet has been considered only as a huge automatic vending machine where I can simply place an order and get a result. It is curious but with all the technology available I can hardly see any portal where all the things they do cant be traced back to the automatic vending machine example.[/quote]

As I think about it, I&#039;m finding this to be a very useful analogy (not only for library sites but service websites at large).  Of course we wouldn&#039;t replace the library with an automated vending machine, yet our websites push the &#039;customer&#039; that way.  Nor do we want to.

As you point out, there is a fine line between &quot;stalking&quot; the user (either through the website or in the physical world) and being readily available for help.  Being &#039;readily available&#039; also is an expense in human capital that needs to be used wisely, so as much as possible the technology needs to bring make that human-to-human contact as effective as possible.  As you point out, perhaps one of the value-add for libraries to out-Google Google is that human touch to know &quot;faster what the user wants through both visual and conversational skills.&quot;</description>
		<content:encoded><![CDATA[<p>[quote comment="5727"]My impression is that up to now internet has been considered only as a huge automatic vending machine where I can simply place an order and get a result. It is curious but with all the technology available I can hardly see any portal where all the things they do cant be traced back to the automatic vending machine example.[/quote]</p>
<p>As I think about it, I&#8217;m finding this to be a very useful analogy (not only for library sites but service websites at large).  Of course we wouldn&#8217;t replace the library with an automated vending machine, yet our websites push the &#8216;customer&#8217; that way.  Nor do we want to.</p>
<p>As you point out, there is a fine line between &#8220;stalking&#8221; the user (either through the website or in the physical world) and being readily available for help.  Being &#8216;readily available&#8217; also is an expense in human capital that needs to be used wisely, so as much as possible the technology needs to bring make that human-to-human contact as effective as possible.  As you point out, perhaps one of the value-add for libraries to out-Google Google is that human touch to know &#8220;faster what the user wants through both visual and conversational skills.&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sergio Berna</title>
		<link>http://dltj.org/article/can-google-be-out-googled/comment-page-1/#comment-5727</link>
		<dc:creator>Sergio Berna</dc:creator>
		<pubDate>Tue, 17 Oct 2006 12:30:01 +0000</pubDate>
		<guid isPermaLink="false">http://dltj.org/2006/07/can-google-be-out-googled/#comment-5727</guid>
		<description>[quote post=&quot;94&quot;]To take a page from Clayton Christensen’s theory of disruptive innovations: is automated description of textual content good enough for some less-demanding users? The answer I think is yes. Is it good enough for high-demanding users as compared to human-driven description? No. Will it ever be? I think the answer here, too, is “yes.” [/quote]

Very good point. The only thing I would add is that my impression as a technologists is that &quot;will it ever be?, yes&quot;, and very soon.

We have the tools, we have the knowledge and we have the chance.

Starting with projects such as &lt;a href=&quot;http://wordnet.princeton.edu/&quot; rel=&quot;nofollow&quot;&gt;Wordnet&lt;/a&gt;, or &lt;a href=&quot;http://www.illc.uva.nl/EuroWordNet/&quot; rel=&quot;nofollow&quot;&gt;Eurowordnet&lt;/a&gt;.
Text indexing initiatives such as &lt;a href=&quot;http://www.google.com&quot; rel=&quot;nofollow&quot;&gt;google&lt;/a&gt;. Metadata cataloguing and triple stores search engines. And a lot of other technologies.

And the most important tool of all, Web 2.0 interfaces that help to bring the user into the application my impression is that we are in the verge of a huge change.

My impression is that up to now internet has been considered only as a huge automatic vending machine where I can simply place an order and get a result. It is curious but with all the technology available I can hardly see any portal where all the things they do cant be traced back to the automatic vending machine example.

Lets examine two distinct and extreme situations that I think reflect the extent of the problem.

Would you ever consider closing the library and placing instead an automatic book dispenser machine such as most of the video stores do?. I think not.

Would you, in the other hand, close the retail shop and place all your employees in a portal where they can replicate the user retail shop experience through web technologies?. To the best of my knowledge “It has not been done”.

Let me further follow the last example so that it might be fully understood. In this way imagine all your employees working at a contact center and available to answer Internet Telephony, Video Conferences, Online Chats, email requests and any other way of internal communication and cooperative navigation. In this way, when a “would be customer” enters into the portal (retail shop) he might look around a bit (search and navigation) but up into the portal he suddenly can read a message such as “welcome, can I help you?”, or he might see that there are 4 people ready to help him and upon selection a chat window, or audio / video conference is started as in a normal “real life” buying experience.

In the middle of those two extreme initiatives comes this kind of technology we are talking about. What we are always trying to do is to emulate the user face to face real life buying experience through technology.

In real life the customer comes into the shop. The librarian is able to see how the user is dressed, what books is he looking at. He is able to speak with him and in case he is a very good sales man to drive the user towards the thing he needs, using non-structured language. Finally what drives the difference between a good seller and a bad one is that he can know faster what the user wants through both visual and conversational skills. To summarize, he is able to “bring the customer into the deal”.

I think this notion of “bringing the customer in” is what drives web 2.0 initiatives as such. And is the real breakthrough in my opinion.</description>
		<content:encoded><![CDATA[<p>[quote post="94"]To take a page from Clayton Christensen’s theory of disruptive innovations: is automated description of textual content good enough for some less-demanding users? The answer I think is yes. Is it good enough for high-demanding users as compared to human-driven description? No. Will it ever be? I think the answer here, too, is “yes.” [/quote]</p>
<p>Very good point. The only thing I would add is that my impression as a technologists is that &#8220;will it ever be?, yes&#8221;, and very soon.</p>
<p>We have the tools, we have the knowledge and we have the chance.</p>
<p>Starting with projects such as <a href="http://wordnet.princeton.edu/" rel="nofollow">Wordnet</a>, or <a href="http://www.illc.uva.nl/EuroWordNet/" rel="nofollow">Eurowordnet</a>.<br />
Text indexing initiatives such as <a href="http://www.google.com" rel="nofollow">google</a>. Metadata cataloguing and triple stores search engines. And a lot of other technologies.</p>
<p>And the most important tool of all, Web 2.0 interfaces that help to bring the user into the application my impression is that we are in the verge of a huge change.</p>
<p>My impression is that up to now internet has been considered only as a huge automatic vending machine where I can simply place an order and get a result. It is curious but with all the technology available I can hardly see any portal where all the things they do cant be traced back to the automatic vending machine example.</p>
<p>Lets examine two distinct and extreme situations that I think reflect the extent of the problem.</p>
<p>Would you ever consider closing the library and placing instead an automatic book dispenser machine such as most of the video stores do?. I think not.</p>
<p>Would you, in the other hand, close the retail shop and place all your employees in a portal where they can replicate the user retail shop experience through web technologies?. To the best of my knowledge “It has not been done”.</p>
<p>Let me further follow the last example so that it might be fully understood. In this way imagine all your employees working at a contact center and available to answer Internet Telephony, Video Conferences, Online Chats, email requests and any other way of internal communication and cooperative navigation. In this way, when a “would be customer” enters into the portal (retail shop) he might look around a bit (search and navigation) but up into the portal he suddenly can read a message such as “welcome, can I help you?”, or he might see that there are 4 people ready to help him and upon selection a chat window, or audio / video conference is started as in a normal “real life” buying experience.</p>
<p>In the middle of those two extreme initiatives comes this kind of technology we are talking about. What we are always trying to do is to emulate the user face to face real life buying experience through technology.</p>
<p>In real life the customer comes into the shop. The librarian is able to see how the user is dressed, what books is he looking at. He is able to speak with him and in case he is a very good sales man to drive the user towards the thing he needs, using non-structured language. Finally what drives the difference between a good seller and a bad one is that he can know faster what the user wants through both visual and conversational skills. To summarize, he is able to “bring the customer into the deal”.</p>
<p>I think this notion of “bringing the customer in” is what drives web 2.0 initiatives as such. And is the real breakthrough in my opinion.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: the jester</title>
		<link>http://dltj.org/article/can-google-be-out-googled/comment-page-1/#comment-5644</link>
		<dc:creator>the jester</dc:creator>
		<pubDate>Wed, 11 Oct 2006 13:45:53 +0000</pubDate>
		<guid isPermaLink="false">http://dltj.org/2006/07/can-google-be-out-googled/#comment-5644</guid>
		<description>[quote comment=&quot;5595&quot;]My personal opinion is that [raw keyword indexing] will break at some point in the future. A good example of that is that our conversation now appears the fifth at Google while trying to locate pages using the “lion king cupcakes” term. And I would be very hard pressed to believe that for someone using that terms to locate a page, our conversation is useful in any way.[/quote]

Yes, I noticed that myself -- and was actually quite surprised how this one (dynamically generated) web page could leap up into the &lt;a href=&quot;http://www.google.com/search?q=lion+king+cupcakes&quot; rel=&quot;nofollow&quot;&gt; Google top 10 for &#039;lion king cupcakes&#039;&lt;/a&gt;.  At the very least I&#039;m going to have to find a better example the next I demonstrate these concepts live!

[quote]Not all the words used in a textual object description have the same location weight. Google already knows that, but the only thing it is able to do is to assign weight depending on the page position, surrounding words and page references.[/quote]
Right!  Google can only make a guess based on context.  In some cases (feeds of vendor inventory for Froogle, perhaps raw marked-up versions of stories for Google News, etc.) Google may have access to the underlying structure, but their efforts to this point seem to be using those structural semantics to tweak the relevance ranking algorithm.  (Although there are some facets, such as &#039;price,&#039; used in the Froogle interface.)

[quote]We have two totally different technologies. The first is text indexing. This technology ensures the best probability to locate a page related to the terms used while searching, provided that we have a rich and varied description of the subject. But it is the job of the user to  get something useful out of it.

On the other hand we have rich metadata cataloguing and truth inferencing engines where we can search in so many different ways that we can provide infinite customization to the query so that it reflects the exact user intent and locate only what the user wants / needs.

On the first case the job is performed by the user after the query, on the second case the job is performed before the query.[/quote]

Ah, very clearly and succinctly stated.  That is the crux of the matter, I believe.  And I would agree that the second is somewhat expensive &#8212; particularly when it is human effort performing the metadata cataloging.  Where I see promise is in the decrease the cost of computing capacity and the improvement of algorithmic approaches to automated description.

To take a page from Clayton Christensen&#039;s theory of disruptive innovations:  is automated description of textual content good enough for some less-demanding users?  The answer I think is yes.  Is it good enough for high-demanding users as compared to human-driven description?  No.  Will it ever be?  I think the answer here, too, is &quot;yes.&quot;  

&quot;Good Enough&quot; &#8212; in this context &#8212; is the user&#039;s perceived performance of retrieval tools that use automated description versus those that use human-driven description.  If/when the perceived performance of the retrieval tool based on automated description is &#039;good enough,&#039;  Christensen&#039;s model then goes on to say that user choice is based on other factors such as &#039;cost&#039;.  Assuming for the moment that the information technology solution will be cheaper than the human-driven solution, the users will use the information technology solution.

Only time will tell, of course, how this will all pan out...</description>
		<content:encoded><![CDATA[<p>[quote comment="5595"]My personal opinion is that [raw keyword indexing] will break at some point in the future. A good example of that is that our conversation now appears the fifth at Google while trying to locate pages using the “lion king cupcakes” term. And I would be very hard pressed to believe that for someone using that terms to locate a page, our conversation is useful in any way.[/quote]</p>
<p>Yes, I noticed that myself &#8212; and was actually quite surprised how this one (dynamically generated) web page could leap up into the <a href="http://www.google.com/search?q=lion+king+cupcakes" rel="nofollow"> Google top 10 for &#8216;lion king cupcakes&#8217;</a>.  At the very least I&#8217;m going to have to find a better example the next I demonstrate these concepts live!</p>
<p>[quote]Not all the words used in a textual object description have the same location weight. Google already knows that, but the only thing it is able to do is to assign weight depending on the page position, surrounding words and page references.[/quote]<br />
Right!  Google can only make a guess based on context.  In some cases (feeds of vendor inventory for Froogle, perhaps raw marked-up versions of stories for Google News, etc.) Google may have access to the underlying structure, but their efforts to this point seem to be using those structural semantics to tweak the relevance ranking algorithm.  (Although there are some facets, such as &#8216;price,&#8217; used in the Froogle interface.)</p>
<p>[quote]We have two totally different technologies. The first is text indexing. This technology ensures the best probability to locate a page related to the terms used while searching, provided that we have a rich and varied description of the subject. But it is the job of the user to  get something useful out of it.</p>
<p>On the other hand we have rich metadata cataloguing and truth inferencing engines where we can search in so many different ways that we can provide infinite customization to the query so that it reflects the exact user intent and locate only what the user wants / needs.</p>
<p>On the first case the job is performed by the user after the query, on the second case the job is performed before the query.[/quote]</p>
<p>Ah, very clearly and succinctly stated.  That is the crux of the matter, I believe.  And I would agree that the second is somewhat expensive &mdash; particularly when it is human effort performing the metadata cataloging.  Where I see promise is in the decrease the cost of computing capacity and the improvement of algorithmic approaches to automated description.</p>
<p>To take a page from Clayton Christensen&#8217;s theory of disruptive innovations:  is automated description of textual content good enough for some less-demanding users?  The answer I think is yes.  Is it good enough for high-demanding users as compared to human-driven description?  No.  Will it ever be?  I think the answer here, too, is &#8220;yes.&#8221;  </p>
<p>&#8220;Good Enough&#8221; &mdash; in this context &mdash; is the user&#8217;s perceived performance of retrieval tools that use automated description versus those that use human-driven description.  If/when the perceived performance of the retrieval tool based on automated description is &#8216;good enough,&#8217;  Christensen&#8217;s model then goes on to say that user choice is based on other factors such as &#8216;cost&#8217;.  Assuming for the moment that the information technology solution will be cheaper than the human-driven solution, the users will use the information technology solution.</p>
<p>Only time will tell, of course, how this will all pan out&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sergio Berna</title>
		<link>http://dltj.org/article/can-google-be-out-googled/comment-page-1/#comment-5595</link>
		<dc:creator>Sergio Berna</dc:creator>
		<pubDate>Tue, 10 Oct 2006 10:08:03 +0000</pubDate>
		<guid isPermaLink="false">http://dltj.org/2006/07/can-google-be-out-googled/#comment-5595</guid>
		<description>[quote post=&quot;94&quot;]Still, one has to wonder if at some point the raw keyword index across the entire web content is going to break down at some point.[/quote]

My personal opinion is that it will break at some point in the future. A good example of that is that our conversation now appears the fifth at Google while trying to locate pages using the “lion king cupcakes” term. And I would be very hard pressed to believe that for someone using that terms to locate a page, our conversation is useful in any way.

[quote post=&quot;94&quot;]I think it would be useful to the end user to know whether ‘cupboard’ appeared in the “abstract” of the item or in a comment about the item. Well, not directly useful to the user, but the usefulness could be brought out in search results weighting, visual cues in the results listing, and post-search options (e.g. a checkbox to eliminate the occurrence of the word ‘cupboard’ in all item comments[/quote]

You have a good point there. Not all the words used in a textual object description have the same location weight. Google already knows that, but the only thing it is able to do is to assign weight depending on the page position, surrounding words and page references.

The real question behind is whether it can be done better without dramatically increasing the costs. And whether that cost increase has an adequate return in user perception.

Imagine that while writing our opinions we had written them all surrounded with appropriated metadata such as &lt;abstract&gt;…  &lt;academicExample&gt;Lion King Cupcakes&lt;/academicExample&gt; … &lt;/abstract&gt; or to use a better example lets imagine a book content indexing where we place metadata while indexing the content such as:

And then &lt;mainCharacter&gt;Jhon&lt;/mainCharacter&gt;, &lt;secondaryCharacter&gt;Mary &lt;/secondaryCharacter&gt; traveled to &lt;location&gt;New York&lt;/location&gt;.

On the first example in case Google had inferred that our intent is to buy something it could have stripped our conversation based on the fact that there those words where simply used as an example.

On the second example trying to locate a concrete book where the words Jhon and New York appear could be a real nightmare using a simple search and indexing engine. But maybe a book where Jhon is the main character is easier.

And so we get to what I think is what you implied on your article. We have two totally different technologies. The first is text indexing. This technology ensures the best probability to locate a page related to the terms used while searching, provided that we have a rich and varied description of the subject. But it is the job of the user to  get something useful out of it.

On the other hand we have rich metadata cataloguing and truth inferencing engines where we can search in so many different ways that we can provide infinite customization to the query so that it reflects the exact user intent and locate only what the user wants / needs.
On the first case the job is performed by the user after the query, on the second case the job is performed before the query.

Which one is better?. Well the only thing I can say for sure is that the second is more expensive. My personal opinion is somewhere near yours in that a wise combination of both technologies might very well be the answer.</description>
		<content:encoded><![CDATA[<p>[quote post="94"]Still, one has to wonder if at some point the raw keyword index across the entire web content is going to break down at some point.[/quote]</p>
<p>My personal opinion is that it will break at some point in the future. A good example of that is that our conversation now appears the fifth at Google while trying to locate pages using the “lion king cupcakes” term. And I would be very hard pressed to believe that for someone using that terms to locate a page, our conversation is useful in any way.</p>
<p>[quote post="94"]I think it would be useful to the end user to know whether ‘cupboard’ appeared in the “abstract” of the item or in a comment about the item. Well, not directly useful to the user, but the usefulness could be brought out in search results weighting, visual cues in the results listing, and post-search options (e.g. a checkbox to eliminate the occurrence of the word ‘cupboard’ in all item comments[/quote]</p>
<p>You have a good point there. Not all the words used in a textual object description have the same location weight. Google already knows that, but the only thing it is able to do is to assign weight depending on the page position, surrounding words and page references.</p>
<p>The real question behind is whether it can be done better without dramatically increasing the costs. And whether that cost increase has an adequate return in user perception.</p>
<p>Imagine that while writing our opinions we had written them all surrounded with appropriated metadata such as &lt;abstract&gt;…  &lt;academicExample&gt;Lion King Cupcakes&lt;/academicExample&gt; … &lt;/abstract&gt; or to use a better example lets imagine a book content indexing where we place metadata while indexing the content such as:</p>
<p>And then &lt;mainCharacter&gt;Jhon&lt;/mainCharacter&gt;, &lt;secondaryCharacter&gt;Mary &lt;/secondaryCharacter&gt; traveled to &lt;location&gt;New York&lt;/location&gt;.</p>
<p>On the first example in case Google had inferred that our intent is to buy something it could have stripped our conversation based on the fact that there those words where simply used as an example.</p>
<p>On the second example trying to locate a concrete book where the words Jhon and New York appear could be a real nightmare using a simple search and indexing engine. But maybe a book where Jhon is the main character is easier.</p>
<p>And so we get to what I think is what you implied on your article. We have two totally different technologies. The first is text indexing. This technology ensures the best probability to locate a page related to the terms used while searching, provided that we have a rich and varied description of the subject. But it is the job of the user to  get something useful out of it.</p>
<p>On the other hand we have rich metadata cataloguing and truth inferencing engines where we can search in so many different ways that we can provide infinite customization to the query so that it reflects the exact user intent and locate only what the user wants / needs.<br />
On the first case the job is performed by the user after the query, on the second case the job is performed before the query.</p>
<p>Which one is better?. Well the only thing I can say for sure is that the second is more expensive. My personal opinion is somewhere near yours in that a wise combination of both technologies might very well be the answer.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: the jester</title>
		<link>http://dltj.org/article/can-google-be-out-googled/comment-page-1/#comment-5547</link>
		<dc:creator>the jester</dc:creator>
		<pubDate>Mon, 09 Oct 2006 17:25:43 +0000</pubDate>
		<guid isPermaLink="false">http://dltj.org/2006/07/can-google-be-out-googled/#comment-5547</guid>
		<description>If it helps explain my perspective, my university training is in systems analysis and I came into the library world purely by chance.  So I can really appreciate much of what you are describing.

Although I couldn&#039;t reproduce your example using the North American Google search engine (based on your IP address I&#039;m assuming you would be using the Spain edition of Google), I can agree with your assessment.  In the case of looking for “lion king cupboard” your Google search picked up the most relevant hits &#8212; I wonder if even Amazon&#039;s search engine would have picked them up in the comment fields of a product listing.

Still, one has to wonder if at some point the raw keyword index across the entire web content is going to break down at some point.  (We could probably find some who will argue that it already has.)  In Amazon&#039;s database, those comments are part of the product listing&#039;s metadata (taking on a very liberal definition of the word &quot;metadata&quot; now).  In Google&#039;s database, it is most likely undifferentiated text.  As a discovery facet, for instance, I think it would be useful to the end user to know whether &#039;cupboard&#039; appeared in the &quot;abstract&quot; of the item or in a comment about the item.  Well, not directly useful to the user, but the usefulness could be brought out in search results weighting, visual cues in the results listing, and post-search options (e.g. a checkbox to eliminate the occurrence of the word &#039;cupboard&#039; in all item comments).

In short, what I think we&#039;re agreeing on is that information retrieval in a &quot;web 2.0&quot; world is about three parts:  the object itself, formal description or metadata, and annotations supplied by the end user.

Glad to hear you are using FEDORA in your records management system.  That strengthens my believe that we are using the right system here at OhioLINK for content preservation and delivery.</description>
		<content:encoded><![CDATA[<p>If it helps explain my perspective, my university training is in systems analysis and I came into the library world purely by chance.  So I can really appreciate much of what you are describing.</p>
<p>Although I couldn&#8217;t reproduce your example using the North American Google search engine (based on your IP address I&#8217;m assuming you would be using the Spain edition of Google), I can agree with your assessment.  In the case of looking for “lion king cupboard” your Google search picked up the most relevant hits &mdash; I wonder if even Amazon&#8217;s search engine would have picked them up in the comment fields of a product listing.</p>
<p>Still, one has to wonder if at some point the raw keyword index across the entire web content is going to break down at some point.  (We could probably find some who will argue that it already has.)  In Amazon&#8217;s database, those comments are part of the product listing&#8217;s metadata (taking on a very liberal definition of the word &#8220;metadata&#8221; now).  In Google&#8217;s database, it is most likely undifferentiated text.  As a discovery facet, for instance, I think it would be useful to the end user to know whether &#8216;cupboard&#8217; appeared in the &#8220;abstract&#8221; of the item or in a comment about the item.  Well, not directly useful to the user, but the usefulness could be brought out in search results weighting, visual cues in the results listing, and post-search options (e.g. a checkbox to eliminate the occurrence of the word &#8216;cupboard&#8217; in all item comments).</p>
<p>In short, what I think we&#8217;re agreeing on is that information retrieval in a &#8220;web 2.0&#8243; world is about three parts:  the object itself, formal description or metadata, and annotations supplied by the end user.</p>
<p>Glad to hear you are using FEDORA in your records management system.  That strengthens my believe that we are using the right system here at OhioLINK for content preservation and delivery.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sergio Berna</title>
		<link>http://dltj.org/article/can-google-be-out-googled/comment-page-1/#comment-5526</link>
		<dc:creator>Sergio Berna</dc:creator>
		<pubDate>Mon, 09 Oct 2006 16:33:45 +0000</pubDate>
		<guid isPermaLink="false">http://dltj.org/2006/07/can-google-be-out-googled/#comment-5526</guid>
		<description>First in answer to your question I’m neither a librarian nor a library professional. I come from the other side of the problem, the technical one. My expertise has always been directed towards analysis, design and development of metadata driven applications such as document and record management systems.

Right now for example, I’m in the middle of a project involving fedora (Digital Object Repository) for the development of a records management system mainly oriented towards information preservation. Understanding by preservation that the content of the document must be accessible and usable in a far away future where technological innovation and evolution might render the previous format or system unusable. As such, we must not only provide the means so that the document format is updated accordingly to technical evolution but also so that the information provided to catalogue and locate the document (metadata) is evolved so that we also preserve the ability to locate that very same document.

A good example would be converting all the pages contained in the library of congress to tiff and store them all away in a single HDD with a sequential name file contained in a single folder. We preserve the information as such, but nobody is going to be able to locate anything.

As such, in this kind of application, metadata collection, cataloguing and evolution is the key to a successful application.

I located your page while performing a state of the art search and got a very pleasant surprise in finding that not only there is someone on the other side of the problem worrying about these things such as metadata and its technical implications, but also they know very well what they are talking about. No offense meant by the previous comment but it is rare to find non-technical people really worried about the technical problems related to their area of expertise as it is also very difficult to find technical people worrying about the real problem and not on how to solve it. Maybe that’s why on most of the cases we come up with a very good solution for a problem that nobody has.

Returning to the Google versus Amazon problem and your point of data contained in the document versus metadata about the document I would like to further follow your example search.

In order to do so I have written “lion king cupcakes” on the Google search interface and have located Amazon on the seventh position of the search result. Then I have changed the search to “Lion King cupboards” and have been amazed to find Amazon on the first and second position of the search. What’s the difference?

The main difference is that the product located under the “lion king cupboard” search had 5 user reviews while the “lion king cupcakes” had none. Not all the people that reviewed the first product had the same point of view. In fact what makes Amazon the first option to be located is that it contains 5 different points of view that depict the product as these 5 users see it. And two of them see the product fit for a cupboard.

To summarize it is not the metadata that Amazon knows about the product what drives me directly to it. It is the comment of a user in a Web 2.0 way that sees the product closely to the way I see it and that uses the very same words I have used in my search. It is also true that once I have located the product it is the metadata I know about the product what really makes it useful.

Maybe that’s the point behind the faceted metadata you mentioned earlier. The more facets your metadata has, the closest to a final user community it is and its usefulness in a search increases.

Another point is that maybe the process is divided in two parts. The first part is locating the object, the second part is making it usable. Metadata is the key to second part, but in searching and locating the object maybe too much metadata simply gets in the way.</description>
		<content:encoded><![CDATA[<p>First in answer to your question I’m neither a librarian nor a library professional. I come from the other side of the problem, the technical one. My expertise has always been directed towards analysis, design and development of metadata driven applications such as document and record management systems.</p>
<p>Right now for example, I’m in the middle of a project involving fedora (Digital Object Repository) for the development of a records management system mainly oriented towards information preservation. Understanding by preservation that the content of the document must be accessible and usable in a far away future where technological innovation and evolution might render the previous format or system unusable. As such, we must not only provide the means so that the document format is updated accordingly to technical evolution but also so that the information provided to catalogue and locate the document (metadata) is evolved so that we also preserve the ability to locate that very same document.</p>
<p>A good example would be converting all the pages contained in the library of congress to tiff and store them all away in a single HDD with a sequential name file contained in a single folder. We preserve the information as such, but nobody is going to be able to locate anything.</p>
<p>As such, in this kind of application, metadata collection, cataloguing and evolution is the key to a successful application.</p>
<p>I located your page while performing a state of the art search and got a very pleasant surprise in finding that not only there is someone on the other side of the problem worrying about these things such as metadata and its technical implications, but also they know very well what they are talking about. No offense meant by the previous comment but it is rare to find non-technical people really worried about the technical problems related to their area of expertise as it is also very difficult to find technical people worrying about the real problem and not on how to solve it. Maybe that’s why on most of the cases we come up with a very good solution for a problem that nobody has.</p>
<p>Returning to the Google versus Amazon problem and your point of data contained in the document versus metadata about the document I would like to further follow your example search.</p>
<p>In order to do so I have written “lion king cupcakes” on the Google search interface and have located Amazon on the seventh position of the search result. Then I have changed the search to “Lion King cupboards” and have been amazed to find Amazon on the first and second position of the search. What’s the difference?</p>
<p>The main difference is that the product located under the “lion king cupboard” search had 5 user reviews while the “lion king cupcakes” had none. Not all the people that reviewed the first product had the same point of view. In fact what makes Amazon the first option to be located is that it contains 5 different points of view that depict the product as these 5 users see it. And two of them see the product fit for a cupboard.</p>
<p>To summarize it is not the metadata that Amazon knows about the product what drives me directly to it. It is the comment of a user in a Web 2.0 way that sees the product closely to the way I see it and that uses the very same words I have used in my search. It is also true that once I have located the product it is the metadata I know about the product what really makes it useful.</p>
<p>Maybe that’s the point behind the faceted metadata you mentioned earlier. The more facets your metadata has, the closest to a final user community it is and its usefulness in a search increases.</p>
<p>Another point is that maybe the process is divided in two parts. The first part is locating the object, the second part is making it usable. Metadata is the key to second part, but in searching and locating the object maybe too much metadata simply gets in the way.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: the jester</title>
		<link>http://dltj.org/article/can-google-be-out-googled/comment-page-1/#comment-5523</link>
		<dc:creator>the jester</dc:creator>
		<pubDate>Mon, 09 Oct 2006 13:42:51 +0000</pubDate>
		<guid isPermaLink="false">http://dltj.org/2006/07/can-google-be-out-googled/#comment-5523</guid>
		<description>Your perspective very interesting and somewhat refreshing -- thank you for continuing the conversation.

[quote comment=&quot;5521&quot;]As far as my experience with library catalog search applications is concerned I have always found the interface hard to use and difficult to understand (IBM370 terminals and such).[/quote]

Based on this comment and others in the context of this dialog that you are neither a librarian nor a library professional.  (That is what is making this conversation so refreshing!)  Please correct me if I&#039;m wrong.

[quote comment=&quot;5521&quot;]So maybe a good question is, when do a lot of metadata becomes too much metadata?[/quote]

This is a very keen observation, and I would offer the answer &quot;when the metadata gets in the way.&quot;

If I may speak for the library profession as a whole, there is a debate going on about the role of metadata in providing access to information.  It has been argued that we spend too much time on the description of &quot;book&quot; and &quot;article&quot; items when simply a search across their text is all that is required to pull them up in response to a user&#039;s search request.  It has also been argued that now is definitely not the time to abandon rigorous description of content by cataloguers &#8212; that it is now more urgently needed with the explosion of information.

My own professional beliefs are somewhat scattered between these two extremes.  On the one hand, this rich metadata has already been created and paid for so we might as well use it to its greatest extent.  And &quot;use it&quot; does not mean the kind of in-your-face library catalog search applications that you rightly point out are hard to use and difficult to understand.  Rather that metadata can be used in more subtle ways to guide the user&#039;s discovery process as I hope is exemplified by the Amazon example.

On the other hand, I believe that we can no longer afford to pay for the human effort tied up in the description of textual materials (books and articles, mostly).  Amazon also shows us that it is possible to run computer algorithms across the corpus of textual material -- its Statistically Improbable Phrases and Capitalized Phrases -- that can approximate subject catalouging to the point where it is arguably &quot;good enough&quot; for the purposes of drawing together works on similar topics (which is what subject catalouging is all about anyway).  Instead the efforts of the library profession should be put towards the textual description of items that as yet defy an algorithmic approach:  images, audio, and video, for instance.

And on yet a third hand -- if I may have that many -- is the role of the user-as-catalouger through link analysis, social bookmarking, collective annotation, and lots of other useful &quot;web 2.0&quot; techniques.

I&#039;m still not sure Google is the best model for this, though, because it lacks one key ingredient:  selection.  Google&#039;s web crawlers attempt to look at everything in the web, index it, and make the retrieval results somehow usable to the end user.  In the next age of libraries &#8212; when perhaps we all have three hands &#8212; the selective application of computer algorithms coupled with user-driven annotation and professional description all over a targeted range of materials can &quot;out-Google&quot; the Google that we know today.</description>
		<content:encoded><![CDATA[<p>Your perspective very interesting and somewhat refreshing &#8212; thank you for continuing the conversation.</p>
<p>[quote comment="5521"]As far as my experience with library catalog search applications is concerned I have always found the interface hard to use and difficult to understand (IBM370 terminals and such).[/quote]</p>
<p>Based on this comment and others in the context of this dialog that you are neither a librarian nor a library professional.  (That is what is making this conversation so refreshing!)  Please correct me if I&#8217;m wrong.</p>
<p>[quote comment="5521"]So maybe a good question is, when do a lot of metadata becomes too much metadata?[/quote]</p>
<p>This is a very keen observation, and I would offer the answer &#8220;when the metadata gets in the way.&#8221;</p>
<p>If I may speak for the library profession as a whole, there is a debate going on about the role of metadata in providing access to information.  It has been argued that we spend too much time on the description of &#8220;book&#8221; and &#8220;article&#8221; items when simply a search across their text is all that is required to pull them up in response to a user&#8217;s search request.  It has also been argued that now is definitely not the time to abandon rigorous description of content by cataloguers &mdash; that it is now more urgently needed with the explosion of information.</p>
<p>My own professional beliefs are somewhat scattered between these two extremes.  On the one hand, this rich metadata has already been created and paid for so we might as well use it to its greatest extent.  And &#8220;use it&#8221; does not mean the kind of in-your-face library catalog search applications that you rightly point out are hard to use and difficult to understand.  Rather that metadata can be used in more subtle ways to guide the user&#8217;s discovery process as I hope is exemplified by the Amazon example.</p>
<p>On the other hand, I believe that we can no longer afford to pay for the human effort tied up in the description of textual materials (books and articles, mostly).  Amazon also shows us that it is possible to run computer algorithms across the corpus of textual material &#8212; its Statistically Improbable Phrases and Capitalized Phrases &#8212; that can approximate subject catalouging to the point where it is arguably &#8220;good enough&#8221; for the purposes of drawing together works on similar topics (which is what subject catalouging is all about anyway).  Instead the efforts of the library profession should be put towards the textual description of items that as yet defy an algorithmic approach:  images, audio, and video, for instance.</p>
<p>And on yet a third hand &#8212; if I may have that many &#8212; is the role of the user-as-catalouger through link analysis, social bookmarking, collective annotation, and lots of other useful &#8220;web 2.0&#8243; techniques.</p>
<p>I&#8217;m still not sure Google is the best model for this, though, because it lacks one key ingredient:  selection.  Google&#8217;s web crawlers attempt to look at everything in the web, index it, and make the retrieval results somehow usable to the end user.  In the next age of libraries &mdash; when perhaps we all have three hands &mdash; the selective application of computer algorithms coupled with user-driven annotation and professional description all over a targeted range of materials can &#8220;out-Google&#8221; the Google that we know today.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sergio Berna</title>
		<link>http://dltj.org/article/can-google-be-out-googled/comment-page-1/#comment-5521</link>
		<dc:creator>Sergio Berna</dc:creator>
		<pubDate>Mon, 09 Oct 2006 10:11:28 +0000</pubDate>
		<guid isPermaLink="false">http://dltj.org/2006/07/can-google-be-out-googled/#comment-5521</guid>
		<description>I see your point. It is true that user experience is what finally draws the line between a successful and commonly used application and a well thought application that is simply not used.

As far as my experience with library catalog search applications is concerned I have always found the interface hard to use and difficult to understand (IBM370 terminals and such). Maybe because it was mainly directed towards making the most of the metadata accessible through search. But as an user I always had the problem to first understand how to map the concepts in my mind to the metadata used to map de concepts by the application and second to understand the search results returned by the application. Luckily the librarian was always near to lend a helping hand.

To be frank that’s the main problem I see with metadata. It represents the world as the cataloguer understands it using his own knowledge base. But a different user may find it much harder to locate that very same book using concepts that are alien to him. So maybe a lot of metadata gets to a point where it is too much metadata and simply adds to the noise making it more difficult to retrieve the desired result from a search query.

Maybe that’s a strong point with Google. It has so many cataloguers available (web authors) that for every concept expressed by an user using words or word combinations it is able to locate several HTML pages where an author has expressed that very same concepts using similar words. It will provide also pages that are not closely related to the concepts the user had in mind. But attempting a best effort search over so many data (not metadata) is able to locate more results than the ones that would be possible to obtain through an exact term metadata search.

So maybe a good question is, when do a lot of metadata becomes too much metadata?</description>
		<content:encoded><![CDATA[<p>I see your point. It is true that user experience is what finally draws the line between a successful and commonly used application and a well thought application that is simply not used.</p>
<p>As far as my experience with library catalog search applications is concerned I have always found the interface hard to use and difficult to understand (IBM370 terminals and such). Maybe because it was mainly directed towards making the most of the metadata accessible through search. But as an user I always had the problem to first understand how to map the concepts in my mind to the metadata used to map de concepts by the application and second to understand the search results returned by the application. Luckily the librarian was always near to lend a helping hand.</p>
<p>To be frank that’s the main problem I see with metadata. It represents the world as the cataloguer understands it using his own knowledge base. But a different user may find it much harder to locate that very same book using concepts that are alien to him. So maybe a lot of metadata gets to a point where it is too much metadata and simply adds to the noise making it more difficult to retrieve the desired result from a search query.</p>
<p>Maybe that’s a strong point with Google. It has so many cataloguers available (web authors) that for every concept expressed by an user using words or word combinations it is able to locate several HTML pages where an author has expressed that very same concepts using similar words. It will provide also pages that are not closely related to the concepts the user had in mind. But attempting a best effort search over so many data (not metadata) is able to locate more results than the ones that would be possible to obtain through an exact term metadata search.</p>
<p>So maybe a good question is, when do a lot of metadata becomes too much metadata?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: the jester</title>
		<link>http://dltj.org/article/can-google-be-out-googled/comment-page-1/#comment-5452</link>
		<dc:creator>the jester</dc:creator>
		<pubDate>Fri, 06 Oct 2006 19:57:53 +0000</pubDate>
		<guid isPermaLink="false">http://dltj.org/2006/07/can-google-be-out-googled/#comment-5452</guid>
		<description>I think you may have missed the main thrust of the posting.  It is common rhetoric in the North American library community to make library services &quot;appear more like Google.&quot;  As you indicated in your comment, a simple search box that encourages the user to put in just three or four words about the information being sought is, if usability studies are to be believed, a very appealing interface that works well in stark contrast to the prototypical library service interface.

Rather, the main focus of the posting was how to make our complicated library service interfaces as appealing as Google yet deliver a better end result to users.  My argument was that the answer lies in faceted metadata, as demonstrated by the Amazon interface and, to a limited extent, the &lt;a href=&quot;http://www.lib.ncsu.edu/catalog/&quot; rel=&quot;nofollow&quot;&gt;NCSU Libraries Catalog&lt;/a&gt; interface.  Users can be presented a rich set of exploratory and limiting functions after the initial three or four words &lt;em&gt;if&lt;/em&gt; the descriptive metadata exists to drive the interface creation.  Amazon has that rich metadata, Google does not, and you can see the effects of that in how they present search results.

(The &quot;limited extent&quot; comment with regards the the NCSU Libraries Catalog interface, by the way, refers to the fact that its interface is only useful for finding printed and bound aggregate volumes of material &#8212; otherwise known as &quot;books.&quot;  It is like the first Amazon interfaces that were limited to just books because that was the only thing in Amazon&#039;s database.  Through Amazon&#039;s interface, however, you can now discover a wide range of items, and the next generation discovery interface in libraries should find not only books, but article, maps, pictures, datasets, websites, and other relevant material...all from the same simple search box.)</description>
		<content:encoded><![CDATA[<p>I think you may have missed the main thrust of the posting.  It is common rhetoric in the North American library community to make library services &#8220;appear more like Google.&#8221;  As you indicated in your comment, a simple search box that encourages the user to put in just three or four words about the information being sought is, if usability studies are to be believed, a very appealing interface that works well in stark contrast to the prototypical library service interface.</p>
<p>Rather, the main focus of the posting was how to make our complicated library service interfaces as appealing as Google yet deliver a better end result to users.  My argument was that the answer lies in faceted metadata, as demonstrated by the Amazon interface and, to a limited extent, the <a href="http://www.lib.ncsu.edu/catalog/" rel="nofollow">NCSU Libraries Catalog</a> interface.  Users can be presented a rich set of exploratory and limiting functions after the initial three or four words <em>if</em> the descriptive metadata exists to drive the interface creation.  Amazon has that rich metadata, Google does not, and you can see the effects of that in how they present search results.</p>
<p>(The &#8220;limited extent&#8221; comment with regards the the NCSU Libraries Catalog interface, by the way, refers to the fact that its interface is only useful for finding printed and bound aggregate volumes of material &mdash; otherwise known as &#8220;books.&#8221;  It is like the first Amazon interfaces that were limited to just books because that was the only thing in Amazon&#8217;s database.  Through Amazon&#8217;s interface, however, you can now discover a wide range of items, and the next generation discovery interface in libraries should find not only books, but article, maps, pictures, datasets, websites, and other relevant material&#8230;all from the same simple search box.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sergio Berna</title>
		<link>http://dltj.org/article/can-google-be-out-googled/comment-page-1/#comment-5335</link>
		<dc:creator>Sergio Berna</dc:creator>
		<pubDate>Wed, 04 Oct 2006 11:43:59 +0000</pubDate>
		<guid isPermaLink="false">http://dltj.org/2006/07/can-google-be-out-googled/#comment-5335</guid>
		<description>First of all I would like to say that I enjoyed reading the article, it is very well written and summarizes in a very precise form the writers point of view about the problem.

Nevertheless it starts with a very interesting point that is not further followed. The main question is, Do you compete at all?. Or stated in a more exact form, do you need to compete at all?. In those 2 questions I want to imply that maybe, google is no competition for you.

Lets have a look at the Google and Amazon search engines example from another point of view. The user.

What the user wants / needs is a solution for his problem, find lion king cupcakes. He doesn’t really think about metadata at all. He just thinks about his problem and as such goes to massive text search engine like google. There, google, excelling at his core business directly redirects the user to the best place available to satisfy the user perceived need. Amazon.

No competition among Google and Amazon as such, and I don’t think it is in the mind of neither of them to compete. Google is certainly not competing with Amazon since it is the first reference to happen. And I don’t really think Amazon thinks Google is competing with him since Google is driving clients right to its grasp.

Lets get back to the user. In his mind there are two processes, the first process is oriented towards locating something that satisfies his need. The second process is directed towards obtaining that very same thing in the (easiest?, cheapest?, most secure?, just choose your adjective) way. There we see a collaboration in locating the subject and after that Google disappears and Amazon is all alone in finally satisfying the client. Is Amazon doesn’t satisfy the client both Google and Amazon lose. On the other hand if the client leaves Amazon as a happy buyer both win.

So the question is, Why compete at all?. Google is not an specialized, categorizing, need oriented catalog. It is just a very smart search engine. In mi opinion it is so good at doing what it does because it presents the most simple interface ever shown. A simple textbox and a button. You cant simplify more without showing a blank page. The message is clear to the user:

“hey just tell me what you need in three or four words, ill do the first part of the road for you and will set you in the portal of the best specialist available”

Thinking in a Google interface with combos and such simply breaks the Google concept (have any of you ever used the advanced search interface?). But you can’t have specialized advice without those. In it is there where there is plenty of space to breathe and where I see lots of opportunities for cooperative-competing.</description>
		<content:encoded><![CDATA[<p>First of all I would like to say that I enjoyed reading the article, it is very well written and summarizes in a very precise form the writers point of view about the problem.</p>
<p>Nevertheless it starts with a very interesting point that is not further followed. The main question is, Do you compete at all?. Or stated in a more exact form, do you need to compete at all?. In those 2 questions I want to imply that maybe, google is no competition for you.</p>
<p>Lets have a look at the Google and Amazon search engines example from another point of view. The user.</p>
<p>What the user wants / needs is a solution for his problem, find lion king cupcakes. He doesn’t really think about metadata at all. He just thinks about his problem and as such goes to massive text search engine like google. There, google, excelling at his core business directly redirects the user to the best place available to satisfy the user perceived need. Amazon.</p>
<p>No competition among Google and Amazon as such, and I don’t think it is in the mind of neither of them to compete. Google is certainly not competing with Amazon since it is the first reference to happen. And I don’t really think Amazon thinks Google is competing with him since Google is driving clients right to its grasp.</p>
<p>Lets get back to the user. In his mind there are two processes, the first process is oriented towards locating something that satisfies his need. The second process is directed towards obtaining that very same thing in the (easiest?, cheapest?, most secure?, just choose your adjective) way. There we see a collaboration in locating the subject and after that Google disappears and Amazon is all alone in finally satisfying the client. Is Amazon doesn’t satisfy the client both Google and Amazon lose. On the other hand if the client leaves Amazon as a happy buyer both win.</p>
<p>So the question is, Why compete at all?. Google is not an specialized, categorizing, need oriented catalog. It is just a very smart search engine. In mi opinion it is so good at doing what it does because it presents the most simple interface ever shown. A simple textbox and a button. You cant simplify more without showing a blank page. The message is clear to the user:</p>
<p>“hey just tell me what you need in three or four words, ill do the first part of the road for you and will set you in the portal of the best specialist available”</p>
<p>Thinking in a Google interface with combos and such simply breaks the Google concept (have any of you ever used the advanced search interface?). But you can’t have specialized advice without those. In it is there where there is plenty of space to breathe and where I see lots of opportunities for cooperative-competing.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
