Skip to content
Solely for the Purpose of Catching $PAMRZ

Google Custom Search for Planet Code4Lib


I wanted to mess around with Google’s new Custom Search Engine feature and in casting about for a list of URLs to feed it I thought I’d try the list of blogs at Planet Code4Lib. As it turns out, this might be a modestly useful search if you remember reading something from one of the code4lib bloggers but can’t remember which one. The exercise was pretty fun and here is the result:

To build it, I started with the Planet Code4Lib OPML feed and ran some regular expression transformations against it, replacing these matches with empty strings (I used BBEdit on the Mac for this one-off, but it could probably be automated with a PERL script to a certain degree):
[code]
/feed/?(rss|atom)?/?$
(\?|\&|\&)feed=(atom|rss2)$
(\?|\&(amp;)?)feed=(atom|rss2)(\&(amp;)?)
/?(wp-rss2.php|rss|index.*|atom.*|rdf)[^/\r]*$
[/code]
After a minimal amount of manual cleanup, I ended up with this list:
[code]
catalogablog.blogspot.com/*
www.wallandbinkley.com/quaedam*
maisonbisson.com/blog*
www.blyberg.net/*
use.perl.org/~LTjake/journal*
foam.lib.muohio.edu/blog/*
schenizzle.wordpress.com/*
onebiglibrary.net/node*
weblog.kevinclarke.info/*
feeds.feedburner.com/DanCohen*
www.librarywebchic.net/wordpress*
www.ecorrado.us/*
beta.blogger.com/feeds/3338174527262061848/posts/full*
textsfornothing.com/blog/*
orweblog.oclc.org/*
feeds.feedburner.com/hublog*
blog.ryaneby.com/*
meredith.wolfwater.com/wordpress*
fawcett.blogspot.com/*
kados.org/cgi-bin/blosxom.cgi*
lisletters.blogspot.com/*
digitallibrarian.org/*
www.patronizing.org/*
www.lackoftalent.org/*
www.kentongood.com/?cat=26*
www.epistemographer.com/*
blogdriverswaltz.com/*
outgoing.typepad.com/outgoing*
shelter.nu/blog*
interoperating.info/mark/blog/1*
www.tomkeays.com/blog*
lxming.blogspot.com/*
john.mignault.net/blog*
infomotions.com/musings/musings*
dltj.org/*
benostrowsky.wordpress.com/*
www.daveyp.com/blog*
efoundations.typepad.com/efoundations*
oregonstate.edu/~reeset/blog*
librarycog.uwindsor.ca:8087/artblog/librarycog*
inquiringlibrarian.blogspot.com/*
www.ibiblio.org/bess/*
thedil.wordpress.com/*
cavlec.yarinareth.net/archives/category/computers*
cavlec.yarinareth.net/archives/category/librariana*
coffeecode.net/feeds/categories/16-Coding*
dilettantes.code4lib.org/*
umlaut.library.gatech.edu/blog/*
www.inkdroid.org/journal*
techessence.info/node*
techessence.info/blog/1*
roytennant.com/*
vielmetti.typepad.com/vacuum*
dystmesis.com:8081/*
weibel-lines.typepad.com/weibelines*
dataunbound.wordpress.com/*
q6.oclc.org/*
del.icio.us/rss/tag/code4lib*
www.frbr.org/*
libdev.plymouth.edu/*
makinglinks.uwindsor.ca:8087/mitas/sfxblog*
open-ils.org/blog/*
oss4lib.org/node*
blogs.talis.com/panlibus*
feeds.technorati.com/feed/posts/tag/code4lib*
unalog.com/group/code4lib*
[/code]
…and fed that into the Google Custom Search control panel.

Items of note in the Terms of Service


Along the way I found some curious bits in the Google Custom Search Terms of Service. In particular:

1.5 Exclusivity. You agree that, during the Term, Google will be the exclusive provider of Internet search services on the Site. You further understand that Google will provide the Service on a nonexclusive basis, and that Google will continue to customize and provide its services to other parties for use in connection with a variety of applications, including search engine applications.

Section 1.1 defines Site this way:

For purposes of the Terms of Use, “Site” shall mean the Web site or sites on which You place JavaScript or similar programming (”Code“) which renders the Google search box (or other means used by users of the Site (”End Users“) to enter a search query (”Query“)) on the Site (”Search Box“).

One suspects what Google meant was that if you put up a Custom Search Box on your Site, then you must also use Google for any general internet search you might have — you can’t have a Google Custom Search Box and a Yahoo search box on the same Site, for instance. I imagine that this also effectively locks out other internet search engine providers from offering the same service. Since Google is the first-to-market, if Yahoo were to come up with a similar service you couldn’t put a Google Custom Search and a Yahoo custom search pointing to each providers indexes with the same subset of URLs. Since we know that each index contains different stuff and ranks results with different algorithms, one might imagine that the same custom search segments over a multiplicity of indexes could be a useful thing.

Ah, well — it is still useful. Just go in with your eyes open…

(This post was updated on 29-Oct-2006.)

2 Comments

  1. kevin | July 15, 2008 at 3:55 am | Permalink

    hi,
    Please help me how I can use regular expression in google search box. for example i want to find web sites which are like this: http://www.google*.com
    One of the result is http://www.googlechat.com

  2. the Jester | July 16, 2008 at 2:51 pm | Permalink

    Kevin –

    It sounds to me like what you want is to search the DNS records (see my recent post about DNS vulnerabilities for an overview of DNS). I used to have a link to a site that searched DNS entries, but it looks like it was taken over by a spam site. I can’t find another one — perhaps DLTJ readers will be able to help out.

4 Trackbacks

  1. [...] Earlier I mentioned creating a Google Custom Search for Planet Code4Lib. The Google-supplied markup puts a form on your web page that leads to Google’s server farm. (Alternatively, you can create a custom URL that points to an HTML page at Google which contains the form.) Well, that’s really neat, but not far enough. How about an OpenSearch plugin suitable for Firefox and MSIE7? (Link to install in Firefox and MSIE7) Here is the plugin markup: PLAIN TEXTXML: [...]

  2. Lorcan Dempsey's weblog | October 26, 2006 at 10:19 pm | Permalink

    links from Technorati across the contents of the repositories they list. As they point out, this search is based on whatever Google has indexed of the repository content, which in turn depends on local implementation/configuration details. Peter Murray put up a search of Dan Chudnov’s Planet Code4Lib blogs. Bill Drew has created a Google custom search across his collection of wireless resources:I have been working on my Google custom search engine for WLANs and Libraries. It is available at

  3. Kramer auto Pingback[...] It’s already been pointed out that one should consider the “Terms of Service” for this new service. Apparently you must stick to google for all your search-box needs if you use this (no mixing and matching with some similar Yahoo type service). Also, there is google branding required, the service may be pulled at any time, etc. As always, there really is no such thing as a free lunch. [...]

  4. [...] Earlier I mentioned creating a Google Custom Search for Planet Code4Lib. The Google-supplied markup puts a form on your web page that leads to Google’s server farm. (Alternatively, you can create a custom URL that points to an HTML page at Google which contains the form.) Well, that’s really neat, but not far enough. How about an OpenSearch plugin suitable for Firefox and MSIE7? Here is the plugin markup: Code (xml)  [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *
Human Detection Scheme
(What's this?)
Comment Preview

Subscribe without commenting

From the Disruptive Library Technology Jester (http://dltj.org/), printed on Thursday the 13th of November 2008 at 8:01:52 AM EST (-0500). The URL to this page is http://dltj.org/article/google-custom-search-for-planet-code4lib/

[Creative Commons Logo] This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.