I wanted to mess around with Google’s new Custom Search Engine feature and in casting about for a list of URLs to feed it I thought I’d try the list of blogs at Planet Code4Lib. As it turns out, this might be a modestly useful search if you remember reading something from one of the code4lib bloggers but can’t remember which one. The exercise was pretty fun and here is the result:
To build it, I started with the Planet Code4Lib OPML feed and ran some regular expression transformations against it, replacing these matches with empty strings (I used BBEdit on the Mac for this one-off, but it could probably be automated with a PERL script to a certain degree):
/feed/?(rss|atom)?/?$ (\?|\&|\&)feed=(atom|rss2)$ (\?|\&(amp;)?)feed=(atom|rss2)(\&(amp;)?) /?(wp-rss2.php|rss|index.*|atom.*|rdf)[^/\r]*$
After a minimal amount of manual cleanup, I ended up with this list:
catalogablog.blogspot.com/* www.wallandbinkley.com/quaedam* maisonbisson.com/blog* www.blyberg.net/* use.perl.org/~LTjake/journal* foam.lib.muohio.edu/blog/* schenizzle.wordpress.com/* onebiglibrary.net/node* weblog.kevinclarke.info/* feeds.feedburner.com/DanCohen* www.librarywebchic.net/wordpress* www.ecorrado.us/* beta.blogger.com/feeds/3338174527262061848/posts/full* textsfornothing.com/blog/* orweblog.oclc.org/* feeds.feedburner.com/hublog* blog.ryaneby.com/* meredith.wolfwater.com/wordpress* fawcett.blogspot.com/* kados.org/cgi-bin/blosxom.cgi* lisletters.blogspot.com/* digitallibrarian.org/* www.patronizing.org/* www.lackoftalent.org/* www.kentongood.com/?cat=26* www.epistemographer.com/* blogdriverswaltz.com/* outgoing.typepad.com/outgoing* shelter.nu/blog* interoperating.info/mark/blog/1* www.tomkeays.com/blog* lxming.blogspot.com/* john.mignault.net/blog* infomotions.com/musings/musings* dltj.org/* benostrowsky.wordpress.com/* www.daveyp.com/blog* efoundations.typepad.com/efoundations* oregonstate.edu/~reeset/blog* librarycog.uwindsor.ca:8087/artblog/librarycog* inquiringlibrarian.blogspot.com/* www.ibiblio.org/bess/* thedil.wordpress.com/* cavlec.yarinareth.net/archives/category/computers* cavlec.yarinareth.net/archives/category/librariana* coffeecode.net/feeds/categories/16-Coding* dilettantes.code4lib.org/* umlaut.library.gatech.edu/blog/* www.inkdroid.org/journal* techessence.info/node* techessence.info/blog/1* roytennant.com/* vielmetti.typepad.com/vacuum* dystmesis.com:8081/* weibel-lines.typepad.com/weibelines* dataunbound.wordpress.com/* q6.oclc.org/* del.icio.us/rss/tag/code4lib* www.frbr.org/* libdev.plymouth.edu/* makinglinks.uwindsor.ca:8087/mitas/sfxblog* open-ils.org/blog/* oss4lib.org/node* blogs.talis.com/panlibus* feeds.technorati.com/feed/posts/tag/code4lib* unalog.com/group/code4lib*
…and fed that into the Google Custom Search control panel.
Items of note in the Terms of Service
Along the way I found some curious bits in the Google Custom Search Terms of Service. In particular:
1.5 Exclusivity. You agree that, during the Term, Google will be the exclusive provider of Internet search services on the Site. You further understand that Google will provide the Service on a nonexclusive basis, and that Google will continue to customize and provide its services to other parties for use in connection with a variety of applications, including search engine applications.
Section 1.1 defines Site this way:
One suspects what Google meant was that if you put up a Custom Search Box on your Site, then you must also use Google for any general internet search you might have — you can’t have a Google Custom Search Box and a Yahoo search box on the same Site, for instance. I imagine that this also effectively locks out other internet search engine providers from offering the same service. Since Google is the first-to-market, if Yahoo were to come up with a similar service you couldn’t put a Google Custom Search and a Yahoo custom search pointing to each providers indexes with the same subset of URLs. Since we know that each index contains different stuff and ranks results with different algorithms, one might imagine that the same custom search segments over a multiplicity of indexes could be a useful thing.
Ah, well — it is still useful. Just go in with your eyes open…
The text was modified to update a link from http://www.google.com/coop/cse/overview to http://www.google.com/cse/ on January 13th, 2011.(This post was updated on 29-Jan-2013.)