Possible Resolution to Technorati Update Problem

Posted on 3 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

Up until about an hour ago, Technorati refused to update its database of postings to DLTJ, and having reached the 31-day point of no updates I was starting to wonder what to do about it. I came up with two theories for which I put in fixes to the configuration and theme setup of DLTJ, but in the end I'm not sure if either definitively provides a solution for anyone else in the same situation. In the spirit of helping out one's neighbors, though, here are the theories and fixes. DLTJ is a standalone (e.g. not hosted) Wordpress 2.0.4 installation, so YMMV.

Theory #1: Technorati Doesn't Like Feedburner

I read some blog posts and messages in the Feedburner forums that suggested blogs that use Feedburner were causing Technocrati to hick-up and not index content. My solution is to let Technorati see the raw feed and not get redirected to Feedburner. This is accomplished with additions to the Apache mod_rewrite rules in the .htaccess file.

<ifmodule mod_rewrite.c>
RewriteEngine On
# These Rules redirect all feed traffic, except that of Technorati and FeedBurner itself, to FeedBurner
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} !^FeedBurner.*$
RewriteCond %{HTTP_USER_AGENT} !^Technorati.*$
RewriteCond %{QUERY_STRING} ^feed=(feed|rdf|rss|rss2|atom)$
RewriteRule ^(.*)$ http://feeds.dltj.org/DisruptiveLibraryTechnologyJester [R,L]
RewriteCond %{HTTP_USER_AGENT} !^FeedBurner.*$
RewriteCond %{HTTP_USER_AGENT} !^Technorati.*$
RewriteRule ^(feed|rdf|rss|rss2|atom)/?(feed|rdf|rss|rss2|atom)?/?$ http://feeds.dltj.org/DisruptiveLibraryTechnologyJester [R,L]
RewriteCond %{HTTP_USER_AGENT} !^FeedBurner.*$
RewriteCond %{HTTP_USER_AGENT} !^Technorati.*$
RewriteRule ^wp-(feed|rdf|rss|rss2|atom).php http://feeds.dltj.org/DisruptiveLibraryTechnologyJester [R,L]
</ifmodule>
If it the HTTP User-Agent string doesn't begin with FeedBurner or Technorati, it falls through to the Wordpress-supplied feed mechanisms.

Theory #2: A Too-Complicated Home Page Confuses Technorati

In addition to looking at the feeds themselves, Technorati will look at the home page of the blog to see if the items still appear as "current" (presumably). About a month ago I put in place a modestly complicated, hand-coded home page that puts an instance of Extended Live Archive front and center. Since this is JavaScript-driven and pushes the list of recent posts pretty far down the HTML page, I wondered if this could be screwing up the Technorati spider. The solution here was to sniff the User-Agent string in the theme's index.php file in order to strip away all of the cruft for Technorati's spider.

$useragent = getenv("HTTP_USER_AGENT");
if (preg_match("/Technorati/i", "$useragent")) {
   // do the un-fancy, barebones stuff for Technorati
} else {
   // do the really nice stuff for everyone else
}
(Remember to surround the appropriate parts of the PHP markup with and ?>...)

Why I May Never Know If Either of These Is Needed

Shortly after I put these two changes in place (literally, after a week of sending e-mails to Technorati's tech support and two hours after coding these fixes) I got a note from Technorati saying:

Please accept my sincerest apologies for the delay in getting back to you. We've been experiencing a backlog in support and are working hard to address everyone. I've taken a look at the issue regarding picking up your pings for "dltj.org". After making a small adjustment, I've sent our spiders to revisit your page and your blog has been indexed with your most recent posts.

http://technorati.com/blogs/dltj.org/

Everything now appears to be working as it should, but please let us know if you experience any problems in the future. Do not hesitate to contact us if you have any other questions. We apologize for any inconvenience. Thank you for using Technorati!

...and indeed Technorati's index for DLTJ has been updated. So I may never know if it was a problem on my end or Technorati's end.

Figuring that these two changes can't do any harm and might actually be doing some good, I plan to leave them in place for a while and see if DLTJ keeps getting indexed. The first test comes with this posting, which will cause Technorati to be pinged and hopefully this post will be picked up and indexed. If not, it'll be back to the drawing board.