Last week I posted about a Yahoo Pipes construct that turns a Zotero website library into an RSS feed. As Dan Cohen noted in a twitter response to the posting, the Zotero team is planning to add an RSS capability in a future release of the website, so this pipe will ultimately be usurped by that capability, but in the meantime it is a handy tool. It was my first full-scale foray into creating a Yahoo Pipes construct from scratch, so I thought it would be useful to document how it works (in case I need to do something similar again). You might find this useful, too; especially the part about how to put a pubDate element into the RSS feed.
The process starts with the two Private Text Input modules at the top right — one each for the Zotero Username and the Zotero User Number. The defaults are set to my values, and they are marked private — meaning that if someone clones this pipe, these values are not carried along.
Directly below is a URL Builder module. The base URL is
http://www.zotero.org, and there are three path elements: the first is a connection from the Zotero Username Private Text Input, the second is a connection from the Zotero User Number Private Text Input, and the third is the literal “
items“. This builds a URL that looks like
http://www.zotero.org/dltj/683/items and that corresponds to the Zotero user’s library items page.
Starting at the upper left of the diagram, the output of the URL Builder is connected to the URL field of a Fetch Data module. The “Path to Item List” parameter is set to
body.div.1.div.div.1.table.tbody.tr and that is a pointer to the portion of the XHTML document that contains the library items. Because the Zotero website is outputting XML (as XHTML), we can use the Fetch Data module and parse the page as if it was an XML document. The Path to Item List is an XPATH-like structure that points into the document structure (see note above). The result of this module is a list of items — the table rows in this case — that are processed by the remaining modules.
The next module down is a Rename module, where the value of the XPATH-like path
item.td.0.a.content is copied to the item
title element. The XPATH, from the root of the “Path to Item List” in the module above, is
td/a; note here the added
item at the front and the
content at the end. Specifying the zeroth
td element isn’t needed, but it brings symmetry with subsequent modules.
content corresponds to the text node under the
a element when viewing this as an XML document.
What follows is a series of Loop modules that act on different parts of the items in the list. The first builds the
link element of the item using the String Builder module. The
href in the XML is a relative path, so the String Builder adds the literal “
http://www.zotero.org” to the value found in
href attribute of the anchor element of the first
td element). The resulting string is assigned to the
link element of each item in the list.
The second Loop module encapsulates a Date Builder module, and this is the inspiration for writing this post. It took a very long time to figure out how to get the
pubDate element into each item of the resulting RSS feed. As it turns out, one cannot simple assign the
pubDate element like we did the
title element above. Instead, one sets the timestamp to the
y:published element and Yahoo Pipes takes it from there. And it isn’t enough to assign a text string to that element; it has to be a Date type, constructed using the Date Builder module. The Date Builder module is very flexible in what it accepts, and it creates a canonical timestamp form that can be used by other modules. In this case, the Date Builder module takes as a source input the string value found at
item.td.2.a.content. Believe me — it took a long time to figure this out, and it was only done by piecing together various suggestions and examples; there doesn’t seem to be any clear documentation about this.
The third and fourth Loop modules go together. The third takes the value found in
item.td.0.a.span.class and applies a String Regex module to it. The value of that class attribute contains the type of item in the Zotero library, and it takes the form of “
img-book” or “
img-journalArticle” or “
img-conferencePaper“. There are two regular expressions defined: the first removes the “
img-” prefix from each value and the second replaces all instances of an upper case letter with a space plus the lower case version of the letter. The latter rule turns “
journalArticle” into “
journal article” (note that there is a space in this field prior to the
\L part). The result is assigned to the
item.itemType element. This is used in the final Loop module to build a
item.description element to create the string:
<p>This journal article was saved to my Zotero library.</p>.
That’s all there is to it. Yahoo Pipes applies all of these modules to each of the items in the list retrieved from the Zotero library page and generates the corresponding RSS feed.(This post was updated on 30-Jun-2009.)