Michael J. Giarlo wrote a very nice summary of my FEDORA trilogy (only three parts so far — I think there are more good things to say about FEDORA; and besides, I like Douglas Adams’ concept of what a trilogy should be), and added a piece that I hadn’t considered:
- Having one’s objects stored as XML on the filesystem also opens up opportunities to see how tools which act thereupon might be glued into the repository infrastructure. One such example might be for an XML-aware search engine (such as amberfish, Lucene, or Zebra). Since you’ve got low-level access to these files, it would be fairly simple to tack on a search & indexing system that is independent of your choice of repository.
Wow. Now that is a powerful concept. Not only do we not need FEDORA in the future to read our digital objects, we don’t need FEDORA now to read our digital objects. This would take a little digging to see if it is true, but if Fedora really does serialize the XML back to the FOXML file on disk for every change made to it, then one really could use the FOXML files on disk as a surrogate for the FEDORA application itself. After having conversations with Dan Davis about what it means to live in a Service-Oriented Architecture, I yearn for a time when FEDORA and other OhioLINK applications can send messages to each other. But for now, simply having another application that looks at the file modification timestamps on files to see if they have changed and should be processed in some way is a very interesting idea. It make sense, for instance, as a way to feed new/modified objects into an indexing application or a notification application. Or to ‘rsync’ a backup hot-spare server with content from the live server.
You hit all of points exactly right in your summary, Michael, and thanks for triggering a new line of thinking about how to exploit FEDORA to its fullest potential.





3 Comments
Thanks for the props, Peter.
In fact, Fedora does serialize the XML back to the FOXML file on disk for every change, including appropriate changes to the administrative metadata section for audit trails. This was a requirement for us at Rutgers as one of our goals for the repository was digital preservation.
A tiny caveat: this used to be the case, at least, under Fedora 1.2 when they used Fedora-METS. I’m 99% confident that this is still the case under Fedora 2.x w/ FOXML — it seems to be a core feature of Fedora that the serialized XML objects are complete. Sure, there’s stuff in the db and the triplestore, but that’s all extracted from the XML. We’d been using Fedora since about 0.9 or so, way before much of its advanced functionality started appearing in official releases, so we had lots of experience going under the hood of Fedora (esp. for searching & repository synchronization).
That’s part of its beauty — the stuff under the hood is relatively undaunting. And it’s not got the black-boxish feel that other information systems do.
Hopefully we’ll start to see others blog about their Fedora experiences. Maybe this would be a good topic (for Sandy or Thorny?) at the Fedora Users Conference — building the Fedora community through the Wiki, the #fedora-users IRC channel, and through bloggers.
Since we keep copies of the xml files for our externally referenced datastreams in the file system, we have built our discovery interface at UVA using XPAT outside of Fedora, with Cocoon as the middleware. We have very rich metadata extracted from our objects programatically, much richer than what’s in Fedora; we also provide full text searching of the TEI or EAD that way as well. As an example, a user uses the web interface to search the XPAT index. He finds an object that he wants to see. When he clicks on it, a parameterized URL is passed through Cocoon that formats a call for Fedora to disseminate the object.
I should clarify that we don’t do this with the raw Fedora XML objects, but with the externally managed datastreams. I think the principle is much the same and could work with the raw objects.
Post a Comment