These list items are microformat entries and are hidden from view.
https://dltj.org/article/xtf-fedora-3/
Some questions and observations that have come in through mechanisms other than blog comments on the analysis of the XTF/FEDORA integration. I've reproduced those here for the sake of completeness, but also be sure to go back to the first two entries in this series to read the comments there as well.Indiana University's ObservationsAs it turns out, Indiana University is considering much the same path. They have an existing FEDORA-based repository and a number of XTF projects that have been in development for a while. They, too, are looking to put these two technologies together and have a page on their project website with Digital Repository Architecture > Search">IU's observations of an XTF plus FEDORA (plus more!) combination.Incremental IndexingHow will the XTF processor know when new documents are added? Or modified or deleted? From your description it sounds like only manual bulk-indexing of a collection would be supported. This will make it inefficient to update the index if, for instance only one document out of a collection is changed or deleted.A good observation. XTF does this now by querying the index at the start to see what documents it knows about (along with when they were last updated) and then scanning the source directory tree for new and revised objects as compared with that index. While doing that it marks the documents that were query results and at the end it deletes index entries for documents it doesn't find.We'll probably need to mimic the same kind of functionality with a FEDORA interface. Since FEDORA doesn't provide hooks for firing off messages/processes when items are added/modified/deleted from the repository, it'll have to be done in this kind of bulk update.Differentiated IndexingHow feasible is it to search within certain elements of XML documents with this model? For instance, with the EAD documents I end up storing things such as the title, elements with the attribute audience="internal", and subjects as separate Lucene fields in order to make searching for these elements a lot easier. (I just took a look at some of the XTF ocumentation -- I assume you can apply a pre-filter to separate your XML file into different sections for searching).The pre-filter can apply rules to the form of the document that is subsequently indexed in Lucene. As you saw in the documentation, it is normally used to add weight to specific fields or terms, but I suppose it is also useful for suppressing terms from the index entirely. I think if you wanted those suppressed terms to be searchable by internalusers, you would need to create a second index (or "index-name" as XTF calls them) with a different pre-filter.
2006-08-23T23:29:22+00:00
2018-01-03T03:34:57+00:00

XTF and FEDORA — Comments from the Community

Posted on August 23, 2006 2 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

Some questions and observations that have come in through mechanisms other than blog comments on the analysis of the XTF/FEDORA integration. I've reproduced those here for the sake of completeness, but also be sure to go back to the first two entries in this series to read the comments there as well.

Indiana University's Observations

As it turns out, Indiana University is considering much the same path. They have an existing FEDORA-based repository and a number of XTF projects that have been in development for a while. They, too, are looking to put these two technologies together and have a page on their project website with Digital Repository Architecture > Search">IU's observations of an XTF plus FEDORA (plus more!) combination.

Incremental Indexing

How will the XTF processor know when new documents are added? Or modified or deleted? From your description it sounds like only manual bulk-indexing of a collection would be supported. This will make it inefficient to update the index if, for instance only one document out of a collection is changed or deleted.

A good observation. XTF does this now by querying the index at the start to see what documents it knows about (along with when they were last updated) and then scanning the source directory tree for new and revised objects as compared with that index. While doing that it marks the documents that were query results and at the end it deletes index entries for documents it doesn't find.

We'll probably need to mimic the same kind of functionality with a FEDORA interface. Since FEDORA doesn't provide hooks for firing off messages/processes when items are added/modified/deleted from the repository, it'll have to be done in this kind of bulk update.

Differentiated Indexing

How feasible is it to search within certain elements of XML documents with this model? For instance, with the EAD documents I end up storing things such as the title, elements with the attribute audience="internal", and subjects as separate Lucene fields in order to make searching for these elements a lot easier. (I just took a look at some of the XTF ocumentation -- I assume you can apply a pre-filter to separate your XML file into different sections for searching).

The pre-filter can apply rules to the form of the document that is subsequently indexed in Lucene. As you saw in the documentation, it is normally used to add weight to specific fields or terms, but I suppose it is also useful for suppressing terms from the index entirely. I think if you wanted those suppressed terms to be searchable by internal
users, you would need to create a second index (or "index-name" as XTF calls them) with a different pre-filter.

Social Media Interactions

No reposts were found.

No likes were found.

No webmentions were found.

Share on

Mastodon/Fediverse Twitter Facebook LinkedIn

XTF and FEDORA — Comments from the Community

Indiana University's Observations

Incremental Indexing

Differentiated Indexing

Social Media Interactions

Share on

You may also enjoy

Learnings from the British Library Cybersecurity Report

One Year of Learning 2023

Restoring Obsidian Knowledgebase from MacOS Time Machine at the Command Line

Processing WOLFcon Conference Recordings with FFMPEG

Likes

Reposts

Discussion