Presentation Summary: “Cross-Repository Semantic Interoperability: the MIT SIMILE Project”

Posted on 2 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

Richard Rodgers presented this talk based on the work of he and MacKenzie Smith in the Digital Library Research Group at MIT. The original abstract of the presentation was:

Many questions are raised as previously unreachable digital content is found in and among new repositories--is each repository an island or a separately searchable resource? SIMILE (Semantic Interoperability of Metadata and Information in Unlike Environments) has developed an extensive 'tool chain' for gathering and manipulating data assets. Richard Rodgers and MacKenzie Smith, MIT, will demonstrate how tools developed by the SIMILE project can be used as powerful instruments for the federation, discovery, exploration, and curation of metadata.

The mission of the SIMILE suite of projects is to build tools for data interoperability. Dealing with heterogeneous metadata in repository design and use is a complex challenge, and the position that SIMILE takes is that no matter what single metadata scheme you select at the start of a repository project, one runs into trouble as subsequent collections come in with other semantically-rich collection-specific metadata schemes. This puts the repository designer between a rock (semantic reduction and loss because metadata crosswalks are "lossy") and a hard place (one has serious scalability problems -- does one construct separate queries for each metadata schema -- if all of the uniqueness of the metadata coming to the repository is embraced.

SIMILE uses RDF and other semantic web technologies contributing to the solution of heterogeneous metadata problem. Statements about documents are inherently more mixable than the documents themselves, and RDF is a more mixable language than trying to harmonize metadata. RDF represents data as a graph, not as a table (RDBMS) or tree (XML). The tools created by SIMILE fall into four categories:

  • Convert: RDFizers (for converting structured data to RDF, such as MARC into RDF), Babel
  • Visualize: Gadget (a data graph viewer for XML; it constructs all of the XPATHS in a document and projects them along with frequency of occurrences as a way to look at XML documents from a structural level), Welkin (same as Gadget except for RDF)
  • Browse: Longwell (see below), Piggy Bank (Firefox plugin; RDFizes an HTML page by using JavaScript to scrape metadata from websites and putting it into your personal repository), Semantic Bank (a way to publish RDF and create communities of RDF content)
  • Lightweight UI: Timeline, exhibit widgets (highly interactive faceted browse displays that divide the processing between client and server through the use of AJAX)

Richard went into detail on Longwell, a faceted browser web application. Using a RDF triple-store backend (Sesame), Longwell presents data in a configurable, extensible user interface. One of the interesting technologies it uses is the W3C-defined Fresnel Display Vocabulary. There is not anything equivalent to CSS in the RDF world to a layout styling language. The W3C thought they could spur development of RDF tools if there was a way of expressing a display vocabulary in RDF, hence the Fresnel Display Vocabulary. Longwell has been embedded into DSpace as an optional advanced search engine called "DWell".

Update at 20070129T1646 — also see Dorothea Salo summary:

Source: Caveat Lector » SIMILE
Address : http://cavlec.yarinareth.net/archives/2007/01/28/simile/
Date Visited: Mon Jan 29 2007 16:43:35 GMT-0500 (EST)

The text was modified to update a link from http://cavlec.yarinareth.net/archives/2007/01/28/simile/ to http://web.archive.org/web/20081121161752/http://cavlec.yarinareth.net/archives/2007/01/28/simile/ on August 22nd, 2013.