One of the DRC developers had a question recently that sparked a discussion about what to do with collections of objects. In order to answer the question of how to represent the notion of a collection within the repository, we're going to have to get pretty heavy into RDF: the Resource Description Framework. RDF is a language created by the Worldwide Web Consortium "for representing information about resources in the World Wide Web." If you already know about RDF -- or just want to see what a proposed solution is -- you can skip down to the "RDF for Collections in FEDORA" heading.
At the preface, I have to say that I'm increasingly uncomfortable with the word "collection" because it has become so overloaded in library usage, and like Carl Lagoze prefer the term "aggregation" to describe in a general sense what we think a collection is and what it could become. I probably bounce back and forth between the terms here, but am aiming to use "aggregation" and "aggregation object" more often.
I'm going to be pulling a lot of examples and language from the "RDF Primer", which I would recommend reading at some point. It is a very long, dense document, but if you can get through it you'll have a very good understanding of what RDF is and what is does for us.
The Primer describes RDF this way: "It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource.... RDF is based on the idea that the things being described have properties which have values, and that resources can be described by making statements ... that specify those properties and values."
There are three parts to an RDF statement about an object. "[T]he part that identifies the thing the statement is about (the Web page in this example) is called the subject. The part that identifies the property or characteristic of the subject that the statement specifies (creator, creation-date, or language in these examples) is called the predicate, and the part that identifies the value of that property is called the object."
These component make up what is called an "RDF triple." When written in tabular form an RDF triple is conventionally written in the order subject, predicate, object. To represent RDF statements in a machine-processable way, RDF uses the Extensible Markup Language [XML]. RDF defines a specific XML markup language, referred to as RDF/XML, for use in representing RDF information, and for exchanging it between machines.
For instance, imagine trying to state the (nominally, Dublin Core) descriptive metadata about a web page called http://www.example.org/index.html. In natural language, the descriptive elements could be:
In tabular form, this could look like:
In XML, this could look like:
Keep in mind, though, that we expressed the predicate here as Dublin Core; the predicate can be anything -- even something you make up!
RDF for Collections in FEDORA
RDF is used throughout FEDORA -- in fact, the Dublin Core properties can (and in our FEDORA configuration, are) expressed as RDF triples in an internal database and can be searched as such. But the RDF triples can be used to express more than just attributes about an object -- it can be used to express /relationships/ between objects. There is a whole section of the FEDORA docs called "Fedora Digital Object Relationships" that goes into more detail. Quotations and examples in this section are drawn from that document.
"Fedora digital objects can be related to other Fedora objects in many ways. For example there may be a Fedora object that represents a collection and other objects that are members of that collection. Also, it may be the case that one object is considered a part of another object, a derivation of another object, a description of another object, or even equivalent to another object."
FEDORA comes with a list of common relationships between objects, and other community or user-defined relationships may also be asserted. These relationships can be expressed in RDF notation:
"drc:100" is an aggregation object (otherwise known as a "collection object", but I've learned from others in the FEDORA community that "collection" is too loaded of a word) of which "drc:101" is a member. To put it in terms that we may be familiar with:
- drc:100 is the aggregation object for the "Charles E. Frohman Collection"
- drc:101 is a digital image of a photograph with the title "Work Crew" that is part of the Charles E. Frohman collection
- drc:101 is a digital image contributed by member institution "mu3ug"
So the issue becomes, I believe, to examine the pre-loaded set of relationships to match those against the existing relationships in the DMC and then do define any kind of unique relationships (such as "isFromInstitution") that we would want to express about our objects.