Defining Metadata and Making Metadata Accessible

In preparation for the last webinar of the three-part series "Using RDA: Moving into the Metadata Future", I'm reading again Karen Coyle's "Library Data in a Modern Context" -- the first chapter of Understanding the Semantic Web: Bibliographic Data and Metadata. Right at the start she has a clear and useful definition of this thing we call "metadata."

The most common definition of metadata is “data about data.” This short, catchy definition is worthy of a successful advertising campaign. Unfortunately, it doesn't really help us understand metadata, and is actually somewhat incorrect. A more useful definition is decidedly less snappy, but can help us understand the helpful role that metadata can play in facilitating information access. In fact, a functional definition gives us a viable roadmap for our own studies of metadata utility and quality.

So here it goes—metadata is constructed, constructive, and actionable:

Constructed: Metadata is not found in nature. It is entirely an invention; it is an artificiality.

Constructive: Metadata is constructed for some purpose, some activity, to solve some problem. The proliferation of metadata formats that seem similar on the surface is often evidence of different definitions of needs or of different contexts. We may dream of a universal set of metadata for some set of things, like biological entities, printed books, or a calendar of events, but are likely to be disappointed in practice.

Actionable: The point of metadata is to be useful in some way. This means that it is important that one can act on the metadata in a way that satisfies some needs. ((Coyle, Karen. “Library Data in a Modern Context.” Library Technology Reports 46.1 (2010): 5-13.))

A little further on Karen focuses on the actionablity of metadata. I have a heightened awareness of the need for other-than-visual access to information based on the last few months of activity with my previous employer, so I reread this section with "new eyes" (so to speak):

...today's metadata must be in a form that can be processed by computers, and the sense that it is “actionable” really needs to be interpreted as being “actionable by electronic machines.” Even when the final goal is to display the data to humans in an understandable form, the data will undergo some machine processing on the way to its destination on a screen [or] in printed form or when read aloud by a screen reader.

I added that last part. The transformation of the meaning of the metadata into a visual form is but one possible sensory input across the human-computer divide. It is important to also design interfaces that bring meaning to data by supplying labels to values in alternate ways. For the MARC 300 field, it is the difference between "ix, 74 p. : ill. ; 23 cm" and "9 pages of introductory material followed by 74 numbered pages. Includes illustrations. 23 centimeters high." If the only way to transmit this information was auditory, which one of these would you like spoken to you? Is it: "eye-ex, seventy four pee. ill. twenty three cem"?

Now let's try to engineer that backwards. Is the auditory version easier to do with:

300    |aix, 74 p. :|bill. ;|c23 cm

or something like this made-up, MODS-like markup:

<physicaldescription>
  <extent>
    <pagination>
       <pages type="introductory">9</pages>
       <pages type="numbered">74</pages>
    </pagination>
    <illustration />
    <height unit="cm">23</height>
  </extent>
</physicaldescription>

With the second, we can produce something like the first -- or even the abbreviated display version. But it is considerably more difficult to create the auditory version from the first, particularly with the wide variation of punctuation encoding ISBD offers. It just isn't machine actionable, which makes it difficult to transform, reuse, and translate that data in another context.

I'm reminded too of this recent quote from Jonathan Rochkind: "Of course, our legacy environment is even worse, with the ‘data model’ being supplied by an unholy combination of ISBD ... and MARC...." It would be good to stop doing our data entry in the language of the computer (e.g. MARC). Based on the chat from the first webinar in the series, we wouldn't expect catalogers to type out the XML fragment above. There should be computer-assisted workflows to capture the data and store it with all the required semantics. That XML would be used for machine-to-machine communication and transformation into the output desired by the user -- be it a short-hand visual display or an auditory reading of information.