A review of metadata: a survey of current resource description formats
Work Package 3 of Telematics for Research project DESIRE (RE 1004) |
Title page
Table of Contents Previous - Next |
DESIRE: Peer Review Report | |||
Project Number: | RE 1004 (RE) | ||
Project Title: | DESIRE - Development of a European Service for Information on Research and Education | ||
Deliverable Number: | D3.2 | ||
Deliverable Title: | Specification for resource description methods: a review of metadata: a survey of current resource description formats | ||
Review Method: | Report Reading | ||
Principal Reviewer: | Name | Tony Gill | |
Address | Surrey Institute of Art & Design, Farnham, Surrey GU9 7DS, UK | ||
tony@adam.ac.uk | |||
Telephone | +44 (0)1252 722441 | ||
Fax | +44 (0)1252 712925 | ||
Credentials | Programme Leader: ADAM & VADS. The Art, Design, Architecture & Media Information Gateway (ADAM) is an Access to Network Resources project of the Electronic Libraries programme.The Visual Arts Data Service will store curated visual arts resources and resource descriptions. | ||
Summary: | Relevant | 5 (1 = poor, 5 = excellent) | |
State-of-Art | 4 | ||
Meets Objectives | 4 | ||
Clarity | 3 | ||
Value to Users | 5 | ||
Specific Criticisms | 1 | Small number of unsubstantiated assertions made | |
2 | Small number of excessive generalisations made | ||
3 | Some terminology used without adequate definition | ||
4 | Terms associated with specific metadata formats used inappropriately | ||
Developer Response: | 1 | (developer's response given to general comments below) | |
2 | |||
3 | |||
4 |
(Within this section Developer responses are italicised)
The Survey document attempts to provide background information about the pertinent issues to consider when selecting a metadata format for implementation, and consistently structured outline descriptions of significant metadata standards initiatives to date.
The document is split into two main sections; Part I is a discursive overview of metadata and the general issues relating to the description of networked resources for a variety of purposes, whereas Part II provides a more structured directory-style description of the key metadata initiatives worldwide.
Part I provides a generally coherent and accurate summary of the
issues, although it is somewhat terse in places, with certain
passages assuming a high degree of prior knowledge on the part
of the reader (see Clarity, below). There are also a small number
of generalisations and unsubstantiated assertions that, whilst
not necessarily disputed by this reviewer, possibly warranted
more detailed discussion. For example:
"It is unlikely that some monolithic metadata format will
be universally used. This is for a number of more or less well
known reasons." (page 5)
Some brief explanation of these reasons would be helpful.
(Explanation now included.)
"Newer approaches based on manual descriptions have initially
tended to focus on servers, and not describe particular information
objects on those servers [..]. The subject information gateways
fall into this category." (page 6)
The scope of the term 'subject information gateways' should be defined in this context before making this type of generalisation, since there undoubtedly exist subject-based information services that do not fall into the category as described.
(Definition now included.)
There is also some apparent inconsistency in the discussion of
the three-band model for classifying metadata formats. For example,
band one, a conceptual class of metadata format postulated in
the review, is described as being "relatively unstructured
data, typically automatically extracted from resources and indexed
for searching." The apparent inconsistency is between
the assertion that "the data has little explicit semantics
and does not support searching by field.", and the statement
that "there are moves to develop a shared format for exchange,
perhaps based on SOIF." The inconsistency is that the
Summary Object Interchange Format "is based on simple
attribute-value pair elements", and should therefore
support searching by field.
(Wording has been changed in the text to aid clarity. The reviewer
has not perhaps taken into account the flexibility of SOIF which
can be used for records with very little structure. In addition
'searching by field' implies some level of delineation of the
semantic content of a record over and above the two or three attribut-value
pairs that would be typical of a Band One record.)
The three-band model creates additional difficulties, since some of the other formats do not conform well to the defining characteristics of their class; for example, Alta Vista, a popular web crawler using a metadata format of the type described in band one, supports limited searching by field using HTML tags inherent in the resource itself. Similarly, Dublin Core records do not fully conform to the description of band two metadata formats, since they offer a relatively straightforward mechanism for describing relationships between objects. Overall, the three-band model appears to be somewhat artificial, and does not appear to add much value or clarity to the discussion.
(Any one format may not have all the characteristics of the
band in which it is placed, and a note to this effect has been
added to the text. In a number of discussions this grouping has
proved beneficial in identifying the differences and similarities
between formats.)
Taken as a whole, however, the Overview is an accurate, concise and useful introduction to the pertinent issues.
The use of a consistent structure across each entry in Part II,
the review of metadata formats, enables comparisons between diverse
metadata formats to be made, and the structure itself provides
a sensible and clear description of each format in the context
of the broader issues of resource description as outlined in Part
I.
The descriptions of each metadata format generally provide a good synthesis between an analysis of the format, and discussion of the broader factors affecting the development of networked information discovery and retrieval initiatives. The Implementations section in particular is useful for ascertaining which formats are attracting interest from the influential web browser developer community.
Comments on individual sections are below:
Caution should be taken when equating OSI (a framework for describing communication protocol layers) and TCP/IP (a family of communication protocols).
(Ambiguity now removed from text.)
The CHIO demonstrator requires the use of an SGML browser such as Panorama, in addition to a generic web browser, in order to view the SGML-encoded documents.
(Information added to text.)
The Conversion to other formats section could be updated to include a reference to the DC/USMARC crosswalk exercise.
'Hand lists' (in the museum and archive sense at least) are not equivalent to detailed catalogues, but are more akin to inventory lists.
(Text changed.)
Both the description of EELS and of EEVL talk of the absence of alternative formats for use by the engineering community, yet no cross-referencing between the EELS and EEVL is made.
(Cross-referencing now included.)
Describing mSQL as a search engine is potentially misleading; it is in fact a freely-available relational database management system.
(The text has been amended.)
The assertion that the inherent flexibility of the TEI Headers "might well lead to difficulties" could usefully be elaborated upon by examples of the type of difficulties that could be encountered as a result.
(The original comments on the implications for interoperability and distributed record creation have been elaborated.)
The style of writing throughout both parts of the document is necessarily technical in nature, with acronyms and often-obscure references scattered liberally amongst the prose; since no guidelines about the intended audience for the document were supplied, it has been assumed throughout this review that the document is aimed at a reasonably technical audience with some prior knowledge of the issues pertaining to information retrieval in the network environment.
The multiple authorship of the document occasionally results in noticeable changes in the prose style from section to section. This has a marginal impact on the clarity of the document as a whole.
The liberal inclusion of URL's throughout, while slightly detrimental to the clarity of the document in paper form, allow it to be employed as a useful starting point for more in-depth study, and reflects it's dual role as both a traditional paper document and an (arguably more useful in view of the hyperlinking capability) electronic resource.
( HTML versions of the document have been made available as it has evolved.)
A more serious barrier to clarity is created by the occasional use of terminology associated with a particular metadata format to describe another format; the most common examples are the misleading use of the term Template to refer to records, a practice that has developed amongst the ROADS/IAFA/WHOIS++ community (pages 73, 75), and the phrase Document-Like Objects (pages 44, 45, 84), coined and only loosely defined by example in the Dublin Core initiative and not defined in the document under review.
(Different communities tend to use different terminology and this is certainly the case with metadata. For example templates, schemas, formats are used to refer to the 'format' of a record by different communities. The reviewer refers to the SOIF section where indeed the Harvest User manual does make use of the term 'template' to reference both format and record. Wherever possible ambiguity has been removed in the text, but there will inevitably be some borrowing of terminology amongst authors who come from different communities themselves.)
Technical slang is also occasionally employed, for example vanilla ASCII (page 55), on-the-wire format (page 55). These should not, however, present much hindrance to understanding for a technical reader.
(Where it proves useful and enlivens the style technical slang has been allowed to remain.)
It would also be helpful for the term use chain to be defined.
(The meaning of this term can be gleaned from the context. It is a phrase in current use in the field.)
A glossary of acronyms, and possibly some technical terminology, would greatly increase the clarity and potential audience of the document, should this be considered worthwhile.
(We will consider adding a glossary as part of further project work on resource description.)
A small number of typographical errors, listed as an appendix to this review, were spotted during the review process.
Documents of this nature are extremely difficult to compile and present clearly, since the requisite information, which must be collected from sources throughout the world, assimilated and reorganised, is almost immediately out of date in such a rapidly-evolving field.
Nonetheless, this Survey is a valuable and timely attempt to provide a coherent overview of the current state of the art of networked resource description, providing as it does a reasonably detailed and consistently structured account of the majority of the significant metadata initiatives taking place globally.
The Survey's usefulness is significantly enhanced by its publication as an electronic resource, allowing the user to carry out more in-depth research by following hyperlinks to detailed information about individual initiatives and formats.
This document is almost certainly the most comprehensive (and for the time being at least the most current) introduction to the diverse metadata formats currently in existence.
Next | Table of Contents |