A review of metadata: a survey of current resource description formats

A review of metadata: a survey of current resource description formats
Work Package 3 of Telematics for Research project DESIRE (RE 1004)

Title page
Table of Contents

Previous - Next

Dublin Core

Note: see also the entry for Warwick Framework.

Environment of use

Documentation

'Dublin Core' is shorthand for the Dublin Metadata Core Element Set which is a core list of metadata elements agreed at the OCLC/NCSA Metadata Workshop in March 1995. The workshop report forms the documentation for the Dublin Core element set. (Stuart Weibel, Jean Miller, Ron Daniel. OCLC/NCSA metadata workshop report. OCLC, March 1995. <URL:http://www.oclc.org:5046/conferences/metadata/dublin_core_report.html>)

The workshop was organised by OCLC and the National Centre for Supercomputer Applications (NCSA) to progress development of a metadata record to describe networked electronic information. This workshop followed on from joint meetings and discussions of the American Library Association. The workshop brought together a range of interested parties from different professional backgrounds and subject disciplines, all of whom had been involved with metadata issues. The motivation progressing Dublin Core has been to reach a consensus among stakeholders on a minimal resource description which can be used for the benefit of all involved in the creation, search and retrieval of electronic resources. There has been high commitment and involvement from a range of professions (publishers, computer specialists, librarians and information workers) and sectors (library utilities, software producers, service providers, libraries).

The Dublin Core is positioned as a simple information resource description. However, importantly it also aims to provide a basis for semantic interoperability between other, probably more complicated, formats. A third target use is to provide the basis for resource-embedded description, initially with HTML documents.

Ease of creation

The objective of Dublin Core is to define a simple set of data elements so that authors and publishers of Internet documents could create their own metadata records with no extensive training. The Dublin Core approach is to have the level of bibliographic control midway between the detailed approaches of MARC and 'structured' TEI, and the automatic indexing of locator services such as Lycos. It is acknowledged that the Dublin Core is a minimal set, and that many 'publishers' or metadata producers may wish to augment this simple set with more specialised data.

Progress towards international standardisation

Initial attempts to include consideration of Dublin core elements as part of an IETF working group were not taken forward, on the grounds that the content of metadata records is outside the scope of IETF standards. However the Dublin core elements have been considered by USMARC as central to their development of the USMARC record so the impact has already been seen in the formation of other metadata.

Ambitions to actualise Dublin Core were carried forward by a second international workshop which took place in the UK at the University of Warwick in April 1996 sponsored by UKOLN and OCLC. This workshop looked at the implementation of Dublin Core and the requirements for extensibility, change control and dissemination. The need for a registration agency was discussed at this meeting.

Format issues

Designation and encoding

The Dublin Core is a set of elements that can be used to describe a resource but there was initially no attempt to prescribe an encoding method or record structure. During the first Dublin Core workshop there was an explicit decision taken not to define syntax at this stage.

However certain principles were established for further development of the element set. Of particular relevance to encoding and designation are the principles of

extensibility: the core set can be extended with further elements to describe intrinsic data of particular relevance to a particular community
optionality: all elements are optional
repeatability: all elements are repeatable
modifiability: any element can be modified by one or more qualifiers

The sanctioning of qualifiers is of particular note as it is an attempt to bridge the gap between casual and sophisticated use. Qualifiers can be of two very different types: some indicating external schemes to be applied to processing e.g. OtherAgent(scheme=TEI), some specifying more precise information about the attribute, in effect sub-dividing the element name e.g. OtherAgent(role=editor). If a scheme qualifier is used then this means the syntax of that scheme must be applied to the data in that element. So Author (scheme=USMARC) fields will contain data embedded with USMARC tags and sub-field markers, and OtherAgent (scheme=TEI) elements will contain data with TEI mark-up tags embedded. Potentially widespread use of qualifiers could cause severe problems with interoperability.

At the Warwick workshop a decision was taken to develop a concrete syntax for the Dublin Core in the form of an SGML DTD. (The proposed syntax is described in: Lou Burnard, Eric Miller, Liam Quin, C.M. Sperberg-McQueen, A Syntax for Dublin Core Metadata: Recommendations from the Second Metadata Workshop <URL:http://users.ox.ac.uk/~lou/wip/metadata.syntax.html>).

Content

Basic descriptive elements

The core element set includes the following bibliographic data elements:

Title (name of the object)
Author (person(s) primarily responsible for intellectual content)
Publisher (agent or agency responsible for making the object available)
OtherAgent (person(s) such as editors or transcribers, who have made other significant intellectual contributions to the work)
Date (date of publication)
ObjectType (genre of the object such as novel, poem, dictionary)
Language (language of the intellectual content)

The Author element name does not distinguish the form of author (personal/corporate/meeting). Similarly the OtherAgent element name does not express the precise role of the other agent. It would be possible to use qualifiers to make these more precise distinctions, but the Dublin Core documentation does not attempt to make comprehensive recommendations. Suggested qualifiers are:

Author(scheme=USMARC)=100 1 Doyle, Conan $c Sir, $d 1859-1930

OtherAgent(role=editor)=Weibel,Stuart L.

As soon as such qualifiers are used the complexity of processing the data, and the difficulties for interoperability, will increase.

Subject description

The core element set includes the data elements:

Subject (topic addressed by the work)
Coverage (the spatial and temporal characteristics of the object)

The subject element can be used for headings controlled by a known classification scheme indicated in the qualifier, or can contain free text. The Coverage element allows spatial or temporal data to be included for geospatial data. This data might be in unstructured form or in a format governed by a known scheme e.g.

Coverage(type=spatial)=Atlantic ocean

Coverage(type=spatial,scheme=LATLONG)=West=180,East=180,North=90,South=90

URIs

The core element set includes the data element:

Identifier (string or number used to uniquely identify the object)

The data in this element could be an identifier conforming to an internationally recognised scheme (e.g. URL, ISBN) or it could be a local, privately administered number (e.g. university technical report number). The qualifier would need to be used to make the identifier generally useful.

Resource format and technical characteristics

The core element set includes the data element:

Form (the data representation of the object such as Postscript file or windows executable file)

A constraint on the design of the Dublin Core, accepted by the workshop participants, was that the aim of the element set is to describe 'document like objects' (DLOs).

Administrative metadata

No administrative data is included in the Dublin Core set. A principle of intrinsically was established at the workshop which constrained the set to only include elements describing the intrinsic properties of the object. It would seem essential for any implementation of Dublin Core to include in a record such information as the record identification, record creation date, etc.

Provenance/source

The core element set includes the data element:

Source (objects, either print or electronic, from which the resource is derived)

This element could be used to link different versions of an object which have the same intellectual content, whereas the relation element would be used to link objects with a different intellectual content.

Host administrative details/Terms of availability/copyright

An agreed constraint on Dublin Core is that extrinsic data such as cost and details of access methods would be excluded from the element set. It was accepted that only elements for resource discovery would be included, not those elements specific to retrieval and request.

Other comments

At the Warwick Workshop it was decided that content-wise the Dublin Core should remain more or less as it was. It should not be indefinitely extended to encompass the variety of current and future metadata requirements.

Ability to represent relationships between objects

The core element set includes the data element:

Relation (relationship to other objects)

This element describes relationships to other objects with different intellectual content. It allows for a variety of relationships to be identified by use of the qualifier mechanism. Specification of a relationship would require use of at least two qualifiers, e.g.

Relation (type=ContainedIn) (identifier=URL) =http://www.ukoln.bath.ac.uk/metareview.html

Multi-lingual issues

The core element set includes the data element:

Language (language of the intellectual content)

The problems of use of non-ASCII characters within the record were deliberately not addressed.

Fullness

The fullness of Dublin Core is low, by design. The attempt to compromise with sophisticated use by the qualifier mechanism could potentially lead to highly complex, much fuller records.

Conversion to other formats

MARBI Discussion Paper No 86 (Mapping the Dublin Core elements to USMARC, Library of Congress, May 5, 1995 <URL:gopher://marvel.loc.gov:70/00/.listarch/usmarc/dp86.doc>) looks at options and problems in matching Dublin Core to USMARC. Because Dublin Core elements are less specific than MARC, some fields cannot be sufficiently identified to tag them correctly. For example the author field in MARC is identified as being personal or corporate name, whereas Dublin Core does not make this differentiation.

Other crosswalks are available:

From Dublin Core to USMARC by Rebecca Guenther / Network Development and MARC Standards Office (Library of Congress) <URL:http://lcweb.loc.gov/marc/dccross.html>

From Dublin Core to EAD/GILS/USMARC - by Eric Miller (OCLC). <URL:http://www.oclc.org:5046/~emiller/DC/crosswalk.html>

From Dublin Core to IAFA/ROADS templates - by Michael Day (UKOLN). <URL:http://www.ukoln.ac.uk/metadata/interoperability/dc_iafa.html>

From Dublin Core to Z39.50 tag set G - by Ray Denneberg (Library of Congress) - mail to Meta2 list, Feb. 1997. <URL:http://www.roads.lut.ac.uk/lists/meta2/0733.html>

Rules for construction of these elements

No formulation of rules

Protocol issues

Not yet applicable.

Implementations

There have been a few early implementations of Dublin Core.

National Document and Information Service <URL:http://www.nla.gov.au/2/NDIS/NDISintro.html>: this is a joint project between the National Libraries of Australia and New Zealand. Within this project the Dublin Core elements have been used as the core search attributes for their records, in effect the intersection between their various databases. There has been flexibility in the use of semantics with mapping of other 'search fields' to the Dublin Core set.

DSTC <URL:http://www.dstc.edu.au/>in Australia is using the Dublin Core in the Research Data Network Co-operative Research Centre project for resource discovery.

Next Table of Contents

Page maintained by: UKOLN Metadata Group
Last update: 10-Jun-1998