Dublin Core Abstract Model

Home > Documents >

Title:	Dublin Core Abstract Model
Creator:	Andy Powell UKOLN, University of Bath
Date Issued:	2003-08-11
Identifier:	http://dublincore.org/documents/2003/08/11/abstract-model/
Replaces:
Is Replaced By:	Not applicable
Latest Version:	http://dublincore.org/documents/abstract-model/
Status of Document:	This is a DCMI Working Draft.
Description of Document:	This document describes an abstract model for Dublin Core metadata records.

1. Introduction

This document specifies an abstract model for Dublin Core (DC) metadata records [DCMI]. The primary purpose of this document is to provide a reference model against which particular DC encoding guidelines can be compared, in order to facilitate better mappings and translations between different syntaxes.

2. Terminology

This document uses the following terms:

resource

A resource is anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources.

property

A property is a specific aspect, characteristic, attribute, or relation used to describe resources.

element

An element is a property of a resource.

element refinement

An element refinement is a property of a resource that shares the meaning of a particular DCMI element but with narrower semantics. Since element refinements are properties of a resource (like elements), element refinements can be used in metadata records independently of the properties they refine. In DCMI practice, an element refinement refines just one parent DCMI property.

value

A value results when an element or element refinement is used to describe a resource.

record

A record is some structured metadata about a resource, comprising one or more properties and their associated values.

value URI

A value URI is a URI that identifies the value of an element or element refinement.

value string

A value string is a simple string that represents the value of an element or element refinement. In general, a value string does not contain any markup.

value string language

The value string language indicates the language of the value string.

encoding scheme

An encoding scheme provides contextual information or parsing rules that aid in the interpretation of a value string. Such contextual information may take the form of controlled vocabularies, formal notations, or parsing rules. If an encoding scheme is not understood by a client or agent, the value string may still be useful to a human reader. There are two types of encoding scheme: vocabulary encoding scheme and syntax encoding scheme.

encoding scheme URI

An encoding scheme URI is a URI that identifies an encoding scheme. For all DCMI recommended encoding schemes, the URI is constructed by concatenating the name of the encoding scheme with the DCTerms namespace URI.

vocabulary encoding scheme

A vocabulary encoding scheme indicates that the value string is a term from a controlled vocabulary, such as the value "China - History" from the Library of Congress Subject Headings.

syntax encoding scheme

A syntax encoding scheme indicates that the value string is formatted in accordance with a formal notation, such as "2000-01-01" as the standard expression of a date.

marked-up text

A string that contains HTML, XML or other markup (for example TeX) and that is associated with the value of an element or element refinement.

rich value

Some marked-up text, an image, a video, some audio, etc. (or some combination thereof) that is associated with the value of an element or element refinement.

related metadata

A metadata record that describes a resource that is related to the resource described by a DC metadata record.

qualifier

A qualifier is the generic heading traditionally used for terms now usually referred to specifically as element refinements or encoding schemes.

structured value

A structured value is one of the following:

A value string that contains machine-parsable component parts (and which has an associated syntax encoding scheme that indicates how the component parts are encoded within the string).
Some marked-up text.
Some related metadata

3. Qualified DC model

The abstract DC model for qualified DC metadata records is as follows:

A qualified DC record is made up of one or more properties and their associated values.
Each property is an attribute of the resource being described.
Each property must be one of the elements or element refinements recommended by the DCMI [DCTERMS].
Properties may be repeated.
Each value may be identified by a value URI.
Each value may have a value string.
Each value string may have an associated encoding scheme.
Each encoding scheme is identified by an encoding scheme URI.
Each value string may have an associated value string language (e.g. en-GB).
Each value may have an associated rich value (some marked-up text, an image, a video, some audio, etc. or some combination thereof).
Each value may have some associated related metadata.

The italicised terms used in this description are defined in the terminology section above.

Note that, in general, the value string should be a string that does not contain any markup.

It recognised that many real-world metadata applications will use additional properties beyond those indicated in the third bullet point above. While such usage does not fall strictly within the definition of 'qualified DC' provided here, such applications are strongly encouraged to conform to the DCMI abstract model in order to achieve maximum interoperability with other DC metadata records.

4. Simple DC model

The abstract model for simple DC metadata records is a simplification of the qualified DC model defined above:

A simple DC record is made up of one or more properties and their associated values.
Each property is an attribute of the resource being described.
Each property must be one of the 15 DCMES elements [DCMES].
Properties may be repeated.
Each value has a value string.
Each value string may have an associated value string language (e.g. en-GB).

4.1 Dumb-down

The process of translating a qualified DC metadata record into a simple DC metadata record is normally referred to as 'dumbing-down'. In terms of the abstract model, the dumb-down algorithm is as follows:

Remove any related metadata and rich values.
Remove any encoding schemes (and encoding scheme URIs).
If a value string is not present and a value URI is present, use the value URI as the value string.
Repeatedly resolve any properties to their 'parent' property (as indicated by the 'Refines' attribute in the DCMI Metadata Terms recommendation [DCTERMS]).
Remove any resulting properties that are not one of the 15 DCMES elements [DCMES].

Note that software should make use of the DCMI term declarations represented in RDF schema language [DC-RDFS] and the DC XML namespaces [DC-NAMESPACES] to automate steps 4 and 5.

5. Encoding guidelines

Particular encoding guidelines (HTML meta tags, XML, RDF/XML, etc.) [DCMI-ENCODINGS] do not need to encode all aspects of the abstract models described above. However, DCMI recommendations that provide encoding guidelines should refer to the models described this document and should indicate which parts of the models are encoded and which are not. In particular, encoding guidelines should indicate whether any rich values or related metadata records associated with a value are embedded within a DC record or are encoded separately and linked to it using a URI.

References

DCMI: Dublin Core Metadata Initiative
<http://dublincore.org/>
DCTERMS: DCMI Metadata Terms
<http://dublincore.org/documents/dcmi-terms/>
DCMES: Dublin Core Metadata Element Set, Version 1.1: Reference Description
<http://dublincore.org/documents/dces/>
DCMI-ENCODINGS: DCMI Encoding Guidelines
<http://dublincore.org/resources/expressions/>
DC-RDFS: DCMI term declarations represented in RDF schema language
<http://dublincore.org/schemas/rdfs/>
DC-NAMESPACES: Namespace Policy for the Dublin Core Metadata Initiative (DCMI)
<http://dublincore.org/documents/dcmi-namespace/>

Acknowledgements

Thanks to Pete Johnston and the members of the DC Usage Board for their comments on previous versions of this document.

Appendix A - A note about structured values

This appendix discusses 'structured values', as they are used in DC metadata applications at the time of writing.

Many existing applications of DC metadata have attempted to encode relatively complex descriptions (i.e. descriptions that contain more than simply a property and its string value). These attempts have been loosely referred to as 'structured values'. It is possible to identify a number of different kinds of structured values. Four are enumerated below. The first two of these are recommended by the DCMI, in the sense that there are a number of existing encoding schemes that define values that conform to these definitions of structured values. The latter two are not currently recommended, but it is likely that they are in fairly common usage across metadata applications worldwide.

Labelled strings

These are value strings that contain explicitly labelled components within the string. Examples of this kind of structured value include:

DCSV

and the various DCMI syntax encoding schemes built on it - Period, Point and Box. An example of the use of DCSV in Period is:

<meta name="dcterms:temporal"
      scheme="dcterms:Period"
      content="start=Cambrian period; scheme=Geological timescale; name=Phanerozoic Eon;" />

vCard

for example:

<meta name="dc:creator"
      content="BEGIN:VCARD\nORG:University of Oxford\nEND:VCARD\n" />

Note that vCard is not currently a DCMI recommended encoding scheme.

Unlabelled strings

These are value strings that contain implicit components within the string, i.e. the components are determined based solely on their position within the string. Examples of this kind of structured value include:

W3CDTF

the date-time format used within most DC metadata. For example:

<meta name="dc:date"
      scheme="dcterms:W3CDTF"
      content="2003-06-10" />

Marked-up text

These are strings containing 'presentational' or other markup, for example adding paragraph breaks, superscripts or chemical/mathematical markup to a dc:description. It is possible to characterise various kinds of markup as follows:

Markup based on a version of HTML.
Markup based on other XML-based languages such as CML and MathML.
Non-XML markup languages such as TeX.

Related resource description (related metadata)

These are metadata records that describe a second resource (i.e. not the resource being described by the DC record). For example, a related metadata record associated with a dc:creator property could contain a complete description of the resource author (including birthday, eye-colour and favourite beverage if desired!).

In the past, 'related resource descriptions' have tended to be encoded using XML, vCard (see above) or by inventing multiple 'refinements' of DCMES properties (for example DC.Creator.Address). The RDF/XML encoding of DC (see below) provides us with a more thorough modelling of related metadata records through the use of multiple linked nodes in an RDF graph.

In DC metadata records, the following elements (and their element refinements) are used to provide the name or identifier of a second resource that is related to the resource being described:

dc:creator
dc:contributor
dc:publisher
dc:relation
dc:source

In the case of the first three, this is typically done by providing the name (or in some cases the name and a small amount of additional information in order to better identify the person or organisation) of the related resource as the value string.

In the case of the last two, this is typically done by providing the URI (or some other identifier) of the related resource as the value URI. However, where no identifier is available, the name of the related resource can be provided instead (or as well) using the value string.

It should be noted that the value strings of these elements (and their element refinements) are not intended to be used to provide full descriptions of the related resource.

Summary

The categories outlined above are not watertight and there are certainly overlaps between them. For example, labelled strings can be viewed as a type of non-XML markup language. In addition, there will be cases where marked-up text (e.g. MathML) can be viewed as a related resource description.

Nevertheless, the purpose of the categorisation used here is to try and analyse existing usage of complex metadata structures within current DC metadata applications. In the context of the abstract model proposed here, all the types of structured values outlined above form part of the qualified DC model:

A labelled string should typically be treated as related metadata (though it should be noted that DCSV and the various DCMI syntax encoding schemes built on it - Period, Point and Box - are currently treated as value strings with an appropriate encoding scheme).
An unlabelled string should be treated as a value string with an appropriate encoding scheme.
Marked-up text should typically be treated as a rich value.
A related resource description should be treated as related metadata.

Appendix B - RDF and the abstract model

This appendix discusses the relationship between the DC abstract model and the Resource Description Framework (RDF).

RDF currently provides DCMI with the richest encoding environment of the available encoding syntaxes. It is therefore worth taking a brief look at how the abstract model described here compares with the RDF model.

Note that the intention here is not to provide a full and detailed description of how to encode DC metadata records in RDF. Instead, three simple examples of the use of DC in RDF are considered.

Example 1: dc:creator

Figure 1 shows a simple RDF graph (and the RDF/XML document that represents it). The graph shows a resource with a single property (dc:creator). The value of the property is a second (blank) node, representing the creator of the resource. This second blank node has several properties, used to describe the creator, and an rdfs:label property that is used to provide the value string for the dc:creator property.

Figure 1

Figure 2 shows the same information separated into two graphs. In this case the related metadata that describes the creator has been more clearly separated from the description of the resource by moving it into a separate RDF/XML document. In order to do this, the node representing the value has been assigned a value URI, allowing the two nodes in the two RDF/XML documents to be treated as representing the same thing.

The related metadata in the second RDF/XML document is linked to the first using the rdfs:seeAlso property and the URI of the RDF/XML document. Note that it is not strictly necessary to separate the two graphs in this way; it is perfectly valid to represent the second graph as a sub-graph of the first, as shown in figure 1. However, for the purposes of this document, the two graphs have been separated in order to more clearly differentiate the DC metadata record from the related metadata. In some cases it will be good practice to facilitate this separation anyway. For example, in order to serve the second graph from a directory service of some kind.

Figure 2

Example 2: dc:subject

Figure 3 shows a second simple RDF graph (and the RDF/XML document that represents it). The graph shows a resource with a single property (dc:subject). The value of the property is a second (blank) node, representing the subject of the resource. This second blank node has an rdfs:label property that is used to provide the value string for the dc:subject property, an rdf:value property that is used to provide the classification scheme notation and an rdf:type property to provide the encoding scheme URI.

Figure 3

Figure 4 shows the same information separated into two graphs. In this case the related metadata that describes the subject has been more clearly separated from the description of the resource by moving it into a separate RDF/XML document. In order to do this, the node representing the value has been assigned a value URI, allowing the two nodes in the two RDF/XML documents to be treated as representing the same thing.

The related metadata in the second RDF/XML document is linked to the first using the rdfs:seeAlso property and the URI of the RDF/XML document. Note that it is not strictly necessary to separate the two graphs in this way; it is perfectly valid to represent the second graph as a sub-graph of the first, as shown in figure 3. However, for the purposes of this document, the two graphs have been separated in order to more clearly differentiate the DC metadata record from the related metadata. In some cases it will be good practice to facilitate this separation anyway. For example, in order to serve the second graph from a terminology service of some kind.

Figure 4

Example 3: dc:description

Figure 5 shows a third simple RDF graph (and the RDF/XML document that represents it). The graph shows a resource with a single property (dc:description). The value of the property is a second (blank) node with an rdfs:label property that is used to provide the value string for the dc:description property. The second node also has an rdfs:seeAlso property that links to a rich value - in this case some HTML marked-up text that provides a richer representation of the description.

Note that it is possible to embed the marked-up text within a single RDF graph (using rdf:parseType="Literal"). However, this is not shown here.

Figure 5

Summary

By re-visiting the second figure from example 2 (figure 4) it is possible to layer the terminology used in the abstract models above over the RDF graph.

A similar exercise could be undertaken for other encoding syntaxes (e.g. XML and HTML meta tags). As mentioned above, it may be the case that the full richness of the abstract model is not able to be offered in all encoding syntaxes.

Figure 6

Metadata associated with this resource: http://dublincore.org/documents/abstract-model/index.shtml.rdf