Stuart Weibel, OCLC Office of Research (Weibel@oclc.org)
(OCLC, http://purl.org/metadata/dublin_core)
The term metadata simply means data about data. It is the term most often used in the Internet community for what has been known in the library community as cataloguing data, or resource description. The Dublin Core is a 15-element metadata set intended to facilitate discovery of electronic resources and tabulated in an addendum to this chapter. Originally conceived for author-generated description of World Wide Web resources, it has also attracted the attention of formal resource description communities such as museums and libraries.
The Dublin Core Workshop Series has gathered experts from the library world, the networking and digital library research communities, and a variety of content specialities in a series of focused invitational meetings. The building of an interdisciplinary, international consensus around a core element set is the central feature of the three-year evolution of the Dublin Core. The progress represents the emergent wisdom and collective experience of many stakeholders in the resource description arena. An open mailing list supports ongoing work.
The characteristics of the Dublin Core that distinguish it as a prominent candidate for description of electronic resources fall into several categories.
The Dublin Core is intended to be usable by non-cataloguers. It is expected that authors or Web-site maintainers unschooled in the cataloguing arts will be able to use the Dublin Core for resource description, making their collections more visible to search engines and retrieval systems. Most of the 15 elements have a commonly understood semantics that represents what might be described as a lowest common denominator for resource description (roughly equivalent to a catalogue card). As such, the Dublin Core is not intended to replace richer description models such as AACR2/MARC (Library of Congress 1997b) cataloguing, but rather to provide a core set of description elements that can be used by cataloguers or non-cataloguers for simple resource description.
On the Internet commons, disparate description models interfere with the ability to search across discipline boundaries. For example, libraries, museums, and the geographic information systems community use different standards for resource description. This reflects the different description needs of these communities, and the fact that such standards have evolved independently. At the level of
fine-grained description, element sets are different because they must describe different things. Most writers seldom associate a cloud-cover attribute with their documents, but if you are describing satellite images of farmland, this is a critical descriptor. Still, most resources share a core set of attributes that are similar from one discipline to the next, but have different names simply because they have evolved independently and at different times. Promoting a commonly understood set of core descriptors will improve the prospects for cross-disciplinary search by unifying related attributes. For example, an author and a creator can be sensibly thought of as the same attribute for the purposes of resource discovery.
The Dublin Core is intended to serve as this core element set.
Recognition of the international scope of resource discovery on the World Wide Web is critical to the development of effective discovery infrastructure. The Dublin Core has benefited from active participation and promotion in the United Kingdom, Australia, Sweden, Denmark, Norway, Finland, Germany, Thailand, Japan, Canada, and the United States.
Although initially motivated by the need for author-generated resource description, the Dublin Core has also attracted the attention of formal resource description communities. As the diversity and volume of Web resources increases, trusted intermediaries (such as museums and libraries) will achieve greater recognition as preferred sources of metadata for persistent resources. In the hands of cataloguing experts, the Dublin Core is expected to provide an economical alternative to more elaborate description models such as full MARC cataloguing (Library of Congress 1997b). It includes sufficient flexibility to encode the additional structure and more elaborate semantics appropriate to such applications.
The wide diversity of metadata needs on the World Wide Web requires an environment that supports the coexistence of many independently developed and maintained metadata packages. The Dublin Core is targeted specifically towards resource discovery, but one can imagine many functionally distinct packages that serve other goals (terms and conditions, archival management, administrative metadata, and many others). For example, a Terms and Conditions metadata package would include elements that describe rights holders, cost of acquiring a resource, restrictions on re-use of the resource, and related information. Recognition of the desirability of this sort of modularity has guided the evolution of the Dublin Core since the Warwick Workshop, and has been formalised as the Warwick Framework (Lagoze et al., 1996). The concepts articulated in this work have informed the ongoing development of a metadata architecture for the Web as well.
The World Wide Web Consortium (W3C) is the primary standards forum for the Web, and has recently begun to focus on implementing an architecture for metadata for the Web. The Resource Description Framework, or RDF, is evolving to support the many different metadata needs of vendors and information providers. Representatives of the Dublin Core effort are actively involved in the development of this architecture, bringing the digital library perspective to bear on this important component of the Web infrastructure (W3C 1997a). The evolving RDF metadata architecture will support a variety of resource description models, each with implications for functionality and management.
Currently the easiest way of deploying metadata on the World Wide Web is by embedding it in HTML documents (using the <META> tag). There are conventions that support inclusion of simple metadata in HTML versions 2.0 and above (c.f. Miller and Gill 1997). The HTML 4.0 specification released in July of this year includes additional attributes for the <META> tag that allow the qualifiers necessary for more complex implementations (W3C 1997b). The advantage of embedded metadata is that no additional system must be in place to use it; the metadata is integral to the resource, and can be harvested by Web - indexing agents.
A model more familiar to the library community includes what is known in Web parlance as a third-party label bureau; that is, an entity that collects and manages metadata records that refer to resources but are not embedded in the resource (a library catalogue, for example). This model is important not only to libraries and museums. It also supports the development of agencies that might label resources according to age, appropriateness, or other acceptability criteria.
A third model also involves management of records by a distinct entity, but not necessarily Dublin Core records per se. Managing a wide variety of data stores often involves reconciling very different description models. One approach to achieving interoperability in such an environment involves mapping many description schema into a common set such as the Dublin Core, giving users a single query model (Day 1997).
Much remains to be done to bring the Dublin Core to a state of sufficient maturity and stability to fulfil its promise as a foundation for resource discovery on the net. The main thrusts of continued development are enumerated below.
The Dublin Core elements emerged from the collective judgement and experience of the many participants in the process to date. As deployment spreads, the evolution of the Dublin Core will reflect experience with the ambiguities, conflicts, and deficiencies in the set. Standards of best practice will evolve in light of such experience.
The spread of a common set of resource description conventions depends in part upon the availability of clear user guidelines. Such guidelines must be developed in many languages but with a common purpose and orientation.
The Warwick Framework describes the characteristics of an architecture for metadata that will allow independently developed metadata element sets to co-exist. This implies that the 'consumers' of metadata (either people or software agents) will need formal online registries that describe the semantics, the structure, and the transport syntax of a metadata element set. Thus, an application finding Dublin Core metadata associated with a collection of resources might access the Dublin Core Metadata Registry to better understand the characteristics of the metadata. Work on metadata registries is still in an embryonic stage, but as the functional specifications evolve, they will become a central part of the infrastructure necessary to develop and manage change for a metadata set such as the Dublin Core.
Tools for creating and managing Web-based metadata are evolving now. As the infrastructure evolves and standards become stable, these tools will become commonplace in authoring, site management, and resource management applications.
The development of the Dublin Core has been a voluntary effort on the part of many disparate stakeholders in resource description. As it becomes more widely deployed, standards of best practice must be formalised.
Work reported here by the AHDS and UKOLN represents development on several of these fronts, but especially regarding the refinements of the Dublin Core element set and the formalisation of best practice. It is to those refinements, and the formal evaluation process from which they emerged to which attention now turns...
Dublin Core elements may be optionally applied, extended with implementation-specific TYPE attributes, and repeated as necessary when describing any given resource.
Element Name | Element Description |
---|---|
Title | The name given to the resource by the CREATOR or PUBLISHER. |
Creator | The person(s) or organisation(s) primarily responsible for creating the intellectual content of the resource. |
Subject | The topic of the resource: keywords or phrases that describe the subject or content of the resource, including controlled vocabularies |
Description | A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. |
Publisher | The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. |
Contributor | Person(s) or organisation(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the CREATOR element (for example, editors, transcribers, and illustrators). |
Date | The date the resource was made available in its present form. |
Type | The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. It is expected that TYPE will be chosen from an enumerated list of types. |
Format | The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. |
Identifier | String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element. |
Source | The work, either print or electronic, from which this resource is derived, if applicable. |
Language | Language(s) of the intellectual content of the resource. |
Relation | Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. |
Coverage | The spatial and temporal characteristic of the resource. Formal specification of COVERAGE is currently under development. |
Rights | The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a service that would provide such information dynamically. |
Send comments or questions to info@ahds.ac.uk
Last modified: Monday, 17-Nov-97 16:52:01 GMT by D. Greenstein
URL: http://www.ahds.ac.uk/public/arlist.html
This page was originally part of the Arts and Humanities Data Service (AHDS) Website: http://ahds.ac.uk/public/metadata/disc_03.html |