Interoperability between metadata formatsMapping Dublin Core to ROADS templatesMichael Day November 1997 |
Dublin Core element | IAFA template |
---|---|
Title | Title |
Creator | Author-name (From Author (USER)* cluster) |
Subject | Keyword Subject-Descriptor-Scheme Subject-Descriptor |
Description | Description |
Publisher | Publisher-name (From Publisher (ORGANISATION)* cluster) |
Contributors | No direct equivalent |
Date | Creation-date |
Type | Category |
Format | Format-v* Requirements |
Identifier | URI-v* ISBN ISSN |
Source | Source |
Language | Language |
Relation | No direct equivalent |
Coverage | No direct equivalent |
Rights | No direct equivalent |
The Dublin Metadata Core Element Set (Dublin Core for short) was devised as a simple set of data elements so that Internet publishers and authors would be able to create their own metadata records.
The Dublin Core elements were originally agreed at a workshop held in March 1995 at Dublin, Ohio (Weibel, et al. 1995). The workshop report commented that "automatically generated records often contain too little information to be useful, while manually generated records are too costly to create and maintain for the large number of electronic documents currently available on the Internet". Dublin Core elements were designed to mediate between these extremes. A reference description of the Dublin Core element set can be found at the URL:http://purl.org/metadata/dublin_core_elements.
ROADS templates are used by the subject services which use the ROADS software. The templates are a development of the Internet Anonymous FTP Archive (IAFA) templates outlined in an IETF Internet Draft in 1994 (Deutsch, et al., 1994). The mapping of Dublin Core to these templates should provide an interesting examination of Dublin Core's potential role as an interchange format between metadata types, in particular with relation to the ROADS project (Heery 1996; Knight and Hamilton 1996). More information on the ROADS project can be found at the URL:http://www.ukoln.ac.uk/roads/.
Title
This should map neatly across to the ROADS template Title.
Creator
Dublin Core Author or Creator elements are defined as the "person(s) or organizations(s) primarily responsible for the intellectual content of the resource". Dublin Core Creator elements could be mapped to the Author-Name part of the ROADS template Author-(USER)* cluster. Differences in format are not as crucial here as it would be mapping to a more complex scheme like MARC. If a Dublin Core SCHEME is added, e.g.:
Author (type=USMARC): 100 1 Doyle, Arthur Conan $c Sir $d 1859-1930,
things get more complicated.
Subject
In Dublin Core a SCHEME sub-element can be used to note which controlled indexing terms are being used, or which classification system is in use. e.g.:
Subject (scheme=LCSH): UNIX (Computer system) Subject (scheme=Dewey Decimal System): 004.251 Supercomputers--systems design
If the sub-element includes a well known indexing or classification system, then this could be extracted and placed in the ROADS template "Subject-Descriptor-Scheme" and the data itself could be attached in an "Subject-Descriptor". Presumably, well used indexing or classification schemes could be in an authority file so that the machine could identify them accurately. Alternatively, the SCHEME sub-element could map directly to the ROADS "Subject-Descriptor-Scheme", and the attached data in "Subject-Descriptor". However, this would rely on abbreviations for the schemes being used in a consistent manner.
If no SCHEME sub-element is used, the subject terms could be assumed to be suitable for the ROADS template "Keywords". In Dublin Core, the Subject element can contain any "keywords or phrases that describe the subject or content of the resource".
In Dublin Core the data elements are repeatable. Subject elements containing one or more SCHEME sub-elements are possible. All will have to map to their relevant place in an ROADS template.
e.g.:
Keywords: Supercomputers Keywords: UNIX Subject-Descriptor Scheme-v1: DDC Subject-Descriptor Scheme-v2: LCSH Subject-Descriptor-v1: 004.251 Supercomputers--systems design Subject-Descriptor-v2: UNIX (Computer system)
Description
This, in Dublin Core, refers to a textual description of the content of a resource. It will map fairly accurately to the ROADS template Description field.
Publisher
Dublin Core Publisher elements are defined as the "entity responsible for making the resource available in its present form". Dublin Core publisher elements could be mapped to the Publisher-Name part of the ROADS template Publisher- (ORGANISATION)* cluster.
Contributors
The Other Contributors element is intended to describe roles like editing, illustrating, compiling, etc., fundamentally, intellectual contrubutions to the resource not covered in the Creator element. It can take the form of a free text string:
Contributors: Transcribed by the University of Maryland at College Park Libraries Humanities Electronic Text Center
or can be defined by a "type" or "role" sub-element:
Contributors: (role=Editor): Harnad, Stevan Contributors: (role=Illustrator): Bailey, Sian
Whichever is used, there is no obvious place in ROADS template based records where this data could be accurately mapped. The closest is the Author - (USER)* category.
Date
The Dublin Core Date of publication element is intended to reflect "the date the resource was made available in its current form". Recommended practice is for an ANSI X3.30-1985 term (YYYYMMDD) to be used. Modified dates can be identified with a qualifier.
Date: May 6, 1995 Date: 19950506 Date (modified): 19970206
It should map to the ROADS template element Creation-Date. There are potential problems with compatibility between date formats. ROADS templates do not specify what form of date should be used. A conversion program might have to convert from ANSI X3.30-1995 terms to a more human-readable form for the ROADS template. Modified dates could map to Last-Revision-Date.
Type
Resource Type: defining the genre or category of the object, it would probably best map to the ROADS template "Category". There are no problems with semantics, although it might be thought best that there might be an authority list of the most well-used terms.Format
The Dublin Core Format element is intended to provide information about the hardware and software requirements to display or operate the object. To this extent, examples like Windows 3.1 executable file, HTML file or ASCII file, would best map to the ROADS template "Format-v*". If there is more than one format given in a Dublin Core record, ROADS templates would have to automatically generate additional Format-v* elements. If, however, Dublin Core Form elements are free text descriptions of how the object can be displayed or operated, it would map better to the ROADS template "Requirements".
Identifier
The Resource Identifier in Dublin Core is the string or number used to uniquely identify an object. This includes things like ISBNs and ISBNs, as well as URLs. The type of identifier could be identified by a scheme.
Identifier (scheme=ISBN) = 0-19-097636-X Identifier (scheme=URL) = http://www.ukoln.ac.uk/metadata/home.html
ROADS Templates include attributes for ISBN, ISSN and URI-v*. With schemes present, URLs, ISBNs and ISSNs could be adequately mapped to ROADS. Other, non-standard, identifiers would not, however, necessarily fit into the ROADS templates.
Source
The Dublin Core source element refers to the object from which the object being catalogued is derived, e.g. the previous version of a document.
The source element in ROADS templates are designed to give information as to the source of the object. It is not used in the SERVICE template, but can be included in the DOCUMENT template. This is not necessarily related to an identifier as defined by Dublin Core but is usually a short form of text. A short form of text could presumably be inserted, e.g. "Derived from:" if necessary.
Language
Language in Dublin Core specifies the language of the intellectual content of the object. Where practical, the guidelines note that the content of this field should coincide with the Z39.50 three character codes for written languages.
Again, abbreviations can be used and the source can be included as a scheme:
Language (scheme=USMARC) = spa
In ROADS templates, the Language-v* template is used for the language in which the object is written. Note that it can also be used for the programming language in a SOFTWARE template.
Relation
The Dublin Core relation element gives the relationship of the object to other objects. This could be to other documents in a hierarchy, or maybe to the parent electronic journal, although other relationships are possible. The use of this element is currently under discussion.
ROADS templates do not currently contain relation elements. The Dublin Core relation element will not therefore map to ROADS templates. However discussion is currently taking place within the ROADS project to ensure that basic relationships (e.g. Parent and Child relationships) can be identified in some way.
Coverage
The Dublin Core coverage element describes spatial and temporal characteristics of an object. It would be used for GIS or geospatial data, or something requiring time elements. It has a possible "type" qualifier.
Coverage (type = spatial) = The Atlantic Ocean Coverage (type = spatial, scheme = LATLONG0 = {West - 180, East = 180, North = 90, South = 90} Coverage (type = temporal, scheme = ANSI X3.30-1985) = {Begin = 19910101, Eng = 19930601}
There is no ROADS templates equivalent of this element, although it could provide part of the Description.
Rights
The Rights Management field in intended to provide a link to a rights-management statement or copyright notice, so that these conditions can be linked to the record.
There is no ROADS/IAFA equivalent to this.
The 1995 OCLC Dublin Core metadata workshop report gave some examples of records encoded using the Dublin Core. The first was created by a subject specialist without specific library cataloguing experience. The Dublin Core elements have been amended to reflect current practice
Dublin Core record:Title: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web. Title: (Subtitle) Universal Resource Identifiers in WWW Creator: Berners-Lee, T. Subject: IETF, URI, Uniform Resource Identifiers Publisher: CERN Date: 1994 Type: Internet RFC Format (scheme=IMT): text/plain Identifier(scheme=URL): gopher://gopher.es.net:70/0R0-57601-/pub/rfcs/rfc1630.txt Relation (type=child)(identifier=URL): http://ds.internic.net/ds/dspg1intdoc.html Relation (type=sibling)(identifier=URL): http://ds.internic.net/rfc/rfc1738.txtIAFA / ROADS template record:
Author-Name: Berners-Lee, T. Category: Internet RFC Creation-Date: 1994 Format: text/plain Keyword: IETF, URI, Uniform Resource Identifiers Publisher-Name: CERN Title: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web. Title: Universal Resource Identifiers in WWW Template-Type: DOCUMENT URI-v1: gopher://gopher.es.net:70/0R0-57601-/pub/rfcs/rfc1630.txt
Notes:
Most of the record maps quite easily onto the ROADS template. Somehow it will have to work out whether a DOCUMENT, SERVICE or other Template-Type is required.
The Title and Title (subtitle) in Dublin Core is potentially confusing, and would result in two Title elements in ROADS templates. If, however, conversion software could recognise the (subtitle), then it could conceivably add the relevant syntax:
Title: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web: Universal Resource Identifiers in WWW
The Relation elements in Dublin Core are completely ignored.
Title: On the Pulse of Morning Author: Maya Angelou Publisher: University of Virgina Library Electronic Text Center Contributors: Transcribed by the University of Virginia Electronic Text Center Date: 1993 Type: Poem Format: 1 ASCII file Source: Newspaper stories and oral performance of text at the presidential inauguration of Bill Clinton Language: EnglishROADS template record
Author-Name: Maya Angelou Category: Poem Creation-Date: 1993 Format-v1: 1 ASCII file Language-v1: English Publisher-Name: University of Virginia Library Electronic Text Center Source: Newspaper stories and oral performance of text at the presidential inauguration of Bill Clinton Template-Type: DOCUMENT Title: On the Pulse of Morning
Notes:
The DC "Contributors" element, without a further qualifier (e.g. publisher, compiler) does not map onto a ROADS attribute. In this case, however, this is not a major problem as the transcriber is also the publisher.
A more complex record
Dublin Core recordTitle: Assessing Information on the Internet: Toward Providing Library Services for Computer Mediated Communication Creator: Martin Dillon Creator: Erik Jul Creator: Mark Burge Creator: Carol Hickey Subject: scheme=LCSH: Internet (Computer network) Cataloging of computer files Information networks Computer networks Libraries--Communication systems Information storage and retrieval systems Publisher: OCLC Date: 1994 Type: ResearchPaper Format: 7 postscript files 1 Unix tar file Identifier: Scheme=OCLC: 155653163X Source: Martin Dillon, Erik Jul, Mark Burge and Carol Hickey. Assessing Information on the Internet: Toward Providing Library Services for Computer Mediated Communication. OCLC Technical Report Number, 1234567. Dublin, OH.:OCLC, 1993. Language: English Relation: For a Web page listing Internet accessible OCLC research publications go to: http://www.oclc.org/oclc/menu/reschdoc.htmROADS template record:
Author-Name: Carol Hickey Author-Name: Erik Jul Author-Name: Mark Burge Author-Name: Martin Dillon Category: monograph Creation-Date: 1994 Format-v1: 7 postscript files, 1 Unix tar file Language-v1: English Publisher-Name: OCLC Source: Source: Martin Dillon, Erik Jul, Mark Burge and Carol Hickey. Assessing Information on the Internet: Toward Providing Library Services for Computer Mediated Communication. OCLC Technical Report, 1234567. Dublin, OH.:OCLC, 1993. Subject-Descriptor Scheme-v1: LCSH Subject-Descriptor-v1: Cataloging of computer files Subject-Descriptor-v1: Computer networks Subject-Descriptor-v1: Information networks Subject-Descriptor-v1: Information storage and retrieval systems Subject-Descriptor-v1: Internet (Computer network) Subject-Descriptor-v1: Libraries--Communication systems Template-Type: DOCUMENT Title: Assessing Information on the Internet: Toward Providing Library Services for Computer Mediated Communication
Notes:
The Relation element is missing from the ROADS template.
Caplan, P. and Guenther, R., 1996, Metadata for Internet resources: the Dublin Core Metadata Elements Set and its mapping to USMARC. Cataloging & Classification Quarterly, Vol. 22, nos. 3/4.
Day, M., 1997, Mapping between metadata formats.
URL:http://www.ukoln.ac.uk/metadata/interoperability/
Deutsch, P., Emtage, A., Koster, M. and Stumpf, M., 1994, Publishing information on
the Internet with Anonymous FTP. IETF Internet Draft, September.
URL:http://info.webcrawler.com/mak/projects/iafa/iafa.txt
Heery, R., 1996, ROADS: Resource Organisation and Discovery in Subject-based
Services. Ariadne, No. 3.
URL:http://ukoln.bath.ac.uk/ariadne/issue3/roads/
Knight, J.P. and Hamilton, M.T., 1996, Overview of the ROADS software. (LUT CS-
TR 1010). Loughborough: Loughborough University of Technology, Department of
Computer Studies, March.
URL:http://www.roads.lut.ac.uk/Reports/arch/arch.html
Kunze, J.A., 1996, Guide to Creating Core Descriptive Metadata. Draft 3, 18
September.
URL:http://www.ckm.ucsf.edu/meta/mguide3.html
Library of Congress, Network Development and MARC Standards Office, 1997, Dublin Core/MARC Crosswalk
URL:http://lcweb.loc.gov/marc/dccross.html
MARBI, 1995, Mapping the Dublin Core Metadata Elements to USMARC.
Discussion Paper, no. 86.
URL:gopher://marvel.loc.gov:70/00/.listarch/usmarc/dp86.doc
MARBI, 1997, Metadata, Dublin Core, and USMARC: a review of current efforts.
Discussion Paper No. 99.
URL:gopher://marvel.loc.gov:70/00/.listarch/usmarc/dp99.doc
ROADS, 1995, Field descriptions for DOCUMENT, SOFTWARE, IMAGE, SOUND,
VIDEO, MAILARCHIVE, USENET and FAQ IAFA Template types. From ROADS
Manual.
URL:http://www.roads.lut.ac.uk/v1/IAFA-help/document.html
Weibel,S., Godby, J., Miller, E. and Daniel, R., 1995, OCLC/NCSA Metadata Workshop
report.
URL:http://www.oclc.org:5046/conferences/metadata/dublin_core_report.html
Weider, C., 1994, The Internet anonymous FTP archive templates: towards an Internet resource location system. Journal of Information Networking, Vol. 1, no. 3, pp. 256- 260.
UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Services Committee of the UK Higher Education Funding councils, as well as by project funding from JISC's eLib Programme and the European Union. UKOLN also receives support from the University of Bath, where it is based.
This document is a revision of an earlier draft (November 1996) dealing with mapping from the original proposed Dublin Core elements (1995) to ROADS/IAFA templates. Anyone interested in the earlier form of this document should look at: <URL:http://www.ukoln.ac.uk/metadata/interoperability/dcv1_iafa.html>.
For mappings from ROADS/IAFA to Dublin Core, see <URL:http://www.ukoln.ac.uk/metadata/interoperability/iafa_dc.html>.
This work was carried out for the Resource Organisation And Discovery in Subject-based services (ROADS) project funded by the Electronic Libraries (eLib) Programme. More information on ROADS can be found on the project's Web pages: <URL:http://www.ilrt.bris.ac.uk/roads/> |
Maintained by: Michael Day of UKOLN The UK Office for Library and Information Networking, University of Bath.
Document created: 3-Nov-1997
Last updated: 12-Aug-1998