A review of metadata: a survey of current resource description formats
Work Package 3 of Telematics for Research project DESIRE (RE 1004) |
Title page
Table of Contents |
Note: See also the entry for MARC.
The record structure of USMARC, as for all MARC formats, adheres to ISO 2709: 1981. This appears also as a national standard in the US, ANSI Z39.2.
The content of the USMARC record is not governed by an international standard, but by a cataloguing manual produced by the Library of Congress. The USMARC manual is issued by the Library of Congress and additions and amendments are controlled by the Library of Congress on the advice of the US MARC Advisory Group. This Group is made up of MARBI (The American Library Association's Machine-Readable Bibliographic Information Committee) and representatives from the US National Libraries, the National Library of Canada, the National Library of Australia, large bibliographic utilities (OCLC and RLIN), special library associations and library system vendors. The Library of Congress regularly publish discussion documents and proposals for comment which are considered at the twice yearly MARC Advisory Group meetings and, if agreed, are published occasionally as updates to the MARC format.
Traditionally USMARC has been used by the library community in the US. Over recent years the national libraries of Canada and Australia have also adopted USMARC rather than maintaining separate formats. In addition the British Library have agreed on a timescale for a convergence programme with USMARC (see UKMARC section for details). Within the US the dominant bibliographic utility, OCLC, uses USMARC with some added variants. Widespread use of USMARC in the US enables sharing of cataloguing effort between the majority of academic and public libraries.
Over the last five years the USMARC community has been considering and adopting changes to the format to allow for cataloguing of electronic networked resources. At the start this was seen as an extension of the work done in the 1980s to describe computer files. The formats had become increasingly inadequate as a means of describing networked resources which needed to include details of access methods and addresses. After considerable debate within the USMARC community a new field, 856, has been adopted for the location of electronic resources. Guidelines for its use have been issued by the Library of Congress. Changes for describing online services are still under consideration.
The Intercat project has been instrumental in progressing format development in this area. This is a project which started in July 1995 for cataloguing Internet resources and the timescale has recently been extended. It is led by OCLC with partial funding from the US Department of Education. The project involves participation of 200 libraries more than 60% of which are academic libraries; almost all active participants being in the US.
The Intercat project has catalogued a total of approximately 6000 resources to date. This relatively small number (as a comparison during a similar period OCLC's NetFirst database added 50,000 records, and webcrawler global search services many millions) reflects the co-operating libraries selection criteria. Libraries select only those resources they wish to integrate with their MARC library catalogues, and they select resources only if they are of sufficient quality and stability to warrant the effort of cataloguing. The project is by no means an attempt to 'catalogue the Internet'. There has been extended discussions on the Intercat mailing list on their criteria for selection of material. Many libraries intend to include in their catalogue only locally available resources i.e. those electronic resources held on a local server which are 'owned' or 'managed' by their own institution.
The encoding of USMARC adheres to ISO 2709 in the same way as other MARC formats. The specific USMARC format is governed by a 'de facto' standard in the form of the USMARC manual produced by the Library of Congress. There is no move to formalise this as an international standard.
USMARC is an implementation of ISO 2709. USMARC records are written in the extended ASCII character set. The records consist of the leader, the directory and the data content fields. The leader consists of fixed fields containing coded data defining parameters for the processing of the record (such as the length of the directory entry), the directory contains entries listing the tag, starting location and length of each field in the record. The data content of the record is continued in fields of two types: variable control fields (fixed fields) and variable data fields.
USMARC formats are defined for three data types: bibliographic, holdings and authority records. This report will deal with the bibliographic record only. The USMARC Format for Bibliographic Data is designed for the description of different forms of bibliographic material: books, archives and manuscripts, computer files, maps, music, visual materials, serials. At one stage separate formats existed for each material type, but these have now been integrated.
Data in the record is contained in fields identified by a three digit tag. Fields containing data with a similar function is organised into groups identified by the first number in the tag:
· 0XX control numbers, provenance
· 1XX main entry
· 2XX titles and related information
· 3XX physical description
· 4XX series statements
· 5XX notes
· 6XX subject access
· 7XX added entries; linking fields
· 8XX series added entries
· 9XX reserved for local fields
The remaining numbers in the tags indicate further sub-division of content, and in general parallel content designation is preserved across the groups e.g.
· X00 personal author
· X10 corporate name
· X11 meeting name
· X30 uniform title
Further content designation is identified by a two character indicator following the tag, and by a two character sub-field markers. Within this scheme the digit 9 is used to indicate a local implementation. Most fields, and some sub-fields can be repeated.
The 0XX fields in the USMARC record contain fixed length data for information such as material type, date of publication, form, language.
The MARC record is highly developed for bibliographic and bibliographic-like data.
USMARC developed in the context of library cataloguing. It therefore deals with the various bibliographic data elements in a detailed way. However it is important to note that the content of fields is governed by cataloguing rules. USMARC is designed to provide a formatted display of 'a catalogue card' giving a description of the resourceas well as to provide access for the purposes of information retrieval. The rules for the content within fields are governed by the Anglo American Cataloguing Rules, and the ISBD. So the content of the 1XX and 7XX tag ranges are defined in terms of the cataloguing concepts main and added entries and, depending on their relationship with the work, an author might appear in one or the other ranges. Similarly an author could be defined as a personal author, corporate author or meeting name and this will affect the indicator value.
In addition USMARC format has implications for the authority control of the data content. Normally the data in fields 1XX, 4XX, 6XX, 7XX, 8XX will be subject to authority control.
In order to integrate cataloguing of network resources into existing legacy USMARC databases it is necessary to create bibliographic data elements according to AACR2. As the Intercat project has discovered, this requires further extension of the cataloguing rules and extended guidelines if it is to be done in a standardised way.
Although library cataloguing data is still by far the most predominant use for USMARC, there are possibilities for use as a 'vehicle' for metadata created to other standards such as GILS or Dublin Core. MARBI Discussion Paper No. 88 raises some of the problems in such attempts, and in particular looks at the problem of defining a generic author field in USMARC.
Specific 6XX tags are used for different controlled subject heading schemes e.g. Library of Congress subject headings, MeSH etc.
The 856 field has been approved for URIs. This field is designed to contain location and access information to make a connection, locate and retrieve an electronic document. Guidelines for the use of the 856 field have been issued by MARBI. This field may be repeated, and more than one access method may be used.
The 856 field is a structured field with subfields describing method of access, and it can be repeated to allow for different access methods. The 856 indicator details the mode of access over the network (e-mail, FTP, Telnet, dial up) or if none of these (e.g. HTTP, gopher, wais, prospero) then the mode can be defined in a subfield. Within the 856 field description of access methods, other than those taken from the indicator, follow the controlled vocabulary for Internet media types (also known as MIME types).
Non-bibliographic data is unstructured and tends to be placed in notes fields.
The MARBI Discussion Paper No. 49 presented a preliminary list of data elements to describe network information resources. This is developed in MARBI Discussion Paper No 54 (Providing Access to Online Information Resources). These papers map the required data elements onto USMARC fields and subfields.
For example there are proposals for:
· Type of resource 256$a File characteristics
· Frequency of update 310$a Current frequency
· Other providers of a database 582$a Related Computer File Note
The Intercat project has also involved discussion on its mailing list of the use of various other fields in this context. For example:
· Detailed contents e.g. list of web links 505 $a Contents note
· record review date no
· record creation date (in record label?)
Not relevant.
There are proposals to use the following fields:
· Access restriction notes 506 $a Restrictions on access note
· Mode of connection and resource address 538 $a Technical Details note
· Host administrative details contact 856$m
USMARC uses tagged links to indicate relationships between parts of a collected work. Tags can also be used to specify other relationships e.g. continued as; replaced by.
Allows for rich descriptions and detailed structure. See the entry for MARC.
Widely implemented and deployed. USMARC also influences other MARC formats.
Next | Table of Contents |