Metadata: an overview of current resource description practice Work Package 3 of Telematics for Research project DESIRE (no. 1004) | Title page Table of Contents |
The Inter-university Consortium for Political and Social Research (ICPSR) established a committee in May 1995 to develop a structured standard to describe social science data sets. The committee was a response to a perceived need amongst the social science archive community for an international codebook standard (a codebook generally contains information on the structure, contents, and layout of a datafile or data set).
Information documenting the proposed DTD (Documentation Type Definition) and content for the codebook standard can be found at <URL: http://www.lib.umich.edu/codebook.html>.
The standard is still being formulated, the committee will be meeting in October 1996 to agree on a final draft for the standard with the intention that implementations will begin before the end of the year.
The ICPSR is an international organisation with membership from 325 colleges and universities in North America and several hundred institutional members in Australia, Denmark, France, Germany, Great Britain, Hungary, Israel, the Netherlands, Norway, South Africa and Sweden. The codebook committee was established to be representative of all the archives and includes a representative from CESSDA (Council of European Social Science Data Archives), as well as representatives from Canada, Denmark, Norway and Germany. The elements for the codebook were chosen by reviewing a series of guidelines and standards in use by the social science survey, research, archive, and technical communities. The lists below include some of the materials that were examined:
Guidelines that prescribe what the codebook itself should contain (content standards):
Roistacher: 1980, A Style Manual for Machine-Readable Data Files
Geda: 1980, Data Preparation Manual (ICPSR)
Collins, Patrick and Jane Powers, 1991, The preparation of data standards for machine-readable data.
National Data Archive on Child Abuse and Neglect (Cornell University)
US Bureau of the Census, Statistical Research Division, Statistical Design and Methods Extension to Cultural and Demographic Data Metadata: CDDM draft standard 1995.
Federal Geographic Data Committee content standards for digital geospatial metadata
Standards that define how to describe the study:
Standard Study Description: developed by and for data archives, Council of European Social Science Data Archives.
ICPSR Study Description "Template" Manual
Essex Study Description outline (based on the Standard Study Description)
Standards that establish rules for producing records for cataloguing:
MARC
ISBD-CF: The International Standard Bibliographic Description for Computer Files
GILS: Government Information Locator System
ISO: International Standards Organization: ISO 690-2
Dublin Core: OCLC/NCSA Metadata Workshop recommendations
Descriptions of codebook elements produced as a by-product of computerised interviewing software:
Health and Welfare Canada
Computer Assisted Survey Methods, University of CA, Berkeley
Standards that establish rules for tagging the contents of the codebook text:
OSIRIS
TEI: Text Encoding Initiative DTD for SGML
EAD: Encoded Archive Description DTD for SGML
The standard is still in the development phase but the indications are that the initiative has wide support amongst the social science data archives, the ICPSR also hope that data producers and granting agencies will adopt the standard.
There are 5 main sections in the proposed structure:
Codebook header
Study description
Data files description
Record and variable description
Other study-related materials
Each of the 5 main sections contain further sub-sections and elements.
The basic bibliographic elements of the data set are described in section 2 Study description under the sub-section Citation:
Title statement of data set
title
subtitle
parallel title
common abbreviation
study number - producer
study number - archive
The description of subject is dealt with in section 2 - Study description under the sub-section Study scope:
Subject information
keywords
topic classification
None
The format of the data set is dealt with in section 3 Data files description:
Type of file - text, numerical, graphic, program source, etc.
These are provided for in section 2 - Study description under the sub-section Citation:
Distributor statement for data set
documentation distributor
contact persons
depositor
date of deposit
date of distribution
All administrative information is provided in section 1 - Codebook header. Sub-sections here include:
Title statement for documentation
Responsibility statement for documentation
Production statement for documentation
Distributor statement for documentation
Series statement for documentation
Version statement for documentation
Bibliographic citation of documentation
The source of the data set is provided in section 2 Study description under sub-section Citation, elements include:
Production statement for data set
producer
date of production
place of production
This information is provided in Section 2 - Study description under sub-section Data access:
Data set availability
original archive where study stored
collection note
extent of collection
completeness of study stored
number of files
Data use statement
restrictions
access authority
citation requirement
disclaimer
analysis conditions
other reanalysis conditions note
An SGML DTD has been proposed. Codebooks encoded into SGML could also be used for the production of data definition statements for use by statistical analysis software such as SAS or SPSS. There is also a proposal to produce a TEI compliant base tag set.
Details of language can be found in Section 2 Study description:
Documentation statement
Language (s) of written materials
There are fields for citing bibliographic information about and/or links to related materials and studies.
Full - provides a very rich and comprehensive description of data sets.
There are no specified protocols assigned to this format as yet but the committee are looking at the possibilities of using Z39.50.
This is a proposed standard, the developers have applied the DTD to some sample codebooks but they are not in use as yet.
Next | Table of Contents |