Metadata: an overview of current resource description practice
Work Package 3 of Telematics for Research project DESIRE (no. 1004) |
Title page
Table of Contents |
Much of the driving force behind the development of the templates came from private companies, in particular from Bunyip as part of their development of internet navigational tools and directory services; and from Martijn Koster at Nexor as a personal ini tiative. The aim of the IAFA template designers was to construct a record format which could be used by ftp archive administrators to describe the various resources available from their own archives.
The original intention was that each ftp site administrator would be responsible for ensuring that IAFA templates were available for each file on their archive. This information would be available for individuals visiting the archive and also, if ftp arch ive sites followed a common set of indexing and cataloguing guidelines, then it would be possible for software (such as Harvest) to automatically pick up the records. This is in fact happening in some implementations of the IAFA/whois++ templates, althoug h in others records are being created centrally. The recently developed directory service software, whois++ allows search and retrieval of databases created in this way, and also offers the possibility of searching across multiple databases.3 Experimental work is being done using the Common Indexing protocol (CIP) which gathers together a 'centroid' or summary from a number of database to form an 'index server'. The index server contains an index of all unique attribute values contributed by the centroids ., and searches can be referred from one index server to another by interlinking the servers in a mesh.4
Supporters of IAFA templates have widened the original aim, and the intention now is to devise a record format simple enough to be generated by the wide variety of individuals and organisations involved with creating resources on the Internet, whether on web servers or ftp archives. The underlying philosophy is that it must be the information providers who create metadata records if indexing of the Internet is to be a viable proposition. Given the instability of network resources the alternative of centra lly creating records would be a high cost option.
Each record can only have one template type, but any of the other data elements can be repeated. It is intended that template types and data elements should be extensible, although extensions would not be inter-operable unless agreed between implementatio ns.
Every time an individual or organisation occurs in a record there are a number of common data elements required to describe them e.g. name, address, telephone number, e-mail address. These logically grouped data elements are termed clusters in the guideli nes and can be used to save indexing time by creating the details once then referring to them by a unique handle. The IAFA guidelines define the content of clusters for both individuals and organisations. Clusters of data elements can be identified by a u nique handle although it is dependent on the implementation how the cluster information is incorporated into the record. Further proposals to extend the use of clusters have been circulated by Bunyip as part of the development of more detailed White Pages whois++ templates for use with the whois++ protocol. These proposals suggest definitions of further clusters at a lower level for names, phone numbers and addresses. In addition it is proposed that all clusters would include record management details.
Each record and cluster within the database is identified by a string of characters and/or digits unique within the system on which it resides.
Certain attributes can be repeated. Within the IAFA definition this is done through the mechanism of variants. The first occurrence of an attribute is variant-1, the second variant-2. Related groups of attributes that are repeated are linked by the varian t number e.g.
class-v1
class-scheme-v1
class-v2
class-scheme-v2
Within the whois++ schema the order in which attributes are stored is significant and links are maintained in this way.
Effort has been made to ensure the templates are 'human readable' which means less processing is required to make the data understandable. This helps to ensure there is a low entry cost to implement the templates. Attribute names are therefore written in full.
The IAFA templates distinguish persons and organisations by prefixing the name of the user (person) and organization clusters by the role definition e.g.
Author-(USER*)
Publisher-(ORGANIZATION*)
Document
Dataset
Mailing list archive
Usenet archive
Software package
Image
Other template types are designed for use in the context of ftp archives to provide information about a particular ftp site:
Site configuration information
Logical archives configuration
Service (e.g. on-line catalogues, information servers)
Mirror (details of sites which mirror files including information on frequency of update from the source)
The configuration files would be relevant for the automatic collection of records, and in a broader context, the service template would be used to describe free-standing resources.
Templates for 'document like objects' include attributes for the size, format, character set and method of access. The guidelines set down that different versions of the same resource are described as variants. If a resource has 'the same intellectual con tent' it is taken to be the same resource regardless of language or text format (ASCII, Adobe, Postscript, etc.).
e-mail addresses: RFC 822
host names: RFC 1034
host IP addresses: defined in guidelines
numeric values: defined in guidelines
dates/times: RFC 822 amended by RFC 1123
telephone numbers: defined in guidelines
latitude/longitude: defined in guidelines
personal names: BibTex (see separate entry for BibTex)
formats of resource: RFC 1521
The diverse locations of these rules, and the relative lack of detail compared to traditional cataloguing manuals, will inevitably lead to inconsistencies in practice. It remains to be seen whether the indexing and retrieval software can ameliorate the inconsistencies or whether 'simplified cataloguing rules' will need to be drawn up.
As yet there is no agreed mechanism for controlling amendments and additions to the template structure. Establishing a means to communicate and control changes to the templates would be an essential step in the move towards a standard. Until then the tend ency is for attributes to proliferate and for the overall structure to remain unstable.
Bunyip are now leading development of a whois++ White Pages directory system. Within the eLib framework, so far three projects SOSIG (Social Science Information Gateway) <URL:http://sosig.ac.uk> and OMNI (Medical Information Gateway) <URL:http://omni.ac.uk> and ADAM (Art, Design, Architecture and Media) <URL:http//adam.ac.uk> are using the ROADS software. ROADS uses IAFA templates for description of resources, and the current release (version 1 in beta test May 1996) incorporates the who is++ protocol.
Within the UK there are also other implementations. The Internet Parallel Computing Archive (IPCA) at the University of Kent uses IAFA templates for a database containing information on parallel computing.5 At the University of Manchester, a volunteer eff ort NetEc provides a database of resources in economics using the IAFA template as the basis for the record structure <URL:http://cs6400.mcc.ac.uk/NetEc.html>.
2. Bunyip have issued a draft outline of the whois++ template. URL: ??check article in Ariadne
3. RFC 1835. P.Deutsch et al. Architecture of the whois++ service. IETF, August 1995 http://
4. RFC 1914 P. Faltstrom, R. Schoultz & C. Weider. How to Interact with a Whois++ Mesh IETF Proposed standard protocol, February 1996
5. David Beckett. IAFA templates in use as Internet metadata. <URL:http://www.w3.org/pub/Conferences/WWW4/Papers/52/>
Next | Table of Contents |