Discovering Online Resources Across the Humanities | Reports From the Front: Domain-Specific Perspectives on Cross-Domain Discovery [1997]

Discovering Online Resources. Reports From the Front:
Domain-Specific Perspectives on Cross-Domain Discovery

Introduction
UKOLN MODELS 4: evaluation of cross-domain resource discovery, Rosemary Russell, UK Office for Library and Information Networking
Archaeology Data Service: evaluation of resource discovery metadata for archaeological resources, Paul Miller, Archaeology Data Service
History Data Service: evaluation of resource discovery metadata for historical data sets, Cressida Chappell, History Data Service
Oxford Text Archive: evaluation of resource discovery metadata for electronic texts and linguistic corpora, Michael Popham, Alan Morrison, Jakob Fix, Oxford Text Archive
Performing Arts Data Service: evaluation of resource discovery metadata for moving image resources, Celia Duffy, Performing Arts Data Service
Performing Arts Data Service: evaluation of resource discovery metadata for sound resources, Celia Duffy, Performing Arts Data Service
Visual Arts Data Service: evaluation of resource discovery metadata for the visual arts, museums, and cultural heritage communities, Catherine Grout, Visual Arts Data Service

1 Introduction

The Dublin Core evaluation process undertaken by the AHDS and UKOLN involved a series of workshops held between December 1996 and May 1997. The first of these involved representatives from a wide range of academic, computing, and curatorial communities who explored whether and how access might be provided to networked scholarly information resources irrespective of their intellectual content and of where, how, and by whom they are managed. Addressing these general issues, it established a framework which helped to focus six further and more specialist workshops that formally evaluated the Dublin Core against the resource discovery requirements of a particular arts and humanities community and the information resources of interest to that community. Between them, the specialist workshops represented most of the subject, media, and curatorial perspectives of interest to the arts and humanities. In addition, their use of the Dublin Core as a touchstone in the evaluation process ensured that a common formalism was used to express different resource discovery requirements. Accordingly, the survey produced compatible results and enabled the AHDS and UKOLN to identify both consensus and conflict in the resource discovery needs of very large and substantial communities. A final meeting of workshop convenors was held to resolve conflicts, and thus to develop a unifying approach to resource discovery. That approach, based upon the Dublin Core, is reported in Chapter 3. The present chapter summarises the more focused evaluations of resource discovery requirements upon which it is built. The full text of the workshop reports are available online via the AHDS Web site at http://ahds.ac.uk.

2 UKOLN MODELS 4: evaluation of cross-domain resource discovery

Rosemary Russell, UK Office for Library and Information Networking

2.1 The Models 4 workshop: integrating access to resources across domains

This first workshop in the AHDS/UKOLN series involved nearly 50 participants representing a range of curatorial professions (e.g. libraries, museums, archives, and data archives), systems experts, and scholarly users in a discussion of whether and how to integrate access to information resources across domains. The topic reflected the hypothesis that people want to search for and locate relevant information resources, irrespective of their format and of whether they are held in libraries, museums, data archives, or any other organisational structure. The workshop set out to test the hypothesis, and if found to be true, to explore how resources and systems might be structured in order to realise it in information service environments.

The workshop was the fourth in a series of MODELS workshops (MOving to Distributed Environments for Library Services) conducted by UKOLN and sponsored by the Electronic Libraries Programme (more information about MODELS may be found at <URL: http://www.ukoln.ac.uk/models/>). MODELS was ideally situated to launch the AHDS's and UKOLN's work on metadata for resource discovery. As an ongoing concern, it aims to develop a framework for managing distributed library and information services by facilitating informed and focused discussion of salient strategic and practical issues, and by articulating the technical service models which emerge from those discussions. Even prior to this workshop, MODELS had established a track record for involving key stakeholders in discussions which ultimately initiated work of national significance.

2.2 Significant findings

2.2.1 Cross-domain resource discovery

It was agreed at an early stage in the workshop that cross-domain searching is highly desired by both scholarly and curatorial communities, making technical and service models worth pursuing. That is, the group agreed about the desirability of being able to search across a range of potentially complementary online information resources in a manner which, from the users point of view, obscured any differences that existed in the underlying resources' hardware, software, record structures, and query languages. What constituted cross-domain discovery, however, was seen to differ. For some it involved searching across the holdings of several institutions operating within a single curatorial tradition. In the archival community, for example, finding aids differ so considerably in both their structure and content that cross-domain discovery could be seen as searching across the holdings of two or more archives. From other perspectives, cross-domain discovery could entail searching across the different databases maintained by a single institution for different aspects of its collection - for example, by a library about its book, manuscript, and print collections. For still others, cross-domain resource discovery might entail searching for information across a range of library catalogues, museum databases, archival finding aids, data archive catalogues, and subject-based catalogues of World Wide Web resources. The different definitions of cross-domain discovery encouraged the group to focus upon...

2.2.2 Defining domains

Flexibility in approach was deemed desirable in defining the domains which surrounded discrete collections of information resources. Thus, domains could reflect the curatorial traditions which shaped the way in which those resources were managed (libraries, museums, archives, etc.), the academic disciplines within which they were principally created or used (archaeology, film studies, geology), or the regional settings where they were stored (north Wales, south-east). Crucially, by facilitating cross-domain discovery the group felt it possible to break down the institutional, disciplinary, geographical, and other barriers which may impede access to and use of information.

2.2.3 A search model and its implication for metadata

With regard to cross-domain discovery itself, a reiterative or staged approach was identified. At the first stage a user requires rudimentary information about relevant information resources in order simply to be made aware of their existence. To sustain this stage, generic metadata such as the Dublin Core is required. At a second stage, the user having found a potentially interesting information resource, might need richer metadata to determine whether to acquire, browse, or analyse it. At a third stage, the same user having acquired to a resource might need still further descriptive information in order to use the resource effectively. At both of these later stages, metadata in more specialist formats (e.g. MARC records for books, ISAD(G) records for archival material, TEI headers for electronic texts) would likely be required. It was here that the group articulated a search model which enabled the user, in a single search environment, to 'drill down' or move progressively through a hierarchy of increasingly rich and specialist metadata as they moved through a continuum from resource discovery to resource evaluation, access, and use.

2.3 Areas for further investigation

2.3.1 Metadata for resource discovery

Domain-specific approaches to resource description, though fundamentally different, need not impede cross-domain resource discovery, particularly if discovery is seen as part of the continuum described above. Some consensus is required across domains, however, about the minimum level of metadata that needs to be associated with an information resource if it is to be located meaningfully by those who might wish to gain access. As a first step towards that consensus, domain specialists should, in light of the documentation standards and best practice that are current within their domains, formally identify their resource discovery requirements and express them using a common formalism for comparative purposes. Within this process the Dublin Core could provide the common formalism but also a starting point for discussions about domain-specific resource discovery requirements. This recommendation was particularly formative for the six more specialist workshops convened by the AHDS and UKOLN.

2.3.2 Collections description

The issue of collections description first arose at the third MODELS workshop (Dempsey and Russell 1996) and focused on mechanisms for providing users but also resource discovery tools some forward knowledge about the contents of a particular collection catalogue. In the context of cross-domain resource discovery, collections description was seen as a higher level of resource discovery metadata; that is as metadata to help users select from a range of online catalogues, finding aids, Web-based gateways, etc. those worth including in a particular search. Discussion touched on the possibilities for using Centroids (Knight and Hamilton 1997). More importantly it called for further investigation into communities' collections description practices and into the collection description requirements of users involved in real cross-domain discovery. Whereas the former investigation requires traditional research (an eLib supporting study of collections description is currently being co-ordinated by UKOLN and is due for completion in autumn 1997), the latter requires the development of cross-domain discovery services such as the one being built by the AHDS and reported in Chapter 4 of this volume.

2.3.3 The Z39.50 protocol for search and retrieve

The Z39.50 protocol seemed potentially capable of permitting users to implement the reiterative cross-domain search and retrieve model outlined above. Although a relatively unexplored area, some investigative work was at the time of the MODELS workshop either underway (for example, by a group of UK-based archivists) or intended (for example by the AHDS). The Z39.50 Digital Collections Profile appeares particularly promising (Library of Congress, 1996). It provides a generic or over-arching framework capable of accommodating more specialist profiles each of which is designed to navigate heterogeneous collections databases developed by particular communities of, for example, museums, libraries, curators of geospatial information.

2.3.4 Controlled vocabularies in a cross-domain environment

Data description standards ensure that information resources are described with reference to a common range of attributes. Controlled vocabularies ensure a degree of consistency in the use of attribute values, and thus a degree of consistency in search and retrieval. In a cross-domain discovery scenario, the user is likely to encounter different, possibly competing controlled vocabularies which reflect the underlying domains' different resource discovery and description requirements. Resolving these conflicts is essential if users are to be assisted in meaningfully searching a wide range of information resources. Mapping between controlled vocabularies may be one option although here too, it was felt that experience of user behaviour in testbed cross-domain discovery environments was an essential first step in addressing this issue.

2.4 Conclusion

The MODELS 4 workshop brought together a wide range of representatives from relevant communities, and initiated discussion on the hitherto little explored area of cross-domain resource discovery. It confirmed the desirability of this approach to resource discovery and debated possible strategies and implementations. It was particularly successful in identifying the metadata requirements and how these might build upon work already conducted under the Dublin Core initiative. As such it laid an important foundation for the more specialist AHDS/ UKOLN workshops which followed.

3. Archaeology Data Service: evaluation of resource discovery metadata for archaeological resources

Paul Miller, Archaeology Data Service

3.1 Introduction

As with the other workshops in this series, that held for archaeology was interested primarily in those elements of metadata essential to facilitating effective discovery of resources (Miller and Wise 1997). Undeniably important questions about metadata for other purposes such as management or re-use of information were declared outside the scope of the meeting, and are being explored through other fora with which the Archaeology Data Service is involved.

Those participating in the workshop and its follow-up consultation process represented the major national heritage agencies, local government, the museum and research communities, and others, ensuring an exhaustive consideration of issues from a number of archaeological interests with very different priorities and experiences. Despite not necessarily being versed in the language of metadata per se, many participants found that their everyday interaction with diverse resources ensured that most had at least an unconscious grasp of the key issues.

A number of problematic issues were highlighted for further exploration, but the workshop found the proposed Dublin Core Metadata Element Set to be essentially fit for the purpose of facilitating the online discovery of archaeological resources.

3.2 Significant problems and potential solutions

3.2.1 Notions of 'the resource'

As also highlighted by the Visual Arts Data Service, this workshop encountered some confusion in defining what metadata should relate to. Debate focused upon the question of whether resource discovery metadata such as that proposed should describe the archaeological resource itself (an archaeological site, perhaps) or the manifestation of that resource as data (a digital excavation archive). It was felt that metadata for the latter was more appropriate, but the division between metadata for archaeology and metadata for data continues to blur occasionally, leading to potential confusion and misuse.

3.2.2 Collection versus item level description

Most archaeological resources are grouped together to form collections of related information, such as an excavation archive which comprises plans, photographs, databases, artefacts, artefact reports, environmental evidence, etc. These collections may themselves be grouped into larger collections formed on regional, national, or content criteria.

In terms of creating metadata, it is undoubtedly easiest to do this at the level of the most encompassing collection, with a single record created for each of the National Monuments Records, for example. In facilitating access, of course, it is more useful for comprehensive metadata to be searchable at the lowest possible level with, ultimately, complete metadata records for each individual object or computer file. The most practical and effective implementation obviously lies somewhere between these extremes, and is determined by the nature of the resources themselves and the ease of user access to those resources shown to be worthy of further exploration by a metadata-based search.

The manner in which collections are defined, the ways in which they interrelate, and the level to which Dublin Core-style metadata should be provided were all identified as important areas for further exploration such as that now underway.

3.2.3 Notions of authorship

As with other workshops in this series, Archaeology found the Dublin Core's notions of primary intellectual responsibility as expressed in the DC.creator element difficult to align with the realities of digital surrogates of physical archaeological resources.

3.2.4 Overloading the Core

In exploring effective use of the Dublin Core, the Archaeology workshop encountered two important and related problems associated with overloading the Dublin Core model.

The first of these was element overload, where a very few elements (principally DC.subject and DC.coverage) were employed as 'data buckets' into which large quantities of information might be placed, normally within a complex system of Dublin Core TYPE sub-qualifiers.

The second was overload of the entire Dublin Core itself, where a structure designed exclusively for resource discovery was employed to provide more detailed metadata better disseminated by means of more specialised structures. A large number of standards exist for describing the detail of archaeological resources (Miller and Wise 1997) and the Warwick Framework model (Lagoze et al 1996) should be explored in search of a means by which this detail might be linked to the more general Dublin Core record.

3.3 Recommendations regarding the Dublin Core

3.3.1 Registering SCHEMEs and TYPEs

It was felt that implementations of Dublin Core such as that proposed by the AHDS required a central registry for recommended Dublin Core SCHEMEs and TYPEs, and that such registries should be kept closely linked to less formal developments in the wider Dublin Core community.

3.3.2 Disciplinary definitions for Dublin Core elements

The current definitions for Dublin Core elements (Weibel, this volume) remain largely rooted in a documentary paradigm, although the meaning behind the definitions is more widely applicable. It is suggested that definitions suitable for individual subject communities are developed, with careful validation to ensure that the meaning remains the same, regardless of the words used.

3.3.3 Greater attention to DC. coverage

The Dublin Core's coverage element is essential to the archaeological community, both spatially and temporally. This element requires greater attention and a coherent development programme in order to ensure fitness for purpose. This element, perhaps more than any other, is in danger of overload and its role needs careful assessment alongside more detailed schema such as those under development within CEN TC287 (CEN 1997) and ISO TC211 (ISO 1997).

4 History Data Service: evaluation of resource discovery metadata for historical data sets

Cressida Chappell, History Data Service

4.1 Introduction

This workshop assessed historians' resource discovery requirements with regard to digital resources (Chappell and Anderson 1997). It focused in particular on historical databases and implicitly on social science databases with which historical databases share so much structurally and semantically in common. The two documentation standards which are most widely used to record and catalogue information about such data sets were reviewed - ISAD(G), or the General International Standard Archival Description (ICA 1997) developed by an ad hoc commission of the International Council for Archives during the early 1990s for archival materials in general rather than necessarily for digital ones; and the Standard Study Description developed by a consortium of Social Science Data Archives in the 1970s specifically for machine-readable files (DDI 1997). The group benefited substantially from members' experience of extant online catalogues, notably that of the Data Archive's catalogue, BIRON (Data Archive 1997a), and the Council of European Social Science Data Archives' Integrated Data Catalogue which acts as a gateway to permit searching in parallel across the holdings of eleven data archives (Data Archive 1997b).

4.2 Significant problems and potential solutions

4.2.1 Defining resource discovery

The workshop accepted in principle that resource discovery could be based on a minimum set of descriptive elements provided that fuller information is supplied, possibly in Warwick framework-style packages. As in other workshops, however, there was some difficulty in identifying the boundary between resource discovery and the fuller assessment of a resource which might be required before deciding actually to use it, and thus, precisely how much metadata was necessary for resource discovery.

After discussion, members agreed that six categories of information were absolutely essential: the source(s) on which a resource was based; the geographical area (e.g. the British Isles, Madison County); chronological period (e.g. 1900-1945); and subject (e.g. labour history, urbanisation) it referred to. Also, a data set's title and the person(s) or organisation(s) responsible for its creation were considered vital.

More pressing than the categories of essential metadata was the depth of information supplied in any one category for any particular data resource. In general, members preferred comprehensive information supplied to a fine level of granularity, particularly with regard to a resource's geographic and chronological coverage, and to the source(s) on which it was based. They wanted to be able to search for all data sets spanning a given year or range of years, and then retrieve information about the periodicity of the underlying data (e.g. annually collected data, a decennial census). Far from merely being able to locate data sets based upon their geographic extent, members also wished to be able to further refine such a selection based upon the level of detail offered by each data set. Thus a search for the English county of Essex might not only recover all data sets indexed by the term 'Essex', but would also recover all data sets from a higher level (e.g. England, the United Kingdom, or Europe) which are known to include county-level data for Essex. The search might also extend to data sets covering a smaller area identifiable as a subset of the county, such as parish-level records.

4.2.2 Collection versus item level description

Users' resource discovery requirements would in some cases be based on the content of dataset tables, records, or fields rather than on the information contained in a resource's metadata. Thus, an individual searching through a range of data resources based on censuses or census-like lists might wish to locate only those resources which made explicit reference to a particular named individual, thus blurring the boundary between resource discovery on the one hand, and resource browsing or analysis on the other. Though useful, cost and technical considerations are likely to mitigate against this functionality being supplied across the holdings of a relatively large data service.

4.3 Recommendations regarding the Dublin Core

Given the essential requirements for initial resource discovery documented above, the group agreed that the Dublin Core made a useful starting point, but it identified the following problem areas.

4.3.1 DC.creator and DC.contributor

The relationship between the elements DC.creator and DC.contributor was seen as hopelessly confused, and members preferred to eliminate DC.contributor altogether and use DC.creator with an appropriate list of controlled responsibility statements.

4.3.2 DC.type

The preliminary list of object types proposed by Knight and Hamilton (Knight and Hamilton 1997) was felt to be inadequate for the needs of historians and historical data. The provisional hierarchy developed since this workshop (Tennant 1997) would seem to better encapsulate the data forms encountered by historians. It is, however, likely that more history-specific information about such details as data structure will either need to be added to the current model or represented within a SCHEME specifically for historical data sets.

4.3.3 DC.coverage

The coverage element was seen as potentially problematic as a result of the manner in which historically crucial information relating to spatial location and temporal duration were both forced into it. It was felt preferable for information relating to space and time to be separated into two elements or, failing this, for a rigorous system of TYPEs to be employed within the element to clearly distinguish between different forms of the two. As well as providing locational or durational data, this element will be required to provide significant amounts of contextual information on such details as data granularity, locational precision, and temporal periodicity.

5 Oxford Text Archive: evaluation of resource discovery metadata for electronic texts and linguistic corpora

Michael Popham, Alan Morrison, Jakob Fix, Oxford Text Archive

5.1 Introduction

This workshop focused on identifying the metadata essential to finding electronic texts of interest to those working in the fields of literary and linguistic studies, and encompassed texts of every type and period (Popham et al. 1997). It worked with a broad definition of what might constitute a 'text' in order to consider various forms of text collection (e.g. collected works, anthologies), linguistic corpora, and other works (e.g. dictionaries, reference works).

Arguably, this workshop should have encountered the fewest challenges when evaluating the Dublin Core against the communities' resource discovery needs. The Dublin Core was initially envisaged as metadata for document-like objects, and there has been substantial work 'mapping' between the Dublin Core and the two text documentation standards which focused the group's attention: MARC (Library of Congress 1997b), and the Text Encoding Initiative's Header (Giordano 1996). Despite this, two significant challenges were identified which tempered the consensus that emerged regarding the Dublin Core's suitability for resource discovery.

5.2 Significant problems and potential solutions

5.2.1 Defining resource discovery

This crucial issue bears directly on how much information (metadata) is actually required for any given resource. The consensus was that the more information that could be fed back to a user in response to an enquiry, the easier it would be for that user to identify the resources likely to be of interest.

5.2.2 Variety of users' resource discovery requirements

The workshop focused initially on the needs of literary and linguistic scholars, but rejected early on the possibility of considering the disciplines in any uniform way. The problem was further compounded given that texts (whether electronic or not) are frequently of interest to scholars working across the range of humanities and other disciplines, and who therefore represent an extremely broad range of resource discovery requirements.

The group did feel that Warwick Framework-style packaging of more detailed and specialist documentation offered a reasonable mechanism for satisfying the resource discovery requirements of diverse user communities. Currently, such a model is employed by academics working with conventional library catalogues to discover paper-based texts. The catalogue provides basic search facilities for author, title, keyword, and subject. The initial inquiry can then be followed either by browsing the complete library catalogue record (if available, e.g. online), and/or by consulting a copy of the work itself. With this in mind, it was felt that the basic information necessary for the successful discovery of non-electronic resources in literary and linguistic studies would also appear to be sufficient for discovering their electronic counterparts, and that the Dublin Core made a good starting point for satisfying these basic information requirements.

5.2.3 Scope. Collection versus item level description

The problem is easily stated though not easily addressed. In an anthology of verse or the collected works of an individual playwright, should the metadata relate only to description at the collection level, or should each individual work (or even section - e.g. chapter, verse, act, scene) within a collection also have its own descriptive metadata? If the latter, then in certain circumstances (e.g. a collection of works by the same author), perhaps certain metadata could be inherited from the collection-level description by each of the works constituting the collection. Similarly, the collection-level metadata description should perhaps be sufficient to convey basic information about each of the individual works within the collection (but would this be feasible in the case of, say, an anthology of 500 poems produced by different authors?). These issues are of even greater concern when considering large-scale literary or linguistic corpora which may contain many thousands of individual texts. The concept of scope also raised a number of related issues, such as the possible requirement to identify discrete resources (e.g. a number of specific texts within a corpus, a specific act within a play), and the need to know whether or not a resource was static or dynamic (i.e. liable to change), as knowing such information might aid initial resource discovery when searching across large volumes of material.

Here the problems seemed less surmountable and it was later agreed at a meeting of workshop convenors that they were likely to be addressed by individual service or information providers who would weigh up their users' resource discovery needs against the size of their collections and the costs and redundancy entailed in item-level description.

5.3 Recommendations regarding the Dublin Core

5.3.1 Problematic elements and element usage

The elements DC.subject and DC.description presented difficulties with purely literary texts (for example, there are many potential keywords for Shakespeare's play Hamlet related to notions of love, betrayal, insanity, etc., but a text about the play might require only a handful of subject keywords), though none were envisaged for linguistic resources.

The relationship between DC.source and DC.relation was considered to be confused, and the group felt unsure about where best to express the relations familiar to those studying literary materials (e.g. an adaptation by X of Y's translation of a work by Z).

DC.type was considered useful but not essential and presented problems as the group was sceptical about the usefulness of the proposed Dublin Core object types (at the time represented by the work of Knight and Hamilton 1997). It recommended instead the use of one of the many existing controlled vocabulary lists, such as those used by conventional library cataloguing staff to describe genres of literary resources.

5.3.2 Element qualifiers

The group argued that these were necessary for the Dublin Core elements DC.title, DC.creator, DC.contributor, DC.date, and DC.identifier. With regard to DC.date it argued for a controlled list of types allowing for date of original creation of a work, the publication date of the relevant printed edition of that work, and the release date of the electronic version of the printed edition.

5.3.3 Implementation issues

The group's discussions pinpointed three key implementation issues:
the importance of controlled vocabularies, particularly for DC.creator;
mechanisms for coping with date ranges in DC.date;
the desirability of more rather than less comprehensive information in DC.source, possibly including pointers to metadata for the source(s).

6 Performing Arts Data Service: evaluation of resource discovery metadata for moving image resources

Celia Duffy, Performing Arts Data Service

6.1 Introduction

Moving image resources are, of course, of interest to a great many more discipline areas than those of the performing arts. This workshop (Duffy and Owen 1997a) , however, focused on the discipline areas of film, TV, and theatre studies and considered resource discovery issues relating chiefly to movies, TV, drama, and recordings of staged performances. Participants represented a cross-section of expertise and interest from both service providers and user groups.

There is a marked difference between the specialised, individualistic cataloguing practices at film archives (which often adopt their own in-house procedures and systems) and those of general libraries. The difficulty for general libraries is the fact that moving image resources are generally not amenable to descriptive methods designed for text-based materials. The Dublin Core, with its origins in describing document-like objects, shares these difficulties. The workshop examined its potential use for describing moving images resources, tested it against a variety of examples, and critically reviewed its application. It concluded that the Dublin Core model could be used to describe moving image resources with some provisos as noted below.

6.2 Significant problems and potential solutions

6.2.1 Dublin Core terminology

If one of the aims of the Dublin Core is for non-library-trained researchers to supply metadata records with their data, the language used and definitions given have to be meaningful to those researchers. At least for those working with moving images, this was not felt to be the case at present.

DC.coverage was sufficiently problematic that it is not recommended for use at all in conjunction with moving image resources. Similarly difficult was DC.publisher, with notions of 'publication' far less straightforward than for text-based resources. The provision of clear guidelines for both general and subject-specific users should alleviate this problem and help to ensure that the core set is used consistently. The definition of other elements has been clarified for use in an AHDS context but there are still likely to be a large number of SCHEME and TYPE qualifiers. It is questionable whether these are intuitive enough for non-specialists to use.

6.2.2 Qualifiers to Dublin Core elements

Whilst keeping in mind the fact that the Dublin Core is intended for core description and not to replace more precise and specialised cataloguing methods, the workshop concluded that the Dublin Core would only be useful for moving image resources with ample provision of qualifying statements. The number of roles (director, producer, performer, etc.) attributed to individuals under DC.creator and DC.publisher (the difficult distinction between primary and secondary contributors in DC.creator and DC.contributor having been jettisoned) will be the most difficult to restrict.

Moving image resources will often need qualifiers which make a clear distinction between original works, their various manifestations in production, and their digital surrogates with respect to DC.creator, DC.publisher, DC.date, and others. As it remains difficult to recommend use of DC.coverage in its present form, important information relating to place will have to be moved to other elements, again with appropriate qualification.

A definitive list of qualifiers for each element is still to be determined.

6.2.3 Specialist versus inter-disciplinary users

Throughout its discussions of the Dublin Core the workshop tried to maintain a balance between the needs and expectations of a non-specialist, inter-disciplinary searcher (which are difficult to predict) and those of a specialist user. Many of the problems encountered arise out of knowledge and experience of the needs of searchers within the disciplines of film, television, or theatre studies. Although it is generally agreed that interdisciplinary searching is a positive goal, the more precise needs of the interdisciplinary searcher have still to be determined. There is a case for more research in the area of interdisciplinary searchers' behaviour.

6.3 Recommendations regarding the Dublin Core

6.3.1 The problem of authorship

It is often neither possible nor desirable to assign a principal 'author' to a moving image resource; even if the convention of using the director is adhered to for movie resources, it cannot be consistently applied for television and other recorded performances. The option of listing those of 'secondary' importance within DC.contributor implies a hierarchy of artistic effort which is problematic in the extreme. The contents of DC.creator and DC.contributor should therefore be combined within DC.creator, with the roles of each named individual clearly specified.

6.3.2 Coverage

This element, as defined, cannot be used consistently for moving image resources. However, the concepts of place and duration (teased out by implication from the current definition's "spatial locations and temporal duration") are extremely important for moving image resources and can be included under other elements.

Place might be accommodated within DC.subject (tagged for provenance, country of production, etc.) and DC.publisher (tagged with place of release/broadcast/production). This leads, however, to potential overload, particularly of the subject element. Running time (duration) fits more naturally along with playback information in DC.format than within DC.coverage.

6.3.3 Element qualifiers

The only element for which some kind of SCHEME or TYPE qualifier is not potentially useful is the free-text DC.description.

7 Performing Arts Data Service: evaluation of resource discovery metadata for sound resources

Celia Duffy, Performing Arts Data Service

7.1 Introduction

This workshop (Duffy and Owen 1997b) focused on the discipline of music, considering both sound and printed music resources and their application to the Dublin Core metadata element set.

In the discipline of music, it can be difficult to divorce the needs for information retrieval in sound recordings from those of the printed versions of the same music; particularly in the field of Western art music, users are likely to be searching for both. It is a characteristic of music resources that the same work can manifest itself in many different ways, e.g. as a sound recording, a score, an arrangement, a manuscript, or a MIDI file. This multiplicity of representations of a work (a good selection of which were covered in the workshop) is not as prominent an issue in book cataloguing but is one which causes conflicts between music librarianship and traditional book-based approaches.

UKMARC is a widely-used standard in music libraries, although, as it was developed primarily for book-based media, its use for music resources can be problematic. Further discussion of MARC and the special issues faced in sound and printed music cataloguing with a detailed consideration of current standards appears inMalcolm Jones' briefing paper (1997) and the workshop report itself (Duffy and Owen 1997b).

7.2 Significant problems and potential solutions

7.2.1 Mapping existing conventions and standards to the Dublin Core

The workshop found that there was a reasonably good correspondence between descriptions necessary for sound and printed music resources and the Dublin Core, in the sense that a home could be found somewhere within the Dublin Core structure for the most important categories. There were cases where important search terms had to be shoehorned in to unexpected places (such as DC.coverage), or scattered over various Dublin Core elements in a way that separated information that would usually be linked together in one statement (for example, separating place from date of recording or date of publication from publisher). DC.coverage gave most cause for concern here. The provision of clear guidelines for both general and subject-specific users should alleviate this problem and help to ensure that the core set is used consistently.

7.2.2 Overloading the core

The workshop found that most of the Dublin Core elements needed to contain not one, but several pieces of information to make sense for music resources. Given that an expert group tends inevitably to construct a 'wish list' for particular cataloguing requirements in their discipline and some requirements noted in the workshop report are likely to be too detailed for a core level of description, nevertheless the amount of qualification still necessary for music resources may prove problematic in an interdisciplinary searching environment. Many of the proposed qualifications attempt to separate information about an original work from its recreation in the form of a recording; for example, users need to be able to distinguish a composer from a performer in the DC.creator element and the date of original composition from the date of recording in DC.date.

Many elements are in danger of being overloaded. The recommendation to combine DC.creator and DC.contributor into a single element for all individuals who have a creative input to the resource, will necessitate a large number of explanatory tags to explain roles. Other elements which are in danger of being overloaded are DC.subject (including information on genre, medium, associated names and places), DC.publisher (again, this is not so straightforward for sound resources as it may be for text-based resources and many qualifications are necessary), and DC.date (again several dates to be disentangled of recording, of original composition, of release).

A definitive list of qualifiers and a mechanism for providing more detailed and specialist documentation within the core is still to be determined.

7.3 Recommendations regarding the Dublin Core

7.3.1 The problem of authorship

As in the area of moving image resources, it is often neither possible nor desirable to assign one 'author' to a recorded sound resource; the convention of citing the composer as the main author is arguable for Western art music, but cannot be consistently applied to other types of music. The option of listing those of 'secondary' importance implies a hierarchy of artistic effort which is itself problematic. It is therefore recommended that DC.creator and DC.contributor are combined, with appropriate tagging schemes to explain roles.

7.3.2 Coverage

This element, as defined, cannot be used consistently for sound or printed music resources. However, the concepts of place and duration (teased out by implication from the current definition's "spatial locations and temporal duration") are individually extremely important and can be included under other elements. The workshop felt that place of recording (or provenance/origination) was such a significant element for sound resources that it should ideally have a field of its own. Place does not fit naturally with duration. Normal practice would be to combine place and date of recording, and duration with format.

As a result of further discussion, and in line with the moving image workshop, it is recommended that this element is not used for sound resources. Duration fits naturally with DC.format. Place of publication, or release, should go with DC.publisher. Place of origination, or recording, and any associative place name will be included in DC. subject.

7.3.3 Difficulties of definition

Usage has been clarified for DC.type, which should no longer contain genre statements and thus not overlap with DC.subject. Usage has also been clarified for DC.format (relating to playback or handling information), which should also include duration. The definition of DC.publisher needs clearly to state its inclusiveness: a publisher in the case of recorded music may be interpreted as a record company, distributor, agent, or broadcasting organisation. This definition of publisher is wider than usual.

The potential confusion between DC.source and DC.relation should be resolved by restricting the use of source to describing an (usually analogue) original from which a (usually digital) copy has been made. DC.relation can then be used for hierarchical relationships (for example, tracks on an album, individual songs relating to a song cycle) as outlined in the AHDS guidelines in Chapter 3.

7.3.4 Element qualifiers

The only element for which some kind of SCHEME or TYPE qualification is not potentially useful is the free-text DC.description.

8 Visual Arts Data Service: evaluation of resource discovery metadata for the visual arts, museums, and cultural heritage communities

Catherine Grout, Visual Arts Data Service

8.1 Introduction

This workshop aimed to examine the descriptive information needed to enable the discovery of visual arts, museums, and cultural heritage resources on the Internet, particularly in the form of digital images. It aimed to decide which of these were of 'core' significance, to indicate relevant specialist standards, terminology resources, syntaxes etc., and to consider the effectiveness of Dublin Core as a basis for resource discovery metadata in this domain. This exercise was in some respects a complex one, as a very significant corpus of information description standards already exists for use by members of the three communities represented. A review of these standards was provided in a document circulated in advance for the participants (Gill et al. 1997). The workshop was followed by an extensive process of reporting and consultation in order both to recommend solutions to the problems identified at the workshop and to subject these recommendations to a process of rigorous review by members of the relevant communities. The reports which detail this process are available on the Visual Arts Data Service site on the World Wide Web (Gill and Grout 1997).

8.2 Significant problems and potential solutions

8.2.1 Identification of the source of intellectual content

One of the most significant problems which the workshop began to address was the need to identify the source of intellectual content when creating and using resource discovery metadata. In essence the discussions at the workshop lead to the need to find an effective answer to the following question: how can a clear distinction between originals, surrogates, and online resources be made using Dublin Core?

The process of creating metadata information about digital networked resources will often involve the description of a number of different entities, since the investment of intellectual content can occur at many stages. Since information in visual arts, museums, and cultural heritage is often derived from physical, tangible original objects such as works of art, objects in a collection, or sites of historic interest, the ability to optionally specify what exactly is being described by the metadata becomes more significant than for less object-focused research areas which tend to centre around the retrieval of bibliographic resources, and where information about the physical manifestation(s) of the work is usually inconsequential compared to information about the work itself.

The principal solution proposed to this problem by the Visual Arts Data Service was the application of optional 'intellectual content source' qualifiers. These were original, surrogate, and resource, which could be further refined by the use of the optional sub-qualifiers analogue or digital. These qualifiers could be applied to any of the Dublin Core's elements. However, despite the merits of a logical system such as this, it does seem likely that it will prove to be syntactically too complex for wide-spread implementation and could also represent a barrier to cross-domain searching.

8.2.2 Granularity: items and collections

The Dublin Core originated from the library community, and was originally intended to provide a simple means of describing document-like objects which were defined by example. Over the course of the Dublin Core workshop series, however, the element set was refined and the notion of a document-like object extended to include any networked resource that appears to be identical to diverse users. This means that the Dublin Core can now be used to describe a much wider range of networked resources.

This also paves the way for the application of the Dublin Core to descriptions at varying levels of granularity; it can still be used to describe a discrete individual item such as a Web page or a digital image, but it can also now be applied to more general resources, such as a collection of Web pages forming a site, or multiple digital images arranged as a collection.

This tension between item and collection level descriptions is particularly pertinent for this domain, as both the original works and the digital resources based upon them will tend to exist both as individual items and as parts of larger collections; descriptions of both the items and the collections will inevitably be used for retrieval by users, depending upon their search goals.

The problem is further exacerbated by the fact that collections can be contained within larger collections. For example, a collection of objects donated by an individual may form part of the larger collection of a museum or gallery, but will still need to retain its unique identity and provenance.

The VADS recommended the following strategy to address this issue. This was the use of qualifiers for DC.relation as suggested by members of the wider Dublin Core community (Guenther 1997). The most useful qualifier to allow relationships between items and collections to be described was felt to be DC.relation.isMemberOf.

8.3 Recommendations regarding the Dublin Core

8.3.1 Need for user documentation and implementation guidelines

It was recommended during the course of the workshop that more information was needed to allow consistent interpretations and implementations of the Dublin Core. This was borne out by the experience of members of an editorial group elected at the workshop, who used the VADS's Edinburgh Recommendations as a basis for the construction of sample metadata on items from their collections. Although a template was supplied, implementations still differed substantially between the authors, suggesting that domain-specific as well as general guidelines will be needed in future to allow for consistent resource description and discovery.

8.3.2 Need for wider awareness of the high-granularity resource discovery needs of this domain

Essentially, the workshop highlighted an issue at the heart of resource discovery requirements for visual arts, museums, and cultural heritage material. While the Dublin Core was invented to describe document-like objects, it is anticipated that members of these communities will wish to use the Core to describe and retrieve information about more complex and multi-level entities such as an electronic exhibition catalogue which could, for example, contain descriptions of the life and work of several artists, each accompanied by several digital images. Given the diverse and complex electronic resources which exist in this domain, it is therefore particularly important when using Dublin Core to define where the basis of intellectual content lies and what exactly the metadata is setting out to describe.

Conclusion

Together these statements of requirement reflect the needs of a wide range of scholarly communities, curatorial domains, and of humanities information resources. Their expression in terms of a common formalism readily highlights both convergent and divergent requirements and the outlines of a conceptual map of metadata for cross-domain resource discovery. At a final meeting of workshop convenors and AHDS and UKOLN representatives, that map was more fully charted particularly by paying attention to apparently conflicting domain-specific requirements. Where possible such conflicts were resolved on the day and reported back to the communities represented at each of the domain-specific workshops for further review and comment. A few issues were referred to small groups of workshop convenors or referred back to Dublin Core discussion lists for input. The result - an implementation of the Dublin Core appropriate to cross-domain discovery of humanities resources - is reported in the following chapter.

Return to table of contents

Send comments or questions to info@ahds.ac.uk
Last modified: Monday, 17-Nov-97 16:52:01 GMT by D. Greenstein
URL: http://www.ahds.ac.uk/public/arlist.html

This page was originally part of the Arts and Humanities Data Service (AHDS) Website: http://ahds.ac.uk/public/metadata/disc_04.html
Rescued (courtesy of the Internet Archive) and migrated to the UKOLN Website: 08-Apr-2011; Last updated: 06-May-2011.
The content is identical, but changes have been made to the HTML in an attempt to make it validate, and some links have been updated or deactivated.

Discovering Online Resources. Reports From the Front: Domain-Specific Perspectives on Cross-Domain Discovery

Contents