RSLP 1/99
Collection Description
Study, Recommendation, Specification
A PROPOSAL SUBMITTED TO THE RESEARCH SUPPORT LIBRARIES PROGRAMME BY THE
UK OFFICE FOR LIBRARY AND INFORMATION NETWORKING, UNIVERSITY OF BATH.
18 JUNE 1999
Introduction
This proposal is submitted by UKOLN, University of Bath in response to RSLP 1/99. It
aims to build on UKOLN’s existing work in the area of collection description by working with
key libraries within the RSLP programme to refine, validate and promote a consistent human-
and machine-readable approach to collection description.
It does not respond directly to the themes identified, rather it proposes a development and
consensus making activity to support each theme. We hope that this approach is judged to be
appropriate. We understand that this is a managed programme and suggest that there are benefits
to be gained by some horizontal activities in areas where consistency is important. Accordingly,
we intend this submission to be indicative of a programme of work that supports a shared
approach to collection description.
The dates given in this proposal are shown in months, relative to the start of the project.
Assuming that it starts in July 1999 then we anticipate completion of the initial deliverables of the
project by the end of December 1999. These will include a collection description schema and
concrete syntax, some ‘data entry guidelines’ describing its use, a simple Web-based tool enabling
the creation of collection descriptions and a concertation day for RSLP project team members.
This is an extremely short and challenging time frame in which to work but one that we consider
to be viable, given a pragmatic approach to the issues involved.
Purpose of the Project
The description of collections will become increasingly important in the context of network
library services and an important underpinning for developing a collective resource. This view
has emerged clearly through our MODELS project, where it has influenced the course of the
clumps and hybrid libraries who are working with collection and service descriptions, and in our
current work on retrospective conversion for LIC, LINC and BLRIC. In the latter case, a strong
view is emerging that libraries need to complement item-based description with description at a
higher level. A particular feature of this discussion is that this would complement current work in
the archives community and that descriptions at this shared level of granularity would facilitate
cross-domain working (while acknowledging that ‘collections’ may mean different things in the
different library and archive content models).
The creation of collection descriptions allows the owners or curators of collections to
disclose information about their existence and availability to interested parties. Although
collection descriptions may take the form of unstructured textual documents, for example a set
of Web pages describing a collection, there are significant advantages in describing collections
using structured, open and standardised formats. Such descriptions would enable:
- users to discover and locate collections of interest
- users to perform searches across multiple collections in a controlled way
- software to perform such tasks on behalf of users, based on known user
preferences.
There are additional advantages where catalogues do not exist for collections, as a collection
description may provide some indication to the remote user of content and coverage.
So while the value of collection description is recognised, there is no standardised way of
doing it. This is a potential danger. A project by project approach to the content and structure of
such descriptions is potentially damaging to the overall service ambitions of the programme as it
adds effort for users and managers.
In fact, we believe that the costs of not adopting a consistent, machine-readable description
at an early stage may be significant. This cost will fall on users and managers of collections alike:
- For users, there will be the burden of having to individually find and navigate
particular web-sites, interpret differently formatted descriptions, and have
limited opportunity of consistent, search based approaches.
- For managers, there will be the burden of having to design their own local
approaches, and at some future date of having to redo this work to conform
with a consistent approach. There is also the cost of having to adapt collection
descriptions for particular services. For example, the eLib ‘clump’ projects are
looking at incorporating collection description services to facilitate navigation
and selection of target catalogues. A consistent approach to collection
description would mean that they can add descriptions to their services more
easily and lessen the need to having to convert textual descriptions to the
appropriate format.
Work to be Attempted
UKOLN has developed a preliminary approach to collection description [1], it has
experimented with this approach in describing the JISC Current Collections [2], and it has
prepared a report that examines collection description in library, archive and museum domains
[3]. The clumps projects are exploring how they will support collection description requirements
using the approach developed here, and we are in discussion with several other initiatives
(including some which have submitted EOIs to RSLP) about implementing and refining the
approach. The work proposed here will:
-
Refine our current approach based on a more thorough modelling of collections and
their catalogues (issues here are further discussed below). This work will be carried out
by UKOLN in association with Mike Heaney, Associate Director (Service Assessment,
Planning and Provision), University Library Services Directorate, University of Oxford.
The approach will be consistent with the emerging Dublin Core Version 2. Dublin Core
[4] is an internationally agreed approach to resource description. Within version two,
Dublin Core and current work in the rights management metadata area will be aligned,
and DC will be based on a sounder content model. We intend that by associating this
work with the Dublin Core initiative, we can improve the chances of widespread
adoption, that we can benefit from wider input, and that we have the chance to influence
standards in an increasingly important area. (Mike Heaney has done influential work on
content models for bibliographic data that is similar to DC Version 2 discussions, and is
familiar with the descriptive needs of large research collections.) This work will focus
primarily on the needs of libraries in describing their collections but will also take into
account the requirements of other sectors. We have some support from OCLC to carry
out this work, so it will not be charged to the project.
-
Specify a concrete syntax for the collection description schema. This is expected to be
based on the Resource Description Framework [5][6], in line with current work on the
Dublin Core. This work will also include the development of a simple, Web-based tool
for creating collection descriptions. While it is hoped that this will not be the only tool
available to projects, it is intended to be a useful, early, baseline mechanism for creating
their descriptions.
-
Validate the approach by working with RSLP projects and others to describe their
collections. This activity would involve liaison, requirements analysis, and consensus
making activity.
-
Develop a prototype service based on the approach taken. This would involve working
with libraries to develop descriptions of their collections and creating a searchable
resource to provide access. We are well placed to assess ways in which this resource
might interact with hybrid library and clump approaches (we work with these projects
and are a member of Agora) and with the subject gateways (together with King’s, we are
responsible for the Resource Discovery Network Centre, and we work with several
projects in the area of resource discovery). We are in discussion over similar initiatives
elsewhere. We would see this as a proof-of-concept, demonstrating the value of such an
approach.
A Discussion of Collections and Descriptions
Our work suggests that requirements for collection description fall into three broad
informational categories. Firstly, descriptive information about the collection. This may include
the subject area, ownership, strengths and weaknesses and sources of items within the collection.
We are keen to develop a fuller understanding of requirements in this area in association with the
RSLP initiative. Our early discussions suggest that there would be advantage in working towards
consensus in this area. Secondly, information about how to access the collection, including
physical access, in the case of library, museum or archival collections for example, or networked
access in the case of digital collections. Thirdly, the terms and conditions associated with access
to the collection and individual items within it.
The term ‘collection’ can be applied to any aggregation of individual items. It is typically
used to refer to collections of physical items, collections of digital surrogates of physical items,
collections of ‘born-digital’ items and catalogues of such collections. Collections are exemplified
in the following, non-exhaustive, list: library collections; museum collections; archives; library,
museum and archival catalogues; digital archives; Internet directories (e.g. Yahoo); Internet
subject gateways (e.g. SOSIG, OMNI, ADAM, EEVL, etc.); Web indexes (e.g. Alta Vista);
collections of text, images, sounds, datasets, software, other material or combinations of these
(this includes databases, CD-ROMs and collections of Web resources); collections of events (e.g.
the Follett Lecture Series); other collections of physical items.
This is a broad list, of overlapping categories. However, it suggests the need for a planned
approach, both so that techniques adopted fit in well with broader resource discovery directions
and so that techniques are flexible enough to cope with the many collection types that libraries
will develop and indicate the relationships between them. It is worth noting that the list includes
collections of physical items and collections of digital items. In some cases, the digital items are
surrogates of physical items, in others the digital items are the primary (only) manifestation of the
item. It is also worth noting that some collections are actually catalogues (metadata) for other
collections. For example, a library catalogue typically describes the items in one or more
collections within a library. Finally, it is worth noting that collections are often composed of
other collections.
Schedule of Work
This work will be carried out in four overlapping phases (corresponding to the four areas of
work outlined above) lasting a total of 15 months.
- Phase 1 - the development of a collection description schema based on a more
thorough modelling of collections and their catalogues, will be done during the first
6 months of the project. This phase will include some initial gathering of
requirements from other RSLP projects. One of the key deliverables of the project -
the collection description schema - will be delivered by the end of this phase.
- Phase 2 - the development of a concrete syntax for the collection description
schema, a Web-based tool for its creation and a set of ‘data entry’ guidelines, will be
carried out during months four to six of the project (i.e. this phase will run
concurrently with the last three months of phase 1).
- Phase 3 - the validation of the schema developed in phase 1 and other consensus
making activities, will be carried out during months four to nine of the project. It
will also overlap phase 1 by three months, allowing initial validation to feed back
into the final stages of the schema development. During month five of the project,
UKOLN will organise a collection description concertation day for RSLP project
teams and relevant library staff.
- Phase 4 - the development of a demonstrator search service, will run from month
ten until the end of the project. This phase will also include collaboration with other
RSLP projects and relevant library staff to develop descriptions of their collections.
Deliverables
The following deliverables will be made during the project:
- Project Plan
-
a document providing details of the project’s external
deliverables, delivery dates and work schedules. The project plan will include
details of the balance of effort between UKOLN and Mike Heaney during the
first phase of the project (note, however, that this effort is not being charged to
the project). This deliverable will be internal to the project and will be
completed during the first month.
- Draft Collection Description Schema
-
a document providing a draft version
of the collection description schema (for comment by various parties) and the
content model upon which it is based. This will be delivered by the end of the
fourth month.
- Collection Description Schema
-
the final version of the above document,
delivered by the end of the sixth month. The schema is seen as the key
deliverable of the project.
- Collection Description Editor
-
a simple Web-based tool. This will be based
on DC-dot [7], a Web-based Dublin Core generator and editor, and will be
delivered by the end of the sixth month. (A version of the tool will be available
for demonstration during the collection description concertation day.)
- Collection Description Concertation Day
-
a one-day workshop for RSLP
project team members and relevant library staff. This will be organised during
the fifth month of the project.
- Data Entry Guidelines
-
a document providing guidance for those using the
collection description schema. This document will be aimed at RSLP project
team members, library staff and other interested parties. A draft version of this
document will be available for the collection description concertation day, the
final version being delivered at the end of the sixth month.
- Prototype Search Service
-
a Web-based demonstrator, enabling searches to
be made across a range of collection descriptions gathered from RSLP projects.
This will be developed and run throughout the third phase of the project.
During this phase, UKOLN will also provide advice and assistance to RSLP
projects wishing to describe collections.
- Final Report
-
a document describing the findings of the project, including a
final version of the collection description schema, based on experience gained
during the demonstrator.
Dissemination
Dissemination is integral to the success of this project. Clearly, the primary focus for this
work will be within the RSLP programme. However, UKOLN will also disseminate information
about the project more widely, maximising the benefits of the deliverables to the communities
that have an interest in collection description and opening the developing collection description
schema to more widespread scrutiny. Dissemination will largely be carried out by the
Interoperability Focus [8], taking advantage of relationships with key communities including
CIMI, the cultural heritage community, the LIC, libraries and library systems suppliers. UKOLN
have close links with the Dublin Core Metadata Initiative, including staff membership of both
the DC Technical Advisory Committee and the DC Policy Advisory Committee. Project
outcomes will be disseminated during the normal course of our work but in addition, we propose
the following specific activities:
- A project Web-site and mailing list.
- Promotion of the results of the work through the UK Interoperability Focus
Advisory Committee, CIMI, Dublin Core and elsewhere.
- A concertation day during Phase 3 of the project.
- Articles in appropriate journals or newsletters such as Ariadne and D-Lib
Magazine.
References
-
Collection Description Working Group – a report on work in progress
<http://www.ukoln.ac.uk/metadata/cld/wg-report/>
-
JISC Current Content Collection – demonstration ROADS database
<http://roads.ukoln.ac.uk/jisc-ccc/cgi-bin/search.pl>
-
Collection Level Description – A Review of Existing Practice
<http://www.ukoln.ac.uk/metadata/cld/study/toc/>
-
Dublin Core Metadata Initiative
<http://purl.org/dc/>
-
Resource Description Framework (RDF) Model and Syntax Specification
<http://www.w3.org/TR/REC-rdf-syntax/>
-
Resource Description Framework (RDF) Schema Specification
<http://www.w3.org/TR/PR-rdf-schema/>
-
DC-dot
<http://www.ukoln.ac.uk/metadata/dcdot/>
-
UK Interoperability Focus
<http://www.ukoln.ac.uk/interop-focus/>
Maintained by: Andy Powell
Last modified: 3-September-1999