The interpretation of the Dublin Core discussed previously allows a degree of common description to be applied to information resources that are otherwise documented according to different (domain-specific) standards and practices. Implementation issues remain to be addressed, however, if its full benefits are to be realised and access integrated to disparate scholarly collections. The implementation described here, to be developed for the Arts and Humanities Data Service, goes some way in addressing these issues. Using the Dublin Core-styled metadata outlined in Chapter 3 it will trial the reiterative search and retrieve model for cross-domain discovery that emerged from the MODELS 4 workshop (Russell, this volume). In particular, it will enable users to:
The solution uses software based upon the Z39.50 network applications protocol (Library of Congress 1997a). That software acts as a mediating layer between on the one hand, a World Wide Web interface from which users query a range of different catalogue databases and to which merged result sets are returned to the user, and on the other, the underlying catalogue databases themselves. From the users point of view, this 'middleware' irons out any differences that may exist in the underlying databases (e.g. in their native record structure, query language, and record syntax). Too briefly, systems based upon the Z39.50 network applications protocol will typically comprise Z39.50 clients and Z39.50 targets. Clients issue user-supplied queries to targets and integrate result sets that are returned therefrom. Targets are associated with underlying databases. They receive the queries issued from the Z39.50 client, translate them into queries comprehensible to the native database, issue the queries to the native database, retrieve a result from the native database, and translate the result into a format that can be passed back to the client. Z39.50 clients can themselves behave as Z39.50 targets. The AHDS's Z39.50 client, for example, is designed to search in parallel across any one or several of the AHDS Service Providers' Z39.50-enabled catalogue databases. When queried remotely from a third-party Z39.50 client it will act as a Z39.50 target enabling the remote client to query the Service Providers' catalogue databases as a virtual uniform database.
Until now, the Z39.50-based systems have been used to integrate access to distributed online databases whose native record structures share a great deal in common. Systems have been developed, for example, for libraries, archives, museums, and social science data archives where databases conform more or less closely to a particular data description standard: MARC for libraries (Library of Congress 1997b), the Encoded Archival Description for archives (EAD 1997), the data standard recommended by the Consortium for the Computer Interchange of Museum Information for museums (CIMI 1997), and the standard study description for social science data archives (DDI 1997). In these cases, the Z39.50 systems act principally to iron out differences which stem from the underlying database hardware and software and the implications those differences have for record syntax, query language, etc. As yet, Z39.50 systems are not normally used to iron out differences which exist between catalogue databases because their records are formatted according to fundamentally different data description standards. Z39.50 systems do not as yet enable users to search across the catalogue databases that are maintained in different scholarly, curatorial, or other domains.
Representing a microcosm of this more generic cross-domain discovery problem, the AHDS is positioned to trial potentially generisable applications of Z39.50. Its collections are geographically distributed amongst five Service Providers and intrinsically interdisciplinary - Service Providers collect, manage, and re-distribute digital resources of interest to specific academic communities. The Service Providers will develop their own on-line catalogue databases which will differ by necessity. In cataloguing its electronic texts, for example, the Oxford Text Archive conforms to the Guidelines recommended by the Text Encoding Initiative (TEI 1995) for the use of SGML (Structured Generic Markup Language - a standard formalism for encoding electronic texts). For its database software, the Text Archive will use an SGML-aware search and retrieval engine which will enable users to carry out sophisticated searches of its holdings, either within individual texts, across selections of texts, or across the entire collection. The Archaeology Data Service, on the other hand, catalogues its holdings according to standards more suitable to archaeological information resources and including that recommended by the National Geospatial Data Framework (NGDF 1997). Rather than storing its records as SGML-encoded texts, the Archaeology Data Service will store them in a tabular database which can be queried with SQL (Structured Query Language). Across the AHDS as a whole, we expect each Service Provider to adopt the record structure which most adequately describes its holdings given their particular structure, provenance, and intellectual contents. We further expect the development of at least two possibly three generically different online database implementations: two SQL-based (at the Archaeology Data Service and at the Visual Arts Data Service), two SGML-based (at the Oxford Text Archive and the History Data Service), and one possibly based on object-oriented database software (at the Performing Arts Data Service).
The Service Providers' adoption of different catalogue record structures is essential if they are adequately to describe their holdings - historical databases are necessarily described differently than music recordings. The diverse range of database software and hardware platforms reflects Service Providers' responsiveness to their users' particular search and retrieval needs and to the infrastructure and expertise which exists in their host institutions. Owing to the intrinsic interdisciplinarity of humanities research, however, the AHDS also needs to allow users to search simultaneously across its distributed, interdisciplinary, and differently catalogued holdings. Hypothetically, it must enable a user interested in Shakespeare to discover an electronic copy of a Shakespearean play (at the Oxford Text Archive), a digitised film clip of Olivier's performance as Hamlet (at the Performing Arts Data Service), or a database with information about 300 years' worth of British Shakespearean performances (at the History Data Service). The interpretation of the Dublin Core outlined in Chapter 3 provides a mechanisms for expressing elements which are used commonly to describe the information resources in our distributed collections. The Z39.50-based tools described below will enable users to benefit from that commonality and to search for and locate information resources across the domains occupied by our interdisciplinary and mixed-media holdings.
The Gateway will enable users to query the AHDS's distributed holdings in an integrated fashion. A high-level schematic drawing showing a posible implementation of the Gateway and its relation to internal and external systems is provided in Figure 4.1. It shows the Gateway as a single point of entry to the online catalogue databases developed by the five AHDS Service Providers. It can also search other Z39.50-enabled databases, and is accessed by end users through a World Wide Web interface. As well as having a direct user interface through the World Wide Web, the Gateway may also provide its own Z39.50 target which would allow remote Z39.50 clients to view the databases of all the AHDS Service Providers as a virtual uniform database. 'User Profiles' and 'Database Profiles' make up the Gateway's knowledge bank about approved users, and known databases, respectively. 'Database Centroids' are experimental tools which may assist users in selecting which of the several databases known to the Gateway they should usefully include in any particular query and need not detain us here (Knight and Hamilton 1997; Panotzki 1996).
Figure 4.1 Schematic of the AHDS Gateway and its relation to internal and external systems.
Figure 4.2 AHDS Gateway architecture.
A web gateway allows users to access the Gateway from any standard
web browser.
An information landscape definition provides users with a contextualised map of the information and services available from the Gateway.
Dynamic interface definition ensures that the user interface changes to reflect the services available to users as they move through the landscape.
This is especially important where query forms and returned result sets are concerned. If the user elects to search a number of underlying databases, the elements or fields (e.g. creator, title, subject) which are presented to the user for searching and which are returned in result sets will be those that are commonly supported by the selected underlying databases. As database selections change, so may the elements or fields that are presented to the user for searching and for formatting uniform result sets. Given the AHDS Service Providers' widespread adoption of a common Dublin Core-styled element set for resource description, any search involving one or several of the AHDS Service Provider catalogue databases and any results returned from such a search, will present elements from that set. Since the Gateway will in time enable users to search catalogue databases and other online information resources that are not maintained by the AHDS and so not necessarily conversant with the AHDS's common element set, dynamic interface definition is vital to the Gateway's function.
Authentication which may be implemented if required to ensure the Gateway is accessed only by bona fide users.
The Explain proxy service may store profiles of databases that are known to and thus searchable by the Gateway. A database's profile will include information about its contents and record structure, the kinds of queries it supports, and the format in which results are returned. The Z39.50 Explain function is meant to generate such information about a database whenever it is queried by a Z39.50 client. To do this, both the Z39.50 clients and targets involved in the query need to support Z39.50 Explain. Presently, however, Z39.50 Explain is still in a developmental stage. Its capabilities are not entirely understood, nor is it implemented universally with Z39.50-aware systems. The Explain proxy service is an interim measure which will ensure that the Gateway has appropriate knowledge of the systems it is intended to interact with until such time as Explain is more fully developed and universally implemented.
Record syntax conversion. Even within the AHDS, Service Providers will use different catalogue databases and these will return records to the Gateway in multiple record formats. The record syntax conversion facility will obscure this heterogeneity from the user's point of view by converting incoming record structures into some standard internal format which will display single uniform result sets to users.
A parallel query manager propagates a user's query to multiple remote databases by spawning Z39.50 clients. Results from each client are collated and passed to the upper layers of the Gateway (and so to the user) as a single result set.
Z39.50 client(s) independently manage query sessions with a remote database.
Database profiles describe the addresses and capabilities of remote databases which can be searched by the Gateway the the Explain proxy database described above.
Syntax schema describe the record structures that may be returned by remote targets and informs record syntax conversion as described above.
Though relatively complex, the system described above is intended to realise a simple vision: to enable scholars to find information resources which are appropriate to their needs irrespective of where, by whom, or in what format they are stored. Initially, the system will provide access to a small number of scholarly humanities collections, but it will be extensible, and in time provide access to a wider range of information resources. Indeed, the system's extensibility is required if scholarly communities are to take full advantage of network technologies and the proliferation of online information resources. Integrating access to such resources will require more than metadata and middleware, however. There are other challenges, some of which are only just coming to light as we gain some limited insight into resource discovery in extensively distributed and cross-domain environments. Some of these challenges are addressed in the next chapter, which sets AHDS/UKOLN work on Metadata in a wider context.
Send comments or questions to info@ahds.ac.uk
Last modified: Monday, 17-Nov-97 16:52:01 GMT by D. Greenstein
URL: http://www.ahds.ac.uk/public/arlist.html
This page was originally part of the Arts and Humanities Data Service (AHDS) Website: http://ahds.ac.uk/public/metadata/disc_06.html |