The library, the catalogue, the broker

brokering access to information in the hybrid library

Lorcan Dempsey, UK Office for Library and Information Networking, University of Bath

This is a preprint version of an article which appeared in:
S. Criddle, L. Dempsey, R. Heseltine (eds). Information Landscapes for a Learning Society. London: Library Association, 1999.
A slightly revised version appeared in The New Review of Information Networking, Vol. 5, 1999. p 3-26.

Please refer to the print version in any citation.

Introduction

In his chapter on libraries and librarians in A history of reading, [1] Alberto Manguel calls librarians `ordainers of the universe' an epithet used, he tells us, by the Sumerians. He discusses the efforts of Callimachus to ordain the order of books at The Library of Alexandria and notes that `With Callimachus, the library became an organized reading-space'.

These phrases are useful handles on which to hang a view of what it is libraries do. Libraries may no longer aim to collect and classify all documented knowledge, but their selection and acquisition policies have `ordained' the view of knowledge and learning their readers have had, as well as which materials have become a part of the intellectual record libraries jointly create with archives, museums and others. The library has further organized these materials in ways that are useful to their users; it is not merely an unordered aggregation. In this paper I want to explore some aspects of this organization in a new environment. What `organized reading-spaces' will libraries create in a network society?

Organization, libraries and catalogues

Organization is central to what libraries do, and is a large part of the value they add. As such it is not surprising that the `organizaton of knowledge' has been central to the disciplinary claims of librarianship and an elaborate apparatus of cataloguing, classification and bibliography has developed. Indeed, Callimachus stands near the head of this tradition. He was responsible for the production of a work in 240 B.C. which Norris describes in her History of cataloguing as `a classified catalogue, a bibliography and a biographical dictionary all in one'. [2] The subsequent history of the catalogue provides interesting examples of discussion and argument about the value, and the cost, of such organizational skills and labour. Norris traces the development from systematic-classed lists of the ancient world (`ordainers' of the universe as they sought to capture all knowledge in their classifications), through the inventories of the medieval period, to the realization that the catalogue had to be more than a mere listing, it `was a key to the library'. The construction of such a key required a specialist art and governing rules. Of central interest here is the debate surrounding the catalogue of the British Museum, and the opposition between Anthony Pannizzi who developed his seminal 91 rules for the construction of the British Museum Catalogue and various lay proponents of a simpler -- cheaper and quicker -- list-based approach.

Why is a simple list not enough? Because neither the effective disclosure of library materials nor the user's best interests are well served by such a list. Much discussion of the nature and function of the catalogue go back to the celebrated `objects' described by Cutter. [3] Simply stated, these are that the catalogue supports the finding function (by known author, title or subject), and the gathering function (by author, or subject or kind of literature), and the selecting function (as to edition or as to character). An inventory or list will not support these functions very well. Authors may have different names. Works appear in many manifestations (Hamlet is a work; there are many manifestations of this work in different editions, compilations, formats, and so on). Works may not have titles or authors, or their titles or authors may not be clear. In a world where the description and collocation of works, manifestations, authors and subjects may be quite complicated, the organizational investment represented in the catalogue, and the skills and rules which enable it, are required to support the realization of Cutter's objects. Going beyond Cutter, some argue that the role of the catalogue is to surface through such organizational devices the knowledge that is represented in the collection, by allowing the user to navigate by subjects, works and authors.

Libraries have also organized the materials themselves, and in open access libraries this typically follows a classified approach. The reader can browse in `ordained' ways. The format of the materials -- books and journals -- is well known and has co-evolved with the development of libraries and the reading patterns of their users. These materials present further organizational devices -- contents pages, indexes, bibliographies -- which provide other avenues into the literature. [4] However, alongside the book collections are others -- reports, slides, journals, special collections, archives -- which are typically not well integrated into the catalogue itself or, physically, into the classified sequence on the shelves. They may be internally organized islands, where such organization is not consistent across collections. Each island may involve a different pattern of organization and use. They may have separate catalogues or other finding tools, and may be arranged in various ways, slides one way, for example, EU documentation another.

Until recently, these collections have been physically co-located in library buildings. In the largely print-based world, readers are accustomed to the labour of interacting with the apparatus of different collection types. The special collections, or the archives, or the slide collection, or the standards collection, might be in different places, with different levels of catalogue. Readers recognise that the catalogue deals with books, and that to discover whether a particular journal article is available may involve several steps in different tools. They move between the collections themselves, finding tools, conversation with colleagues, and advice from the librarian -- that `living catalogue only waiting to be consulted'. [5] A large part of the sustained experience of any library comes from the relationship between the various finding tools, the collections, and the places they collectively occupy. [6] These exist in complex relation, and complex practices and behaviours have developed around them, often specific to particular libraries. As in any complicated system, much use is directed by tacit understanding developed through custom and experiment. In fact, we know surprisingly little about such behaviour in the round: research tends to focus on the use of the catalogue, or on browsing behaviour, or on some other individual component; similarly, the progress of automation has been piecemeal, task by task. This is one reason why, despite several years of attention, we do not have very well developed views of what digital or hybrid libraries will be like. It is also a reason for occasional misunderstanding between reader and library over a particular change: where the latter might see a specific improvement or saving, the former might see disruption in a pattern of behaviour or expectation.

As digital resources multiply, organizational and behavioural practices are being modified, typically, at present, in ad hoc ways. Such resources introduce new `islands': they are usually not part of the traditional organizational apparatus, whether realized in the catalogue or through physical arrangement. There may be an electronic reserve collection; a collection of CD-ROMs; document discovery and delivery services (FirstSearch, BIDS, etc.); access to collections of specialist data sets (through the Arts and Humanities Data Service, or MIDAS (Manchester Information Datasets and associated Services), or the Data Archive, for example). The multiplication of such islands has several drawbacks. For users, it means that the use of resources becomes complicated. There may be diversity of organization of resources (different collection types, different catalogues); diversity of user interface and interaction pattern; there may be diversity of logins and conditions of use. Such diversity is a barrier to use: it erects fences, wastes time and dampens demand. For libraries, it means that there are additional demands in terms of training, support and collection management: as diversity increases, scale economies diminish.

How do these new developments relate to the traditional organizational techniques, the catalogue and the physical arrangement of materials?

The catalogue has only ever provided access to a part of the collection. Typically, the non-book `islands' have not been represented in it. Of particular significance here is the absence of the journal literature. It may contain title information, but typically not article details. As the volume and variety of non-book material increases, the catalogue increasingly becomes one resource among many. It is less and less the complete `key to the library'. Terry Hanson has discussed the motivation for developing an `access catalogue' which describes a range of resources to users, [7] and the LASER (Library Access to Selected Electronic Resources) system at Leeds is an interesting example of this. [8] However, the user now also has access to other resource description databases: for journals, for Internet resources, for electronic texts, and so on.

Clearly, these new resources are not physically integrated as part of a physical classified sequence. The experience may be partly replicated by providing web pages which present resources organized in simple ways. Increasingly, as suggested above, there may be databases of resource descriptions which may be used to present views of what is available. In this way, the intellectual and the physical arrangements come together. But there are some interesting ways in which the arrangement of digital resources differs from the physical. I have suggested that electronic resources do not currently have the uniformity of treatment which has developed with print resources. Each may have to be `learned' or handled in special ways. Furthermore, materials in the print world arrive in discrete packages, which are managed and used individually. Because of their `physicality', the user or the library has to makes the connections or links. In the new network space, resources may communicate, connections may be made, resources may change shape or be reconfigured. So, for example, documents, bibliographic references, or scientific data may be imported into a user workspace. An intermediate system may interact with several resources on behalf of a user, as happens with the `clump' projects, where several catalogues are queried in parallel. [9] Data may be automatically collected and indexed, as happens with the web robots. Processes may be automated through communication between applications, as where data is passed between a search service, a document delivery service, and an accounting system, without the need for manual intervention.

We are only in early stages of such developments, but this variety of resource, and of organizational approach, is characteristic of what is coming to be known as the `hybrid library'. The hybrid library can be understood as an organized attempt to come to terms with the multiple islands that library services are increasingly becoming and to reduce the difference in patterns of access and management between those islands.

To finish this part of the discussion, I want to pick up two points. The first is to do with the implications of some of this change for the `organization of knowledge' tradition of libraries. The second takes forward the remark made earlier about value and cost.

I began by talking about the goals of the catalogue, as elaborated by Cutter. An apparatus of codes and practices has developed to realize these goals: so that the assignment of headings, and perhaps, further authorities work, bring together names, works and subjects; descriptions allow retrieval of manifestations; and so on. Of course, these goals may be imperfectly and variably realized in today's catalogues. Furthermore, cataloguing theory has been developed in an environment where the catalogue typically exposed the content of a particular collection under a single organization's control, where that collection typically contained books, and where the catalogue was realized by manual means. The structure of the catalogue may not be apparent in some automated catalogues, and is not very effective across collections. In fact, the information retrieval approach, and its extension in network protocols such as Z39.50, involves a `flattening' of structure: the model is one of individual, unrelated records which describe information objects. So, the catalogue may not be well equipped to satisfy these theoretical cataloguing objectives. Increasingly, readers may be allowed to search across several catalogues, individually or as a federated resource, meaning that differences in practice between libraries and catalogues may be apparent. In a new development, the catalogue may be looked at alongside resources created within different curatorial traditions (archival finding aids) or from different sectors (abstracting and indexing services, online book sellers), where the same structuring apparatus may not be used. These issues raise serious questions for libraries and their practices -- both in terms of their theoretical basis and their practical application -- but further exploration is beyond my scope here.

Whatever the current state of the catalogue, a more general form of the argument advanced above in its defence remains relevant. The hybrid library cannot be a mere collocation of services, a listing on a web page. Where is the added value the library provides? The value of the library is that it saves users' time, that it releases the value of the resources it manages, that it effectively brings together users and resources over time. Organization continues to be central, but the techniques used need to develop with the needs of the user and the characteristics of the materials. In particular it is becoming clear that it is not enough for the library to provide access to a part of its collection through a catalogue; another through a set of annotated web links or resource database; another indirectly through abstracting and indexing services; and so on; with no relation between them. It is likely that the collections and services will be brought together at some higher level. Furthermore, It will be increasingly difficult to consider these services separately from the wider fabric of resources that is available. And, it is becoming more important to support other services than just searching or discovery: network resources can communicate with each other, and this involves further thinking about how to present and support user services. I consider some of these issues further in later sections. The purpose of the comparison with the catalogue is to highlight that the `added value' may not be obvious, nor may it emerge directly from user requirements. Indeed, its value may need to be promoted and defended.

Organization, libraries and networked information

I want to consider some of the characteristics of network resources, and of the environment in which they are used. Let us take first, as an example, the resources that might be available to the reader who is interested in books and journals: catalogues; abstracting and indexing services; document delivery services; and so on. Some of the characteristics of available resources are:

They are heterogeneous, and are growing in volume and variety. Services have different access characteristics, may require individual login, use diverse data schema and exchange formats, and so on. The experience of using a subject gateway like EEVL (Edinburgh Engineering Virtual Library) for example, is different from using an abstracting service on CD-ROM, which is different from using a network abstracting service through Edina. Services require the user to have significant advance knowledge of what is available, and some persistence if they wish to use several. Services have different terms and conditions attached to their use.
They are autonomously managed; they have developed independently, responsive to different service and business goals. This means that within any information process, it may be necessary to interact with several services which do not coordinate their activities. Until recently, these services have been conceived and designed as standalone systems, rather than as parts of a fabric of information resources on a network. So, for example there are network services which accept document requests (such as that provided by the British Library), there are packages which can format requests for dispatch to such services, there are services which allow people discover the documents of use to them. These may not be linked up in such a way that an end-to-end process can be automated. Data may not cross boundaries, or may have to be re-keyed or transcribed by user or by staff.
They are individually controlled. Information providers wish to protect the value of resources they make available. There may be a need to confirm the identity of users or the integrity of resources. A framework for commerce needs to be in place. At present these are provided on a service by service basis. When these issues of security, control and commerce achieve widely deployed common solutions we can expect another surge in the availability of networked information.
They represent different aggregations of function. For example, a union catalogue allows people to discover (and locate) journal articles. An integrated service from the British Library or BIDS (Bath Information Data Service) allows a user to discover, locate, and request documents and have them delivered. An abstracting and indexing service allows users to discover the existence of documents. Some services may be offered as `one stop shops'. `One shop stop' might be a more accurate characterization. Although some organizations now offer services which include discover, locate, request and deliver facilities, they are still just components within this potentially distributed document supply service since no server will meet all coverage or quality of service criteria. This is not to say that `one stop shops' are not locally useful, but they provide inevitably partial solutions.

This lack of organized access has some implications:

Network resources do not release their full value. There is some evidence, summarized by Bell for example, [10] that variety inhibits effective use for casual users who do not have the time or inclination to become familiar with more than a few different approaches, and who do not stray from familiar paths. This variety also presents a management issue for libraries: different technical characteristics may require different approaches, training needs are duplicated, the technology is very much on the surface rather than disappearing into a coherent environment of use. Similar issues can be raised in relation to other information providers, where variety of technical or service approach may reduce demand or raise the cost of supply.
Users are not well served. Relevant resources may not be visible, or will be ignored. The learning or social opportunity is diminished.
Missed opportunity of networking. The opportunity now posed by the developing technologies is not just to automate particular tasks but to automate end-to-end processes in ways that support effective use, and deliver integrated services into working environments.

This is the situation within the area traditionally looked after by libraries. However, library users might expect organized access to a wider range of materials. Libraries now operate in a shared network space which brings together users, service providers and resources in new combinations and balances. So, there are new divisions of labour in the learning and information domains. Take the example of document supply, where publishers, libraries and aggregators are realigning the pattern of delivery; or the creation of new learning environments which may bring together learning technologists, library and information people, and so on, in the creation of a new type of service. There are new forms of user behaviour and expectation, as for example, where communication technologies are reaching into writing and learning environments. There are also cross domain convergences, where previously distinct activities may be brought into new relation. For example, we are likely to see much greater collaboration in relation to cultural heritage and local history where libraries, archives and museums recognise shared access and preservation concerns. Or again, greater link up between the library and booksellers, publishers, or other suppliers of materials, as they recognize a shared interest in providing services which meet the various needs of users. Some libraries have bookshops on their premises; why not link their online services also?

These influences again point to an environment in which heterogeneous, autonomously managed information resources continue to increase in volume and variety. We can suggest some characteristics of this wider environment and its relationship to library services:

The user workspace. I used the expression `network space' earlier, and metaphors of landscape and ecology have threaded discussion at this conference. What these phrases suggest is an emerging awareness that it is reasonable to see such spaces as living and working places. We can see a progressive shift in perception from a computer, to a workspace, to an environment. This will continue as the environment becomes richer in services we wish to see, as computational devices and the means of connecting them become more pervasive, and as technologies for security and commerce mature. Increasingly, the user of services will expect them to be available within their own communications space, to be able to reuse, combine, and link, without barriers, fences or arbitrary difference. In an interesting discussion of the future of computing, Terry Winograd talks of the historical shift from `computing machinery to interaction design' as the focus of computing. He suggests that this requires a shift from seeing the machinery to `seeing the lives of the people using it'. [11] Viable digital information spaces will depend on the ability to allow the user richer interactional abilities. This in turn depends on presenting opportunity effectively and creating the ability for components of the network space to communicate. It also suggests the need for greater personalization services.
The role of the library. Libraries need to consider their response to these changes within the network space. Unlike some other players, libraries have a dual role. They manage their own collections and make them available, but they also have a role in guiding their users to resources and services out of their direct control. In many cases this distinction becomes less clear as a model based on the purchase of rights in the use of materials, with particular terms and conditions, replaces a model based the direct purchase and circulation of materials themselves. Nor should it be forgotten that libraries are part of larger institutions, whose mission they support. Libraries co-evolve with these wider institutions which in turn are being transformed. There are important and far-reaching discussions about the management of learning and research, the provision of digital information services for the citizen, the place of cultural heritage services, and the `informationalization' of business and industrial processes. Again, this suggests greater variety of practice and partnership.
Technical change. A difficulty for those currently developing network information services is accelerated technical change, and the risk associated with investment in a volatile environment. The current web environment is document-centric, with limited support for interaction and structured applications which exploit the richness of the resources that are made available. Alongside this, structured approaches exist which have developed within particular domains (Z39.50, SGML, EDI, and so on). There is limited support for some of the technologies required to build new long-term institutions within the network space (distributed authentication, security, commerce). However, we are seeing the rapid development of a more `structured web' within which it is likely that existing structured approaches to work will be reengineered or integrated. The `web' will support richer applications, distributed objects, security and commerce services. The goal should be for an applications framework to be able to support rich enough services that the technology disappears into the environment. We are still some way from that.

I suggested above that the goal of the hybrid library was somehow to overcome the fragmentation of the library service into multiple islands. We can now recognize that there are also multiple other islands in the network space the user occupies, and that the library needs to work to provide effective access to some of these also. This is in the midst of rapid organizational, service and technical change which makes taking decisions difficult. As other communities are facing some of these issues, and as libraries may have to try to provide access to resources outside of their control, common approaches at various levels would be useful. However, at the moment, as discussed by Ray Lester elsewhere in this volume, there is limited opportunity to seek consensus across these communities.

How might we move forward? What does integration mean? Some examples of what it might cover are:

Information landscape. For the moment, I introduce this term to suggest the shift in emphasis which sees information resources organized around user interests rather than around particular technical or media characteristics. What this might mean in practice is discussed further below.
Collection description. Libraries do not typically describe resources at the collection level. I have loosely spoken about collections already. I mean such things as physical collections of documents, slides, tapes, and so on. However, they might also include such things as databases (the catalogue for example), electronic journals, web sites. The notion is vague until decisions are made in some service context. Describing which collections are available and under what conditions would provide some higher level integration which is currently lacking.
Authentication. The library as a place has certain controls built in: membership cards, single points of entry and exit, supervision. Open distributed control in the digital information space is still a research and development challenge. Authentication and authorization information needs to be exchanged at various stages. Support for commercial activity will have to be provided. The integrity and privacy of exchanged information may have to be assured. The usefulness of an information landscape will be severely mitigated without distributed authentication services which mean that the client need only `prove' themselves once. Multiple challenges (passwords) erect fences in the landscape which inhibit use. They are potentially relevant at various points in information exchanges. This issue is not discussed further.
System interworking. We might identify two related aspects. The first could be called intra-function integration, where there is working across different systems which provide the same function. So for example, one might be able to cross-search heterogeneous resources, or have a message dispatched to heterogeneous request services. The oft-mentioned desire to be able to look at several services through a similar interface is an example of this. The second could be called inter-function integration, where there is working between systems which provide different functions. Typically, this might involve handing data between functions, as where, for example, an identifier or citation is passed from a search function to a locate function. (A locate function might resolve the identifier to discover locations, or match a citation against a holdings file, or use some other technique.) Inter-function communication is required if end-to-end processes are to be automated, reducing expensive manual effort.
Semantic interworking. Different islands may employ different subject vocabularies and schema, different data schema, different conventions with regard to identification, and so on. A variety of mappings may be required, and it is likely that some issues here will remain intractable.

In recent years, `broker' services have emerged which aim to provide some or all of these integration services. In the next section I explore broker systems, with particular reference to the library issues. Typically a broker will provide support for projecting a unified service across some part of the independently produced, distributed resources we have been discussing.

Organization, libraries and brokers

Building blocks

In his discussion of networked heritage information, Mike Stapleton provides a useful list of building blocks. I repeat this here with my own gloss. Mike's text is in quotation marks.

Databases. `For creating and managing online catalogues and content repositories.' We could add other services here, a document requesting service for example.
Gateways. `Provide interfaces to the repositories and deliver content in different forms.' The use of `gateway' here shows how unsettled our vocabulary is. In this use, it refers to how the resource is made network accessible. In this discussion I have used `interface' or `service' rather than `gateway' for sense proposed here. So, a document requesting service may have an interface that implements the Inter Library Loan protocol. A catalogue may be made available through Z39.50, telnet and web interfaces, and so on. Historically, library resources have been made network aware in one of the following ways:
Terminal access to remote services. Users login to a monolithic application with its own user interface and request commands. Data may be downloaded using special, separate procedures or by some form of screen capture.
World WideWeb access. The service provider presents its own user interface and request commands, this time through a web form. The gain is that there is some consistency of presentation in a consistently navigable environment. Although the web is becoming a richer environment, dominant current use is for transfer of static information in the form of web pages, or for the unstructured output from databases. As with terminal access, the client just responds to user interface directives such as `display this text in bold italics' etc. These services are largely oriented around providing services to human users, who then have to process the results.
Client/Server access. This architecture is significantly different in that the client has some understanding of the data it is handling. Proprietary approaches may be in use within certain systems. The use of open standards such as Z39.50 is growing but not widespread. When the client understands the data it is handling, it can be reused in various ways, the results can be processed. The client is responsible for the re-presentation of that data to the user, and is capable of shielding the user from differences between servers. It can be reformatted for display alongside records from other resources, can be passed to other applications, and so on.
For completeness, I note here significant developments in the web community which will provide support for the exchange of structured data, for distributed objects, and for a range of other services which will significantly enhance the development of web-based information processing applications. Resources may be dynamic and distributed.
Brokers. `Provide a common point of access to a range of repositories and data sources.' These will be discussed in greater detail below.
Delivery platforms. `Provide users with an interface appropriate for the task at hand.' The current dominant delivery platform is the web browser.
Protocols. `Connect it all together on the networks.'

Brokers

Some examples of brokers from the library and related domains are listed in this section. In each case, an application, or `external mediator', facilitates access to diverse resources and supports data flow. The broker will be designed to support a particular business need, which determines what types of integration it supports. The term `broker' may be familiar from a distributed object environment, but I intend no specialised meaning here. A broker may be a set of annotated web links; it may deploy a more sophisticated apparatus which supports a richer business model or quality of service. The examples given here do rather more than provide a set of web links or resource database, even if they are still rather early examples.

LIDDAS (Local Interlending and Document Delivery Administration System) is an Australian project looking at managing interlending and document delivery operations. It will broker access to databases for search and location of desired items, services for request and delivery of items, directory services for environment and business information, and authentication services, passing data between applications as appropriate. The model is very much one of `single service - multiple systems'. [12]
The eLib `hybrid library' and `clump' projects are looking at broker-type services in their particular application areas. Clumps provide virtual union catalogue services across different underlying OPACs. [13] The hybrid library projects are variously conceived, looking at providing integration services across particular sets of resources.
Aquarelle is an EU project which provides a `mediator' or broker service for museum repositories. It searches across heterogeneous cultural resources providing some central services to support that operation. Mike Stapleton describes Aquarelle in another paper in this volume.
The Arts and Humanities Data Service (AHDS) Gateway projects a unified picture of the AHDS based on a federation of five underlying, autonomously managed service providers. The Gateway provides a service which hides the different access mechanisms of the heterogeneous systems in use at service provider sites, and provides authenticated document requesting services. Daniel Greenstein describes the AHDS gateway in another paper in this volume.
The ROADS cross-searching service. ROADS (Resource Organisation And Discovery in Subject-based services) is an eLib funded project which is providing a set of tools for the UK subject gateways, databases of descriptions of Internet resources. The cross-searching service provides a query routing and referral service between the autonomously managed subject gateways. [14] In a related initiative, the CHIC (Cooperative Heirarchical Indexing Coordination) project has developed a service which searches across Z39.50, WHOIS++, and Harvest resources. [15]

The focus of some of these developments is discovery: to varying degrees, they hide differences and collate the results from several different underlying discovery systems. Some go beyond this to address several functional areas and allow data to flow between them. For example, the hybrid library project Agora aims to pass data about selected articles from a discovery to a `locate' function where it may be matched against some holdings data; data may then be passed to a request function, where it forms the basis of a request message. It hides the difference between different discovery systems and different request systems.

These examples prompt some comments, before discussing brokers in more general terms:

Human and machine access. Services will be provided which may be accessible to human users through the web, or to intermediate or broker systems through some machine protocol. In the latter case, machine-readable structured data will be returned for reuse in some context; in the former human-readable results will be returned for reading. For example, an OPAC may have a web interface for human access, and a Z39.50 interface for `clump' access or hybrid library access. Human access is the dominant model.
Lack of standard interfaces. What characterizes these initiatives is that they support standard machine interfaces which allows them to interact with arbitrary services which support the same interfaces. Of course not all services or databases of interest will be available through such standard interfaces. In some cases it may be necessary to write applications which will talk to particular service interfaces, and there is a trend for services to make APIs (application programming interfaces) public to facilitate this. In many cases only human interfaces may be available, so the broker will not be able to retrieve structured data, or may have to try and capture and analyse screen-based data.
While these examples are largely `bespoke' applications with standard interfaces to the outside world, we may gradually see more fully distributed applications emerge based on distributed components specialised by function which communicate within one of the distributed object approaches.

MODELS and brokers

This discussion draws on work still developing with our MODELS (Moving to Distributed Environments for Library Services) project which is describing the MODELS Information Architecture (MIA). [16] MODELS started from the view that the development of a high-level architecture was a useful support to discussions of interworking in the library area. It has moved forward through a series of invitational workshops, desk research, and it has influenced the development of several systems and services in the UK and beyond. The MIA provides a common frame of reference and vocabulary which has achieved some currency. The current state of the MIA is described elsewhere and I do not propose to discuss it in detail here. [17] The initial work described a high level broad framework: (see Figure 1.1) this now being refined with a view to implementation, and will be further reported elsewhere.

The broker provides infrastructure for managed access to physically distributed resources. MODELS has generalized the services provided by such `brokers' in the following way:

User access -- managing interaction with the user. `Information landscape' is a term that was introduced within the MODELS context to describe the presentation of information environments to users. It might be seen as a potential beneficiary of `interaction design'. The landscape will project the underlying business processes in a user-oriented way. The `information landscape' will describe the resources available, provide navigation and selection support, and may be configured to reflect particular styles, policies and collections. A minimal landscape may have links to resources; richer landscapes will gradually emerge which are constructed on the basis of forward knowledge of available services, user profiles, and other data. For example, there has been some discussion of matching user profiles (which record preferences and privileges) against collection descriptions to personalize the landscape. In a hybrid environment, it will also be important to provide routes in to the print and the digital environments here. The landscape also merges what was previously intellectual and physical access to collections. The California Digital Library provides an interesting example of a landscape (though it does not use the term). [18] Collections (databases, electronic journals, finding aids) and services are described, can be examined in search and classified browse mode. Individuals and institutions can customize their views. Raising `landscape' in these terms focuses some questions on how it might be provided: it is an area of active research and development, rather than an answer. The landscape is dependent on the other services the broker can support. We have discussed landscape presentation, user profiling, client access (in theory, a broker might be accessible to client as well as to a human user) as part of a user access layer.
Applications framework -- programs and data to support business logic. There needs to be a framework which orchestrates the service components the broker provides in support of business functions. Operations we might envisage here are: bringing together the user requests with appropriate services; merging and manipulating data for presentation to a user or to pass between functions; implementing the particular business processes needed (e.g. distributed document delivery, clump, cross-domain resource discovery, etc.). To do this it requires knowledge about data schemas and formats, and services and collections to which it brokers access. This might be locally configured or provided to it through directory or registry services. The richness of the services here will influence the quality of service provided by the broker.
Distributed service interfaces -- managing interaction with services and resources. Various resources and services require different modes of interaction. Ideally, the broker will abstract the mechanics of these interactions. A reader may request an item to be delivered; the system will decide whether this needs an HTTP `get', an ILL request, or a note saying to go to the reference collection. A reader may wish to locate some items; the system will present some options for searching, will open up a Z39.50 session, or a web browser, or whatever is required. The system will deal appropriately with delivered items. The variety of services and resources mean that a variety of modes of interaction will be possible. The broker will support interfaces appropriate to its application area. As noted above, in a real-world environment, the absence of widespread deployment of standard machine interfaces means that a broker may have customized or specialized interfaces to particular services. There are clearly advantages in minimizing the number of interfaces which have to be supported.
Control of access -- authentication, commerce. These are included here for completeness. Distributed, open solutions are needed, which the broker can call on as services.

These are logical functional groups which have worked well when measured against a range of emerging developments. The advantage of such an approach is that it separates different aspects. So if a service is offered through a different protocol, or if a new service is added, it should not be necessary to change the user access level. The appropriate transformations will be effected in the middle layer. Similarly, users may see available resources through different landscapes without having to alter the way in which those resources are organized. We begin to see how new resources might be routinely `shelved' by being added to the lower layer. We can also see how the flexibility introduced in the user access layer makes it plausible to consider a variety of customized approaches into the available resources.

We can also see the advantages of increasing modularization. The broker may be able to call service components from various places to provide its service. For example, terminology services (thesauri) might be used in query expansion.

This is a somewhat idealised presentation, and current brokers in the areas under discussion are rather less well developed than this sketch suggests. The purpose of MIA has been to provide some common ground for discussion of these issues. It is also clear that in implementation many difficulties will be experienced. These brokers are typically working with large, diverse collections of legacy data.

A note on metadata

In the library domain, discussion has tended to focus on so-called `item' level metadata (descriptions of individual books, articles, and so on). The new environment has new requirements. The broker needs to have access to various types of metadata to support its operation. This is data about its environment and the resources in it. It should be clear that metadata is of central importance. Agility in a distributed environment depends on the ability to identify and use components, and this increasingly relies on metadata. Metadata will be pervasive of distributed information environments. [19]Metadata will be associated with information objects, with applications, with people, with organizations. It will support operations by people and by programs, providing them with advance knowledge of the characteristics of objects of interest and supporting sensible behaviour. In relation to brokers we can note:

Collection descriptions. Typically information objects exist in collections, where a collection comprises similar information objects. These collections might be databases, web sites, document supply centres or libraries. They may be particular collections within a library, or the catalogue for such collections. Such collections are also, of course, information objects, and collections may contain other collections. Collections will also have different terms and conditions associated with their use. They are embedded in particular organizational and business practices, which may impose additional technical requirements on any networked solution, for charging for example. Typically collections will be managed by organizations. Information objects may be `data' or `metadata'. How to characterise collections is a poorly defined area at the moment, where a variety of approaches exist and for which consensual approaches are urgently required. This is especially so in the service environment we have sketched above where a range of broker services are emerging. Each of these will have to provide ways of describing the collections and services which are available to users of their systems. `Collection descriptions' provide forward knowledge about the resources that are available to a user, allowing sensible discrimination between them and their effective use. These may be intellectually created. There is also some interest in automatically generated representations. For example, centroids are inverted index style representations of database content used to support query routing. They are defined by the Common Indexing Protocol [20], and are used to support cross searching of the UK subject gateways.
Application (or service, or interface) description. Collections will be made available through some machine interface which needs to be described. Such descriptions will permit brokers and clients to connect to arbitrary resources. Several approaches exist, which cover such attributes as host name, port number, etc. Other services may also need to be described in this way.
Schema descriptions. If a system is to broker access to heterogeneous metadata collections, to render heterogeneous content in some way, or to map between interchange formats, it needs to have access to schema data which supports this activity.
User profiles. We have discussed these above.

In the current environment it is likely that brokers may be configured with this type of data. In due course it will be stored in directory and registry services which the broker queries.

There are various ways to create machine- and human-readable descriptions of collections, applications and user profiles at the moment. None commands universal assent. Approaches may be embedded in particular application and/or professional domains. A review of some current approaches to collection and service descriptions has been prepared as part of the MODELS project. [21]

Brokers, islands and hybrid libraries

I have suggested that one way of looking at the hybrid library is to see it as an organized attempt to come to terms with the multiple islands that library services are increasingly becoming and to reduce the difference in patterns of access and management between those islands. Islands may be collections which have their own catalogues, organizational patterns or access mechanisms. They may be network or CD-ROM databases or repositories, or print or other material collections. And so on. There may also be service islands with different procedures and forms of interaction (Inter Library Loan, for example).

Such islands will be material -- a part of the physical library collections -- or digital. Many, but not all, material collections will have digital catalogues. Digital resources -- image and document repositories, catalogues, and so on -- will be interfaced to the network in various ways, allowing different levels of interaction and data exchange.

Initial approaches to organizing such environments have focused on developing organized sets of links. Resource databases take this a step further, providing search and browse access to descriptions of collections and services. These might be seen as simple brokers which provide discovery services. However, they deliver us to the door of resources, rather than delivering the content of resources to us. Broker services are emerging which support a complex of services. These operate in particular domains (library, museum) and so far have limited production use. Other domains are developing similar approaches. Brokers may have different business aims. A common requirement is to allow cross searching of heterogeneous metadata resources (e.g. library catalogues). Another is to automate end-to-end processes (e.g. Agora will attempt to automate the whole chain of document discovery, locate, request, deliver). Brokers may integrate access to other service components, terminology services, authentication services, user directories, and so on. A developing pattern seems to be where an `information landscape' is presented which allows navigation, discovery and selection of collections and services, followed by resource specific interaction. Standards-based interaction between brokers and resources confers some benefits, but many resources are not made available through standard interfaces. The level of interaction possible between a broker and resources will vary. For this and other reasons the level of abstraction away from underlying resources the broker provides may not be very high in some cases. The ultimate ambition is to present a unified service over independently developed resources. Early experiences suggest that the construction of brokers is a complex undertaking, and it will be interesting to see how far they are developed. It should be noted that broker access to a resource is not incompatible with continued access direct to the resource itself. A fuller treatment of some of the problem issues that might be encountered in trying to build a system which brokers access to document discovery and delivery services is given elsewhere. [22]

Some final notes on brokers:

Services and standards. As in other digital library areas standards work lags behind service requirements. This complicates development, as implementors may be unwilling to develop proprietary approaches in the knowledge that standards may shortly emerge. This is true of general network information management issues as within the more specific library domain. The emergence of the structured web is a major factor here: `will XML change everything?'
Chicken and egg. The value of brokers increases where resources are available through standard interfaces. Without brokers there may be little value to an information provider in providing services through standard interfaces. This sets up a chicken and egg situation which is a further inhibitor to development. It is interesting to consider the widespread potential but little actual use of Z39.50 within the library management systems environment as an example here. Absence of a market means that there are few if any off the shelf products.
Business models. A further inhibitor is the absence of clear business models for funding broker activity.
Brokering access to heterogeneous legacy data is inherently complex. There is sometimes a culture clash between those developing new resource discovery systems in an Internet environment, and those looking to broker access to library, cultural and other legacy systems. There are a number of reasons for this. However, it is important to remember that it is just more difficult to broker access to data which is more richly structured, with different data schema, with different semantics for subject and name control, and with a history of development with accumulated inconsistency. The difficulties of working across different subject vocabularies, for example, are well known and unresolved.
The broker will often provide less rich services than resources themselves do.

Conclusion

I began with some discussion of the catalogue, and moved on to a general discussion of brokers. They are different types of thing, and, indeed, one of the challenges for the library is to broker access to different catalogues, or to the catalogue alongside other resources. The association is for a different reason. Libraries add value by saving users' time and effort, by ensuring they are united with the resources most useful to them, by releasing the value of resources over time. To continue to provide these services, libraries must move to a new level of activity which involves integration, not merely collocation, of services. In the current environment, how to do this is not straightforward and we are witnessing a range of interesting experiment and exploration. Users may not expect libraries to `ordain the universe', but they do look to them to help them do useful things in a complicated network space. Thinking about how to do this brings us to the centre of what libraries are about.

References

1. Manguel, A. A history of reading. London: HarperCollins, 1996.

2. Norris, D., M. A history of cataloguing and cataloguing methods 1100-1850: with an introductory survey of ancient times. London: Grafton &Co, 1939.

3. Cutter, C. Rules for a dictionary catalogue. 4th ed. Washington: Government Printing Office, 1904.

4. This complementarity is discussed in: Oddy, P. Future libraries: future catalogues. London: Library Association Publishing, 1996.

5. These words are attributed to Sir Henry Ellis, Director of the British Museum in Pannizzi's time, in Norris, op cit. 204.

6. This relationship is further briefly discussed in: Dempsey, L. Afterword: places and spaces. In: Towards the digital library: the British Library's Initiatives for Access programme. London: British Library, 1998.

7. Hanson, T. The access catalogue gateway to resources. Ariadne, 15, 1998. Available at <URL:http://www.ariadne.ac.uk/issue15/main/>.

8. <URL:http://www.leeds.ac.uk/library/laser/>

9. Dempsey, L and Russell, R. Clumps -- or organized access to the scholarly record. Program, 31(3), 1997, 239-249.

10. Bell, A. The impact of electronic information on the academic research community. The New Review of Academic Librarianship, 3, 1997, 1-24.

11. Winograd T. From computing machinery to interaction design. In: Denning, P. and Metcalfe, R. (eds), Beyond calculation: the next fifty years of computing, Spinger-Verlag, 1997, 149-162. Also available at <URL:http://hci.stanford.edu/~winograd/acm97.html>)

12. Blinco, K. LIDDAS -- an Australian document delivery project. Presentation at Information Ecologies: the impact of new information 'species'. A conference organized by the Electronic Libraries Programme and coordinated by UKOLN. 2-4 December 1998, Viking Moat House Hotel, York. Presentation is linked from conference report at: <URL: <URL: http://www.ukoln.ac.uk/services/elib/events/information-ecologies/>.

13. Dempsey, L. and Russell, R. `Clumps' -- or distributed access to scholarly material. Program, 31(3), July 1997, 239-249.

14. The ROADS cross searching service is available at
<URL: <URL:http://www.ukoln.ac.uk/metadata/roads/crossroads/>.

15. Valkenburg, P. (ed). Standards in a distributed indexing architecture, draft version 1. 24 February 1998. <URL:http://www.terena.nl/projects/chic-pilot/standards_v1.html>

16. Further information about MODELS can be found at
<URL: <URL:http://www.ukoln.ac.uk/dlis/models/>.

17. The discussion here and elsewhere throughout this paper draws on: Dempsey, L; Russell, R; Murray, R. A utopian place of criticism? Brokering access to network information. Journal of Documentation, 55(1), 1999, 33-70.

18. <URL:http://www.cdl.org>

19. Dempsey, L. and Heery, R. Metadata: a current view of practice and issues. Journal of Documentation, 54(2), 1998, 145-172.

20. Allen, J. and. Mealling, M. The architecture of the Common Indexing Protocol (CIP). Request for Comments draft version 1. 1997. Available at <URL: ftp://ftp.ietf.org/internet-drafts/draft-ietf-find-cip-arch-01.txt>

21. The MODELS collection description study is available at: <URL: <URL:http://www.ukoln.ac.uk/dlis/models/studies/>

22. Dempsey, L; Russell, R; Murray, R. A utopian place of criticism? Brokering access to network information. Journal of Documentation, 55(1), 1999, 33-70.