This is a 'manuscript' version of the following paper:
Dempsey, Lorcan; Mumford, Anne; Tuck, Bill.
Standards of relevance to networked library services. In:
Libraries and IT: working papers of the Information
Technology Sub-comittee of the HEFCs' Libraries Review.
Bath: UKOLN, 1993. pp. 131-155
In any citation please refer to the printed version.
Copyright to this work is retained by the authors. Permission is granted for non-commercial reproduction of this work provided full acknowledgement of authorship and copyright is given.
0. Overview
This brief report will describe 'technologies and standards' which we consider to be important for emerging networked library services, and discuss some issues which need to be addressed.
In Section 1, we present one possible view of the virtual library, and outline the services that will be required to bring it about. Subsequent sections describe these services. These sections are:
Section 2 briefly introduces some broad policy issues; and recommendations are made in Section 8.
1. The virtual library
We feel it is useful to describe a context for further discussion and present here a hypothetical virtual library scenario. This view has oriented our selection of important technologies and standards.
A library user connects to the Information Server (we use this phrase for convenience to cover the organised presentation of services, and do not suggest how it is constructed), and is authenticated. She is a lecturer preparing a course on the image of the city in modernist literature, and is looking for information on Joyce and Dublin. She selects library services and can choose from several items, say the following :
Books
She has a number of requirements, say the following :
This scenario involves communications between systems over a wide area, but the end-user sees services as an extension to her desktop, and uses them by simple on-screen selections and manipulations. Facilities for identification of resources, searching and browsing, locating, requesting and using information resources are brought into a single context of use, which hides the underlying complexity of the interacting systems of service provision. The technical and organisational challenges of realising such a scenario on a widespread basis are considerable. Existing library services are poorly integrated with the network, and there are few real distributed applications. The network is used to connect terminals (people) to remote resources; information is downloaded and uploaded rather than being shared between applications. The above scenario (or others like it) will depend on a move to a more distributed environment, based on client-server operations and computer to computer communication between applications.
Services are required which allow the lecturer to search a CD-ROM and an OPAC without switching between terminals, user interfaces and command languages. Services will be required which build on flows of data. For example, a structured record is returned in response to our lecturer's search for a book. The system could take this record, and use it to search other systems (consortium libraries, BLDSC files) to ascertain location, and could then use the record as the basis for the ILL request message. The BL receiving system could use the record to search its system and notify the requester of availability. In this way the need for manual intervention, and multiple transcriptions and rekeyings will gradually be reduced. Services will be built on the basis of the routine exchange of page image and structured documents between libraries, between libraries and publishers, and between libraries, publishers and document suppliers.
These requirements are now being addressed; a variety of customised solutions are being put in place. But this will limit the flexibility and sharability of future solutions. An open architecture with standard communications, applications and data interchange services will facilitate the interaction between system components, and the extension and addition of services, minimising the need for extensive preliminary agreements or contrivance. It is important to ensure that current technical choices do not unnecessarily constrain future cooperative or information service activities, and that these services can be put together in a variety of ways to create a variety of future scenarios.
The following services will be required to construct the virtual library:
Communications services (Section 3)
Application services (Section 4)
Data interchange services (section 5)
Meta-information services (Section 6)
Authentication, charging and related services (Section 7)
2. Some policy issues
The scenario presented above is rather easier to describe than it will be to implement. The reasons are partly to do with the technology itself and partly to do with what might be called 'political' issues. Of these, the technology is likely to be the easier one to solve.
Central to the problem is the need for different networks to be able to communicate. Our scenario might be taken to assume that the various information services interrogated by the lecturer are on the same network. but this is unlikely to be the case. The respective networks must therefore be able to interconnect. This may be difficult if they use different protocols to communicate, different forms of email to transmit document requests or other different approaches. Nor is there any realistic prospect of this changing in the short term. For this reason, 'relays', 'gateways' and 'translators' are likely to be of considerable importance in ensuring the seamless interconnection of future information services.
At a different level are the political problems. These largely reduce to the question of "who pays?". Services on JANET are restricted to those within the academic community because that is the group for which the funding was designated. Acceptable use policies permit non-academic (e.g. British Library) and even commercial organisations (e.g. Glaxo Research) to link to the network only if there is significant benefit to the academic community. Few at present do so -- at least at the direct network level rather than via an electronic mail gateway or relay.
In the United States, the commercial Internet has shown explosive growth. Because it is essentially all part of the same network, even if funded in a different way, connectivity between commercial and academic sectors is much more complete and there are fewer technical problems to intercommunication. In the UK, JANET has no equivalent commercial counterpart, which would give commercial providers unrestricted network services as well as full connectivity to the academic community.
Enabling the unrestricted flow of information between different technology domains and between different economic or resource centres are the two main policy issues.
3. Communications services
3.1 Connectivity and protocol issues
3.1.1 Introduction
In its most basic sense the network provides the connection between the customer and the information service. What kind of network is most appropriate will depend on who the customer is and what kind of service they require -- students working from home will have different requirements from laboratory-based researchers.
Our present academic networks, including JANET, do not necessarily form the ideal model for how future networks should be organised. For one thing, access is very uneven. It is much easier to gain access if you are a laboratory-based research scientist than an arts student working from home.
Nor is the physical form of networks immutable. We have become accustomed to the idea that our institutions should provide connectivity either by departmental LANs and desktop workstations, or rooms filled with personal computers for student use. However, the rapid growth of the laptop computer threatens to make both models obsolete. What users may want in future is simply a point into which they can plug their personal laptop, whether in the library, at home, in the laboratory or in the office.
At the same time, it is not clear that information providers will necessarily want or be able to deliver services to customers through our present networks. Some providers may not want to invest in a connection which allows them to communicate with only a part of their customer base. Uncertainty about future network funding and acceptable use policies is also an inhibitor. Despite BIDS and other important initiatives, the network is still relatively 'information poor' -- a library with a good collection of CD-ROMs may well be able to offer a greater resource of information than the whole JANET network. It remains to be seen what information services develop on JANET. Various 'host' or 'brokerage' services may emerge which offer services to the academic community, other commercial providers may connect, and services will be provided from within the academic community. However, the network may become more important as a gateway rather than a provider of information services. The full resources of the global Internet are now accessible via JIPS (JANET IP Service). This more than anything has opened the network to a much wider range of information resources.
3.1.2 Protocol issues
At the lowest layers, network technology for LANs has settled to a very limited number of different options, primarily ethernet and token ring. Above this level, however, choices are more diverse though within the UK academic community the dominant ones would be TCP/IP, OSI and Novell's IPX (plus Appletalk and DEC's LAT). In principle, all three protocols may be run over the same network; in practice, however, this may prove difficult and will certainly increase the problems of network management. Individual departments will then make a choice as to which network type is most appropriate to their own needs -- science/engineering generally opting for TCP/IP and arts/laws for Novell, with a minority choosing OSI, Appletalk, or LAT. One factor influencing the choice, particularly in libraries, has been the growth of CD- ROM usage and the requirement to make network access to these available -- the majority of these are designed for PCs on Novell networks.
The present strategy for providing wide area access is to link departmental LANs into a campus hub. These in turn will connect to one of the regional X.25 switches of the JANET backbone. In this way, higher level services such as email and file transfer can be set up between remote sites.
Alternatives would be to link the LANs using the TCP/IP protocol itself. This is broadly the basis on which the Internet is organised (i.e. it does not use X.25 and is therefore non-OSI in the strict sense). Novell's own protocol (IPX) is not suitable for wide area connections, and in this case a relay facility to convert to TCP/IP or X.25 might be included in the network server.
3.1.3 Current UK developments
Installation of the academic network JANET predated publication of the OSI standards. For this reason it was forced to develop its own version of OSI -- the so-called 'coloured book' protocols. Although these are still widely used on JANET, there is virtually no use outside this community. It was always intended that these would be a transitionary device prior to full implementation of OSI. A number of factors have conspired against this, however, with the result that the transition is still far from complete.
Perhaps chief among these inhibiting factors has been the demand from users for access to (and from) TCP/IP networks such as the Internet. This led the JNT to implement JIPS (JANET IP Service). Direct demand for OSI, on the other hand, has probably been minimal.
More recently, a different group of users (frequently arts, libraries or administration based) have adopted PC LANs using Novell's Netware network operating system. This has further eroded support at the LAN level for OSI.
The result of these divergent tendencies has been to distract effort and resources away from full conversion to OSI. At present the most significant initiatives have been directed at implementing X.400 email and X.500 directory services, though even here the level of adoption has been patchy.
Away from the academic sector, the growth of commercial Internet is perhaps the most significant event in the inter-networking area. In the UK, two companies are providing commercial access (Pipex and UKnet), with links into JANET, US Internet and EBONE (the European Internet). TCP/IP is again the protocol used for commercial Internet.
In the commercial world, Novell now has something like 75% of the PC LAN market. Product availability, easy access to support services and low cost give this area considerable momentum. In the short term, competition between Microsoft and Novell is likely to drive this market.
As network technology becomes mature, it no longer makes much sense for the academic sector to attempt to lead the market in the way that it did (or tried to do) ten years ago. Networks based on current technology are now commodity items rather than high-value research tools. This is reflected in the much lower prices now operating, particularly for PC LANs.
3.1.4 Future directions
Among the principal future directions is the trend towards much faster networks, both local (100M ethernet and FDDI) and wide area (Frame Relay, SMDS and ATM). Emerging products in this area are tending to adopt the TCP/IP protocol (or its extensions) as a basic standard, though the situation is still somewhat fluid.
At the same time, the growing complexity of the current generation of networks (particularly the need to run a multiplicity of protocols and relays) is creating a significant problem in network management.
In the light of both these trends there may be a need to re-evaluate our strategies for network specification, and particularly the choice of which protocols to support. The cost of supporting multiple protocols can be considerable. There is as yet no sign of fundamental change on access policies. At a technology level the assumption is still the conventional model of LAN plus workstation. Future users, however, will want to plug in their laptops or connect in through ISDN, but facilities to enable this are as yet poorly developed. At the same time, the range of information providers with access to the network will need to increase. In both cases, the best strategy may be by provision of adequate gateways or relays, rather than by direct connection.
3.2 Gateway and relay services
3.2.1 Introduction
Interconnecting between different networks requires gateways and relays. The demand for wider access has created a market for specialist services providing interconnection facilities. In the last few years, commercial companies such as Pipex, or cooperative organisations such as RARE and COSINE have emerged to fill this need.
Even if the networks employ the same technology (X.25 or TCP/IP, say) then some mechanism is needed for them to interconnect. Higher up the protocol chain, ways must be found for file transfer across networks, or for the interchange of email. More complex still is the provision of some way by which users of one network might be informed of resources (including other users) available on another. Directory services fill this role.
3.2.2 Current UK activities
One of the earliest forms of interconnection between networks was by means of the email relay. These, for example, enabled messages to be exchanged between the three different domains of JANET, Internet and Usenet (the Unix network). Each domain employed very different protocol and organisational structures, making data transfer between them very difficult. The initial method was to dedicate a particular machine to act as a convertor. Sitting on the two networks it would translate address formats or other data and forward the message on. In the UK, such converters still form an important link in the communications chain.
More recent developments have concentrated on the use of international standards as common conversion protocols. X.400, for example, is emerging in this role. Email from proprietary systems (such as cc:Mail) or from non-standard 'open' systems (such as JANET's 'Grey Book' or Internet's 'SMTP') may be exchanged by each converting first to an X.400 format. A similar approach may be taken with file transfer, using the OSI FTAM protocol to convert between JANET 'Blue Book' and Internet 'ftp'. Initiatives to develop such gateways have been mounted by the JNT and early versions are now in operation on the network. In spite of this, it is likely that demand for direct access to 'ftp' will undermine this approach. It is generally reckoned that in the near term over 75% of all traffic on JANET will be IP-based, for which ftp is the normal file transfer protocol.
This is the fundamental predicament: whether to provide conversion facilities or whether to simply adopt the whole protocol. It is not clear how things will go in the long term, though it is likely that the need for both approaches will always remain. The case for email conversion is perhaps stronger, possibly because of its 'store-and-forward' nature, which means that the user is only aware of it as a local operation (unlike file transfer or remote access).
The interconnection of different organisational or administrative domains might also be thought of as a function of general gateway services. In the UK, for example, Pipex Ltd. provides a gateway between the commercial Internet and JANET through its facility in Cambridge. A similar service (for both Usenet and Internet) is provided by UKnet, based at the University of Kent.
Pipex, with its UK backbone network and independent links to the US, operates a fully commercial service. Its rapidly growing customer base includes many of the UK's leading technology companies, as well as organisations like the British Library who are interested in marketing networked services to such groups. There is no usage charge, but an annual fee of around [[sterling]]10K for a 64K bps link (roughly half what JANET charges for an equivalent connection).
Yet another form of gateway is currently provided by dial-up access through an intermediary organisation. Compuserve is a typical example, but there are many others, such as OCLC, which are increasingly seeing one of their roles as being a gateway to other networks and services. The Internet itself can be connected to in this way through a dial-up service offered by Demon Ltd (who, in fact, resell access to the Pipex network). Access is typically at 9.6K bps via a modem and the monthly charge is around [[sterling]]10 (which includes an Internet email connection).
3.2.3 Future directions
The increasing trend towards global connectivity can only mean that the importance of gateways and relays will grow. It is unrealistic to see the world (or even the UK academic community) ever converting to just one protocol format -- whether TCP/IP, OSI or any other.
At the same time, the need to link across administrative boundaries -- commercial, government or academic -- will ensure the continuing importance of commercial internetworking service providers. In particular, the role of commercial Internet is likely to grow. In the US, it is already the fastest growing part of the Internet and threatens to become its dominant sector within a very few years.
Enhanced dial-up access, perhaps based on ISDN, is another likely growth area that is of particular importance to the education sector. Giving full access to the Internet (and indirectly to JANET), it would be a powerful tool for independent students or small organisations. Such services are already under discussion between Pipex and Demon. JANET itself has begun to experiment with ISDN access, but no official gateways between the two networks exist at present.
3.3 Communications infrastructure
3.3.1 Introduction
Network technology is driven by the applications it may carry. With time, the level of application moves upwards. For example, at the present time email is generally considered an application. In future, however, it may simply be an infrastructure facility on which real applications might be developed (such as EDI or ILL). With emergence of such 'email-enabled' applications the level of application moves up.
A similar movement is happening at the lowest level of the network. Here it is the requirement to manage the network itself as a resource, much in the way that an earlier generation sought to manage the computer as a shared resource. The development of so-called 'network operating systems' is an important move in this direction. Current examples of this are Sun's NFS and Novell's Netware. Others will no doubt emerge, particularly within the orbit of Microsoft's Windows NT.
Much of this goes under the name of 'client-server architecture'. The basic idea being to view the network as a set of servers providing different functions (database, printers, communications gateways, etc.), together with a set of clients requiring those functions (generally identified with the workstation through which the user communicates). This model has become very influential in the development of the new network services, even though there is still considerable argument over its precise interpretation.
At the same time there has been a revolution in the way the user is expected to interact with the computer (or with the shadowy 'servers' in the background). The manipulation of data through 'windows' -- whether X, Microsoft, or Apple -- has become the de facto standard for best practice. In a sense it marks as great a shift as that from the teletype to the vdu twenty years ago.
These basic elements -- 'windows' plus 'client-server' -- provide the framework through which individual services are delivered, and the objective is 'integration' and 'transparency'. Many things can be done through a common interface while the source of the services -- the network servers -- is transparent to the user who no longer sees a difference between local and remote.
3.3.2 Current UK activities
There is considerable activity in the UK directed at the use of standard communication services (X.400, X.500, ODA, SGML) for the provision of high-level applications. Among the most important are:
One primary goal of such experiments is to integrate the many functions associated with the application through a single uniform interface, to make it apppear as if it were all happening locally. For example, in the case of a document supply service, it must seek to tie together online search, document request and request monitoring, as well as document transmission and output, all under control of the end-user from the one workstation.
3.3.3 Future directions
There is likely to be a general growth in the use of email-enabled applications. In the context of information and library services, these will include interlibrary loan and document delivery, but may also extend to remote database search and general information retrieval. Wide area applications of this kind can be built on the basis of X.400 (though Internet may well develop its own versions based on the Internet mail protocol SMTP or the new MIME protocol, which has facilities for multimedia body parts). Local area applications will come from the commercial LAN market and are likely to be built around proprietary email protocols such as cc:Mail. In all cases, however, the availability of standard APIs (Application Programming Interfaces) will make development easier and ensure a reasonable degree of portability and compatibility.
A logical extension of email-based applications is to work-group communications, or 'computer supported collaborative work' (CSCW). This is generally viewed as one of the major new directions in which networks will develop. CSCW includes areas such as collaborative authoring and multimedia conferencing. Exactly how these might tie in with information services is as yet poorly understood.
In all these future developments the increasing impact of Internet will be a major influencing factor. As with the Z39.50 protocol, it is likely that initial implementations will be available first on TCP/IP. Maintaining compatibility with an OSI environment will be difficult. One hope is that the availability of X.400 over IP (and its increasing use as a common conversion protocol) might encourage its use in the development of email enabled applications.
Another factor will be an increasing demand for access to and from non-academic sectors. Information suppliers as well as users, will wish to be able to communicate with the academic community. While much of this can be achieved by means of gateways and relays, the additional costs involved, coupled with the relatively much lower cost of access to commercial Internet could lead to a bypass of the 'official' academic network.
4. Application services
4.1 Information retrieval services
4.1.1 Information Retrieval and SR/Z39.50
Search and Retrieve (ISO 10162/3) and Z39.50 are emerging as protocols of choice for the construction of distributed information resources on server systems. (Z39.50 is a NISO standard which is a superset of SR). These resources may be online catalogues, other bibliographic services, or, in theory, a range of other resources. The protocol has facilities for managing the queries and returning results. It also includes a mechanism for switching between query languages, allowing a single user interface to access multiple servers, and, similarly, a single server to be accessed by multiple interfaces. This technology is strategically important for at least two reasons:
4.1.2 Current activities
There is a vigorous group of implementors in North America who cooperate informally in the ZIG (Z39.50 Implementors Group). Developers include the Library of Congress, the National Library of Canada, the Universities of California and Pennsylvania, Carnegie Mellon, OCLC, RLG, Mead Data Central, NOTIS, DRA and others. There is also a group of libraries which use BRS/Search which is working on implementation. In Europe the principle implementors are Pica and LASER within Project ION, and the Nordic academic union catalogue organisations (LIBRIS, BIBSYS, ALBA, LINNEA) who are participating in Nordic SR-Net to link their systems. There are other initiatives in Germany and Denmark.
The UK academic community has lagged behind the US and the Nordic countries, and has little input into SR or Z39.50 standardisation activities. Arising from a UKOLN initiative there is now a UK SR/Z39.50 Pre-implementors Group which may cooperate in the production of a standalone PC client. IME is involved in a CEC sponsored project with Danish partners to develop SR client software. Apart from LASER activities, the most significant development is a Telematique funded project to link university systems in Ireland. Requests for proposals for server systems were sent in February 1993 to vendors of the involved systems (BLCMP, DYNIX and URICA), and a standalone client system is being offered for competitive tender. The project is due for completion by the end of 1993. It is interesting because it provides a concrete incentive for significant vendors in the UK market to develop Z39.50 servers (the project will implement Z39.50-1992 over TCP/IP on the Irish academic network, HEAnet).
4.1.3 Future directions
The future of this protocol seems assured by the adherence it has from so many important players. There are three main problems. The first is the lack of experience. Although used to support the client-server operation of particular systems (OCLC's FirstSearch and DRA), there are yet no routinely interoperating production services. The second is related to this: there are very few products available which incorporate the protocol. OCLC and NOTIS market systems; the Irish project mentioned above and ION may lead to further system offerings. It is anticipated that the next generation of WAIS systems will incorporate Z39.50-1992, providing a significant boost to usage. The final problem is the most serious, and relates to protocol differences at various levels. SR will be a compatible subset of Z39.50, though interoperating systems have yet to be developed. The protocols will be implemented over different communications services. Typical implementations are Z39.50 over TCP/IP, SR over ISODE and TCP/IP, and SR over OSI. Interoperation between these three will rely on gateway or relay services. This complicates implementation choices.
4.1.4 Other approaches
There seem to be two other candidates for this type of application: X.500 Directory Services, and Remote Database Access (RDA). BLRDD is funding work at Brunel and UCL into the use of X.500 for bibliographic applications. X.500 is not now being widely investigated for such services in the library community. RDA is discussed further below.
4.2. Request services
4.2.1 Requests and the ILL standard
Items will be requested from several sources: other libraries, document suppliers, publishers, and other emerging providers. It would already be useful if there were a standard way of communicating requests, however they originate.
However, on inspection, it is clear that the request is only one part of a whole process, which will require a range of transactions. Examples are notification of ability to satisfy the request, notification of conditions under which it can be satisfied, referral of request to other suppliers with backward notification, notification of despatch, notification of receipt, cancellation of request, overdue and recall messages, status queries, and so on.
The ILL protocol (ISO 10160/1) was developed in this context. It is conceptually similar to the EDI agreements discussed below and includes provision for: definition of required data elements, definition of a set of messages and their relationships, and a syntax for structuring the messages. (This syntax is defined using ASN.1, but there is also provision for encoding using EDIFACT).
It is anticipated that the protocol will be implemented in two modes: connection-oriented (real-time interactive) and store-and-forward (electronic mail). Development has so far focused on the latter mode, as the emphasis has been on interlibrary loan. However, implementation in a real-time environment will be required if it is to support online requesting of materials with instant feedback.
4.2.2. Current UK activities
There is currently no standard way of requesting materials. The British Library operates the proprietary ARTTEL system. Many automated interlibrary loan systems interface to this, and BIDS is also developing a order system which will send requests to the BL. Automated request management is underdeveloped.
There are of course a range of 'request' services developed on the academic networks. These may be based on Listserv, file server or other technologies. There is no standard approach, and request management has not been a major issue.
The British Library and LASER have been involved in the development of the ILL protocol, and LASER has implemented it as part of Project ION.
4.2.3. Future directions
The ILL protocol has much to offer in ILL operations, especially as ILL becomes more distributed. The system to system communication of structured messages allows a greater range of ILL operations to be automated, and manual or mixed procedures for tracking, overdues, recalls and so on to be automated. There is also discussion of extending the protocol to allow the inclusion of requested documents in the 'shipped' message.
Its use in interactive services for the request of documents requires further investigation. It does seem to offer many required features, including facilities for auditing transactions.
4.3. Electronic Data Interchange
4.3.1 EDI and EDI standards
EDI is used to refer to the computer to computer exchange of processable structured business messages. Particular 'islands of EDI' activity need to agree the elements to be included in messages, a set of messages and the business rules which govern their relationships, and a syntax which defines the structure of the messages.
There are three main syntaxes of interest here, and each has associated library activity. The ISO standard is EDIFACT, and a CEC project EDILIBE I, has produced an EDIFACT recommendation for library oriented business message types; EDILIBE II is now implementing these agreements. X12 is a US standard, and booktrade formats have been developed by BISAC. X12 is widely used in North America, and in the serials industry. In the UK, the BEDIS formats have been implemented in a Tradacoms environment. It is anticipated that these approaches will converge.
A variety of transmission mechanisms is used for the delivery of EDI messages. An important future transport medium will be X.400, and a specialised X.400 service for EDI is being developed.
4.3.2 Current UK activities
The development of the BEDIS formats in the Tradacoms environment are coordinated by Book Industry Communication (BIC). These formats are beginning to be used in pilot activities involving libraries, booksellers and publishers. BLCMP, Blackwells and John Rylands University Library are collaborating in the EDILIBE II project. X12 formats also supported.
In a recent initiative, BIC assumed secretariat responsibilities for EDITEUR, a pan-European group responsible for developing EDIFACT formats under the umbrella of the EC's TEDIS programme. The EDILIBE work will be an important component of this activity.
In an interim period the BEDIS and EDILIBE formats will be in use. BLCMP and LASER are proposing services which will accept messages from libraries, and reformat them for transmission in BEDIS or EDILIBE streams as appropriate. They will provide batching, auditing and other services.
4.3.3 Future directions
It seems likely that future EDI activity will be based on the formats that emerge from EDITEUR/EDILIBE activities and that any new initiatives should be based on their work. Communications services will be provided by X.400 and EDI VAN services (e.g. First Edition EDI services). Libraries will connect to these directly, or through other organisations as suggested above.
4.4 Library housekeeping
4.4.1 Some possible requirements
Library housekeeping systems have several potential interfacing requirements:
One can imagine several approaches:
RDA standardises client-server operations for database applications, and an SQL specialisation has been defined. It is not yet widely implemented and it does not include the switching facility of SR/Z39.50. The client needs to know how data is structured on server systems. It has been developed in the context of allowing existing relational databases to interoperate. If SR/Z39.50 is to be used there will need to be some work done on 'attribute sets', in which queries are expressed.
This is an area with clear potential overlap with the MAC initiative. It appears that there is no consistent approach between MAC families in terms of data definition. It also seems to be assumed that interfaces between different campus systems are to be treated as institutional issues in the first instance. A need for interoperability and sharing of information between systems at different campuses has been recognised, but no standard approach has yet been investigated.
We do not feel we can make any recommendations in this area without a better understanding of the organisational contexts in which library housekeeping systems will communicate with each other and with other systems. Nevertheless, once such contexts are clearer, a coordinated and consistent approach is desirable.
5. Data interchange services
5.1 Introduction
We need to address the question: what do we mean by an electronic document? Is it a reflection of what we can see on the printed page or does it contain something more? Do we have access to the original information, i.e. the words and the pictures?
We can recognise at least three options for storing documents which contain more than just text, for example text, graphics, layout:
We will look at each of these in turn.
5.1.1. Page Image
This is where we store the image of the page as it would be read. There are two main ways that this is carried out in practice.
The first way is to store a raster image, often using one of the formats defined for fax machines by CCITT. The main advantage of this is the simplicity and the fact that the output can be automatically sent to a fax machine for output. Another common approach is to use a de facto standard such as TIFF. The disadvantage is that the page consists of a series of dots (and only black and white ones) at a fixed resolution. Our text and graphics information in the document has been lost. However we do have a page which can be attractively laid out and include pictures, symbols and tables. It is at a fixed, and fairly low, resolution.
The second option is to store the page using a page description language where the text and graphics are stored together with output information such as layout, font, linestyle, etc. The most popular format is the PostScript language which is output from very many packages and is included in firmware of output devices such as laser printers. This is not as inflexible as the raster storage in that the scale can be changed without loss of information. It is only a small advance from the fax image offering the advantages of potentially high resolution colour output - that is, it is close to being as good as our printed paper copy.
The advantage of the page image solution is that there is a lot of software around which can support this as an option. This cannot be ignored.
5.1.2 Layout and Content
The next possibility is that of storing the layout of a document and the contents of that document separately. This is the approach taken by the Open Document Architecture (ODA) standard. In ODA the layout of the document is stored. This might include pages, title areas, places for pictures, etc. The standard also allows the logical structure of the document to be defined. This may be chapters, paragraphs, etc. which are then linked to the layout of the document. Alongside this the standard allows various content architectures to be positioned into places on the 'page' (this page could be a piece of paper or a screen and the layout may vary depending on the output medium). ODA standardises a number of content architectures. One of these is the Computer Graphics Metafile standard. Another is a raster format based on the CCITT fax standard.
5.1.3 Structured Information
The Standard Generalized Markup Language (SGML) provides a meta- language (syntax) for writing rigorous, descriptive definitions of documents. It is independent of any system, device, language or application and allows individuals or groups of people in user communities to write their own types of documents within a standard framework. The information may include information beyond text and this may be image data stored in fax format or may be a CGM file. This standard separates document definition from subsequent access and viewing and allows information be accessed in unpredictable ways at the time of markup.
Each SGML document contains 3 parts. The first is an SGML declaration which describes the environment in which the document needs to be processed and may include information about which character sets are to be used. The second part is the document type definition (DTD) which describes the logical model for the document and defines references to entities which may be referenced, such as a fax image or CGM file. The third part is the document stream itself.
The SGML standard has an associated standard called the Document Style Semantics and Specification Language (DSSSL) which gives rules of presentation and style for the logical document components, for example headers, footers, chapter headings, etc., which are defined in the document. The document may then be output via a page description language such as PostScript.
SGML gives the most flexibility of our options. There is little restriction - too little many would argue - on the markup used. The standard is beginning to be taken on and used in publishing applications, by the US Department of Defence and by providers of some experimental online information services. It seems to be gaining momentum in the marketplace especially as the multimedia standard called HyTime, which is an SGML application, seems to be being taken up by a number of key players in the market.
5.1.4 Beyond ASCII Text
There is a temptation to think on online documents as simple text written in English (or American!) using only those characters which we can type on a QWERTY keyboard. If we are to seriously look at providing a range of online texts then we need to move further than this. There is a need to represent the different character sets used throughout the world. Some of these have a very large number of characters in the set. These can be stored on a computer using extended character sets but software and file formats need to be able to handle them. Most of the ISO standards have had to address these issues and allow the selection of extended character sets.
We also need to address the need for symbols, for example chemical and mathematical symbols. The detail of some of these means that the storage method is relevant - for example a raster image of a symbol at a medium resolution may not give sufficient detail to show the symbol accurately.
5.1.4 Viewing and Printing
The way we view documents and perhaps print them out needs to be considered. Standard interfaces for viewing (e.g. X Windows); standard output formats (e.g. PostScript) need to be established and direction given to the community. There are directions which groups such as IUSC and AGOCG have adopted which would be relevant.
5.2 Current Activities
5.2.1 SGML related
The SGML project at Exeter (contact Michael Popham or Paul Ellison) is a starting point for information on that standard and associated standards (e.g. HyTime) and activities. They have started an IUSC working party to evaluate SGML products. They are going to carry out an assessment of the current DTDs in conjunction with the Text Encoding Initiative (contact Lou Burnard at Oxford) and look at requirements. The Institute Of Physics Publishing are using SGML and need to have high resolution graphics in their online journal.
5.2.2 Page Description Languages
The Advisory Group On Computer Graphics has commissioned a report on PostScript previewers from Alan Francis who has done some consultancy work on documents, graphics and CGMs. NISS are using indexed PostScript files in their online information about subject areas (commences with training initiative related projects).
5.2.3 Graphics and on into Multimedia Contents
The Advisory Group On Computer Graphics has a lot of experience through members, contacts in the community and the Coordinator, Anne Mumford, who chairs the ISO committee of graphical file formats. The JNT are looking at multimedia contents in the SuperJANET projects (contact John Dyer).
5.2.4 Standards Activities
Name a standard and someone in the community is likely to be involved. Standards and portability issues have been a real concern for some time. The ISC currently subscribes to DISC and there are occasional meetings of people in the community who represent the ISC on various panels. An email list is held by Les Clyne of the JNT.
5.2.5 Scanned Documents
Scanning documents currently held on paper is being done in a number of experimental services. There needs to be agreement on the file format adopted. Fax formats are popular. TIFF is widely used. TIFF using fax compression is used by the GEDI specification which needs to be looked at if agreements on formats are sought. CARL, CORE and other projects use various formats, some with associated indexing. Current surveys suggest that second and subsequent requests for journal articles are not common. This, together with the fact that copyright laws prevent many articles being stored makes this a potentially uneconomic activity.
5.3 Future Directions
We will almost certainly always have information on paper. Where justified by use and legally permissible, some of this will be digitised. There is a need to have agreement on file formats and some, e.g. the GEDI specification, are emerging. This is not claimed to be economic and that needs to be addressed seriously.
The main area where there is potential for online information is where it is originally created online. Theses, current journal articles, and campus information are examples. If these are to be stored online then we need to address how they are created to make their subsequent processing and access easy/easier.
Specifying appropriate software will probably come from publishers who wish to take in complex documents using, say SGML. The publishers are likely to come up with common DTDs but progress on this is slow.
One other feature which is bound to emerge (at least based on recent history of graphical file formats) is the need for translation tools from everyone's preferred formats to the interchange format. Work at UCL on SGML and ODA conversion tools are an example. The various graphical file format translator tools can also play a part.
6. Meta-information services
New automated services will provide information about information resources.
6.1 Resource information
There are at least two levels of information required:
The former will be of assistance to users or user agents in identifying suitable resources; the latter to client systems who will be able to traverse links included in the description, where possible.
Resource information will be made available in a variety of ways:
Clearly, it would be of benefit if resources are described in standard ways, and if these descriptions can be accessed in standard ways, to facilitate their integration into automated services. This is now an area of much diffuse activity, and we feel that the nature of the problem needs to be better understood before any preferred solutions emerge. One significant area of activity which should be monitored is the CNI's TopNode project. Another is the work being carried out under the auspices of the IETF on URIs (Uniform Resource Identifiers - though their name is still to be finally decided), which aims to develop a persistent identifier for network resources (analogous to the ISBN). They would specify an access method and a location for objects, as well as information on content. This work is strongly related to initiatives discussed in 6.2.
6.2 Network discovery and access tools
These tools can be categorised as follows:
(This draws on a taxonomy proposed by Peter Deutsch, a developer of archie and whois++)
We understand another report is being prepared in this area and do not elaborate here. However, it may be useful to note some trends and suggest how libraries might become a more integral part of the "virtual spaces" these tools create. Some trends are:
Library services are poorly integrated into this world. For example Gopher provides very low level 'access' to OPACs: it has to drop into a telnet session and pass control over to the library system. It is anticipated that Z39.50 will be the underlying 'plumbing' which allows Gopher and other systems to interface with library services more satisfactorily. WAIS clients which are compatible with Z39.50-1992 will appear later this year, and the developers of Gopher have indicated interest in developing a Gopher to Z39.50 interface when more Z39.50 servers appear on the Internet. Some libraries are experimenting with WAIS and WWW
7. Authentication and related services
7.1 Authentication, charging, security and accounting services
Facilities for distributed authentication, charging, accounting and auditing are underdeveloped. A well-understood protocol framework for these operations will be critical for the development of significant information services on the networks. We are aware of some work in this area, but feel that it is sufficiently complex and wide-ranging to require a separate investigation, and recommend that the Libraries Review commission a report of existing activity and trends.
8. Conclusion and recommendations
The strategic importance of networking in support of new information services and resource sharing is recognised in the library community. However, there is no consensual view of the future of the communications-related activities of libraries, of which applications will support which services, or of which standards should be used to implement particular applications. We have tried to present a framework in which to discuss these requirements and have pointed to the standards and issues we feel are important. We make the following recommendations:
Some general recommendations
1. Much current activity is compartmentalised within particular sectors of the
library and information world. There is a lack of cross-sectoral structures to
promote awareness and coordination. Opportunities to influence national policy,
to develop consensus, to share experience and effort and to work towards
standards and technical solutions which will enhance overall service provision
are therefore reduced. The Libraries Review is well-placed to support and
promote such structures and consider what is required, and should do
so.
More specifically, initiatives in particular areas should establish
appropriate links and coordination with other interests within those areas.
(e.g. with the British Library, BIDS, LASER, the consortium of publishers
developing the SuperJANET journal testbed).
2. Support should be made available for relevant library and technical representation on standards and profiling bodies, for example the EWOS Expert Group on Libraries. Such support should be linked to a mechanism for reporting back issues and development to interested parts of the community.
Policy issues (Section 2)
3. There is a need for a policy study on interworking with commercial and other non-HE partners. The growth of the commercial Internet will create a high level of demand for communication between that and the academic networks.
4. There is a serious need for a policy study on strategies for future protocol development, particularly in relationship to new generation networks such as SuperJANET. For example, what is the position of OSI applications if SuperJANET is initially IP-based. More generally, there is a need for policies that are independent of significant shifts in the technology base.
5. The ACN should be encouraged to develop, and publicise, a clear policy framework for connection to and use of the network by commercial information providers.
Communications services (Section 3)
6. A study is needed of gateway requirements for interworking with the Internet, e.g. X.400, Z39.50, ILL, etc. How, for example, could a link be set up between Z39.50 over OSI and Z39.50 over IP? Of even greater importance is communication between X.400 over IP and X.400 over OSI.
7. Support is needed for the development of X.400 email enabled applications. These include document delivery, EDI, ILL and others.
Application services (Section 4)
8. The services offered by SR/Z39.50 are potentially of strategic importance for future library and information services. The Libraries Review should encourage its wider development. Specifically we recommend that the following be supported/funded:
9. We recommend the commisioning of a feasibility study which investigates the potential usefulness of the ILL protocol (ISO 10160/1) for distributed request and loan management in the UK in
10. Any EDI initiatives should make use of the X.400 infrastructure and draw on the work of EDITEUR and EDILIBE.
11. A coordinated approach to the sharing of data between library housekeeping systems, and between other campus systems and library systems, is desirable, and projects should take note of any relevant MAC initiative developments.
Data interchange services (Section 5)
12. A range of file formats and tight specifications of them should be adopted. 'Flavours' of file formats need to be avoided; we also need to choose industry and international solutions. These should include: raster page image; page description language; document standard; vector graphics standard; terminal protocol for viewing. Suitable choices (in that order) might be: GEDI specification; PostScript; SGML; CGM; X with Motif Graphical User Interface.
13. SGML is likely to be the way ahead for information storage. We should look to promote its use and develop (in conjunction with publishers) a set of DTDs which can be used in the community.
14. We need tools and exchange formats for bringing in information into SGML documents which is not ASCII text - graphics, symbols, mathematics, etc. These need to be investigated and agreed.
15. We need to develop specifications for a file format for scanned documents; the GEDI specification is an appropriate initial consideration.
Meta-information services (Section 6)
16. The development of appropriate resource discovery solutions should be encouraged. In particular, the Review should consider what contribution the library community should be making to these developments. Specifically, potential synergies between BUBL, UKOLN and NISS in the development of resources and experience should be explored and supported.
Authentication and other services (Section 7)
17. A protocol framework for authentication, charging and related functions is urgently required if information services are to develop on the networks.