[Prev Abstract][Next Abstract][Contents]

Report on Digital Libraries '94

COMPRESSING THE DIGITAL LIBRARY

Timothy C. Bell[1], Alistair Moffat[2], and Ian H. Witten[3,1]

[1] Department of Computer Science, University of Canterbury, New Zealand, tim@cosc.canterbury.ac.nz

[2] Department of Computer Science, University of Melbourne, Australia, alistair@cs.mu.oz.au

[3] Departments of Computer Science, University of Calgary, Canada, and University of Waikato, New Zealand, ian@cpsc.ucalgary.ca

Abstract

The prospect of digital libraries presents the challenge of storing vast amounts of information efficiently and in a way that facilitates rapid search and retrieval. Storage space can be reduced by appropriate compression techniques, and searching can be enabled by constructing a full-text index. But these two requirements are in conflict: the need for decompression increases access time, and the need for an index increases space requirements. This paper resolves the conflict by showing how (a) large bodies of text can be compressed and indexed into less than half the space required by the original text alone, (b) full-text queries (Boolean or ranked) can be answered in small fractions of a second, and (c) documents can be decoded at the rate of approximately one megabyte a second. Moreover, a document database can be compressed and indexed at the rate of several hundred megabytes an hour.

Keywords: Compression, indexing, full-text retrieval, inverted files, query processing.


[Prev Abstract][Next Abstract][Contents]