[Prev Abstract][Next Abstract][Contents]

Report on Digital Libraries '94

INTELLIGENT DATA RETRIEVAL FROM RASTER IMAGES OF DOCUMENTS

Sargur N. Srihari, Stephen W. Lam, Jonathan J. Hull, Rohini K. Srihari, and Venugopal Govindaraju

Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, UB Commons, 520 Lee Entrance, Suite 202, Amherst, NY 14228-2567

Email: srihari@cedar.buffalo.edu, Voice: (716) 645-6164, Fax: (716) 645-6176

Abstract

Documents on conventional media, such as books, newspapers and microfiche, can be converted into the digital form of raster images by the use of scanners. A digital library is a server on a computer network that can respond to user requests by retrieving relevant data contained within raster image documents as well as in other formatted ASCII documents. The task is to automate the analysis of data contained in raster image documents for the purpose of intelligent information retrieval in a digital library.

The task is to develop computational theories and algorithms, with contributions to the fields of document understanding and intelligent interactive information retrieval . The limitations of a technology necessary to convert books to text-searchable form, viz. , optical character recognition, will be addressed. Specific research agenda items: adaptive document understanding, robust page layout analysis, table understanding, intelligent text recognition, graphics analysis, topic categorization, content-based retrieval of captioned pictures and query processing for information retrieval.

Keywords: Document image understanding, OCR, pattern recognition, artificial intelligence, page layout analysis, document scanning.


[Prev Abstract][Next Abstract][Contents]