Creating a Search from Scan Results By Janifer Gatenby, Geac Computers Version 1 July 9th 1999 The requirement Scan returns results that consist of brief records representing rows from an ordered list. The results can be presented to an end user, enabling him or her to browse forward and optionally backwards, then select a line for further information or processing. When a line is selected, the system will format a search request in order to request an actual record. Typical examples are scans (browses) on AUTHOR, SUBJECT and TITLE. Database Models The method of constructing a search from scan results needs to accommodate various different database models. There are the following possible models: * The scan index is derived from a database that is totally separate from the bibliographic database Example: Authority database on LC and bibliographic database somewhere else. * The scan index is derived from a database that is separate from the bibliographic database but it contains pointers to the linked bibliographic records. The same data occurs in both databases. Example: an Integrated Library Management System with an authority database linking to a bibliographic database of full MARC records that contains authorised data, including authors and subjects. * The scan index is derived from a database, e.g. an authority database that is inter-linked with the bibliographic databases with the records in the bibliographic database containing links to the associated database and vice versa, with no repetition of data. Example: an Integrated Library Management System with an authority database linking to a bibliographic database. To construct a full record, it is necessary to integrate data from both databases. * The scan index is derived from the bibliographic database. Example: a title index. The way that Z39.50 scan "formally works" at the moment only really suits the first model. To do a follow on search, the origin uses same USE attributes for the search as it used for the scan and uses data from the TERM as the search data. The problem with using TERM for the follow search is that the resulting search may not be precise enough. There are a number of reasons for this. Firstly, the term may have been truncated and actually lacks significant words, important for the precision. Secondly, the target may not support position attributes such as FIRST IN FIELD or the structure attribute PHRASE and therefore the search is constructed in an imprecise way such that it can retrieve unexpected records even when a single seemingly unique line has been extracted from a SCAN. This is exacerbated when the TERM itself has multiple occurrences, e.g. for a title that has only common words such as 'Psychology". What is required is a means of using database links where they exist to assist in the precision of the follow on search. Scan elements Which data elements of the scan results can an origin use in order to construct a follow on search? The scan results contain TERM that represents the data that was matched against the scan attributes and is normally the data that is used by the target for sequencing the scan results. The scan results also include DISPLAY TERM that gives the display version of the term, e.g. data in upper and lower case, including diacritical marks and initial articles. The other element that could carry significant information is otherTermInfo. When database models 2, 3 or 4 are in use, it is desirable to send some retrieval information in the SCAN results. The Proposal The proposal is to include this retrieval information in otherTermInfo in the form of a Z39.50 url. The urls given should relate to each term occurrence. This means that the "docid" to be given in otherTermInfo relates to the term occurrence and not to database records related to the term such as bibliographic occurrences of an authority record. The identifier also needs to contain an identifier type to indicate whether it is identifying an authority or bibliographic or holding record*. (Personally, I would rather see the docid definition within the Z39.50 url to be broken into three elements, namely attribute set, attribute and identifier, rather than just being defined as attribute set Bib1, Use attribute 12. However, this requires a change to the url that has already been registered as an RFC.) Structure of the Z39.50 url indicating contents in the context of data to be returned in otherTermInfo. zscheme = "z39.50r" (always) Database = name of database to search at the server host site Docid = the bibliographic or authority or holding identifier preceded by identifier type Elementset = Blank - unspecified, therefore origin's choice Recordsyntax = Blank - unspecified, therefore origin's choice Examples: 1. A title scan performed on an index of a bibliographic database (bib.file) produces a scan entry with a term occurrence of 5. In otherTermInfo, there are 5 urls containing identifiers of 5 bibliographic records (111, 222, 333, 444 and 555), preceded by something (BREF) indicating that they are bibliographic identifiers. This could be a USE attribute. z3950r://bib.file/BREF111; Z3950r//bib.file/BREF222 etc. 2. An authority scan, e.g. author or subject performed on an index of an authority database (auth.file) produces a scan entry with a term occurrence of 1. There are actually 3 bibliographic records associated with this authority record. In otherTermInfo of the scan response, there is one url containing the identifier of the authority record (4544), preceded by something (AREF) indicating that it is an authority identifier. z3950r://auth.file/AREF4544 z3950r://bib.file/AREF4544 Where an authority file is linked to a bibliographic file as per database models 2 and 3, it is possible that to: * Scan the authority file, then search the authority file, e.g. to retrieve a MARC authority record (first url used) * Scan the authority file, then search the bibliographic file, e.g. to retrieve the MARC bibliographic record or records associated with the authority record (second url used) Upon receiving such a URL, the client initiates a session at the specified host, port 210 (if necessary). It follows with a Search Request with the following parameters: element set name = as desired preferredRecordSyntax = as desired Query: attribute set = Bib-1 use attribute = 1032 (docId) structure attr. = 104 (urx) term = "BREF111" or "BREF222" or "AREF4544" etc.