Search Catalogue Documentation
The present document presents information on creation and maintenance of
MS SiteServer Search Catalogues. We expect the readers of this document to be
generally familiar with the features provided by SiteServer Search, how to
create new catalogue instances from Microsoft Wizards, and other basic
functionality. Information provided here is supplementary to Microsoft
documentation. It focuses on important aspects of Search we feel would
benefit from extra focus.
In this document we cover:
Dublin Core metadata is in the form DC.Name, DC is followed by a dot and then
by the name of the tag. Search does not support dots. In fact it does not
support most punctuation characters. To overcome this difficulty the following
steps need to be carried out.
NOTE: If the catalogue definition already contains meta tags with "invalid"
characters then the following procedure must be carried out before new tags can
be entered, or old tags modified. The Catalog Schema interface will not allow
any operations to be performed on it when "invalid" characters are found in meta
tags. Hence, before new tags can be entered the old tags must be deleted.
- Find a file called Schema.txt in directory
C:\Microsoft Site Server\Data\Search\Projects\xxxx\Build\, replacing xxxx with
the name of the catalogue (e.g. ExploitCatalogue) (Note: we assume that a
Custom Schema is used for the creation of the catalog).
- Find the blocks of code containing the tags with "invalid" characters and
cut them out of the file. Save and exit the editor.
To enter new tags follow these steps:
- In SS3.0 MMC open the search branch and find the
search catalogue that you wish to modify. In the Catalogue Build Server branch
open the Schema and Add to the catalog the desired tag, replacing fullstops by
an underscore or some other neutral and acceptable character. Select HTML Meta
for Property Set. For Property ID give a unique name e.g. DC$, where $ is a
number. Set both Retrieve and Index radio buttons to Yes.
- Repeat Step 1 for each Dublin Core tag. Save this as
a Custom Schema. When done, close the MMC.
- Find a file called Schema.txt in directory
C:\Microsoft Site Server\Data\Search\Projects\xxxx\Build\schema.txt where xxxx
stands for the name of the catalogue. Open this file in a text editor. Find
the lines with your tags that you have just entered and replace the neutral
character by a full stop in each of the DC tags. Now save the schema file and
exit.
- Rebuild the catalog. To search for tag content in the catalog you must type:
@"META_DC.Creator" Jake Brown . You will need to modify the search
scripts to accept the extra form argument.
Categorisation of documents is used to restrict the scope of the searched
files. In practice this means tagging files with metadata. A different kind of
configuration to the above is needed to facilitate Category based searching:
- Open DEFINECOLUMNS.TXT file in c:\Microsoft Site
Server\Data\Search\Config directory.
- Modify it to contain the following line for each meta tag you want in
the catalog:
myTag (DBTYPE_WSTR | DBTYPE_BYREF) =
d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 "myTag"
(NOTE:
replace myTag with the name of the tag in two
places. Only the second entry ought to be in quotes)
- Modify Schema.txt file in the same directory to have
the following block of code for each meta tag you want to catalog:
<COLUMN
name=myTag
description="Some Descriptive Text"
type="VT_LPWSTR"
propid="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1/myTag"
index="yes"
retrieve="yes"
</COLUMN>
NOTE:
replace myTag with the appropriate string in
two places. Add some descriptive comment at the "description" variable.
We have used the Master Schema to facilitate Category Search.
The Category Search interface is implemented using
JavaScript. The interface is split across two frames. The left frame is used to
select a category from a hierarchical list. The right frame is used to enter a
text query for searching. When a category is selected in the left frame, it is
immediately echoed in the Search Category field in the right frame. The category
hierachy is created from a configuration file that is pulled in using a
Server Side Include by Default.asp in the /cat_search directory. To pull in a
different configuration file edit Default.asp in line 52.
The format of the category configuration file is as
follows. Each category and subcategory needs to have the following form
categoryTree[index] = ["level",
"label", "tag name", "tag content search string"]
where:
- index is a number from 0 up to a maximum of 49. Categories need to
be numbered, the first being numbered with index 0, second with index 1,
third with index 2, and so on.
- level is a number which is either 0 or 1. Level 0 are top level categories.
Level 1 are the subcategories. At present only 2 levels are supported.
- label is the descriptive text for the category as it appears in the
cat_search interface.
- tag name is the name of the META tag that documents of this category are
tagged with.
- tag content search string is meant
to match the content field of the META tag. For level 2 tags this is usually
a complete match. For example, suppose we want to search in issue 4 only.
All issue 4 documents are tagged with <meta
name="IssueNumber" content="issue4">, hence the tag content search string
should be "issue4". For level 1
tags we have categories such as "All Issues", which cover issues 1, 2, 3,
... , X. All documents in these issues are tagged with
<meta name="IssueNumber" content="issue1">
through to <meta name="IssueNumber"
content="issueX">. We use the * wildcard character for this and set
the tag content search string to "issue*".
This will search any document tagged with the IssueNumber META tag. Look at
the category search configuration file for examples.
Final configuration parameter associated with Category
Search is in file /global_defaults.ssi which contains the definition for a
variable holding the name of the catalogue used.
Written by Adam Batenin
March 2000