What do you need?
It is essential that you start off the exercise with a clear idea of what you are looking for. Things to think about may include the following:
- What manpower and/or money is available for the project and will this be seen as a one-off cost?
- Do I want to/ am I able to run this on the web server, on a separate machine, or have someone else host it?
- What platform do I want to use (is there the expertise or facilities for using a different platform)?
- How many servers do I want to index (ballpark figure of number of pages to be indexed useful here too)?
- Is the data to be indexed subject to frequent change, if so in part or as a whole?
- What type of files do I want indexed (just HTML, or including PDF, Office files, etc.)
- What type of search facilities do I want to offer (keyword, phrase, natural language, constrained searches)?
- Do you want to offer separate searches for external and internal users?
- Can you ever solve the problems with just a search engine? - see http://vivisimo.com/solutions/universities.html for 'clustering engine'