Terminology Services and Technology
JISC state of the art review
September 2006 |
|
Authors:
Douglas Tudhope, University of Glamorgan
Traugott Koch, UKOLN, University of Bath
Rachel Heery, UKOLN, University of Bath
EXECUTIVE SUMMARY
Purpose
Over the next two years, as part of its Capital Funding Programme, the Joint
Information Systems Committee (JISC) is supporting further work to realize a
rich information environment within the learning and research communities. This
review is intended to inform JISC’s planning for future work related to Terminology
Services and Technology, as well as to provide useful background information
for participants in future calls, whether specifically featuring terminology
or where terminology can be used to underpin other services.
Overview of report contents
This report reviews vocabularies of different types, best practice guidelines,
research on terminology services and related projects. It discusses possibilities
for terminology services within the JISC Information Environment and eFramework.
Terminology Services (TS) are a set of services that present and apply vocabularies,
both controlled and uncontrolled, including their member terms, concepts and
relationships. This is done for purposes of searching, browsing, discovery,
translation, mapping, semantic reasoning, subject indexing and classification,
harvesting, alerting etc. Indicative use cases are discussed.
One type of TS attempts to increase consistency and improve access to digital
collections and Web navigation systems via vocabulary control. Vocabulary control
aims to reduce the ambiguity of natural language when describing and retrieving
items for purposes of information searching. Another type of TS is not concerned
with consistency but with making it easier for end-users to describe information
items and to have access to other users’ descriptions. This results in vocabularies
(folksonomies) that may not be controlled, at least initially. The report reviews
different kinds of vocabularies, according to their structure and their intended
purpose. Potential benefits and return on investment are discussed. Named entity
authority and social tagging services are discussed in some detail. Pointers
are given on best practice guidelines and networked access to vocabularies,
including key issues for future terminology registries.
The wider context of TS is considered. Relevant literature on user studies
is reviewed. TS are located within an information lifecycle and within the JISC
IE. Suggestions are made towards a more specific definition of Terminology Web
Services within the JISC IE. Current work on Terminology Web Services is reviewed,
along with work on mapping, automatic classification/indexing and repositories.
Current projects that involve TS activity (JISC, UK, and international) are
briefly reviewed.
Relevant standards are discussed, particularly for vocabulary representation; identification
of concepts, terms and vocabularies; protocols and APIs.
Key points
TS can be m2m or interactive, user-facing services and can be applied at all
stages of the search process. Services include resolving search terms to controlled
vocabulary, disambiguation services, offering browsing access, offering mapping
between vocabularies, query expansion, query reformulation, combined search
and browsing. These can be applied as immediate elements of the end-user interface
or can underpin services behind the scenes, according to context. The appropriate
balance between interactive and automatic service components requires careful
attention.
Return on investment should be considered in any service provision. There are
various types of vocabularies serving different purposes, with different degrees
of vocabulary control, richness of semantic relationships, formality, editorial
control. There are a range of TS options, both interactive and automatic. There
is potential for piloting TS to augment existing JISC programmes and projects.
TS are sometimes contrasted with free text searching, assisted by statistical
Information Retrieval techniques in automatic indexing and ranking. These are
not, however, exclusive options and there are opportunities in exploring different
combinations of the two approaches. It should be noted that Web search engines
have introduced elements of TS, by offering synonym and lexical expansion options.
Thus TS should not be seen as antithetical to free text searching and can augment
it.
There are many existing vocabularies. Different arrangements regarding ownership,
maintenance and licensing of vocabularies can be found. The issue of who will
maintain a vocabulary and the basis on which it can be described or made available
in a registry needs investigation since this underpins systematic use of vocabularies
in the JISC IE. This involves establishing business models for access to and
maintenance of vocabularies.
Mapping is a key requirement for semantic interoperability in heterogeneous
environments. Although schemas, frameworks and tools can help, detailed mapping
work at the concept level is necessary, requiring a combination of intellectual
work and automated assistance. The impact on retrieval is a key consideration.
Automatic classification and indexing tools are important for addressing the
potential resource overheads in applying TS to indexed collections and repositories.
Some tools are emerging that should be investigated for JISC purposes. Many
argue for a combination of intellectual and automatic methods.
It is important to consider how people search for information when designing
and evaluating TS, in order to reduce the scope for design errors and increase
the possibility that services will actually be used. User studies should be
conducted where feasible in ongoing project work.
TS should not be seen as an isolated, free-standing component. TS need to be
considered within the wider context of the JISC IE, and need to be integrated
with other components of the eFramework. They should be seen as forming a set
of services that can be combined with a wide range of other services. There
is a need for specifications of TS and their workflow, as part of the JISC IE.
Interoperability requires commonly agreed standards and protocols. Standards
exist at different levels and types of interoperability. The prospect is emerging
for a broad set of standards across different aspects of terminology services
- persistent identifiers, representation of vocabularies, protocols for programmatic
access, vocabulary-level metadata in repositories. Such standards are an infrastructure
upon which future TS will rest but it is not feasible to wait for international
agreements; international consensus will be influenced by operational experience.
Pilot TS projects should orient to existing potential standards (in persistent
identifiers, representations, protocols for programmatic access) and help to
evaluate and evolve them.
RECOMMENDATIONS
The review was asked to include: “recommendations for further activities needed
in this field, and the extent to which JISC should be involved in the work (both
short and longer term), including collaboration with other organizations as
a possible form of involvement". The following recommendations are listed according
to the relevant section of the review, where further context may be found.
1. Introduction
1.1 Purpose of this review
- Terminology services can support various stages of the information lifecycle
- JISC should highlight subject access and terminology services in all relevant
JISC programmes, whether as extensions to existing projects or as new projects
1.2 Terminology Services overview
- Demonstrate integration of Terminology Services with other components of
the JISC Information Environment. (See also Recommendation 4.3)
1.2.3 Combination of terminology tools and techniques
- Encourage inter-disciplinary collaboration in the development of terminology
services and co-operation with memory institutions and archives
- Investigate different combinations of TS and uncontrolled (non-TS) search
1.3.2 Return on investment
- Investigate methods to make vocabularies available to the education sector
through a Registry, initially for experimentation purposes but ultimately
in a sustainable, maintained, licensed manner. (See also Recommendation 3.7)
2 Use cases - scenarios
- Use cases should be developed and refined in an ongoing basis, along with
case studies of TS in practice, user session logging, observation, etc.
3 Types of vocabularies
- Provide access to a range of different vocabularies according to context
- It is important to consider the broader context and return on investment
3.1 Vocabularies by structure
- Consider faceted approaches when developing vocabularies and TS
3.2 Vocabularies by purpose
- Descriptions of intended purposes of a vocabulary would be a useful element
of a vocabulary registry (see also Recommendation 3.7).
3.2.4 eLearning purposes
- Increased cross-fertilisation between eLearning and Digital Library fields
- User studies of behaviour by indexers (cataloguers), students, teachers.
Investigate how to support effective practice with a variety of indexing and
retrieval tools
- Investigate conversion between VDEX and SKOS Core representations for compatible
vocabularies (see also Recommendation 6.2).
3.2.5 eScience purposes
- Studies of user practice with vocabularies describing research data
3.3 Named entity authority and disambiguation services
- Investigate lists of institutional names and academic affiliations (IESR
Agents etc.)
- Study the coverage of available name authorities in OPACs and academic web
publishing (LEAF, CiteSeer and similar)
- Engage in international cooperation (eg, LEAF, OCLC, SURF DARE)
- Prototype a demonstrator UK Name Authority File, possibly involving BL and
universities (authentication, staff, institution databases) and evaluate its
use in a limited application
- Address the treatment of place and geographical names in UK services and
activities, and the development of standards and authorities, in cooperation
with related projects and terminology efforts.
- Support active participation of UK institutions in international naming
standardisation efforts in scientific disciplines and, via project support,
assist their implementation in UK
- Apply methods of name extraction and investigate their benefits compared
to and in combination with traditional authority systems. Build and evaluate
different name disambiguation demonstrators
- Experiment with a Name Authority Web Service, e.g. to be built into metadata
creation tools
- Develop or support metadata enhancement services for correction and enrichment:
vocabularies, schemes, mapping, names
3.4 Social tagging and folksonomies
- Experiment with combination of KOS-based controlled indexing with an established
vocabulary and free (social) tagging for research purposes in a specific discipline,
optimised for discovery and retrieval
- Experiment with potential for automatic linking of tags to facets, controlled
vocabularies and authorities
- Integrate tagging with existing services such as repositories, OPACs, (RDN/Intute)
subject gateways, Digital Libraries, KOS creation and management systems,
museum exhibitions and catalogues, metadata enhancement services etc.
- Comparison study between different types of user participation: annotation,
recommendation, personalization, restructuring of information, categorization,
concept space, concept maps, topic map tools. This could inform a prototype
integrating different types of user participation with social tagging
3.7 Terminology Registries
Demonstrate the use of a terminologies registry within JISC IE testbed to include:
- Investigating inclusion of terminologies into IESR, potentially describing
vocabularies as collections
- Developing marketing proposition for a UK terminology registry (include
use scenarios, IPR issues, business models, cost benefit)
- Evaluating use of the draft metadata description profile proposed by NKOS
- Maintain collaboration between various UK initiatives (with eScience e.g.
GRIMOIRES and learning communities e.g. Becta Vocabulary Tool) and internationally
(e.g. NSDL)
4 Activities with TS
4.1 Studies and models of information seeking behaviour
- User studies of TS in context of JISC IE, illuminating the search process
(for work flow of services) and the appropriate balance between interactive
and automatic TS
4.3 Types of Terminology Web Services
- Develop more precise definitions of TS, as part of the JISC IE and eFramework
- Define search process workflow of TS within JISC IE eFramework
- Within the context of eFramework, develop a hierarchical layered set of
protocols for TS and standard bindings to (various) APIs
- Develop open source, reference terminology web service implementations
4.3.4 Terminology Web Services review
- Collaborate with international efforts in terminology web services
- Develop a range of TS-based search and browsing tools
4.4 Mapping
- Investigate/compare different mapping approaches and granularities in pilot
projects
- Develop a range of TS-based tools to assist in creating mappings
- Investigate the potential for standard mapping relationships and a mapping
protocol
- Collaborate with international efforts in mapping services
4.5 Automatic classification and indexing
- Investigate semi-automatic solutions to indexing and classification in pilot
projects
- Investigate currently available tools for automatic indexing and classification
4.6 Text mining and information extraction
- Investigate relationship between KOS and text mining:
- Demonstrate how KOS can support text mining
- Demonstrate how text mining can be used to update and enhance KOS
5 Review of current terminology service activity
- JISC should negotiate Dewey licenses for JISC services and projects
5.5 Repositories
- Pilot different approaches to subject based access to repository content
via different types of vocabulary and TS, taking cost benefit issues into
account and various levels of aggregation of content:
- use of subject classification and
- use of specialised KOS vocabularies
- use of author assigned keywords
- full text indexing
- Consider use of mainstream classification (such as DDC) in combination with
assigning specialised vocabulary terms (as in use within RDN)
5.6 Augmenting existing programmes and projects
- JISC should support a range of pilot demonstrators with end-users and evaluation
- Investigate different TS approaches to (eg) indexing, mapping, search/browsing, query expansion, disambiguation
- Consider subject access and terminology service adjuncts to appropriate JISC programmes and projects, including TS support for Intute; connection of TS (and subject access) to collection level metadata (e.g. topical composition, correlation); TS support for repositories; project-specific examples
- Harvesting
- Investigate possibilities for extending harvesting tools with more subject metadata
- Investigate relationship of TS and OAI etc
- Evaluate benefits of vocabulary-oriented metadata normalising and enhancement service, e.g. aggregator harvesting relevant metadata, enhancing it and then offering harvesting of the improved metadata
- Develop vocabulary visualisation tools supported by TS
- Flexible display and tailoring of segments from vocabularies
- Flexible display and tailoring of results
- Combined search/browsing
6 Standards
- JISC should encourage participation in international standardisation activities
6.1 Design
- Relevant standards should be included in JISC Standards Catalogue. All new initiatives should take account of relevant design standards
6.2 Representations
- Strongly recommended to use XML-based representations
- Recommended that vocabulary providers consider using SKOS Core if appropriate and contribute to further extensions and customising of SKOS Core
6.3 Identification of concepts, terms and vocabularies
- A global identifier mechanism for referring to vocabularies and their components underpins interoperable TS
- Recommended to consider building upon existing work with the http URI approach for concept identifiers
- Investigate the addition of identifiers to a widely used freely available vocabulary in a pilot study
- Educational work with vocabulary providers on need to supply identifiers and discussions on practical issues should be undertaken
6.4 Protocols, profiles and APIs
- Need for standard m2m protocols for networked access to vocabularies (and their constituent concepts, relations and terms) with common bindings (APIs) building on web services and other low-level standards
- Recommended to consider using SKOS or ZThes API for TS (with a view to contributing to further development). Investigate possibilities of unifying SKOS and ZThes APIs
- Investigate possible standard m2m protocols for mapping access to vocabularies, perhaps by expanding SKOS or ZThes APIs
- Investigate the combination/integration of TS with existing query APIs (SRU/SRW, CQL) or possibly develop new TS-based query APIs
Web page by: Shirley Keane
File last modified:
Thursday, 11-Jan-2007 13:19:14 UTC