UKOLN
Raising Awareness

"A centre of excellence in digital information management, providing advice and services to the library, information and cultural heritage communities."

UKOLN is based at the University of Bath.

EASTER (Evaluating Automated Subject Tools for Enhancing Retrieval)

Demonstration

The project is concerned with the creation and enrichment of subject metadata using existing automated tools which will be tested with Intute in a live environment. Two processes and types of subject metadata will be explored:

1) The creation of subject metadata: using controlled terms from thesauri; and,
2) The enrichment of metadata records: with non-controlled subject keyphrases.

Let us as an example take a Web page such as Lulea University's Department of Mechanical Engineering, Division of Solid Mechnics: http://www.luth.se/depts/mt/hallf/index2.html. It contains the following text:

Words from title: solid mechanics hållfasthetslära luleå univ of technology Sweden
Words from metadata: /
Words from headings: /
Words from the main text (excerpt): staff research areas current projects equipment what is solid mechanis under graduate education… material is called solid rather than fluid if it can also fracture mechanics computational computer numercal simulations upport a substantial shearing force over the time scale of some natural process or technological application of interest shearing forces are directed parallel rather than perpendicular to the material surface on which they act the force per unit of area is called shear stress for example consider a vertical metal rod that is fixed to a support at its upper end…

1) Assigning controlled terms
Using as an example KnowLib's tool, the software will take this text and assign the following controlled terms from the Engineering Index classification scheme:

Rank Term Score
1 931.1 ‘Mechanics’ 3795
2 901.2 ‘Education’ 1935
2 901.3 ‘Engineering Research’ 1935
4 901 ‘Engineering Profession’ 1935

The selected classes were the ones that had a score containing at least 5% of the sum of all the scores assigned in total; as given below, the sum of all the scores of all the automatically assigned classes was 11775. All the automatically derived classes for the document: 38 different classes were automatically derived and ranked (score is in the brackets):
931.1 (3795), 901.2 (1935), 901.3 (1935), 901 (1935), 421 (525), 933.2 (150), 481.1.2 (120), 933.1 (120), 804.2 (105), 481.1 (105), 933 (90), 604.1 (90), 641.1 (90), 657.2 (75), 931.3 (60), 535.1 (60), 804 (60), 931.2 (60), 818.1 (60), 741.1 (60), 412 (45), 444 (45), 422 (45), 657 (45), 408.1 (45), 802.3 (45), 414 (45), 545.3 (30), 483.1 (30), 461.2 (30), 812.3 (30), 531.1 (15), 903.2 (15), 801.4 (15), 932.2 (15), 482.2 (15), 505 (15), 631 (15).

2) Assigning non-controlled subject phrases
Using as an example Nactem's TerMine tool, the software will take this text and extract the following free-text phrases:

Rank Term Score
1 solid mechanics 2
2 solid mechanics departments 1.584962
2 telefonnummer och fax 1.584962
4 division homepage 1
4 typ division 1
4 graduate education 1
4 avdelningens adress 1
4 solid mechanic 1
4 lulea university 1
4 mechanical engineering 1
4 related journals 1
4 master theses 1
4 departments homepage 1
4 undergraduate education 1
4 department home 1
4 egna rubriker 1
4 avdelningens namn 1
4 replaced-dns f 1
4 lule university 1
4 offer undergraduate 1
4 search resources 1
4 current projects 1