EASTER (Evaluating Automated Subject Tools for Enhancing Retrieval)
Demonstration
The project is concerned with the creation and enrichment of subject
metadata using existing automated tools which will be tested with Intute
in a live environment. Two processes and types of subject metadata will
be explored:
1) The creation of subject metadata: using controlled
terms from thesauri; and,
2) The enrichment of metadata records: with non-controlled
subject keyphrases.
Let us as an example take a Web page such as Lulea University's Department of Mechanical Engineering, Division of Solid Mechnics: http://www.luth.se/depts/mt/hallf/index2.html. It contains the following text:
Words from title: solid mechanics hållfasthetslära
luleå univ of technology Sweden Words from metadata: / Words from headings: / Words from the main text (excerpt): staff research areas current projects equipment what is solid mechanis under graduate education… material is called solid rather than fluid if it can also fracture mechanics computational computer numercal simulations upport a substantial shearing force over the time scale of some natural process or technological application of interest shearing forces are directed parallel rather than perpendicular to the material surface on which they act the force per unit of area is called shear stress for example consider a vertical metal rod that is fixed to a support at its upper end… |
1) Assigning controlled terms
Using as an example KnowLib's
tool, the software will take this text and assign the following
controlled terms from the Engineering Index classification scheme:
Rank | Term | Score |
---|---|---|
1 | 931.1 ‘Mechanics’ | 3795 |
2 | 901.2 ‘Education’ | 1935 |
2 | 901.3 ‘Engineering Research’ | 1935 |
4 | 901 ‘Engineering Profession’ | 1935 |
The selected classes were the ones that had a score containing at
least 5% of the sum of all the scores assigned in total; as given below,
the sum of all the scores of all the automatically assigned classes
was 11775. All the automatically derived classes for the document: 38
different classes were automatically derived and ranked (score is in
the brackets):
931.1 (3795), 901.2 (1935), 901.3 (1935), 901 (1935), 421 (525), 933.2
(150), 481.1.2 (120), 933.1 (120), 804.2 (105), 481.1 (105), 933 (90),
604.1 (90), 641.1 (90), 657.2 (75), 931.3 (60), 535.1 (60), 804 (60),
931.2 (60), 818.1 (60), 741.1 (60), 412 (45), 444 (45), 422 (45), 657
(45), 408.1 (45), 802.3 (45), 414 (45), 545.3 (30), 483.1 (30), 461.2
(30), 812.3 (30), 531.1 (15), 903.2 (15), 801.4 (15), 932.2 (15), 482.2
(15), 505 (15), 631 (15).
2) Assigning non-controlled subject phrases
Using as an example Nactem's TerMine
tool, the software will take this text and extract the following
free-text phrases:
Rank | Term | Score |
---|---|---|
1 | solid mechanics | 2 |
2 | solid mechanics departments | 1.584962 |
2 | telefonnummer och fax | 1.584962 |
4 | division homepage | 1 |
4 | typ division | 1 |
4 | graduate education | 1 |
4 | avdelningens adress | 1 |
4 | solid mechanic | 1 |
4 | lulea university | 1 |
4 | mechanical engineering | 1 |
4 | related journals | 1 |
4 | master theses | 1 |
4 | departments homepage | 1 |
4 | undergraduate education | 1 |
4 | department home | 1 |
4 | egna rubriker | 1 |
4 | avdelningens namn | 1 |
4 | replaced-dns f | 1 |
4 | lule university | 1 |
4 | offer undergraduate | 1 |
4 | search resources | 1 |
4 | current projects | 1 |