Issue 1 : April 2004 |
DELOS Home | DELOS Newsletter Front Page | Delos Newsletter Contents |
Each issue of the Newsletter will carry the most recent news items from the DELOS website. The full listing will grow over time.
First International Workshop of the EU Network of Excellence DELOS on Digital
Library Architectures
S. Margherita di Pula (Cagliari), Italy, 24-25 June, 2004
DELOS is a new interdisciplinary EU FP6 Network of Excellence with a broad vision: Future digital libraries (DLs) should enable any citizen to access human knowledge any time and anywhere, in a user-friendly, multi-modal, efficient and effective way. The main objective of the DELOS network is thus to define and conduct a joint programme of activities in order to integrate and coordinate the ongoing research activities of the research teams in the field of digital libraries for the purpose of developing the next generation DL technologies. This first workshop is devoted to the architectural infrastructure of future DLs. The objective of the workshop is to bring together researchers interested in the architecture and related basic services that allow us to build and operate DLs and to identify the direction ongoing research will take. Ideally the infrastructure combines concepts and techniques from the following fields:
Peer-to-peer (P2P) architectures allow for loosely coupled integration of information services and sharing of information such as recommendations and annotations. Different aspects of peer-to-peer systems (e.g. indexes, and P2P application platforms) must be combined. Grid computing middleware is needed because certain services within digital libraries are complex and computationally intensive (e.g., extraction of features in multimedia documents to support content-based similarity search or for information mining in bio-medical data). The service-oriented architecture (SoA) provides mechanisms to describe the semantics and usage of information services. Moreover, in a SoA we have mechanisms to combine services into workflow processes for sophisticated search and maintenance of dependencies.
It is obvious that elements of all three directions should be combined in a synthesis for future DL architectures. Therefore, a main goal of this workshop is to provide a forum for discussing the development and integration of building blocks and services for DL infrastructures from these and related areas. In the spirit of a workshop we ask for extended abstracts describing ongoing research and development.
Submissions in form of extended abstracts, not exceeding two pages based on Springer's LNCS style, should be sent in PDF to tuerker@inf.ethz.ch before April 24, 2004. Accepted papers will be published in the DELOS workshop proceedings.
Abstract Submission Deadline: | April 24, 2004 |
Notification of Acceptance: | May 10, 2004 |
Camera-Ready Full Version: | June 14, 2004 |
This workshop is co-located with SEBD 2004 - the 12th Italian Symposium on Advanced Database Systems.
Maristella Agosti, Chair
University of Padua
Department of Information Engineering
Via Gradenigo 6/a
I-35131 Padova, Italy
Email: maristella.agosti@unipd.it
Hans-Jörg Schek, PC Chair
ETH Zurich and UMIT Innsbruck
Email: hans.joerg.schek@umit.at
Can Türker, Co-Chair
ETH Zurich
Email: tuerker@inf.ethz.ch
The DELOS Network of Excellence for Digital Libraries invites participation in an evaluation initiative for XML document retrieval.
The widespread use of the extensible Markup Language (XML), especially the increasing use of XML in scientific data repositories, Digital Libraries and on the Web, brought about an explosion in the development of XML tools, including systems to store and access XML content. The aim of such retrieval systems is to exploit the logical structure of documents, which is explicitly represented by the XML markup, and retrieve document components, instead of whole documents, in response to a user query. Implementing this, more focused, retrieval paradigm means that an XML retrieval system needs not only to find relevant information in the XML documents, but also determine the appropriate level of granularity to return to the user. In addition, the relevance of a retrieved component is dependent on meeting both content and structural conditions.
Evaluating the effectiveness of XML retrieval systems, hence, requires a test collection where the relevance assessments are provided according to a relevance criterion, which takes into account the imposed structural aspects. A test collection as such has been built as a result of two rounds of the Initiative for the Evaluation of XML Retrieval (INEX 2002 and INEX 2003). This initiative provides an opportunity for participants to evaluate their XML retrieval methods using uniform scoring procedures and a forum for participating organizations to compare their results. As part of a large-scale effort to improve the efficiency of research in information retrieval and digital libraries, this project initiated an international, coordinated effort to promote evaluation procedures for content-oriented XML retrieval.
In INEX 2004, participating organizations will be able to compare the retrieval effectiveness of their XML document retrieval systems and will contribute to the continuous construction of a large XML test collection. The test collection will also provide participants a means for future comparative and quantitative experiments. Due to copyright issues, only participating organizations will have access to the constructed test collection.
The test collection consists of a set of XML documents, topics and relevance assessments. The topics and the relevance judgments are obtained through a collaborative effort from the participants. Detailed guidelines on the on-line topic submission, retrieval result submission, relevance assessment task, and evaluation metrics will be provided by INEX.
The INEX document collection is so far made up of the full texts, marked up in XML, of 12,107 articles of the IEEE Computer Society's publications from 12 magazines and 6 transactions, covering the period 1995-2002, and totalling 494 megabytes in size. The collection has a suitably complex XML structure (192 different content models in DTD) and contains scientific articles of varying length. On average an article contains 1,532 XML nodes, where the average depth of a node is 6.9.
Each participating group will be asked to create a set of candidate topics, which are representative of the range of real user needs over the XML collection. The queries may be content-only (CO) or content-and-structure (CAS) queries, and broad or narrow topic queries. CO queries are free-text queries, like those used in TREC, for which the retrieval system should retrieve relevant XML elements of varying granularity, while CAS queries contain explicit structural constraints, such as containment conditions. From the pooled set of candidate topics INEX will select a final set of topics to form part of the INEX test collection
The general task, to be performed with the data and the final set of topics, will be the ad-hoc retrieval of XML documents. Similarly to information retrieval, we regard ad-hoc retrieval as a simulation of how a library might be used, where a static set of documents is searched using a new set of queries (topics). The main differences are that, in INEX, the library consists of XML documents, the queries may contain both content and structural conditions and, in response to a query, arbitrary XML elements may be retrieved from the library. Participants will be able to submit up to a fixed number of runs, each containing the top 1500 retrieval results for each of the selected topics.
INEX will have four tracks this year:
Relevance assessments will be provided by the participating groups using INEX's on-line assessment system. Assessors will judge 1-2 topics, either the topics that they originally created or if these were removed from the final set of topics, then topics that were similar to their original queries. Please note that assessments will take about one person week per topic. Participating groups will gain access to the completed INEX test collection only after they have completed their assessment task.
The evaluation of the retrieval effectiveness of the XML retrieval engines used by the participants will be based on the constructed INEX test collection and uniform scoring techniques, including recall/precision measures, which will take into account the structural nature of XML documents, and overlap of answers.
Participants will be able to present their approaches and final results at the INEX 2004 workshop to be held in December 2004 in Dagstul. All results will be published in the INEX workshop proceedings and on the Web.
In order to have access to the data designated as the IEEE Computer Society XML Retrieval Research Collection, organizations (which did not sign the agreement in 2003) must first fill in a data release Application Form (to be obtained from the INEX 2004 website).
April 2: Deadline for the submission of "Application for Participation".
April 02 - 16: The collection of XML documents will be distributed to all participants on the receipt of their signed data handling agreement. Participants will also be provided with detailed instructions and formatting criteria for candidate topics/queries.
May 03: Submission deadline for candidate topics.
May 24: Distribution of final set of topics/queries to participants along with detailed information on the formatting requirements of the search results.
August 09: Submission deadline of search results.
August 23: Distribution of merged results to participants for relevance assessments.
October 08: Submission deadline for relevance assessments.
November 01: Distribution of XML test collection and evaluation scores to participants.
December 1 (tbc): Submission of papers for the workshop pre-proceedings
December 13-15 (tbc): Workshop in Schloss Dagstuhl (http://www.dagstuhl.de/).
Norbert Fuhr
University of Duisburg-Essen
Email: fuhr@uni-duisburg.de
Mounia Lalmas
Queen Mary University of London
Email: mounia@dcs.qmul.ac.uk
Contact person
Saadia Malik
University of Duisburg-Essen
Email: malik@is.informatik.uni-duisburg.de
Topic format specification
Börkur Sigurbjörnsson
University of Amsterdam
Email: borkur@science.uva.nl
Andrew Trotman
University of Otago
Email: andrew@cs.otago.ac.nz
Online relevance assessment tool
Benjamin Piwowarski
Université Paris 6, France
Email: Benjamin.Piwowarski@lip6.fr
Metrics
Gabriella Kazai
Queen Mary University of London
Email: gabs@dcs.qmul.ac.uk
Arjen P. de Vries
CWI, The Netherlands
Email: Arjen.de.Vries@cwi.nl
DELOS Home | DELOS Newsletter Front Page | Delos Newsletter Contents |