Issue 1 : April 2004 |
DELOS Home | DELOS Newsletter Front Page | Delos Newsletter Contents |
Each issue of the DELOS Newsletter will carry a report from each of the clusters working within the DELOS Network of Excellence. In this issue clusters are reporting inital objectives and decisions made about their work.
Can Türker and Hans-Jörg Schek provide us with an overview of the aims and objectives of their cluster activity and explain the outcomes of their kick-off meeting in terms of their workpackage's first steps.
The DELOS Project commenced January 2004 and so at this early stage in proceedings we intend to make use of our first Newsletter report to highlight briefly the main objectives of the workpackage and to report on the activities fixed in the kick-off meeting of this workpackage.
Citizens of the future should be able, through the medium of better designed digital libraries to gain access to a myriad of forms of knowledge from anywhere and at any time and in an efficient and user-friendly fashion. But for this to happen those digital libraries will need to arrive at a common infrastructure which is highly scalable, customizable and adaptive. Ideally, such an infrastructure would combine concepts and techniques from peer-to-peer data management, grid computing middleware, and service-oriented architectures.
Peer-to-peer architectures allow for loosely coupled integration of information services and sharing of information such as recommendations and annotations. Different aspects of peer-to-peer systems (e.g. indexes, and P2P application platforms) will need to be combined. Grid computing middleware is needed because certain services within digital libraries are complex and computationally intensive, (for example the extraction of features in multimedia documents to support content-based similarity search or for information mining in bio-medical data). The service-oriented architecture provides mechanisms to describe the semantics and usage of information services. Moreover, it supports mechanisms to combine services into workflow processes for sophisticated search and maintenance of dependencies.
The main objective of this workpackage therefore is the conceptual and experimental evaluation of the impact of these three main directions on a digital library architecture. A thorough evaluation of existing approaches will reveal the advantages and disadvantages of either approach. Moreover in order to be able to quantify that thorough evaluation satisfactorily, we will have to develop a robust set of benchmarks.
A joint kick-off meeting was held in Zurich over 5-6 February 2004 with colleagues working in Information Access and Personalization which meant some 24 staff in all. We felt that it was very important to take the opportunity to bring together members of both workpackages and discuss the first steps of the various tasks. After general guidelines on procedures and an overview of the project as a whole and our workpackages in particular, each colleague spoke on the work of their group and how they intended to contribute to the work of their cluster This was the principal aim of the kick-off meeting and we felt we had largely achieved it.
We then proceeded to put some flesh on the bones of a number of activities that had been outlined in the project proposal. The three main task areas within the DLA Cluster emerged refined and scheduled as follows:
Outlined briefly below, the following activities are planned for the first 18 months of the workpackage:
With respect to the first item, it was agreed to hold the first workshop on digital library architectures over 23-24 June 2004. This workshop will take place at Cagliari (Italy) in conjunction with the Italian symposium on advanced database systems. Maristella Agosti has taken on the general and local organization of the workshop. Hans-Jürg Schek and Can Türker are co-chairing the programme committee. Meanwhile, a call for papers has been distributed through the various email channels.
This first workshop will be devoted to the architectural infrastructure of future digital libraries. The primary objectives of the workshop are to bring together European researchers interested in the architecture and related basic services that make it possible to build and operate digital libraries, and secondly, to identify the directions in which further research should go. This workshop will provide a forum for discussing the development and integration of building blocks and services for digital library infrastructures particularly in the following areas:
More details about this workshop are available at http://www.dbs.ethz.ch/delos which will also hold the presentations of this workshop on its completion.
A second workshop, again a joint one between DLA and IAP, has already been fixed for 29 March - 1 April 2005 in Dagstuhl, Germany. This workshop will be organized by Gerhard Weikum, MPI Saarbrücken.
DELOS Name of Cluster: Name of Leader: |
Network of Excellence Digital Library Architecture Hans-Jörg Schek |
email: schek@inf.ethz.ch phone: +30 210 727-5224 fax: +30 210 727-5214 url: http://www.dbs.ethz.ch/ |
Georgia Koutrika provides a general description of the Information Access and Personalization cluster, its objectives and the current activities towards the completion of these objectives.
Information stored in digital libraries needs to be accessed, integrated and individualized for any user anytime and anywhere in possibly multiple comprehensive and efficient ways. Within DELOS, Information Access in Digital Libraries is studied from three different aspects:
Information stored in a source comes in different types and formats, each one with its own characteristics and peculiarities. Organization of data within an individual source and efficient and effective search performance are the key issues and are actually very closely related to each other. Different approaches exist but there is a general trend towards richer representations and languages both at the structural and at the semantic level.
Integrated access of different sources presents particular problems due to information heterogeneity, redundancy etc. Issues such as source selection and results fusion must be considered in a range of different settings. Data provenance is often crucial to the trust that is placed in data; hence it has to be managed on a very sound basis.
Different users have different characteristics and preferences concerning the information in which they are interested when they access a digital library. Even users with a common information need can expect to receive different results, different functionality or a different interface. Moreover, the relevant contents and interface of a digital library may be dependent on other factors as well, for example there may be device- or network-specific issues to consider.
The cluster's objectives with respect to the aforementioned aspects are the following:
The cluster has begun work towards the establishment of a common foundation for European researchers in all the aforementioned areas. Current activities are organized into the following tasks:
Task 1: Creation of Common Foundation for Information Access. Leader: CNR
This task seeks to create a common conceptual and infrastructural foundation in respect of Information Access, and is subdivided into three sub-tasks.
Task 2: Creation of Common Foundation for Information Integration. Leader: FORTH-ISL
Here the aim is to achieve a common conceptual and infrastructural foundation in respect of Information Integration which deals with multiple, heterogeneous DLs that need to be treated in a cohesive fashion. It is subdivided into three sub-tasks.
Task 3: Creation of Common Foundation for Personalization. Leader: UOA
This task aims to create a common conceptual and infrastructural foundation with regard to Personalisation and Customisation of the behaviour of a DL system. It is subdivided into two sub-tasks.
Important activities within the above tasks include the organisation of two thematic workshops. The first one will be on Personalization, (dates and location to be determined), while the second one will be on Information Access and Integration and will be organised in cooperation with the Digital Library Architecture Cluster in Dagstuhl over 29 March to 1 April 2005.
DELOS Name of Cluster: Name of Leader: |
Network of Excellence Information Access and Personalization Yannis Ioannidis |
email: yannis@di.uoa.gr phone: +30 210 727-5224 fax: +30 210 727-5214 url: http://www.di.uoa.gr/~yannis |
Stavros Christodoulakis describes the principal aims and activities for this cluster.
Digital libraries will capture, organize, store and manage the access to large amounts of digital information regarding human knowledge, culture, and history in various, possibly interconnected, presentation forms like video, audio, images, etc. These types of non-traditional content will often be highly structured into segments and/or semantic units (objects) which will be indexed and interconnected with other objects in a variety of ways allowing flexible access, transcoding, browsing, semantic integration, presentation, and personalization according to the application functionality, the domain of knowledge described, the presentation device and the user's preferences and goals. The WP3 cluster entitled "Audio-visual and non-traditional objects" will focus on metadata capture for audio-visual content, universal access and interaction with audio-visual libraries, together with the management of audio-visual content in digital libraries.
The overall objectives of this cluster are to establish a common ground of knowledge for European researchers about the state of the art, to identify the direction that research will take and realise important new applications for digital libraries with audio-visual and non-traditional objects. In particular, the cluster aims at:
For the first 18 months period of the project, the foreseen activities are organized in 6 distinct tasks:
Work in the cluster was launched by the kick-off meeting held in Paris at INRIA on 16 January 2004. During this meeting the work scheduled was specified in detail and the cluster steering committee was nominated. The cluster also nominated colleagues to be responsible for liaison with the other DELOS II clusters in order to facilitate integration within the network. We also decided that the first cluster workshop would be held in Chania, Crete during September 2004.
The A/V-NTO Cluster website http://delos.music.tuc.gr/
and the Cluster mailing list delos-wp3@ced.tuc.gr
have been established to facilitate communication between the partners and the dissemination of cluster results.
The complete list of A/V-NTO partners is also available.
The cluster is currently working on the preparation of three state of the art reports on metadata extraction, information access and interaction as well as on management of audio-visual content.
DELOS Name of Cluster: Name of Leaders: |
Network of Excellence Audio/Visual and Non-traditional Objects Stavros Christodoulakis Alberto del Bimbo |
email: stavros@ced.tuc.gr phone: +30-2821-037399 fax: +30-2821-037399 url: http://www.music.tuc.gr email: delbimbo@dsi.unifi.it phone: +39-055-479 6262 fax: +39-055-479 6363 url: http://viplab.dsi.unifi.it/ |
Tiziana Catarci describes the aims and objectives of this cluster and puts us in the picture about work already underway in the area of user requirements relating to different groups of stakeholders.
In current discussions on the notion of what is a digital library, it is generally associated with the technological drive to build, use, and maintain large collections of electronic documents. However, it could also be considered as a linchpin in the construction of an information-enriched environment. It should nonetheless be recognized that this higher-level vision of a digital library, however perceptive, nonetheless brings with it a range of problems which will require solutions if we are to guarantee the usability and accessibility of this environment for the different classes of users with varying needs and capabilities. The ultimate goal of the User-Interface and Visualization cluster is to develop methodologies, techniques and tools to enable future digital library (DL) designers and developers to meet not only the technological, but also the user-oriented requirements in a balanced way. The UIV cluster will address this goal by pursuing several activities:
User requirements will be studied systematically. The different perspectives on a digital library will be analyzed to relate them to the requirements and technical implementation options that emerge from the ongoing development projects being undertaken by the partners of the Network of Excellence (NoE).
It is our intention to analyze methodically all the aspects and phases pertaining to the development and the usage of a DL system. The analysis will focus not only on the DL end-user, for it will also need to take into account other DL stakeholders such as librarians, content providers and maintainers. The DL life cycle will be related to both functional and non-functional requirements.
It is also part of our plan to develop a set of character profiles of DL users and we need to bear in mind that the user interface should afford accessibility to all categories of users, including those with special needs. In addition, this cluster will also explore how users can make use of a multi-modal DL-user interface which will meet their particular needs.
For further details see the update page
UIV Ongoing Activities: Collection of User Requirements
- Questionnaire Formulation.
We will explore the impact and corresponding DL opportunities with respect to contextual information. The cluster will carry out studies towards developing a taxonomy of relevant context models. A language specification will then be proposed which will encompass the pertinent characteristics and requirements of context models identified during the development of the taxonomy. The cluster will then model context-dependent DL aspects.
As a consequence of taking the usage situation/context into account in our work, we find ourselves rethinking the basic assumptions underlying most of the current approaches to information filtering and retrieval. This should lead to more realistic definitions of 'relevance'. It will then be possible to move to the development of a comprehensive model for relevance criteria.
The UIV cluster will investigate the exploitation of existing visualizations as well as consider novel visualizations and how they present DL results/views; it also expects to examine certain aspects of the DL life cycle. Furthermore it will also give thought towards the possibility of extending current visualizations through the suitable application of both text and multimedia data. We will therefore seek to gauge the efficacy, clarity and degree of interactivity of visualization in the digital library context. The cluster will build a theoretical framework from which user interface designers and developers can design interfaces for DL-users. With the framework in place, it should be possible for developers to gather together the various resources provided by the theoretical framework (for example tools and methodologies) and design a DL-user interface specific to a particular application domain.
Furthermore we need to bear in mind that digital library solutions of the future will have to offer components that are both integrated and customizable and which are able to provide the necessary functionality required by the environment they serve. Ultimately our goal is to begin by developing a generic user interface which will allow us to produce a design methodology, and associated guidelines, which will help us to define appropriate technical solutions. Not only should it be possible to implement the latter in a given scenario but also to ensure that the needs of users have been met as a first priority. Therefore we aim to create an integrated DL architecture which combines both user- and application-oriented functions, for example query and navigation features, and which will adapt to the needs of its users.
Therefore, for the UIV cluster, the chief outcomes, i.e. the final integrated results of this work should be the theoretical framework for user interface developers together with the design methodology and supporting documentation which will allow designers to create specific technical solutions. However additionally we are looking to provide profiles of the different classes of users; together with analyses of the DL life cycle, of functional and non-functional requirements and of the efficacy of visualization; as well as ensuring the development of both a relevance and a context model.
A temporary website for this cluster is available at: http://www.dis.uniroma1.it/~delos/
DELOS Name of Cluster: Name of Leader: |
Network of Excellence User Interfaces and Visualization Tiziana Catarci |
email: catarci@dis.uniroma1.it phone: +39-06-4991 8331 fax: +39-06-4991 8331 url: http://www.dis.uniroma1.it/~catarci/ |
Liz Lyon describes the range of research activities being planned by the partners working in this challenging area.
The thematic area of Semantic Interoperability is growing in importance in digital library (DL) research (taking the interpretation of "digital library" at its broadest). It applies to the application of different vocabularies and terminology used in descriptions of digital objects for both learning and research, collections of those objects, collections of datasets and resources used in the wider cultural heritage sector and in e-research. Indeed, cross-sectoral and cross-domain shared understanding of semantic descriptions is one of the goals of the Semantic Web as envisaged by Tim Berners-Lee in his "roadmap" published in 1998, (for further details, see http://www.w3.org/DesignIssues/Semantic.html ). This vision has more recently (2001) been applied to "Grid computing" and e-science / e-research initiatives in the Semantic Grid approach, (see http://www.semanticgrid.org/ ).
In addition, the application of algorithms for the mining and analysis of digital resources (text, data, complex objects), offers exciting opportunities for the extraction of new knowledge and the re-use of data and information in new ways.
Today, we are beginning to address some of the issues and challenges in this complex area and the Delos Network of Excellence has the opportunity to carry out some important research to move the Semantic Web/Grid vision forwards towards implementation.
The Knowledge Extraction & Semantic Interoperability research cluster has two key strategic goals:
We can examine some of the themes underlying this area in more detail.
The development of digital repositories for the support of research and learning is at a critical stage. There has been a concerted effort to promote open access to the research literature with the success of the Open Archives Initiative, the development of the ePrints software from the University of Southampton, UK, the establishment of the European-focused Open Archives Forum and national initiatives such as DARE (Digital Academic Repositories). There has also been a drive to promote institutional repositories as the location for e-print deposit e.g. the DSpace project at the University of Cambridge, UK. These developments have all been made possible through the implementation of the OAI-Protocol for Metadata Harvesting within the information architectures. Digital resources published in this way may also include primary research data, experimental data generated by Grid-enabled applications, gene and protein structure data, statistical data, satellite data, census data and environmental modelling data. The current increase in Grid -enabled applications is resulting in large volumes of data being collected in data libraries and this trend is likely to continue in the future. These large datasets need to be managed, curated and made accessible to the research community.
In parallel to the development of repositories of research data and derived information, many institutions are creating learning objects for manipulation and inclusion in learning programmes and curriculum-based activities. Learning Management Systems are being deployed as vehicles for the development and distribution of online courses as part of e-learning initiatives. Repositories of learning objects are being developed, both at national and institutional level, to enable the access to and deposit of discrete learning objects for wider use by the community.
The integrity, authenticity and value of the mass of information and knowledge derived from original data are actually dependent on a number of critical factors. For example in science, the provenance or origin of a particular set of data is essential to determining the likely accuracy, currency and validity of derived information and any assumptions, hypotheses or further work based on that information. Significant research has been carried out on describing the provenance of scientific data in molecular genetics databases and the topic has been explored in the Global Grid Forum (GGF6) in relation to Grid data. The Open Archives Initiative has carried out work to describe the provenance of harvested metadata records and the concept is included as an element in the administrative metadata which is part of the METS metadata standard. The critical factors include the definition and acceptance of appropriate frameworks for metadata description, a shared understanding of the concept of provenance, the widespread use of unique identifiers, appropriate linking technology and the application of common ontologies for discrete domains. These concepts are relatively new but have the potential for significant impact on the way in which research and learning is conducted in the future and on the ability to integrate and re-use digital resources in a variety of ways.
In order to achieve semantic interoperability between descriptions of services, collections and items, there needs to be a shared understanding of the meanings of subject terms and descriptors. Frequently, discrete subject domains have their own shared vocabulary, however specific terms may have different meanings within another subject domain. Additionally, one particular domain may have multiple vocabularies which are used by the different communities of interest. The myriad of existing vocabularies both at domain and high level is a major challenge to implementers and users of digital libraries who are trying to locate resources and services.
There is now an increasing number of developments in the broad area of Semantic Web/Grid technologies, ranging from the development of Semantic Web-enabled Web Services to the scoping of terminology servers to provide services to distributed digital libraries. There is also a growing body of work on registries and their use in the publication and validation of metadata schemas.
Finally, the increasing richness of both data and the descriptive metadata contained in digital libraries offers great potential for the application of a variety of tools to extract additional information to contribute to knowledge. The research community has a growing requirement for data manipulation tools to facilitate spatial change (federation, aggregation, dis-aggregation, replication, manipulation, linking, annotation, editing/versioning, transformation) and for knowledge extraction which can include analysis (textual, musical, statistical, mathematical, visual, chemical, gene), mining (text, data, structures), and modelling (economic, mathematical, biological). Taking an example, text mining techniques have been applied to resources in various domains and in particular to biomedical materials. Similarly, data mining techniques have been applied to domain datasets such as biomedical and physical data and this form of analysis is becoming increasingly important in the understanding of outputs from Grid-enabled projects and associated data repositories.
Together these themes form a rich contextual background to the research programme of this cluster.
A number of organisations and institutions are currently involved in this Work Package:
We have identified a number of activities to initiate a programme of work which is currently being explored in terms of the definitions and scope of the various themes.
A Forum is being created to provide a physical and virtual arena for the exchange of experience and research in all the areas/themes of this cluster. The first meeting of the Forum is planned to coincide with the European Conference on Digital Libraries (ECDL) to be held in September in Bath, UK. It will provide an opportunity to integrate systematically other relevant groups into the cluster and will take the format of a one-day state-of-the-art workshop. This development is being supported by a moderated virtual forum or discussion list for the expansion of discussion on selected topics. It is also intended to maximise opportunities to harmonise with other relevant initiatives such as CIDOC and FRBR. The activity will culminate with an evaluative report and a second Forum workshop to disseminate the findings of the Report.
In the area of Knowledge Extraction, initially a study will be produced to determine the requirements for and usage of extracted knowledge for biblio-metrics, domain analysis, issue tracking and community modelling.
Semantic Interoperability is being addressed initially by scoping the area with the aim of producing a state-of-the-art overview of DL semantic issues including the application of standards, thesauri, ontologies, Knowledge Organisation Systems and the implementation of metadata schema registries.
It is intended that the discussions and various reports produced will inform the future research programme for the cluster.
Further information will shortly be available on the cluster Web site.
DELOS Name of Cluster: Name of Leader: |
Network of Excellence Knowledge Extraction and Semantic Interoperability Elizabeth Lyon |
email: e.lyon@ukoln.ac.uk phone: +44 1225 386580 fax: +44 1225 386838 url: http://www.ukoln.ac.uk |
Digital Libraries depend on preservation of the digital materials they contain and the ability to build successful digital libraries depends upon methodological and technical solutions. Two years ago, an international workgroup brought together by DELOS and the National Science Foundation defined a research agenda for Digital Preservation and Archiving in broad terms. Beginning 2004 the DELOS Preservation Cluster will thus aim to:
DELOS Name of Cluster: Name of Leader: |
Network of Excellence Preservation Seamus Ross |
email: s.ross@hatii.arts.gla.ac.uk phone: +44-141-330-3635 fax: +44-141-330-2793 url: http://www.hatii.arts.gla.ac.uk |
Norbert Fuhr gives a survey of current activity in the Evaluation cluster.
While supporting ongoing evaluation initiatives like the INEX and CLEF campaigns, the evaluation cluster is also working towards the development of new evaluation models and methods.
CLEF (Cross-language Evaluation Forum, http://www.clef-campaign.org/) provides test-beds for the evaluation of cross-language information retrieval. For the 2004 campaign, CLEF will consist of 8 different evaluation tracks: Mono-, Bi- and Multi-lingual Information Retrieval, Mono- and Cross-Language Information Retrieval on Structured Scientific Data, Interactive Cross-Language Information Retrieval, Multiple Language Question Answering, Cross-Language Retrieval in Image Collections, and Cross-Language Spoken Document Retrieval. Supported languages will be Dutch, English, Finnish, French, German, Italian, Portuguese, Spanish, Swedish, Russian, Japanese and Chinese. At the moment, data and topics have been distributed to the participants, who will submit their runs by 15 May. Results will be presented at the CLEF workshop during ECDL over 16-17 September 2004 in Bath, UK.
INEX (Initiative for the evaluation of XML Retrieval, http://inex.is.informatik.uni-duisburg.de:2004/) deals with the evaluation of information retrieval methods for XML documents. The major tasks involved have been identified as retrieval for content-only queries and retrieval for queries referring both to content and structure of the target elements. During 2004 INEX will undertake 4 additional tracks dealing with relevance feedback, natural language queries, heterogeneous collections and interactive retrieval. 49 participating groups have registered for this year's INEX campaign and are currently involved in topic creation. The results for 2004 will be presented at a workshop in Schloss Dagstuhl, Germany, over 6-8 December.
This task focuses on the specification of standard evaluation methods for digital libraries, starting with a comparison and evaluation of existing evaluation methodology, and then developing new techniques, methods and measures. The first step will be a workshop on DL evaluation over 4-5 October 2004 in Padova, Italy. The workshop will concentrate upon a survey on the state of the art in digital library evaluation and will identify major issues for further research in the area of DL evaluation.
Most of the current activities in this cluster are targeted towards providing the necessary infrastructure for the INEX and CLEF campaigns. As a general infrastructure for DL evaluation, an evaluation forum is under development which will support communication between DL researchers and evaluation specialists. In the future, research on evaluation models and methods will be enforced in DELOS, along with the development of appropriate evaluation toolkits and test-beds.
DELOS Name of Cluster: Name of Leader: |
Network of Excellence Evaluation Norbert Fuhr |
email: fuhr@uni-duisburg.de phone: +49-203-379-2524 fax: +49-203-379-2549 url: http://www.is.informatik.uni-duisburg.de/staff/index.html |
DELOS Home | DELOS Newsletter Front Page | Delos Newsletter Contents |