User requirements
The eBank project is advocating a 'publication at source' philosophy based on open access principles. It is based on the idea that research teams could routinely deposit datasets in institutional repositories as part of the data creation and processing workflow. This would facilitate the subsequent linking of datasets to peer-reviewed papers or the results data published in specialised databases.
For the development of a demonstrator system, the eBank project decided to focus on the sub-discipline of crystallography, as this has a well-defined data creation workflow and a tradition of sharing results data in an internationally accepted standard, the Crystallographic Information File (CIF) adopted by the International Union of Crystallography. A typical crystallography workflow can be shown diagrammatically.
For the eBank UK demonstrator, a test 'data provider' repository of crystallography datasets was created at the University of Southampton and populated with test metadata (http://ecrystals.chem.soton.ac.uk/). This requirements report concerns the requirements for an eBank UK aggregator (service provider) demonstrator, based on an OAI-based architecture that enables the harvesting of metadata from multiple databases of crystallographic data (data provider). For the pilot, metadata about datasets was harvested from the the University of Southampton's Crystal Structure Report Archive. This was linked with metadata about research papers harvested from elsewhere. The aim of the pilot aggregator is to provide an integrated search and browse interface. The project will also explore the possibility of including versions of this aggregator service in third party services, specifically PSIgate (http://www.psigate.ac.uk/), part of the UK's Resource Discovery Network (RDN).
Th following report employs user scenarios and examples from the pilot demonstrator to explore and describe the search functionality and other user requirements of such an aggregator service.
eBank UK project scenarios and user requirements
Scenarios and user requirements for the eBank UK project are available at:
http://www.ukoln.ac.uk/projects/ebank-uk/requirements/scenarios.html
User Requirements and workflows described by other projects
Some requirements related to provenance and eScience (but not specifically on linking publications and datasets)
These links provide a useful view into the 'bigger picture' of how scientists interact with on line resources as well as providing examples of the requirements capture processes in other projects within the eScience community.
- EBI provenance speculations recorded by the myGrid project http://twiki.mygrid.org.uk/twiki/bin/view/Mygrid/EbiProvenanceSpeculations
- BIOMOBY an international research project involving biological data hosts, biological data service providers, and coders whose aim is to explore various methodologies for biological data representation, distribution, and discovery.
BIOMOBY scenarios and usecases - Interoperable Informatics Infrastructure Consortium (I3C)
In 2003, I3C released the LSID, a technique to provide a consistent way to programmatically access any life science data source through the Internet. Data source examples include compound registries, assay results management systems, LIMS, inventory, sequence & protein databases plus public resources such as PDB, NCBI, CAS, PubMed (and others); consumer applications of LSID include E-notebooks, discovery portals, informatics "pipelines," and LIMS.
Requirements derived from Research Council Guidance
A joint statement by the Director General of the Research Councils and the Chief Executives of the UK Research Councils was Issued on 18 December 1998: Safeguarding good scientific practiceThis document primarily focuses on prevention of scientific misconduct, both fabrication of results and misappropriation of others' work. The document outlines a number of elements to ensure sound conduct, including:
2.6 The Central Role of Data
Primary data as the basis for publications should be securely stored for an appropriate time in a durable form under the control of the institution of their origin.
Requirements in the learning community
- Identifiers for learning Objects Use Cases: available on the CETIS web page
Sites that make data available for browsing and downloading
- The Nesstar site http://www.nesstar.com/
Nesstar is a suite of products for locating and using socio-economic and 'similarly structured' data