Functional Requirements

From DigiRepWiki

[ Home | Functional Requirements | Application Model | Application Profile | Community Acceptance Plan | Mapping to Simple DC | XML Format

Contents

[hide]

Scope

The document offers a functional requirements specification for the SWAP. An analysis of the community served by the profile and an indication of the methodologies used are also included.

From the JISC specification for the work:

  • Metadata:
    • In scope: DC elements plus any additional elements necessary
    • Out of scope: other metadata formats
  • Identifiers:
    • In scope: Use of identifiers to link from description to related files (eg full text files such as pdf, HTML, etc); also use of identifiers for the description itself, for related resources etc.
    • Out of scope: Other uses of identifiers.
  • Controlled vocabularies (subject classification, name authority, etc):
    • In scope: Ensuring the application profile is hospitable to the use of a variety of subject access solutions e.g. classification schemes, controlled vocabularies, name authority lists
    • Out of scope: decisions on terminology solutions
  • Complex objects:
    • In scope: Establishing an understanding of existing work in this area and prioritising requirements
    • Out of scope: decisions on how to model complex objects
  • Additional search entry points e.g. Repository of origin
    • In scope: inclusion of properties required to fulfil other search requirements such as institution of origin, research funder, national and regional views. These requirements will be provided by RDN.

In addition:

  • Citations and references
    • In scope: Bibliographic citations for eprints and document references citing other works
    • Out of scope: Citation analysis solutions

Stakeholders and designated community

Designated Community

Stakeholder community

The following have a wider stake in the work and need to be engaged in order to ensure that the Designated Community is targeted.

  • Repositories search service
    • Implementers of this service
    • Users of the search service
  • JISC Digital Repositories Programme
  • JISC Capital Programme - Repositories and Preservation
  • Eprint repositories community in the UK
    • Repository managers, administrators and technical staff
    • Software developers of repository software (e.g. eprints.org, DSpace, Fedora)
  • JISC
  • DCMI
  • Other aggregators and search services, e.g. IRIScotland, PerX, ARROW
  • Other funding bodies

Requirements gathering

Methodology

  • Review conclusions from Eprints UK
  • Identify Issues with current use of simple DC
  • Review existing practice
  • Review existing or proposed application profiles
  • Discussion and input from the working group, feedback group, wider community
  • Gather/write scenarios and use cases

Conclusions from Eprints UK

The final report from the Eprints UK project contained a number of conclusions relevant to the current work (Final Report).

These included the following:

  • Technical barriers to successful aggregation of metadata from institutional repositories
    • issues with the quality of metadata
    • the consistency of metadata
    • the handling of complex objects
    • the lack of a common approach to linking to full text

"The project addressed these to some extent by proposing a Dublin Core eprints application profile. However, adoption of this profile could not be a priority for the FAIR programme as most projects of necessity concentrated on establishing and populating their archives."

  • Issues with the simple Dublin Core profile:
  • encoding the location and type of the full-text file
  • the meaning of each field can be interpreted in various different ways

"Simple DC is not targeted at describing eprints specifically so there is more to the description of an eprint than simple DC will allow. To get round these limitations of simple DC, some repositories try to put more information than necessary into the Dublin Core fields. This varying use of metadata can lead to difficulties for end-users who are trying to discover eprints across multiple repositories".

Recommendations from Eprints UK

  • There should be further investigation into the user requirements for resource discovery services built on institutional repositories. In particular, this work should explore how an aggregation of metadata from UK repositories would interoperate with other international collections.
  • More effort should be made to achieve widespread agreement with and adoption of the recommendations for using simple DC to describe eprints.
  • Repository software suppliers and the administrators of eprint archives should be encouraged to adopt the simple recommendations for linking from the 'jump-off' page to the full text of the eprint. Work should be funded at an international level to agree how best to model eprints as 'complex objects' (e.g. as works and manifestations) and how to encode such complex objects in XML (e.g. by using METS or MPEG-21 DIDL).
  • There should be more investigation into the issues associated with name authority control for eprints and in particular into how best to maintain and expose authoritative name-based services and how best to integrate such services into the eprint workflow.

Existing practice

Local practices can be seen by searching repositories, examples:

Existing or proposed application profiles

Vocabularies

OpenURL:

SWRC (Semantic Web for Research Communities)

Scenarios and Use case

Wherever possible, usage scenarios exist to support the requirements in the Functional Requirements Specification, as identified below.

Functional Requirements Specification

Based on the requirements gathering activities indicated above, the Eprints Application Profile must support, or make recommendations towards supporting the following requirements:

Richer metadata set

  • Requirement: Provide a richer set of metadata than is currently possible with simple DC, see Issues with current use of simple DC
  • Usage scenario: In current practice, harvesting or cross-searching multiple repositories is faced with a number of metadata-related issues. One major issue is that the 15 simple Dublin Core metadata elements do not offer the level of detail which describing eprints requires. A richer set of metadata would enable aggregators to offer services built upon repository metadata and content. (Richer metadata set)
  • Proposed solution: ePrints Application Profile proposes a set of richer metadata.

Consistent metadata

  • Requirement: Facilitate the creation and sharing of consistent metadata.
  • Usage scenario: An aggregator is setting up a search service whereby they can provide cross search, browse and filter capability across metadata harvested using OAI-PMH from a wide variety of repositories implemented on different software platforms. The aggregator wants to ensure that the data being harvested into the service is consistent, both in terms of the metadata elements used and the contents of those elements. At present, the use of simple Dublin Core has led to a situation where different repositories have used the loosely defined Dublin Core elements is different ways. (Consistent metadata for aggregator search service)
  • Proposed solution: ePrints Application Profile tailored to requirements and will offer guidance.

Preservation metadata approaches

  • Requirement: Be compatible with preservation metadata approaches.
  • Proposed solution: Representative from AHDS is included in the Working Group.

Library cataloguing approaches

  • Requirement: Be compatible with library cataloguing approaches
  • Proposed solution: Representative from the DCMI-Libraries Working Group is included in the Working Group.

Extensibility

  • Requirement: Support extensibility of the application profile for other types of material.
  • Usage scenario: The current scope of the Eprints Application Profile work has a narrow definition. If the Profile gains community acceptance, its users may wish to use the Profile for a wider range of materials types that fall under the broader remit of research output (e.g. raw research data, images, multimedia etc.). Other application profiles may exist for other data types, and it would also be beneficial to the community if the approaches taken by this and other application profiles were mutually supportive and could be successfully mapped. (Extensibility of application profile)
  • Proposed solution: Model could be extended through the use of the dc:type element.

Added-value services

  • Requirement: The Eprints Application Profile should be sustainable, extensible and robust enough to support future added-value services.
  • Usage scenario: The UK repositories search service would like to support a range of value-added services in the future. These might include citation analysis of work and/or expressions. (Added-value services)
  • Proposed solution: Model could be extended; recommendations for future developments will be proposed.

Identifying the full-text

  • Requirement: Implement an unambiguous method of identifying the full-text(s). For further details, see the conclusions from the Eprints UK project (quoted above).
  • Usage scenario: An aggregator harvests metadata records for eprints from a wide variety of repositories. In so doing, it finds that there is no common approach to linking to full-text(s) and therefore any automated means of providing links to the full-text(s) are unreliable. The aggregator needs an unambiguous means of identifying the locations of full-text(s). (Unambiguous identification of the full-text(s) of an eprint)
  • Proposed solution: Manifestation isAvailableAs element.

Identifying metadata-only records

  • Requirement: Enable identification of metadata-only records.
  • Usage scenario: A user of a repository or aggregation service finds an Expression of an Eprint that is of interest but can see clearly that there is only a metadata record for that Expression, i.e. there is no available copy. Alternative Expressions, for which Copies are available, are easily located and the user can navigate to those other Expressions easily. (Availability and alternatives)
  • Proposed solution: An empty Manifestation isAvailableAs element.

Version identification

  • Requirement: Offer a preliminary Recommendation to version identification issues. These include different revisions, statuses, translations and multiple formats.
  • Usage scenario: Repository x contains a range of 'versions' of a particular eprint, including several numbered drafts, some significantly different to each other and some bearing only minor revisions. It also contains the unrefereed author's manuscript as submitted to the journal and the refereed publisher's proof. In addition, the published version has been translated into Swedish and French, and these two translations are also available in the repository. An aggregator would like to be able to pick which 'versions' it harvests metadata for (e.g. drafts might be considered out of scope). An aggregator search service would like to make clear, within its search results or browse tree, that these versions are associated with a single 'work'. It also wants to be able to present clear information about the differences between them. (The versions question)
  • Proposed solution: The Model groups different 'versions' together; translations and revisions are identified by the Has Version and Has Translation elements; additional versioning elements included (Version Number or String, Description); the Profile contains a space for additional Status values to identify differences between versions. JISC VERSIONS project are represented on the Working Group.

Navigation between versions

  • Requirement: Support navigation between different 'versions' of the same eprint
  • Usage scenario:A user of a repository or aggregation service finds an Expression of an Eprint that is of interest but can see clearly that there is only a metadata record for that Expression, i.e. there is no available copy. Alternative Expressions, for which Copies are available, are easily located and the user can navigate to those other Expressions easily. (Availability and alternatives)
  • Proposed solution: The Model facilitates this by grouping different 'versions' together.

Most appropriate copy

  • Requirement: Support identification of the most appropriate or latest Copy of a discovered version
  • Usage scenario: When harvesting from a repository, the UK repositories search service wants to ensure that it harvests the metadata record for the latest and most appropriate 'version(s)' of a particular eprint and to ensure that its metadata always points to the latest or most appropriate version(s). (The latest version)
  • Proposed solution: Date Modified element.

Fielded searching

  • Requirement: Support search of any, or all, elements, particularly of title, author, description, keyword.
  • Usage scenario: Simple DC does not contain an element for journal, conference or publication name. The UK repositories search service wants to offer its users the facility to search, browse or filter by this element. In addition, they also want a citation that can be used in the context of OpenURLs and OpenURL resolvers. (Search or browse by journal, conference or publication title)
  • Usage scenario: An aggregator wants to offer advanced search, browse and filtering capabilities for a wide range of metadata elements. It has found that the different users of its service have different requirements, some needing only the simplest of searches, but others requiring much more refined search, browse and filtering capabilities. Machine-to-machine cross-searches also need to interrogate specific metadata elements. For journal publication, peer review is an established process which bestows certain assurances about the authority of a piece of work. Filtering by peer review status is a particular requirement for some searchers. (Search, browse and filter by any element)
  • Proposed solution: Richer element set facilitates this.

Browse by any element

  • Requirement: Support browse by any element, as required. This does not including browsing of description or identifier elements, but may include browse by keyword, author, date, publisher, journal, publication, conference, book, series name and originating repository / institution.
  • Usage scenario: Simple DC does not contain an element for journal, conference or publication name. The UK repositories search service wants to offer its users the facility to search, browse or filter by this element. In addition, they also want a citation that can be used in the context of OpenURLs and OpenURL resolvers. (Search or browse by journal, conference or publication title)
  • Usage scenario: An aggregator wants to offer advanced search, browse and filtering capabilities for a wide range of metadata elements. It has found that the different users of its service have different requirements, some needing only the simplest of searches, but others requiring much more refined search, browse and filtering capabilities. Machine-to-machine cross-searches also need to interrogate specific metadata elements. For journal publication, peer review is an established process which bestows certain assurances about the authority of a piece of work. Filtering by peer review status is a particular requirement for some searchers. (Search, browse and filter by any element)
  • Proposed solution: Richer element set facilitates this.

Controlled vocabularies

  • Requirement: Support subject browse based on knowledge of controlled vocabulary
  • Usage scenario: Repository X classifies its resources using the Library of Congress Subject Headings. Repository Y uses MESH terms or it's biomedical resources. An aggregator would like to offer a subject browse facility based on the different vocabularies used. In order to do this it needs be able to establish which terms come from which vocabularies. (Browse by subject, using controlled vocabularies)
  • Proposed solution: Subject element allows for use of vocabulary encoding schemes.

Filtering of search results and browse tree

  • Requirement: Support filtering of search results and browse tree. For example, by type, publisher, date range, status and version
  • Usage scenario: An aggregator wants to offer advanced search, browse and filtering capabilities for a wide range of metadata elements. It has found that the different users of its service have different requirements, some needing only the simplest of searches, but others requiring much more refined search, browse and filtering capabilities. Machine-to-machine cross-searches also need to interrogate specific metadata elements. For journal publication, peer review is an established process which bestows certain assurances about the authority of a piece of work. Filtering by peer review status is a particular requirement for some searchers. (Search, browse and filter by any element)
  • Proposed solution: Richer element set facilitates this.

Identifying available copies

  • Requirement: Enable movement from search results and browse tree to available copies
  • Usage scenario: A user has completed searching or browsing and has identified which items in the list are of interest. The results are displayed in such a way that they can move easily from a basic listing to find out precisely what full-text(s) are available. (Easy identification of all available copies)
  • Proposed solution: Profiles captures all necessary information to facilitate this.

Filter by format

  • Requirement: Support filtering of available copies by format.
  • Proposed solution: Format element.

OpenURL

  • Requirement: Enable movement from search results and browse tree to OpenURL link server. The Profile should be suitable for use in the context of OpenURLs and OpenURL resolvers i.e. support navigation/discovery of particular version of an eprint (e.g. most recent version of the Author's Original) and navigation/discovery of most appropriate copy of discovered 'version'.
  • Usage scenario: The UK repositories search service would like to support a range of value-added services in the future. These might include citation analysis of work and/or expressions. (Added-value services)
  • Proposed solution: Bibliographic Citation element.

Citation analysis

  • Requirement: Support citation analysis between expressions
  • Proposed solution: Bibliographic Citation element for each Expression.

Dublin Core Citations WG

  • Requirement: Be compatible with dc-citation WG recommendations
  • Action: DCMI Citation Working Group representative on Working Group.

Names and name authority

  • Requirement: Provide for an authoritative form of Agent names, to include personal names (authors) and corporate names (publishers, funders).
  • Usage scenario: Having harvested a number of repositories, an aggregator is faced with a number of authors with similar names. It is difficult to disambiguate these names without some additional analysis or information from other sources. If an authority name form was used, the author would be unambiguously identified. (Authority names)
  • Proposed solution: Agent entity.

Statement of responsibility

  • Requirement: Enable the author name, as it appears on an eprint, to be captured.
  • Proposed solution: Creator element.


Provenance

Research funder and project code

  • Requirement: Enable identification of the research funder and project code
  • Usage scenario: A research funder has mandated deposit of materials into repositories. In order to check this using automated means, a repository must offer details of the funder and project code for works associated with a particular funded piece of research. (Identify funder and project code)
  • Proposed solution: Funder and Grant Number elements.

Affiliation

  • Requirement: Enable identification of affiliation of an eprint
  • Usage scenario: For its RAE return, or other purposes, a department/institution wants to be able to easily browse a list of eprints submitted by authors affiliated with that department/institution. To do this, it would ideally use a service that could aggregate this information from many sources. This list should contain both eprints deposited in the department's/institution's own repository and those deposited in other repositories. Deposit in other repositories might happen because an author has decided to use a subject repository, or because the eprint was deposited by another of the co-authors into their departmental/institutional repository. If an author moves department, eprints created whilst in the employ of a particular department should maintain information identifying that department/institution, rather than the department/institution to which an author has moved. (Identifying institutional affiliation and Identifying departmental affiliation)
  • Proposed solution: Affiliated Institution element.
  • Note: The above usage scenarios, and the Profile, do not allow the associated between the author and institution/department to be maintained over time. For example, if an eprint has multiple authors (x, y and z) from different institutions (a, b and c) and author x, moves from institution a to institution g, the metadata specified by the application profile contains information about their current institution g (Agent) and it would know that the eprint is affiliated with institution a (and not with g), but it can no longer tie author x to institution a.
  • Note: The Profile currently includes Institution only and does not go down to School/Department level; department/school out of current scope.

Copy is made available by

  • Requirement: Enable identification of the repository or other service making available the copy of an eprint.
  • Usage scenario: A user, or service, wishes to discover what other copies, or other services, a repository offers. (Copy is made available by)
  • Proposed solution: Is Part Of element.

Metadata is made available by

  • Requirement: Enable identification of the repository or other service making available the metadata about an eprint.
  • Usage scenario: A user, or service, wishes to discover what other metadata records, or other services, a repository offers. (Metadata is made available by)
  • Proposed solution: Out of scope. This information is considered administrative and would be included in the OAI-PMH response.

Intermediate hosts of a metadata record

  • Requirement: Enable identification of the intermediate hosts of a metadata record.
  • Usage scenario: In most cases, when an aggregator harvests from a repository, there is an implicit trust in the metadata being harvested. In cases where records are transmitted across intermediate hosts, each host in the chain must be trusted to accurately reproduce the record. An aggregator at one end of the chain wants to be able to identify the hosts through which the record has passed. If these intermediate hosts have altered the metadata record, this should be made apparent. (Intermediate hosts)
  • Proposed solution: Outside current scope.

Human- and machine-generated metadata

  • Requirement: Enable the distinction between human-generated and machine-generated metadata to be maintained, particularly of keywords.
  • Usage scenario: [supplied by Emma Tonkin] Machine-generated metadata will have characteristics that reflect its origin (as of course will human-generated values). Providing a marker would allow potential issues to be sidestepped as and when they occur. As an example, say someone implements a keyword extraction mechanism that performs badly on certain categories of document, pulling out irrelevant keywords. The record as a whole remains useful, but it may be that the keywords should be weighted appropriately according to the level of confidence that the metadata consumer has in the accuracy of the information in the current context, which implies providing sufficient information to inform that decision. (Human- and machine- generated metadata)
  • Action: Outside current scope.

Publication title

  • Requirement: Support disambiguation of publication title
  • Usage scenario: Simple DC does not contain an element for journal, conference or publication name. The UK repositories search service wants to offer its users the facility to search, browse or filter by this element. In addition, they also want a citation that can be used in the context of OpenURLs and OpenURL resolvers. (Search or browse by journal, conference or publication title)
  • Proposed solution: Bibliographic Citation element contains Journal title.

Multiple titles

  • Requirement: Support title changes between expressions and the main Eprint (Scholarly Work)
  • Usage scenario: Several Expressions of the same work exist, with slight changes to the title. In addition, a translation is available, with a translated title. Multiple titles cause confusion to aggregating services and their users. A single 'main title' should be provided for use in search results and browse trees. Alternative titles should be included with the metadata record. (Multiple titles)
  • Proposed solution: Profile allows for different titles.

Rights

Access restrictions

  • Requirement: Facilitate identification of open access materials
  • Usage scenario: A user of a repository aggregation service wishes to know whether a particular Copy is open access, or subject to some restrictions on access rights. The aggregator can provide this information easily and unambiguously. (Open access, or not)
  • Proposed solution: Access Rights and Licence elements.

Copyright holders

  • Requirement: Enable identification of copyright holders of different expressions
  • Usage scenario: The copyright holder of different Expressions may differ, particularly in cases where an author has signed over copyright to the publisher of a particular Expression. Different publishers have different policies on deposit into repositories. A service exists that holds information about these policies. Having established who owns the copyright, an aggregator may wish to use information about copyright holder to interrogate such a service and provide additional information about deposit policies. (Identifying copyright holders)
  • Proposed solution: Copyright Holder element.

Dates

  • Requirements:
    • Date available - necessary to establish when a piece of work, or a particular copy, was/will be made publicly available (dcterms:available)
    • Date of modification of a copy - necessary to identify the latest version (dcterms:modified)
    • Date of formal publication (dcterms:issued) - forms part of the bibliographic citation (dc:bibliographicCitation)
  • Proposed solution: Elements included for the above dates.
  • Consider requirements:
    • Date copyrighted - can supplement the date available information (dcterms:dateCopyrighted)
    • Date created - can help identify older material which has only recently been made publicly available (dcterms:created)
  • Not Required:
    • Validity periods (dcterms:valid)
    • Dates of submission to / acceptance by publisher/conference etc. (dcterms:dateSubmitted, dcterms:dateAccepted)
    • Dates of submission of theses/dissertations (dcterms:dateSubmitted) [out of scope for current work]
    • Date captured - could be used for digitized versions (dcterms:dateCaptured)
    • Date of deposit [out of scope - administrative metadata]

Translated abstract

  • Requirement: Support the capture of multiple language versions of an abstract, for translations.
  • Usage scenario: An eprint and its abstract are deposited into a repository. At a later point a French translation is added to the repository, with a French abstract. (Translated abstract)
  • Proposed solution: Use of a language attribute for abstract.

Embargo periods

A Use case has been prepared outlining some of the issues from the scenarios listed: Application profile for eprints used by UK repositories search service