FAQs

From DigiRepWiki

This is a list of frequently asked questions on digital repositories and related issues. It intended to by a work-in-progress so please add further information to the answers, or pose new questions at the end.

Contents


Are there any lists, directories or registries of digital repositories in the UK, or worldwide?

Based on questions posed to jiscmail mailing lists by Howard Noble and Alesia Zuccala, October 2005.

There are a number of directories and registries of repositories, including:

  • Registry of Open Access Repositories (ROAR) at the University of Southampton (http://archives.eprints.org/) The registry has two functions: "(1) to monitor overall growth in the number of eprint archives and (2) to maintain a list of GNU EPrints sites (the software Southampton University has designed to facilitate self-archiving)". The registry can be filtered by country, software (e.g. GNU eprints, DSpace, Bepress etc.) or content type. It is possible to register an archive. The registry uses OAI-PMH and is maintained by Tim Brody.
  • Directory of Open Access Repositories (OpenDOAR) (http://www.opendoar.org/) "A new service is starting development to support the rapidly emerging movement towards Open Access to research information. The new service, called OpenDOAR, will categorise and list the wide variety of Open Access research archives that have grown up around the world." Funded by the OSI, JISC and CURL and SPARCEurope.
  • The Open Archives Initiative (OAI) (http://www.openarchives.org/community/index.html) The OAI site maintains lists of data providers ("these participants administer systems that support the OAI protocol as a means of exposing metadata about the content in their repository" and service providers ("these participants issue OAI protocol requests to the systems of data providers and use the returned metadata as a basis for building value-added services"). Visitors to the site can register for inclusion in either list.
  • Experimental OAI Registry at UIUC (http://gita.grainger.uiuc.edu/registry/) The Grainger Engineering Library Information Center at University of Illinois at Urbana-Champaign has an experimental OAI registry. It collects Identify, ListSets, ListMetadataFormats, and sample records from OAI compliant repositories and makes this information searchable. An RSS feed lists new and modified repositories. To be added to the repository, email Tom Habing.
  • Metalist of Open Access Eprint Archives (http://opcit.eprints.org/explorearchives.shtml) This list is maintained by Steve Hitchcock. It contains links to lists of open access eprint archives, OAI archives and institutional repositories, plus links to gateways, open access journal archives and subject-discipline repositories. Last updated in June 2003, the author is aware that this list is rather out of date and would welcome updates.
  • SPARC Institutional Repository List (http://www.arl.org/sparc/repos/ir.html) A select list, "organized by country, includes repositories that are institutional in scope and that contain multiple document types. It excludes discipline-specific e-print servers and university repositories that contain only theses and dissertations".
  • Information Environment Service Registry (IESR) (http://www.iesr.ac.uk/) "A machine readable registry of electronic resources. The aim of the IESR is to make it easier for other applications to discover and use materials which will help their users' learning, teaching and research. The IESR describes electronic resources within the JISC Information Environment." The IESR holds information about: "collections of information resources, the associated services that provide access to the collections, and the parties (aka agents) that own the collections and/or administer the services" and "transactional services, ie. those that provide functionality other than access to a collection, and the parties that adminster them." It may include information about digital repositories as well as other types of collections and services. The IESR will work in collaboration with the OpenDOAR project (see above). To enable machine access to its data the IESR provides interfaces according to several standard protocols: Web, Z39.50, OAI-PMH and OpenURL; a SRW interface and investigation of UDDI are also planned.

A further list of sources is provided in the Internet Resources Newsletter issue 142.

--JulieAllinson 12:32, 28 October 2005 (BST)


Where can I find metadata mappings and crosswalks for difference metadata standards?

There are a range of metadata standards, for different types of resources, relevant to digital repositories. These include, but are not limited to:

Content packaging specifications:


A variety of mappings and crosswalk documents are available online. The following pages link to a range of these:

  • Mapping between metadata formats A useful starting point; this page is maintained by Michael Day at UKOLN. It contains links to a range of crosswalks between different metadata standards. Last updated 2002.
  • Understanding metadata (pdf) Published by NISO in 2004, this introductory guide to metadata covers Dublin Core, Text Encoding Initiative (TEI), METS, Metadata Object Description Schema (MODS), Encoded Archival Description (EAD), LOM, ONIX, Categories for the Description of Works of Art (CDWA), VRA Core Categories, MPEG, Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) and Data Documentation Initiative (DDI). It also includes links to a selection of metadata crosswalks.
  • All about crosswalks from OCLC covers the following metadata standards, other than Dublin Core and MARC: CanCore, GEM, LOM, ONIX and SCORM. It offers crosswalks for CanCore to SCORM; ONIX to MARC 21; LOM to Dublin Core; and GEM to MARC.
  • Metadata mappings (crosswalks) from the Getty covering museum-related metadata standards: CDWA, CCO, VRA 3.0 CoreCategories, MARC, Dublin Core, Object ID, FDA Guide, CIMI and EAD.

--JulieAllinson 09:41, 1 November 2005 (GMT)


How to facilitate Google crawling

Peter Suber (http://www.earlham.edu/~peters/hometoc.htm) and Google and have put together a set of tips to help configure open-access scholarly repositories for full-text Google crawling.

--JulieAllinson 13:30, 7 December 2005 (GMT)


Can you provide any advice about unique identifiers for teaching, learning and research materials?

Unique identifiers

Based on a question posed to a jiscmail mailing list by Howard Noble, December 2005; incorporating information supplied by Howard, Phil Barker, Lorna Campbell, Rachel Heery, Andy Powell and Amber Thomas.


Answer

When developing repository services, assigning unique identifiers is a fundamental issue. There are various things to 'identify' in the context of repositories: the digital object, datastreams, information package etc. There may be different criteria for the choice of identifier scheme for different things: metadata records, resource or 'work', different representations or 'manifestations' of the work (datastreams), content package (complex objects) etc.

For example DOIs may be appropriate for published scholarly journal articles whereas info URIs, or other schemes, may be appropriate for repository content packages.

Therefore, it is important from the outset to ask: 'what needs to be identified?'

The decision on the choice of identifier scheme might depend on:

  • what you are identifying (certain sorts of content are traditionally identified by certain schemes, various constituent items are assigned

identifiers by the repository)

  • whether the item being identified is part of a 'data flow' where other agencies, apart from the repository itself, will be involved (other

agencies having legacy commitments to certain identifier schemes)

  • what level of persistence is required for the identifier?


The JISC Information Environment Technical Standards Version 1.1 provide guidelines on identifiers and their resolution:

"Every significant item that is made available through a JISC IE network service should be assigned a URI [1] that is reasonably persistent. This means that item URIs should not be expected to break for a period of 10-15 years after they have first been used. For this reason, JISC IE service components should not hardcode file format, server technology, service organisational structure or other information that is likely to change over a 10-15 year period into item URIs. If items become unavailable during that period, then the URI should resolve to a Web page that explains why the item is no longer available and what actions the end-user can take to obtain a copy of the item or similar resources. Furthermore, item URIs should not contain end-user-specific information, i.e. all item URIs should work for all end-users (albeit allowing for appropriate authentication challenges to be inserted into the process by which the URI is resolved)."
"Resources that comprise a collection of items that are packaged together for management or exchange purposes should be packaged using the IMS Content Packaging Specification [2] if they are 'learning objects' (i.e. resources are primarily intended for use in a learning and teaching context and that have a specific pedagogic aim) or the Metadata Encoding & Transmission Standard (METS) [3]." (Andy Powell, JISC Information Environment Technical Standards Version 1.1 http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/standards/)
1. Naming and Addressing: URIs, URLs (http://www.w3.org/Addressing/)
2. IMS Content Packaging Specification (http://www.imsglobal.org/content/packaging/)
3. Metadata Encoding & Transmission Standard (METS) (http://www.loc.gov/standards/mets/)


Identifiers for learning objects

For learning objects, the UK LOM Core recommends the use of URI identifiers. From their guidelines:

  • "implementers may choose from a range of persistent, globally unique identifier schemata which include, but are not restricted to, URI, URN, PURL, Handle, DOI, POI, ISSN, ISBN, XRI.
  • In order to facilitate interoperability within distributed environments it is recommended that the chosen scheme is encoded in the form of a URI." (UK Learning Object Metadata Core. Draft 0.2, May 2004 http://www.cetis.ac.uk/profiles/uklomcore)

This document also recommends the following resources:

This set of guidelines looks at ARK, DOI, Handle, ISBN, ISSN, PURL, POI, SICI. They "do not reflect consensus within the elearning and IEEE LOM community".
A revised version of the above is also available, providing guidelines for Dublin Core metadata only (http://www.ukoln.ac.uk/metadata/dcmi/identifiers/).


CETIS (http://www.cetis.ac.uk) held an international meeting on identifiers in 2003:

Article about the meeting, with link to the final report.
Additional resources for the meeting, including
This discussion paper looks at issues and requirements for the identification of learning objects.


HTTP vs. non-HTTP URIs as identifiers

See:

Slides (http://indico.cern.ch/materialDisplay.py?contribId=7&sessionId=1&materialId=slides&confId=0514)

Persisent URIs

including the following presentation:


Schemes

Uniform Resource Identifier (URI) SCHEMES (http://www.iana.org/assignments/uri-schemes)
The official IANA Registry of URI Schemes, including the well-known ftp, http etc. and the new ‘info’ scheme.


INFO

info URI scheme (http://info-uri.info/)
"INFO URI solves problems with identifying information assets, including documents and terms from classification schemes. The scheme is a consistent and reliable way to represent and reference such standard identifiers as Library of Congress Control Numbers on the Web so that these identifiers can be "read" and understood by Web applications." (NISO press release http://www.niso.org/news/releases/pr-InfoURI-11-05.html)


PURL/POI

A proposal for PURL-based resource identifiers, developed to provide a relatively persistent identifier for resources described by metadata in OAI-compliant repositories. POI takes advantage of the requirement to assign OAI identifiers to OAI items disclosed through OAI-PMH, using this oai-identifier as the basis for a POI.

A practical implementation of this can be seen in the POI – URL lookup tool developed for the RDN/LTSN interoperability project (http://www.rdn.ac.uk/poi/).


DOI


Handle


USAGE


aDORE

Content Identifiers in the aDORe repository architecture are expressed as URIs. Digital Objects, or their constituent datastreams, may have identifiers associated with them before they are ingested into aDore, for example DOIs are often assigned to journal articles. For further information, see:

  • Herbert Van de Sompel, Jeroen Bekaert, Xiaoming Liu, Luda Balakireva and Thorsten Schwander. aDORe: A Modular, Standards-Based

Digital Object Repository. The Computer Journal Advance Access published, 24 June 2005 (http://comjnl.oxfordjournals.org/cgi/rapidpdf/bxh114v1.pdf)

"This paper describes the aDORe repository architecture designed and implemented for ingesting, storing, and accessing a vast collection of Digital Objects at the Research Library of the Los Alamos National Laboratory. The aDORe architecture is highly modular and standards-based." (Abstract)


CORDRA

The CORDRA (Content Object Repository Discovery and Registration/Resolution Architecture) (http://cordra.lsal.cmu.edu/) uses the Handle system. CORDRA™ documents (http://cordra.lsal.cmu.edu/cordra/docs/) include the following:

  • Daniel R. Rehak. The Appropriate Version Problem: Separating Learning Designs and Course Structures from Learning Object Versions, Variants and Copies. Draft, Version V1.00.20050205
"This report outlines requirements for representing and specifying content versions and variants in learning design and course representations. Adapting the FRBR model (Functional Requirements for Bibliographic References), it presents a model used to represent content versions."
  • CORDRA™ Identifiers. Draft specification, Version: V1.00.20050101
Describes the representation of CORDRA identifiers as Handles.
  • Encoding CORDRA™ Identifiers in URI Syntax. Draft Specification/Profile, Version: V1.00.20050101
Describes the mechanism used in CORDRA to encode a Handle within URI syntax.

The documents page notes that the above documents are will be deposited into the CORDRA document repository, at which time they will be assigned persistent identifiers.


eBank

eBank UK (http://www.ukoln.ac.uk/projects/ebank-uk/) is a project to investigate the issues surrounding provenance and the use and re-use of original data for research and learning purposes, and will result in the development of an eBank UK pilot service for the benefit of the HE and FE communities. eBank UK has selected DOIs.

Links to a selection of identifier schemes and identifier resolution services.


For further information about the functional entities for 'work', 'expression' and 'manifestation', see:

FRBR is used in the document by Daniel R. Rehak listed above (see the CORDRA information).

--JulieAllinson 13:19, 22 December 2005 (GMT)


Can you recommend workflow mapping software?

Workflows

Based on a question posed to the jisc-repositories@jiscmail.ac.uk list, January 2006.

Retrieved from "/repositories/digirep/index/FAQs.html"