The Open Archives Initiative (OAI) | Top |
Recap of the main ideas of OAI
Basic functioning of OAI-PMH
OAI: general assumptions | Top |
There are two groups of 'participants': Data Providers and Service Providers.
Data Providers (open archives, repositories) provide free access to metadata, and may, but do not necessarily, offer free access to full texts or other resources. OAI-PMH provides an easy to implement, low barrier solution for Data Providers.
Service Providers use the OAI interfaces of the Data Providers to harvest and store metadata. Note that this means that there are no live search requests to the Data Providers; rather, services are based on the harvested data via OAI-PMH. Service Providers may select certain subsets from Data Providers (e.g., by set hierarchy or date stamp). Service Providers offer (value-added) services on the basis of the metadata harvested, and they may enrich the harvested metadata in order to do so.
OAI-PMH: overview and structure model | Top |
The OAI-PMH protocol is based on HTTP. Request arguments are issued as GET
or POST parameters. OAI-PMH supports six request types
(known as "verbs"), e.g.,
http://archive.org?verb=ListRecords&from=2002-11-01.
Responses are encoded in XML syntax. OAI-PMH supports any metadata format encoded in XML. Dublin Core is the minimal format specified for basic interoperability.
Error messages are HTTP-based.
Data Providers may define a logical set hierarchy to support levels of granularity for harvesting by Service Providers. Date stamps flag the last change of the metadata set, and thus provide further support for granularity of harvesting.
OAI-PMH supports flow control.
Seven key definitions | Top |
Harvester:
client application issuing OAI-PMH requests
Repository:
network accessible server, able to process OAI-PMH requests correctly
Resource:
object the metadata is "about", nature of resources is not defined
in the OAI-PMH resources may be digital or non-digital
Item:
component of an repository from which metadata about a resource can be disseminated;
has an unique identifier
Record:
metadata in a specific metadata format
Identifier:
unique key for an item in a repository
Set:
optional construct for grouping items in a repository
Protocol details | Top |
A record is the metadata of a resource in a specific format. A record has three parts: a header and metadata, both of which are mandatory, and an optional about statement. Each of these is made up of various components as set out below.
header (mandatory)
identifier (mandatory: 1 only)
datestamp (mandatory: 1 only)
setSpec elements (optional: 0, 1 or
more)
status attribute for deleted item
metadata (mandatory)
XML encoded metadata with root tag, namespace
repositories must support Dublin Core, may support other
formats
about (optional)
rights statements
provenance statements
A datestamp is the date of last modification of a metadata record. Datestamp
is a mandatory characteristic of every item. It has two possible levels of granularity:
YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ.
The function of the datestamp is to provide information on metadata that enables selective harvesting using from and until arguments. Its applications are in incremental update mechanisms. It gives either the date of creation, last modification, or deletion. Deletion is covered with three support levels: no, persistent, transient.
OAI-PMH supports dissemination of multiple metadata formats from a repository.
The properties of metadata formats are:
id string to specify the format (metadataPrefix)
metadata schema URL (XML schema to test
validity)
XML namespace URI (global identifier for
metadata format)
Repositories must be able to disseminate unqualified Dublin Core. Further arbitrary metadata formats can be defined and transported via the OAI-PMH. Any returned metadata must comply with an XML namespace specification. The Dublin Core Metadata Element Set contains 15 elements. All elements are optional, and all elements may be repeated.
The Dublin Core Metadata Element Set:
Title | Contributor | Source |
Creator | Date | Language |
Subject | Type | Relation |
Description | Format | Coverage |
Publisher | Identifier | Rights |
Sets enable a logical partitioning of repositories. They are optional archives do not have to define Sets. There are no recommendations for the implementation of Sets. Sets are not necessarily exhaustive of the content of a repository. They are not necessarily strictly hierarchical. It is important and necessary to have negotiated agreements within communities defining useful sets for the communities.
Requests must be submitted using the GET or POST methods of HTTP, and repositories must support both methods. At least one key=value pair: verb=RequestType (where RequestType is some type of request such as ListRecords) must be provided. Additional key=value pairs depend on the request type.
example for GET request: http://archive.org/oai?
verb=ListRecords&metadataPrefix=oai_dc
The encoding of special characters must be supported; for example, ":" (host port separator) becomes "%3A"
Responses are formatted as HTTP responses. The content type must be text/xml. HTTP-based status codes, as distinguished from OAI-PMH errors, such as 302 (redirect) and 503 (service not available) may be returned. Compression codes are optional in OAI-PMH, only identity encoding is mandatory. The response format must be well-formed XML with markup as follows:
Four of the request types return a list of entries. Three of them may reply with 'large' lists.
OAI-PMH supports partitioning. Those managing a repository make the decisions on partitioning: whether to partition and how.
The response to a request includes:
incomplete list
resumption token
+ expiration date,
size of complete list,
cursor (optional)
For a new request with same request type:
resumption token as parameter
all other parameters omitted!
The response includes the next (which may be the last) section of the list and a resumption token. That resumption token is empty if the last section of the list is enclosed.
Repositories must indicate OAI-PMH errors by the inclusion of one or more error elements. The defined error identifiers are:
badArgument
badResumptionToken
badVerb
cannotDisseminateFormat
idDoesNotExist
noRecordsMatch
noMetaDataFormats
noSetHierarch
Request types | Top |
There are six different request types:
Identify
ListMetadataFormats
ListSets
ListIdentifiers
ListRecords
GetRecord
A harvester is not required to use all types. However, a repository must implement all types. There are required and optional arguments, depending on request types. Each request type will now be described.
function
description of an archive
example
archive.org/oai-script?verb=Identify
parameters
none
errors / exceptions
badArgument (e.g. archive.org/oai-script?verb=Identify&set=biology)
response format
Element | Example | Ordinality |
repositoryName | My Archive | 1 |
baseURL | http://archive.org/oai | 1 |
protocolVersion | 2.0 | 1 |
earliestDatestamp | 1999-01-01 | 1 |
deleteRecords | no, transient, persistent | 1 |
granularity | YYY-MM-DD, YYYY-MM-DDThh:mm:ssZ | 1 |
adminEmail | oai-admin@archive.org | + |
compression | deflate, compress | * |
description | oai-identifier, eprints, friends, | * |
Ordinality: 1 = mandatory, 1 only; + = mandatory, 1 only; * = optional, 0 or more
function
retrieve available metadata formats from archive
example
archive.org/oai-script?verb=ListMetadataFormats&
identifier=oai:HUBerlin.de:3000218
parameters
identifier (optional)
errors / exceptions
badArgument
idDoesNotExist
e.g. archive.org/oai-script?verb=ListMetadataFormats
&identifier=really-wrong-identifier
noMetadataFormats
function
retrieve set structure of a repository
example
archive.org/oai-script?verb=ListSets
parameters
resumptionToken (exclusive)
errors / exceptions
badArgument
badResumptionToken
e.g. archive.org/oai-script?verb=ListSets
&resumptionToken=any-wrong-token
noSetHierarchy
function
abbreviated form of ListRecords, retrieving
only headers
example
archive.org/oai-script?verb=ListIdentifiers&
metadataPrefix=oai_dc&from=2002-12-01
parameters
from (optional)
until (optional)
metadataPrefix (required)
set (optional)
resumptionToken
(exclusive)
errors / exceptions
badArgument (e.g. ?&from=2002-12-01-13:45:00)
badResumptionToken
cannotDisseminateFormat
noRecordsMatch
noSetHierarchy
function
harvest records from a repository
example
archive.org/oai-script?verb=ListRecords&
metadataPrefix=oai_dc&set=biology
parameters
from (optional)
until (optional)
metadataPrefix (required)
set (optional)
resumptionToken
(exclusive)
errors / exceptions
badArgument
badResumptionToken
cannotDisseminateFormat
noRecordsMatch
noSetHierarchy
function
retrieve individual metadata record from a repository
example
archive.org/oai-script?verb=GetRecord&
identifier=oai:HUBerlin.de:3000218&
metadataPrefix=oai_dc
parameters
identifier (required)
metadataPrefix (required)
errors / exceptions
badArgument
cannotDisseminateFormat
idDoesNotExist
Example 1: response to ListIdentifiers request | Top |
This example shows the response to a ListIdentifiers request that specifies a date range, a metadata format, a set, and a Data Provider.
Example 2: response to GetRecord request | Top |
This example shows a response to a GetRecord request for an individual record specified by identifier.
Sources of further information | Top |
-- Web sites and email lists --
Open Archives Initiative (OAI)
official site
http://www.openarchives.org/
OAI-PMH protocol specification
http://www.openarchives.org/OAI/openarchivesprotocol.htmlOAI general mailing list
http://www.openarchives.org/mailman/listinfo/OAI-general/
OA-Forum expert reports and reviews
of organisational and technical issues
Links from http://www.oaforum.org/documents/
Dublin Core
http://dublincore.org/
Copyright © 2003 University of Bath. All rights reserved.
Author: Leona Carpenter (co-ordinating author) for OA-Forum and UKOLN |
Last modified: 14 Oct 2003 16:36 Authored in CALnet |