Basics of XML schemas for OAI-PMH | Top |
OAI-PMH uses XML Schemas to define record formats. You can exchange any metadata you like using OAI-PMH as long as you can encode it as XML and define an XML Schema for it. OAI-PMH mandates the oai_dc schema as a minimum standard for interoperability.
OAI-PMH documentation also describes the use of XML schema for other formats, and provides additional XML schemas for:
Closer look at oai_dc, the mandated XML schema for OAI-PMH | Top |
oai_dc is the simple metadata schema (based on unqualified Dublin Core) used as the mandatory ?Lowest Common Denominator? metadata record format in OAI-PMH. It defines a container schema that is OAI-specific, and is hosted on the OAI Web site. It imports a generic DCMES (DC Metadata Element Set) schema. The generic DCMES schema is hosted on the DCMI (Dublin Core Metadata Initiative) Web site.
The same model could be used for a qualified Dublin Core schema; that is, a container schema hosted by OAI and referencing the generic schema hosted by DCMI.
oai_dc an example from a record
This is an example oai_dc record, as viewed via the Repository Explorer, showing the beginning of a full GetRecord response.
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2003-03-15T16:16:51+01:00</responseDate>
<request verb="GetRecord" metadataPrefix="oai_dc"
identifier="oai:HUBerlin.de:3000476">http://edoc.hu-berlin.de/OAI-2.0</request>
<GetRecord>
<record>
<header>
<identifier>oai:HUBerlin.de:3000476</identifier>
<datestamp>1997-07-18</datestamp>
<setSpec>pub-type</setSpec>
</header>
<metadata>
<oai_dc:dc
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Melanchthon in seiner Zeit. In: Philipp Melanchthon
1497-1997</dc:title>
<dc:creator>Selge, Kurt-Victor</dc:creator>
...
Three important things to notice picked out above:
The namespace for the oai_dc format:
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
The namespace for DCMES elements:
xmlns:dc="http://purl.org/dc/elements/1.1/"
The container schema associated with the oai_dc namespace:
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
Thus, the oai_dc container schema for the http://www.openarchives.org/OAI/2.0/oai_dc/ namespace imports the DCMES schema from http://dublincore.org/schemas/xmls/simpledc20021212.xsd. It also defines a container element called 'dc' that lists the elements within the 'dc' container (from the DCMES namespace / schema) that are allowed in oai_dc.
Other metadata schemas may be used | Top |
oai_dc is a simple format providing baseline interoperability. There are a number of reasons why it may not be suitable for your repository, service or community to share only oai_dc.
Adding new elements when oai_dc is not enough | Top |
Creating a new schema by extending the oai_dc schema to add new elements involves the following tasks:
Next, we'll use a simple scenario to demonstrate these eight tasks step-by-step. Suppose we have a test repository containing some photos:
http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/oai/nph-oai2.cgi
Currently the repository has metadata using oai_dc. We want to add an "Equipment Used" element, as this is not part of the DCMES.
The new metadata format needs a name. In this case, we'll choose the name "wp_dc" - following OAI's naming of "oai_dc" as a convention. (The two-letter code, 'wp', is short for 'workshop photos'.) However, the name could be anything you like. In this case alternative possibilities would be, for example, wpdc or WP
We need two namespaces:
Namespaces are declared as URIs. We will use:
Note that the use of PURL for the elements namespace follows DCMI usage, but is not mandatory. However, both these namespace URIs should be under your control to ensure uniqueness and prevent re-use in the future. Namespace URIs do not need to resolve to anything.
Next, we must create an XML schema for the new term. We will do this at:
http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/wp_dc/20030818/wpterms.xsd
Notice the datestamp built in to the directory structure. This makes it easier to enhance the schema without breaking things that use the old one.
The schema for the new term defines the new element "equipmentUsed" and adds it to the dc:any group. It also defines a new container type "wpterms:elementContainer".
We must also create a container schema for the wp_dc record format. In this case the schema is available at:
http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/wp_dc/20030818/wp_dc.xsd
(Note again the use of a date stamp incorporated in the directory structure.) This simply imports the wpterms schema and then defines a container element 'wp_dc' of type wpterms:elementContainer.
In order to validate the records using our new schema, we next create some test records (or modify our existing ones) including all the elements we want to use. For ease of managing our validation process, we put these in a datestamped directory and use a meaningful file naming convention, such as
http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/oai/wp_dc/20030818/test.xml
Now we can validate the records and schema with the XML schema validator at
http://www.w3.org/2001/03/webdata/xsv/
The OAI-PMH verb ListMetadataFormats needs an awareness of the new format. Therefore, we need to modify our repository software (source code and/or configuration files) to support the new metadata format. We do this by adding information about the new format to our repository's response to the 'ListMetadataFormats' request.For example:
...
<metadataFormat>
<metadataPrefix>wp_dc</metadataPrefix>
<schema>http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/wp_dc/20030818/wp_dc.xsd</schema>
<metadataNamespace>http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/wp_dc/</metadataNamespace>
</metadataFormat>
...
We also need to ensure that the "wp_dc" format is available using the:
verbs. To do this, we must modify our repository's response to these verbs. Accept "MetadataPrefix" must be set to the new format name "wp_dc". Responses to requests will then return the appropriate records formatted according to the new schema when that is requested by a Service Provider.
Finally, we use the Repository Explorer to test the new format. To do so, enter the following URL to the OAI interface of the repository
http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/oai/nph-oai2.cgi
We must test to ensure that:
Once all these conditions are met, we have a new format!
When you want to use another metadata format | Top |
You can take a similar approach with other metadata record formats. In the case of IMS/IEEE LOM and ODRL, XML schemas and namespaces have already been agreed. Deployment of these formats should be easier because you don't need to define you own schemas. However, XML schema specs are continually undergoing revisions at the time of preparing this tutorial, so sometimes it is difficult for applications like IMS to keep up with the changes.
Implementing an existing format | Top |
To implement an existing metadata format, modify the ?ListMetadataFormats? response to include the format you wish to support. For example, for IMS:
...
<metadataFormat>
<metadataPrefix>ims</metadataPrefix>
<schema>http://www.imsglobal.org/xsd/imsmd_v1p2p2.xsd</schema>
<metadataNamespace>http://www.imsglobal.org/xsd/imsmd_v1p2</metadataNamespace>
</metadataFormat>
...
Extend the other verbs (ListSets, ListIdentifiers, ListRecords, and GetRecord requests) to accept the 'metadataPrefix' set to 'ims' and return records formatted appropriately.
Summary | Top |
OAI-PMH allows for any metadata format, so long as it is encoded in XML with an XML Schema. All repositories must support oai_dc for a minimum level of interoperability. If oai_dc does not have enough elements, you can extend it. If oai_dc is not precise enough, a qualified Dublin Core schema can be used. If oai_dc is not the right schema for your community or purpose, then use something else as well.
Seven key definitions | Top |
PURL
A PURL is a Persistent Uniform Resource Locator. Functionally a PURL is a URL.
However, instead of pointing directly to the location of an Internet resource,
a PURL points to an intermediate resolution service. The PURL resolution service
associates the PURL with the actual URL and returns that URL to the client.
The client can then complete the URL transaction in the normal fashion. In Web
parlance, this is a standard HTTP redirect.
(Definition quoted from PURL
at http://purl.org)
URI
URI is the acronym for Universal Resource Identifier. URIs are strings that
identify things on the Web. URIs are sometimes informally called URLs (Uniform
Resource Locators), although URLs are more limited than URIs. URIs are used
in a number of schemes, including the HTTP and FTP URI schemes.
XML namespace
An XML namespace is a collection of names, identified by a URI reference [RFC2396],
which are used in XML documents as element types and attribute names. XML namespaces
differ from the "namespaces" conventionally used in computing disciplines in
that the XML version has internal structure and is not, mathematically speaking,
a set.
(Definition quoted from W3CNamespaces
in XML at http://www.w3.org/TR/REC-xml-names/)
XML schemas
XML Schemas express shared vocabularies and allow machines to carry out rules
made by people. They provide a means for defining the structure, content and
semantics of XML documents.
(Definition quoted from W3C
Architecture DomainXML schema at http://www.w3.org/XML/Schema)
container
Containers are places in OAI-PMH responses where XML complying with any external
schema may be supplied. Containers are provided for extensibility and for community
specific enhancements. The OAI Implementation Guidelines lists the existing
optional containers and provides links to existing schemas.
DCMI (Dublin Core Metadata Initiative)
The Dublin Core Metadata Initiative is an open forum engaged in the development
of interoperable online metadata standards that support a broad range of purposes
and business models. DCMI's activities include consensus-driven working groups,
global workshops, conferences, standards liaison, and educational efforts to
promote widespread acceptance of metadata standards and practices.
(Definition quoted from Dublin Core Metadata
Initiative at http://dublincore.org/)
DCMES (Dublin Core Metadata Element Set)
The Dublin Core metadata element set is a standard for cross-domain information
resource description. Here an information resource is defined to be "anything
that has identity". This is the definition used in Internet RFC 2396, "Uniform
Resource Identifiers (URI): Generic Syntax", by Tim Berners-Lee et al. There
are no fundamental restrictions to the types of resources to which Dublin Core
metadata can be assigned.
(Definition quoted from Dublin
Core Metadata InitiativeDublin Core Metadata Element Set, Version 1.1:
Reference Description at http://dublincore.org/documents/dces/)
Sources of further information | Top |
Dublin Core official site
http://dublincore.org/
DCMI term declarations represented in XML schema language
http://dublincore.org/schemas/xmls/Guidelines for implementing Dublin Core in XML
http://dublincore.org/documents/dc-xml-guidelines/
W3 Schools XML tutorials include, among others, the following:
W3 Schools XML tutorial
http://www.w3schools.com/xml/W3 Schools XML Schema Tutorial
http://www.w3schools.com/schema/
OAI official site
http://www.openarchives.org/
OAI-PMH protocol specification
http://www.openarchives.org/OAI/openarchivesprotocol.htmlOAI general mailing list
http://www.openarchives.org/mailman/listinfo/OAI-general/OAI implementers mailing list
http://www.openarchives.org/mailman/listinfo/OAI-implementers/
Copyright © 2003 University of Bath. All rights reserved.
Author: Leona Carpenter (co-ordinating author) for OA-Forum and UKOLN |
Last modified: 14 Oct 2003 16:36 Authored in CALnet |