MetadataROADS for Web Site Metadata Management |
[This document is a part of DC-ROADS: ROADS as a (Dublin Core) Metadata Management Environment ]
ROADS is typically used to store third party metadata maintained by subject gateways. This document describes a technique that enables information providers to use a ROADS database as the primary source of metadata for their resources.
Metadata is stored in a ROADS database, as each web page is requested a server side include (SSI) directive retrieves the appropriate record from the ROADS database and converts it to RDF/XML (RDF metadata encoded in XML) to be embedded in the HEAD of the document.
Storing the metadata for a web site in a ROADS database has the following advantages:
Ease of Maintenance - Creating and maintaining embedded metadata is an error-prone and time-consuming task. Using a ROADS to store metadata allows the ROADS template-editor (or other ROADS compliant tools) to be used to generate metadata.
Metadata for non-HTML Resources - The traditional approach of embedding metadata in web pages does not support the storage of metadata for non-HTML resources such as Microsoft Word and PDF documents and images. A ROADS database allows metadata to be associated with any resource.
Flexibility of Presentation - Rather than storing metadata in a single fixed format a (Perl) script can be used to retrieve the metadata and convert it to a format suitable for a particular purpose. For example, RDF/XML or META tags for embedding into web pages and HTML code for a viewable representation.
Search Engine - If metadata is already stored in a ROADS database then a search engine can be added to a web site with relatively little effort using the ROADS toolkit.
Subject Gateway Interoperability - Subject gateways (or simply other related sites) using ROADS or other compatible technologies can cross-search your site with little effort.
Storage of metadata within a ROADS database is straightforward, the difficulty is in associating the metadata with web pages and dynamically extracting, converting and embedding that metadata. This document describes a technique that uses a server side include (SSI) directive embedded in each web page to call a Perl script to dynamically retrieve the appropriate record from the ROADS database and converts it to RDF/XML (or other formats if requested).
The eLib web site hosted by UKOLN provides a demonstration of the approach described in this document.
The approach described here requires an installation of the ROADS software. See [1] and [2] for details.
You may chose to use the standard ROADS document template to store your metadata. In the demonstration system (the eLib website) a Dublin Core template type was used. Creating a new template type is not recommended unless you have a very good reason for doing so.
If you wish to convert an existing web site over to this approach then you will need to populate a ROADS database with the appropriate metadata. There are a number of approaches for doing this including:
Automatically convert existing metadata. Obviously this will depend on the format of the existing data. It is likely to be embedded in web pages, if it should be possible to extract it and add it to a ROADS database automatically. Discussion of such approaches is beyond the scope of this document.
Automatically generate metadata. If there is no preexisting metadata for your web site you may consider using or writing tools to extract it automatically. Metadata harvesters exist for this purpose.
Manually enter metadata. This can be done using the ROADS template editor.
It may be that the best approach is a combination of all three techniques, using existing metadata where it exists, generating metadata where it does not, and finally checking and enhancing metadata manually. This three-pronged approach was used to generate metadata for the eLib web site.
DC-dot [3] will extract metadata from existing resources where it exists and will also generate metadata; the resulting metadata can then be edited and saved in a number of formats including as a ROADS DOCUMENT template. DC-dot could be used to interactively generate and edit metadata for each web page on an existing site.
Obviously metadata will need to be maintained and new metadata will need to be added as the site evolves; a strategy for this process should also be put in place.
Next it is necessary to decide how to associate the appropriate ROADS record with each web page; the script called to embed the metadata (see below) must have enough information to be able to extract the correct metadata from the ROADS database.
Note that it it not possible to retrieve ROADS records by URI in current versions of ROADS for technical reasons.
The initial solution chosen for the eLib web site was to embed ROADS handles into web pages using a server side include variable. This value could then be passed to the script that extracts the metadata from the ROADS database.
Although this solution works it does require ROADS handles to be inserted into every web page. This can be automated but it is still not ideal.
A separate document [4] describes how to extend ROADS so that it is possible to retrieve records by URI. With this mechanism in place it is possible to extract the URI of the current document from the environment and pass this to the script that extracts metadata. There is no need to embed any additional information in each web page.
The mechanism for extracting metadata depends on the identifier (handle or URI) made available in the previous step. Since there are alternatives a goal of turning the metadata into a Metadata::Base object (to be manipulated in the next step) was defined.
If the identifier is a handle and the ROADS database containing the metadata has a correspondance between filenames and handles then a simple approach is to simply read the metadata from file. This can be done using the Metadata::IAFA perl module [5] to create a metadata object.
With a handle or a URI the ROADS database could be accessed via WHOIS++. A script has been developed to turn the results of a ROADS wppc request (using the WPPC ROADS module) into a Metadata::Base object.
Whichever approach is taken the result is an object that can be manipulated via the Metadata::Base interface.
The next step is to be able get from metadata in a Metadata::Base object to metadata in a format that can be embedded in web pages. In practice this means either META tags or HTML-compliant RDF/XML (a representation of RDF metadata in XML that can be embedded into an HTML page).
The latter approach was taken for the eLib web site. Since the metadata used was Dublin Core this meant representing Dublin Core metadata in RDF/XML. Since we only use the simple version of Dublin Core we did not need to deal with the representation of qualifiers; the scripts may be enhanced to support Dublin Core v2 in the future.
A script - formatmetadata - was developed to output a Metadata::Base object in an embeddable format. Some of the conversion is specific to Dublin Core, e.g. treatment of the Identifier attribute(s). Such specialisations were factored out to create a script that could be configured for other template types such as ROADS document. The result is a script that takes a ROADS handle and generates RDF/XML suitable for embedding in a web page. Actually, multiple output formats are supported: RDF/XML (both with and without attributes), HTML META tags and viewable HTML. Again, the script can be configured to support other formats.
Note that an alternative to embedding metadata directly into web pages would be to link to a document containing RDF/XML.
The example discussed in this document is the eLib web site. To see the embedded metadata go to a page on the eLib web site and view the html source.
A more human-friendly version of the metadata, also based on metadata dynamically extracted from the ROADS database, can be seen by clicking the 'DC Metadata' link in the footer of each page.
Scripts to extract metadata from a ROADS database into a metadata object and to convert a metadata object to RDF/XML (and other formats) will be available here shortly.
[<A HREF="http://www.ukoln.ac.uk/metadata/">Metadata</A>] [<A HREF="http://www.ukoln.ac.uk/">UKOLN</A>]