NOF-digitise Technical Advisory Service 2001 - 2004 archive |
|
Frequently Asked QuestionsThe questions and answers on this page have been asked by nof-digitise applicants. This page will be updated frequently. Metadata
Metadata 1. Is there a glossary or simplified version of the various metadata standards? UKOLN has produced a metadata glossary at http://www.ukoln.ac.uk/metadata/glossary/ 2. As we are developing sites for lifelong learners, do you have any views on whether we should use metadata appropriate for learning packages, e.g. the IMS Learning Resource Metadata Model or LOM (Learning Object Metadata)? Although the IMS Learning Resource Metadata Model or IEEE Learning Object Metadata (LOM) would be relevant, both these place a significant overhead on the metadata creator; a LOM record could take an hour or more to complete in extreme cases, for example. We feel that LOM/IMS is too big an overhead for what these projects are meant to be doing (although a LOM/IMS description of each project might be worht considering). An alternative might be to use Dublin Core with the extensions
proposed by the Education Working Group (DCEd) of the DCMI. They
have proposed an "Audience" element, and suggest adopting
"InteractivityType", "InteractivityLevel", and
"TypicalLearningTime" elements from the IEEE LOM standard. More
information is available at: More general information on educational metadata is available
in the two recent SCHEMAS Metadata Watch reports: Also see the UK's Metadata for Education Group at http://www.ukoln.ac.uk/metadata/education/ 3. Are there recommended standards for the core and extended metadata attributes that should be created for digitised resources, especially images. Dublin Core provides one simple model but is very general, other possible approaches would presumably include MARC and CIMI, but some shared approach to this is presumably seen as valuable. There are, in fact, quite a few relevant standards. For resource discovery, the nof-digitise guidelines (5.2.1) suggest that "item-level descriptions should be based on the Dublin Core and should be in line with developing e-government and UfI metadata standards." In a Dublin Core context, the specifics of using DCMES for images was discussed at DC-3 - the Image Metadata Workshop held in Dublin, Ohio in September 1996. This workshop resulted in the addition of two new elements to the original thirteen and made some changes to element descriptions. There is some useful information on DC and other image metadata formats in section 4 of the VADS/TASI guide to creating digital resources in the AHDS Guides to Good Practice series: Catherine Grout, et al., Creating digital resources for the
visual arts: standards and good practice. Oxford: Oxbow Books,
2000 (forthcoming). This mentions things like the CIMI DTD, MARC, the CIDOC standards, etc. as well as more specialised things like the Visual Resources Association (VRA) Core Record. There is information on more specialised administrative and structural metadata in the Making of America II project's final report: Bernard J. Hurley, John Price-Wilkin, Merrilee Proffitt and
Howard Besser, The Making of America II Testbed Project: a
digital library service model. Washington, D.C.: Council on
Library and Information Resources, 1999. A shorter list of elements with a primary focus on preservation is available at: RLG Working Group on Preservation Issues of Metadata, Final
report. Mountain View, Calif.: Research Libraries Group,
1998. There may be some useful background information in the
following conference paper: Michael Day, Metadata for images:
emerging practice and standards (1999). Also, see The Application of Metadata Standards to Video
Indexing Jane Hunter, CITEC (jane@dstc.edu.au) and Renato
Iannella, DSTC Pty Ltd. Abstract This paper first outlines a multi-level video indexing approach based on Dublin Core extensions and the Resource Description Framework (RDF). The advantages and disadvantages of this approach are discussed in the context of the requirements of the proposed MPEG-7 ("Multimedia Content Description Interface") standard. The related work on SMIL (Synchronized Multimedia Integration Language) by the W3C SYMM working group is then described. Suggestions for how this work can be applied to video metadata are made. Finally a hybrid approach is proposed based on the combined use of Dublin Core and the currently undefined MPEG-7 standard within the RDF which will provide a solution to the problem of satisfying widely differing user requirements. http://archive.dstc.edu.au/RDU/staff/jane-hunter.html http://www.pads.ahds.ac.uk/padsDepositorsGuide.html BBC: CIMI Guide to Good Practice (DC) 4. Can you advise on approaches to/chosen standards for metadata for sound files. Are there any recently developed models of good practice? The following resources may be useful An evaluation of resource discovery metadata for sound
resources by the Performing Arts Data Service: Outcomes of the Harmonica project might also be helpful: As a more practical example, Jon Maslin (J.Maslin@surrey.ac.uk) describes the approach taken to creating metadata for music recordings, scores and video in the performing arts at the University of Surrey: ---cut--- We have adopted the Dublin Core as a basis for our metadata because we needed a clearly defined structure and wanted, if possible, to adopt a standard. It was adopted while it was still unclear in some respects, but we knew what we had to achieve so we selected only the relevant elements, expanded some and extended DC with new elements needed for the application. So, while it was convenient to use it we had to extend it, but did not use all the elements. We are using the same schema for music recordings, scores and video in the performing arts. A copy of the schema (DTD) and some samples are attached. Comments: Dublin core elements are preceded by DC The title conforms to a standard agreed with the music department. Creators and contributors: The roles of these are defined with their names. There can be an unlimited number. We have not adopted a dictionary of defined roles as the performing arts has a potentially unlimited number, but have taken the view that different applications will act upon the metadata and that retrieval software will be sufficiently intelligent to take care of interpreting different roles (hence an informal convention of adopting the terminology on the source and defining the instrument rather than the role (largely to avoid contortions such as guitarist). It is debatable, as is the difference between creator and contributor in some instances. We have tended to class producers, recording engineers as a contributor. One of the benefits of defining a role is that the importance may not terribly significant. A similar approach has been adopted for other elements, such as the place and time of recording. We have limited this to a few attributes for our own convenience. There is no reason why this should not be expanded in the way that creators element is used. The location elemented is structured as a URL. In the example you will see it pointing to the patronserver. It can be to any other web server or a direct file access In addition a number of patron elements have been added which relate to courses. Another element has been added to define uniquely a title, eg all scores and recordings of a piece have an id. The most extensive addition has been to define the contents of a piece in a standard way regardless of type or medium. In effect this gives a multi-level table of contents. It has been designed to provide an objective series of access points which can be created without extensive subject knowledge. Typically a classical piece of music will list the movements with references to starting and stopping times. Scores have access points to movements and page numbers and repeats if required. There is no limit to the granularity (beyond time and patience). Processing and exchange It is important to remember that this is entirely independent of the application. The advantage of the XML implementation is that variations in application are relatively simple - in Patron the application displays these in cascading hierarchies.One of the objectives has been to include sufficient data and structure to allow the metadata to be exchanged and processed for the current implementation of Patron and possible enhancement,and also to be developed with more universal standards. Metadata creation We have created the metadata from a MS Access database which also holds rights information. We have also developed a form builder which automatically creates an input form from the metadata schema. This enables metadata to be created and tested rapidly, and allows inputters to adopt previously entered data to reduce time and to ensure accuracy. So, the answer is yes and it works, but that it has been application-driven: other applications would need to add to it. ---cut--- 5. Are there any software tools for handling various metadata formats? Tools that may be of some assistance in describing Web sites
may be found at: 6. Is use of Dublin Core mandatory? Metadata should be capable of supporting the delivery of item-level DC descriptions of all project resources. 7. Is simple Dublin Core metadata sufficient or are qualifiers needed? If they are, which ones should be used and how will interoperability between different domains be handled? The 15 Dublin Core metadata elements form a fairly basic cross-domain core that ensures a degree of commonality across domains and applications. In order to less ambiguously express richer or more structured information than is possible in the 15 elements, the Dublin Core community supports the notion of qualification, using element refinements and encoding schemes. An initial set of these is defined by the Dublin Core community, in the Dublin Core Qualifiers , and these are a good place to start. Where the agreed qualifiers do not meet your needs, it is possible to define others, either within your project or as part of a broader domain-based interest group. In defining new qualifiers it is important to ensure
that: As an illustration, the DC-Government Working Group recently proposed 'previousAccessMarkingChangeDate' as a refinement of DC.Rights. This was rejected because the definition of DC.Rights is: 'Information about rights held in and over the resource.' A value of the proposed 'previousAccessMarkingChangeDate' element refinement would have been a simple date, which, on its own, does not constitute 'information about rights held in and over the resource'. 8. We are planning to digitise and make accessible through a database 20,000 photographs. We are collecting enough detail at item level to create dublin core. Please can you give some examples of dynamic, database-driven sites which use this. The AHDS gateway http://www.ahds.ac.uk visibly
displays DC metadata. 9. We are cataloguing video clips and each item has approximately 20 metadata fields that need to be incorporated in the site, offering advanced search options. How would I incorporate a metadata structure that conforms to e-Government standards. What steps do I need to take to achieve this? The Dublin Core (DC) metadata scheme is based on a set of 15 core elements that are generic enough to define individual digital objects, however and wherever they have been created. Elements included in the list include 'title', 'creator', 'date' etc. A full list of these elements is available from http://dublincore.org/documents/dces/. In many cases, however, these 15 elements are not sufficient to define accurately the objects in question. The elements are then extended or qualified to define further the resource. For one type of digital resource, an HTML page, one often sees the date element extended to include fields called 'date.created' and 'date.lastmodified', i.e. the metadata includes two dates, one informing when the page was first created and a second informing when it was last updated. For a video collection the rights element may well need to be extended so to record the various copyright issues involved. Sometimes DC elements can be qualified according to examples set by others trying to define similar digital objects; in other cases, projects need to develop their own qualifying terms. For the criteria mentioned in the query, it would probably be best to have multiple qualifications of the creator and contributor elements to record details of interviewers, interviewees, gender etc. "Which tape" and "absolute address" could probably be slotted under the 'title', 'identifier' or 'source' elements. It's important to note that there is no perfect metadata scheme for any one collection. How you qualify your DC metadata can depend on how your resources are being digitised or what soft- and hardware you are using. Perhaps most importantly, any metadata scheme depends on who will be searching for your resources. A metadata scheme has to be set up to allow users to find the information they need, so, in an ideal world, the creation of a metadata scheme will follow a period of research on user needs. Users must be thought of in the broadest terms, including not only a general public, for example, but future custodians of the collection. While members of the general public may want to metadata fields which permit they to do advanced searches, future custodians may need to find detailed information on the copyright holders of the videos in questions. This could be recorded in the 'rights' element. There is a Dublin Core user group especially devoted to metadata issues surrounding moving images, although it is not particularly well developed at the moment. The user group is housed at http://dublincore.org/groups/moving-pictures/ One case study (at http://ahds.ac.uk/shakespeare.htm) gives an indication of how one digital project recording theatrical performances went about creating its metadata. Dublin Core is recommended by the NOF-digi technical standards because its common takeup should allow digital collections around the country to be interoperable with one another, i.e. to allow users to search through more than one collection at the same time. We would also point you to a (rather technical) paper looking
at video metadata representation (mainly MPEG-7) at: In addition, as this metadata seems to describes individuals there may also be important data protection problems that need to be solved. 10. What should go in the Identifier element in Dublin Core? The Identifier element has to be an unambiguous reference because it defines the actual item/resource being described. When describing the kind of resources you will be creating within your NOF project the Identifier element will most likely need to include the Project's Image Number and any reference numbers used by the host institutions (e.g. accession numbers). It could be a URI (Uniform resource identifier) but should not be just the URL of the resource, though the URL could be included. A few examples of the type of thing you should be putting in the Identifier element are listed below (these are taken from DC Assist)
<meta name="DC.Identifier" scheme="URI" content="http://foo.bar.org/zaf/"> 11. And what about the Coverage element in Dublin Core?" Just to clarify the temporal coverage is "date range" (1939/1945). You could look at some examples from dc-assist - http://www.ukoln.ac.uk/metadata/dcassist/
For spatial coverage, values might be: For temporal coverage, values might be: Question - How does one use the Period encoding scheme for the element Coverage, Time.? Can I just simply list the Period in a field called Coverage, Period. I found the explanation in the DCMI site difficult to understand. Again, how you manage it in your database is up to you, but it probably makes sense to have separate fields for the start date, end date and name of the Period (I'd suggest you probably don't need to store the name of the date scheme in your database as that should be constant). You might need to make the group repeatable if you envisage multiple ranges for temporal coverage, but that does seem quite complex. When you expose/export your metadata, the start date, end date, scheme
and name of a range all form part of the value of an occurrence of the
spatial coverage property. N.B. this is still a spatial coverage
property: "DCMI Period" is the name of an encoding scheme. You might
want to check the distinction DC makes between "element refinements"
(like "spatial" and "temporal") and "encoding schemes" (like
DCMI-Period, or a subject scheme). See the start of: Anyway....In the database, you might have a record with fields like: But when you expose/export the DC metadata record the value of the
temporal coverage property would be encoded as 12. Can NOF recommend or suggest any models for preservation metadata that we might use for our own projects? The RLG Working Group which suggests using 16 elements to capture crucial information about a digital file, their elements are fairly 'lightweight' and would probably be OK for a digitisation project, assuming that some descriptive metadata (e.g. DCMES) is also available. It's a bit old now, and it might be worth looking at METS http://www.ukoln.ac.uk/metadata/resources/mets/ or the more detailed set of elements which can be found in the draft NISO Technical Metadata for Digital Still Images standard. This can be found (in PDF) at http://www.niso.org/committees/committee_au.html and is also mentioned in the NOF guidelines. Other guidance would be available in: You could also have a look at the OCLC/RLG Preservation Metadata Working Group which has published an overview (chiefly of the OAIS model, and the specifications developed by Cedars, NEDLIB and NLA) and recommendations for 'Content Information' and a forthcoming one on 'Preservation Description Information' (these are OAIS terms): http://www.oclc.org/research/pmwg/ Or there is OCLC's own preservation metadata set at http://www.oclc.org/digitalpreservation/archiving/metadataset.pdf 13. Are any project's metadata strategies available to look at? The "Gathering the Jewels" NOF digitisation project in Wales has settled on what metadata elements and digitisation guidelines it is going to adopt. In the interests of sharing this information as widely as possible, they have put it up on their Web site - please see http://www.gtj.org.uk/technical_logo.html and scroll down the page. 14. How useful is embedding metadata in HTML meta elements? It can be useful to embed metadata into the HTML meta elements on a Web page, however when doing so keep the points below in mind. (a) it depends on a service provider (i) finding the document (search engines still have issues when harvesting dynamically created pages, for more information see Search Engine Watch and the NOF dissemination section of the programme manual) and (ii) extracting and using the metadata; and (b) HTML meta elements are not the only way of exposing metadata. Further information on OAI and other ways of making your metadata available will follow. 15. How is the dc:relation element in Dublin Core used? The dc:relation element is used to encode a reference to a resource which is related to the resource being described. The value of the dc:relation element should be an identifier for the related resource. In any DC metadata record, there may be multiple occurrences of the dc:relation element, expressing relationships between the current resource and a number of other resources. In simple/unqualified Dublin Core, dc:relation allows you to express the fact that a relationship exists between the current resource and a related resource, but it does not permit you to say anything more about the nature of that relationship between the two resources. Qualified Dublin Core introduces a number of element refinements to dc:relation, which allow you to express the nature of the relationship between the current resource and the related resource. In both simple/unqualified DC and in qualified DC, the value of the dc:relation element (or the value of any of its element refinements) should be an identifier for the related resource. e.g. 16. How would you define the language of a bilingual item using Dublin Core? Hopefully you are storing this metadata some how in your database for your own use. If so it should be fairly straightforward to define the language of your pamphlet using Dublin Core. Although DC doesn't allow for a second language you can have multiple occurrences of one element (in fact all of the 15 elements allow unlimited occurrence). So for example in a html page this would appear as below (depending on which encoding scheme you choose to use - ISO 639 or RFC 1766, and how many characters). <meta name="DC.Language" scheme="ISO639-2" content="en"> No priority is given to either of the occurrences of the element. 17. NOF require that I submit a sample of my project's metadata. Could you show me an example of how to do this? All projects are required to submit samples of their item-level metadata and indicate which fields are being used for Dublin Core metadata. This is a fictional sample taken from the digitisation project of Sandfordshire Council. It is of a digitised image of an etching done by the artist John Shade. It gives an indication of the format that should be used when forwarding metadata samples to case managers. Many of the fields here are loosely based on the JIDI Metadata Guidelines. The example shows what categories the project is using for its metadata, the actual descriptions used for one item and how the fields relate to the core element set of Dublin Core. Note that not every Dublin Core element needs to be mapped to; DC.RELATION and DC.SOURCE, for instance, were omitted from the example below. Other Dublin Core fields, in this case DC.COVERAGE, can be qualified to add extra descriptive richness. The notes on the right indicate what controlled vocabularies can be used, but there is no need for projects to indicate which ones they are utilising. Projects will have developed different metadata schema according to their collection and content; some will have more detail in certain areas and some will have less. There is no need to replicate the schema shown here. What is important with the sample is to give a sense of how each of your metadata categories are being interpreted and how they are being mapped to Dublin Core. For further help on metadata and Dublin Core visit the Technical Advisory Service and the UKOLN metadata pages. The latter includes useful tools such as DC-dot and DC-assist. |
UKOLN is funded by MLA, the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the Higher and Further Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based. |
T A S : 2 0 0 1 - 2 0 0 4 : A R C H I V E This page is part of the NOF-digi technical support pages http://www.ukoln.ac.uk/nof/support/. The Web site is no longer maintained and is hosted as an archive by UKOLN. Page last updated on Monday, May 09, 2005 |