QA Techniques For The Storage Of Image Metadata

Background

The archival of digital images requires the consideration of the most effective method of storing technical and life cycle information. Metadata is a common method used to describe digital resources, however the different approaches may confuse many users.

This paper describes QA techniques for choosing a suitable method of metadata storage that takes into account the need for interoperability and retrieval.

Choosing a Suitable Metadata Association Model

Metadata may be associated with an image in three ways:

Internal Model:: Metadata is stored within the image file itself, either through an existing metadata mapping or attached to the end of an image file in an ad hoc manner. Therefore, it is simple to transfer metadata alongside image data without special requirements or considerations. However, support for a metadata structure differs between file formats and assignment of the same metadata record to multiple images causes inefficient duplication in comparison to a single metadata record associated with a group of images.
External Model:: A unique identifier is used to associate external metadata with an image file e.g. an image may be stored on a local machine while the metadata is stored on a server. This is better suited to a repository and is more efficient when storing duplicate information on a large number of objects. However, broken links may occur if the metadata record is not modified when an image is moved, or visa versa. Intellectual Property data and other information may be lost as a result.
Hybrid Model:: Uses both internal and externally associated metadata. Some metadata (file headers/tags) are stored directly in the image file while additional workflow metadata is stored in an external database. The deliberate design of external record offers a common application profile between file formats and provides a method of incorporating format-specific metadata into the image file itself. However, it shares the disadvantages of internal & external models in terms of duplication and broken links.

When considering the storage of image metadata, the designer should consider three questions:

What type of metadata do you wish to store?
Is the file format capable of storing metadata?
What environment is the metadata intended to be stored and used within?

The answer to these questions should guide the choice of the metadata storage model. Some file formats are not designed to store metadata and will require supplementation through the external model; other formats may not store data in sufficient detail for your requirements (e.g. lifecycle data). Alternatively, you may require IP (Intellectual Property) data to be stored internally, which will require a file format that supports these elements.

Ensuring Interoperability

Metadata is intended for the storage and retrieval of essential information regarding the image. In many circumstances, it is not possible to store internal metadata in a format that may be read by different applications. This may be for a number of reasons:

The file format does not define metadata placeholders (e.g. BMP), or does not use a metadata profile that the application uses.
A standard image metadata definition and interchange format model does not exist (e.g. JPEG). As a result, the storage mechanism and metadata structure must be defined by each application.
The metadata is stored in a proprietary file format that is not publicly defined.

Before choosing a specific image format, you should ensure the repository software is able to extract metadata and that editing software does not corrupt the data if changes are made at a later date. To increase the likelihood of this, you should take one of the following approaches:

Convert image data to a file format that supports a known metadata structure (e.g. Exif, TIFF, SPIFF and Flashpix).
Use a vendor-neutral, and technology-independent, well-documented metadata standard, preferably one written in XML (e.g. DIG35, Z39.87 & MIX).
Investigate the solutions offered by the DIG35 [1] and the FILTER [2] projects, which are developing a set of templates for consistent description of images.

Although this will not guarantee interoperability, these measures will increase the likelihood that it may be achieved.

Structuring Your Image Collection

To organise your image collection into a defined structure, it is advisable to develop a controlled vocabulary. If providing an online resource, it is useful to identify your potential users, the academic discipline from which they originate, and the language they will use to locate images. Many repositories have a well-defined user community (archaeology, physics, sociology) that share a common language and similar goals. In a multi-discipline collection it is much more difficult to predict the terms a user will use to locate images. The US Library of Congress [3], the New Zealand Time Frames [4] and International Press Telecommunications Council (IPTC) [5] provide online examples of how a controlled vocabulary hierarchy may be used to catalogue images.

References

DIG35 Specification: Metadata for Digital Images, Version 1.0, August 30, 2000,
<http://xml.coverpages.org/FU-Berlin-DIG35-v10-Sept00.pdf>
FILTER,
<http://www.filter.ac.uk/>
Library of Congress Thesauri,
<http://www.loc.gov/lexico/servlet/lexico/>
New Zealand Time Frames, New Zealand National Library
<http://timeframes1.natlib.govt.nz/nlnz-browse>
International Press Telecommunications Council
<http://www.iptc.org/>