Background
The archival of digital images requires the consideration of the most effective
method of storing technical and life cycle information. Metadata is a common
method used to describe digital resources, however the different approaches may
confuse many users.
This paper describes QA techniques for choosing a suitable method of metadata
storage that takes into account the need for interoperability and retrieval.
Choosing a Suitable Metadata Association Model
Metadata may be associated with an image in three ways:
- Internal Model:
- Metadata is stored within the image file
itself, either through an existing metadata mapping or attached to the end of
an image file in an ad hoc manner. Therefore, it is simple to transfer metadata
alongside image data without special requirements or considerations. However,
support for a metadata structure differs between file formats and assignment of
the same metadata record to multiple images causes inefficient duplication in
comparison to a single metadata record associated with a group of images.
- External Model:
- A unique identifier is used to associate external metadata with an image file
e.g. an image may be stored on a local machine while the metadata is stored on a
server. This is better suited to a repository
and is more efficient when storing duplicate information on a large number of
objects. However, broken links may occur if the metadata record is not modified
when an image is moved, or visa versa. Intellectual Property data and other
information may be lost as a result.
- Hybrid Model:
- Uses both internal and externally associated
metadata. Some metadata (file headers/tags) are stored directly in the image file
while additional workflow metadata is stored in an external database. The
deliberate design of external record offers a common application profile between
file formats and provides a method of incorporating format-specific metadata into
the image file itself. However, it shares the disadvantages of internal &
external models in terms of duplication and broken links.
When considering the storage of image metadata, the designer should consider
three questions:
- What type of metadata do you wish to store?
- Is the file format capable of storing metadata?
- What environment is the metadata intended to be stored and used within?
The answer to these questions should guide the choice of the metadata storage model.
Some file formats are not designed to store metadata and will require supplementation
through the external model; other formats may not store data in sufficient detail
for your requirements (e.g. lifecycle data). Alternatively, you may require
IP (Intellectual Property) data to be
stored internally, which will require a file format that supports these elements.
Ensuring Interoperability
Metadata is intended for the storage and retrieval of essential information
regarding the image. In many circumstances, it is not possible to store internal
metadata in a format that may be read by different applications. This may be for
a number of reasons:
- The file format does not define metadata placeholders (e.g. BMP), or does not
use a metadata profile that the application uses.
- A standard image metadata definition and interchange format model does not
exist (e.g. JPEG). As a result, the storage mechanism and metadata structure must
be defined by each application.
- The metadata is stored in a proprietary file format that is not publicly defined.
Before choosing a specific image format, you should ensure the repository
software is able to extract metadata and that editing software does not corrupt
the data if changes are made at a later date. To increase the likelihood of this,
you should take one of the following approaches:
- Convert image data to a file format that supports a known metadata structure
(e.g. Exif, TIFF,
SPIFF and Flashpix).
- Use a vendor-neutral, and technology-independent, well-documented metadata
standard, preferably one written in XML
(e.g. DIG35, Z39.87 & MIX).
- Investigate the solutions offered by the DIG35 [1] and
the FILTER
[2] projects, which are developing a set of
templates for consistent description of images.
Although this will not guarantee interoperability, these measures will increase
the likelihood that it may be achieved.
Structuring Your Image Collection
To organise your image collection into a defined structure, it is advisable to
develop a controlled vocabulary. If providing an online resource, it is useful
to identify your potential users, the academic discipline from which they originate,
and the language they will use to locate images. Many repositories have a
well-defined user community (archaeology, physics, sociology) that share a
common language and similar goals. In a multi-discipline collection it is much
more difficult to predict the terms a user will use to locate images. The US
Library of Congress [3], the New Zealand Time Frames [4]
and International Press Telecommunications Council
(IPTC) [5]
provide online examples of how a controlled vocabulary hierarchy may be used
to catalogue images.
References