Previous section | Contents | Next section

4. Findings - issues for consideration at the meeting

4.1 Background

The draft policy statement from the Australian National Preservation Office is a clear and concise statement of the current situation (in Australia and elsewhere):

" As the twentieth century draws to a close, an ever-increasing quantity of information is created, stored, disseminated and networked in digital form. Digital objects, many of which are dynamic in nature, are created by a variety of creators for a number of purposes. Digital objects include data stored in digital form and accessed using electronic equipment. Examples are databases, images, sound, video, documents, etc...

The organisations charged with the responsibility of preserving and making available Australia's cultural and intellectual heritage will need to develop a range of strategies to address the preservation of and access to various categories of digital objects. Custodial and non-custodial arrangements will need to be considered both from a preservation and an access perspective and will need to be considered prior to creation if possible and throughout the life of the object.

(National Library of Australia. National Preservation Office. Draft statement of principles on the preservation of and long-term access to Australian digital objects, 1996. p.1-2.)

4.2 Critical analysis of the CPA/RLG report 'Preserving Digital Information'

This summary highlights the key points made in the CPA/RLG report which we felt to be of relevance for consideration in the UK. The report is covered section by section below.

A digital object is defined as a named collection of bytes, which may recursively include digital objects. Terminology generally in the field of digital preservation is inchoate and susceptible to misinterpretation.

Introduction

The short terms benefits of digital objects (manipulation, distribution, duplication, linking) are immense. The long-term storage of digital objects, however, is near impossible because of the ever-changing technology needed for their use and storage. This contradiction is the impetus for digital preservation.

The challenge of archiving digital information

Preserving digital objects on physical media alone does not preserve the technology needed for them to be usable. Refreshing, by recopying digital objects, only lengthens lifespan if the storage format is independent of any particular technology. ASCII text is about the only format that can currently be considered to be independent of technology. Not all digital objects, however, can be reduced to ASCII text.

Migration is the storage of digital objects in a format (or series of formats over time) which are independent of particular technologies. Migration is a much more complex task than refreshing. While refreshing has been attempted for some years and is known, migration is new and mostly unknown.

The long term future of the intellectual property rights held for digital objects is uncertain. The enormous variety of agencies publishing and distributing digital objects, and the numerous and ill-defined roles of the creators of digital objects make legal responsibility unclear. There is as yet no legal deposit for digital objects, so there is no impartial source for such rights.

The widespread usage of digital objects means that their preservation is a general problem throughout society. Failure to preserve them adequately will surely damage future scholarship and weaken the cultural heritage.

Information objects in the digital landscape

The Integrity of Digital Information
Digital objects allow the common storage of hitherto disparate media (text, images, sound, video etc). At the same time, digital objects can be mixed and linked in novel ways. This intermingling of media within and across digital objects, which is unique to them, needs preserving. However there are no precedents or precursors for preserving entities of this nature.

Content
While content in terms of bytes is common to all digital objects, the storage and interpretation of those bytes are hardware and software dependent. Migration is the preservation of the intellectual content of digital objects, but not neccesarily their physical form according to hardware and software dependencies. Preservation of digital objects cannot merely be about format, as content is unrelated to the storage medium.

Fixity
Digital objects must have their content fixed before preservation can take place. This fixing must be done in a manner independent of hardware and software dependencies, as these will not be preserved over time by migration. Certain digital objects whose content changes rapidly are impossible to fix permanently and can only be fixed as versions linked to date/time stamps.

Reference
Each digital object needs a full description of its nature and content. Such a description could be incorporated as part of the digital object itself. The accepted term for a description integral to the item it describes is ‘metadata’. There are metadata standards under development, but none are currently geared towards the needs of digital preservation.

Provenance
Metadata for a digital object must include its provenance in terms of the individuals who have used it, its function(s) in owning organisations and the source of any data it contains.

Context
Metadata for a digital object must contain its hardware and software dependencies, its links with other digital objects and the social context in which it was created and used.

Stakeholder Interests
A digital archive must hold descriptions of the content, fixity, reference, provenance and context of each of the digital objects it stores and any associated stakeholders (persons) involved in its life cycle. These stakeholders may be involved with publication or distribution or may be part of a team responsible for creation. Stakeholders may come from a large range of organisations, reflecting the ubiquity of digital objects.

Archival roles and responsibilities

General Principles
Digital archives should form a distributed, linked structure. A single, monolitihic archive would be unsuited to ditigal objects for a number of reasons. Their range and ever-evolving nature mitigate against one archive, as does the ease with which digital objects can be managed over a global network. Although a distrbuted set of archives would be a suitable infrastructure for digital archiving, this networked infrastructure will raise new problems of management not faced by previous archiving services.

The initial creators of digital objects carry a responsibility for their preservation. However, if the initial creator reneges on this responsiblity, then a digital archive might have to step in. How responsibility would be transfered, both legally and administratively, is far from clear. Legislation is needed.

Legal deposit would aid in removing legal uncertainity. However the legal deposit of digital objects would seem to need a standard for the format for deposited objects, which is unlikely for technological reasons. The legal definition of a digital object would be difficult.

The Operating Environment of Digital Archives
Standards for the operation of, and services provided by, digital archives, are needed. As yet, there are no such commonly agreed standards, but rather a diverse collection of local ad-hoc standards. What is lacking is the long-term experience of digital archving necessary for the encoding of standards.

Appraisal and Selection
Since the volume of digital objects precludes universal preservation, selection must take place. As with general guidelines for digital archives, there are no common selection procedures. Existing digital archives tend to cover specific types of digital objects or digital objects created at a common time and place.

The management of selected digital objects requires knowledge of their current storage location and retention value, so that weeding can be done. Again, weeding of digital objects is a new practice with little tradition to guide decisions.

Accession
Accession to a digital archive involves giving digital objects descriptions to enable their storage, retrieval and management.

Digital objects, because of their fluid nature, require special authenthication mechanisms. There is an application here for cryptographic signatures and watermarks.

Storage
Digital archives should store digital objects according to their expected frequency of use. In a properly organised hierarchical storage, speed of access will be directly linked to expected frequency of use, with levels of access being provided by different storage topologies.

Digital objects may need to be duplicated to ensure security of storage. Duplication may be but one of the measures a digital archive would need to ensure the security of its collection. One thing a digital archive could never do is allow technical failure to harm or destroy any of its collection.

Access
Digital archives and their holdings of digital objects should be accessible via a network. Since digital objects are inherently networkable, this seems only practical. However the possible security risks of network access would have to be considered and guarded against.

Migration Strategies
Migration strategies for simple digital objects exist, but those for more complex digital objects are still to be developed. For migration purposes, the content of a digital object may be preservedwhole, or in part, or as a surrogate (summary). The different levels of preservation relate in part to the complexity of the digital objects being preserved. Migration of the whole digital object might not be practical.

Change Media
Digital objects which exist as ASCII files or as files with simple, regular structures, can be copied between different storage media, but more complex digital objects cannot be.

Change Format
One way of attempting to migrate complex digital objects is to break them down into a collection of simpler digital objects, which are more suited for migration and which can be recombined later to form the complex original.

Incorporate Standards
Standards can be expected to develop in business where there is a need to migrate between generations of hardware and software systems. However there is a danger that such standards may not completely meet the needs of digital preservation in digital archives. Dialogue between the business and digital archive communities is essential.

Build Migration Paths
Ideally, standards from the business community need to be compatible with any standards developed by the presevation community for use in digital archives. Any standard must allow the current generation of hardware and software to access data in digital objects created by an earlier generation.

Use Processing Centres
Specialist centres will be needed to cope with the enormous range of formats that digital objects can be stored in, especially those formats from obsolete or unusual technologies, and prepare those formats for migration. Such specialist centres will support the work of digital archives dealing with digital objects produced by current technologies.

Managing Costs and Finances
The cost of operating digital archives will come from selecting, accessing and describing digital objects, managing intellectual property rights and building systems to allow migration. These costs will vary according to the type of digital object, their usage and over time. It is currently difficult to forecast with any high degree of certainty what these costs will actually be. A prime weakness of digital archiving is that it is a cost-unknown venture.

Migration costs will be linked to the complexity of digital objects, their description, authentication and possible compensation to their intellectual property holders. Again, there is little experience on which to base judgements of cost against the characteristics of digital objects being archived.

Just as there is scant knowledge of costs of the procedures for archiving digital objects, there is also no experience of how digital archives can achieve cost efficiency of operation.

The Yale Cost Mode
The ‘Yale Cost Model’ is one of a few possible precursors for the digital archive cost model. Project Open Book, at Yale University Library, which digitised over 2,000 texts, concluded that technological innovations did not save money, but rather organisational innovations might do so. There was a tendency for costs of the preservation of digital materials to rise over time, unlike costs for paper materials.

If funding for digital archives is uncertain, it is not known how digital archives might generate income through their operation.

Summary and recommendations

NB Recommendations from the CPA/RLG report are numbered and quoted in full emboldened text, with page numbers following.

Pilot Projects
1. Solicit proposals from existing and potential digital archives around the country and provide coordinating services for selected participants in a cooperative project designed to place information objects from the early digital age into trust for use by future generations. (p.41)

While rescue projects for old digital objects are valuable, they have little impact outside of the limits of the material that they rescue. They are solving problems raised by obsolescent technology. It is more valuable to try and perfect digital preservation for existing material, on current technology, as this has wider application.

2. Secure funding and sponsor an open competition for proposals to advance digital archives, particularly with respect to removing legal and economic barriers to preservation. (p.41)

This proposal arises from examining the Digital Libraries programme. However digital libraries are a different beast from digital archives. In digital library research the aim is to find a new role for a traditional organisation. For digital archives the nature of the research is more fundamental, to define the role. While research is certainly needed into legal and economic preservation issues, an ‘open competition’ might not be the best way to achieve the fundamental research needed.

3. Foster practical experiments or demonstration projects in the archival application of technologies and services, such as hardware and software emulation algorithms, transaction systems for property rights and authentication mechanisms, which promise to facilitate the preservation of the cultural record in digital form. (p.41)

Emulation is a preservation option which runs against the main thrust of the Report which is towards migration. Research though in transaction systems for copyright and authentication is certainly needed.

Support Structures
4. Engage actively in national policy efforts to design and develop the national information infrastructure to ensure that longevity of information is an explicit goal. (p.42)

A digital preservation policy should be part and parcel of any national informaton policy. There is little point investing in digital libraries and information superhigways without digital preservation.

5. Sponsor the preparation of a white paper on the legal and institutional foundations needed for the development of effective fail-safe mechanisms to support the aggressive rescue of endangered digital information. (p.42)

While rescue of endangered digital information is a good thing, it is difficult to see at this early stage how ‘aggressive’ such a rescue could be. This presupposes that digital archives know how to select endangered material and how to preserve it once they have rescued it.

6. Organize representatives of professional societies from a variety of disciplines in a series of forums designed to elicit creative thinking about the means of creating and financing digital archives of specific bodies of information. (p.42)

Sources of funding for digital archives do need consideration. However any such sources ought to reflect the deep and serious nature of the problem, and not be ad hoc.

7. Institute a dialogue among the appropriate organizations and individuals on the standards, criteria and mechanisms needed to certify repositories of digital information as archives. (p.42-43)

Certification of digital archives is certainly a necessary future condition of their appearance and operation. However it is too early to know how a digital archive should function to be certified and indeed, to find individuals or organisations with the knowledge and experience to do the certification.

8. Identify an administrative point of contact for coordinating digital preservation initiatives in the United States with similar efforts abroad. (p.43)

A vital proposal which applies as much to the UK as to the USA.

Best Practices
9. Commission follow-on case studies of digital archiving to identify current best practices and to benchmark costs in the following areas:

a. The design of systems that facilitate archiving at the creation stage.

b. Storage of massive quantities of culturally valuable digital information.

c. Requirements and standards for describing and managing digital information.

d. Migration paths for digital preservation of culturally valuable digital information (p.43-44)

None of the above recommendations for study can be realistically condemned. They are all crucial. Especially valuable is the intent to discover best current practice and disseminate it

4.3 Comment on the CPA/RLG Task Force Report

Comment in this section is summarised briefly through the use of headings, which reflect key content. All documents refered to (except the CURL response, below) can be accessed electronically - URLs are given.

Australia

National Library of Australia
National Library of Australia. Guidelines for the Management of Electronic Records, Documents and Publications: a Towards Federation 2001 (TF2001) Progress Report. 1996. (http://www.nla.gov.au/3/npo/conf/npo95kp.html)

Points made cover:

Actions include:

PADI: Preserving Access to Digital Information
(http://www.nla.gov.au/dnc/tf2001/padi/padi.html)

Working group made up of the Australian Archives, the Australian Council of Libraries, the National Preservation Office, and the National Film and Sound Archive. Established in 1993.

Main goals:

  1. Information site
  2. Discussion forum
  3. Strategies

National Preservation Office
The Australian National Preservation Office issued a Draft statement of Principles on the Preservation of and Long-Term Access to Australian Digital Objects. 1996
(http://www.nla.gov.au/3/npo/natco/draft.html)

In it they set out various principles including:

Note: the draft statement has now been replaced by a full statement of principles: National Library of Australia National Preservation Office. Statement of Principles: Preservation of and Long-Term Access to Australian Digital Objects. 1997
(http://www.nla.gov.au/3/npo/natco/princ.html)

Canada

National Library of Canada
Electronic Publications Pilot Project. Summary of the final report. 1996
(http://www.nlc-bnc.ca/eppp/ereport.htm)

Issues addressed:

Standard Formats

Access and Copyright

Storage

Legal Deposit

Technology

Personnel

UK

CURL (Consortium of University Research Libraries)
CURL’s response to the draft report (Response by the Consortium of University Research Libraries (CURL) to the CPA/RLG Draft Report “Preserving Digital Information”, 1996):

Warwick Workshop
Long term preservation of electronic materials. A JISC/British Library Workshop as part of the Electronic Libraries Programme (eLib). Organised by UKOLN 27th and 28th November 1995 at the University of Warwick. Report prepared by the Mark Fresko Consultancy. The British Library, 1996. (British Library R&D Report 6238).
(
http://ukoln.bath.ac.uk/fresko/warwick/intro.html)

Key features include:

Strategy

Collection

Suggestions for action:

Preservation Policy

Practical Implications

Issues raised:

Further actions:

Other General Comments

These were included as they illustrate and reinforce points made elsewhere.

"I do not have confidence in the ability of the publishers to take this on. They are much more subject to takeover and closure than research libraries are. In addition, they have never before shouldered this responsibility and I don't see them suddenly getting an interest in it now. Sooner or later, they would have to say: where is the commercial benefit? ...The main qualification I would make is that learned society publishers may find themselves able to take on archiving since they have always accepted a greater responsibility for meeting the needs of the intellectual community they represent."

(Bernard Naylor, email to arl-ejournal mailing list, December 4th 1996)

“These stories were easy to gather using CNN’s keyword search function. I e-mailed for permission and got a quick response saying that links to the CNN site are Ok, although use of their logo is restricted. Of more concern is their response on how long the stoires are available:-

‘Our stories are archived, but some stories, depending on their source, are deleted after a period of time. We are not at liberty to release which stories must be deleted’."

(John Kupersmith, email to web4lib mailing list, November 26th 1996)

"This [migration] to me represents the true cost of archiving, more than the cost for hardware/memory, etc. In accepting the burden of archiving, the archiving agency is also accepting an obligation to refresh data and formats as required-maybe as often as every five years. For materials with relatively little economic payback, this is a daunting obligation."

(Sandra Whisler, email to arl-ejournal mailing list, November 21st 1996)

"Long-term digital archiving is...very expensive, and it requires special skills and technical infrastructures which most libraries will not be able to acquire."

(John Mackenzie Owen, NBBI Ltd, email to arl-ejournal mailing list, November 27th 1996)

"Although the cost of digital archiving is high, it need not be a problem if we chose a different organisational model for the digital archive. The lower cost of print archiving has to be multiplied by the number of libraries/archives world-wide that include a publication in their collection. That number, and therefore the global cost, is many times higher than the cost of storing a digital publication in one location."

(John Mackenzie Owen, NBBI Ltd, email to arl-ejournal mailing list, Novermber 27th 1996)

4.4 Issues for Consideration

The following issues, categorised by broad heading, arising out of analysis of the CPA/RLG Report (see 4.2 above), comment on the Report (see 4.3 above) and other documents, should be considered with releance to the UK

Legal

Intellectual property: Protection of creator's rights. Will copyright become unmanageable with the increase of digital information available? Investigate attitudes to intellectual property and responsibility for archiving.

Legal deposit. Is it viable? Present legislation is outdated and therefore unclear on what is covered. Tax incentives are not enough to ensure survival of information, there must be legislation. Replace existing legal deposit with electronic deposit? The British Library has recently submitted a report to government on the legal deposit of electronic publications (Proposal for the legal deposit of non-print publications: to the Department of National Heritage from the British Library. London: British Library Board, 1996). Note: in response, the goverment has produced a consultation paper, Department of National Heritage, Scottish Office, Welsh Office, Department of Education Northern Ireland. Legal deposit of publications: a consultation paper: February 1997. Department of National Heritage, 1997.

Legally binding responsibility: CPA/RLG report: "first line of defence lies with the creators, providers and owners". How can this be legally binding?

Creating Standards

Standards are needed for:

Migration and/or refreshing procedures.

Descriptive information (similar to cataloguing details), to include details such as provenance, ownership, change in format or structure, etc. (metadata). "Some producers are already taking some actions, for example publishing materials already fit for preservation. Consequently there is a need to select and encourage the use of a set of common standards" ( Warwick Workshop, p.55).

Certification of digital archives/archivists. By whom?

Education and Further Research

Achieve overview of current and forthcoming digital projects (both archival specific and access lead). "There is a need to survey activities already in progress, in order to provide a baseline for future efforts". (Warwick Workshop, p.55).

Identify what policy statements exist to facilitate the production of guidelines.

Identify further areas of research/pilots, for example, how to deal with non-static resources such as bulletin boards, databases, selected Internet resources, like the Human Genome Project.

Identify training needs in libraries and archives concerning digital preservation. Identify library and information studies and other educational courses relevant to digital preservation with a view to develping teaching materials.

Policy and Organisational

Selection: Can/Should ALL information be ‘saved’? How can qualitative assessment of the value of information be carried out? Issues such as censorship, misjudgement of individuals or organisations responsible: standardisation? Is random selection an option?

Policy Statement: Organisations need clear policy statements (e.g. on selection for preservation).

Which organisations will ‘hold’ material? Would either one major repository or a network of digital archives (as the CPA/RLG report suggests) be better?

Awareness Raising (Warwick Workshop, p.56).

Training/Education: Are there suitably qualified people available at present?

Creating standards: Archival process, certification schemes (e.g. for migration procedures, digital archives/archivists).

"Fail safe mechanism": How would this be implemented? For example how would digital archives know if information was in jeopardy, especially if the danger was neglect?

Collaboration

Collaboration should be both national and international and public and private sector. Specific issues include:

Involve IT suppliers: "Actions should be initiated to "sensitise" producers of data to the need for, and the issues concerning, preservation." (Warwick Workshop, p.53)

Access issues: such as are preservation and access indistinguishable for digital materials? Is the Internet the best mechanism of access? (Warwick Workshop, p.55)

Costs

Digital archives (e.g.'up-front', implementation, running)

Costs of research/pilots

Training.

4.5 Initial Prioritised Actions

Based on the issues for consideration given above, the following recommendations, intended to generate activity on digital preservation in the UK, are suggested.

N.b. Emboldened numbers in brackets refer to the numbered recommendations in the CPA/RLG report, which are given in section 4.1. above.

  1. Appoint National Digital Preservation Officer (based at the National Preservation Office?)(8).

  2. Establish a representative National Digital Preservation body to work with and support the activities of the National Digital Preservation Officer (8).

  3. Facilitate public and private sector representation and involvement (6).

    Technical: Identify stakeholders and develop links between them. Establish professional and industrial standards by consent (7).

    Legal: lobby for legislation. Seek appropriate legal representation on national body (5).

    Financial: Identify funders (2 & 6).

    Organisational: Coordination of storage and access (1 & 9).

  4. Raise awareness. Promote education training, and discussion. Create a UK-based Web site.

  5. Establish and maintain a UK (-based) discussion list, "Digpres" (could be used for notification to and dissemination by National Digital Preservation Officer).

  6. Investigate current and proposed digital archival practice and policy nationally and internationally. Achieve overview (1 & 3).

  7. Identify through research/pilots good practice, problems, e.g. selection, metadata, standards and gaps in knowledge (9).

  8. Devise guidelines on practice and a digital preservation policy within the national preservation policy and national information initiatives and within an International context (4).

  9. Liaise with Library and Information Commission concerning its 20/20 Vision work programme (eg with regard to input to development of National Information policy) (4).

  10. Create and maintain a database of 'who's doing what?' (Current directory of UK digital archives) as a source of information and expertise.


Previous section | Contents | Next section

Web version of this report by Alan Poulter