Previous section |
Contents |
Next section
4. Findings - issues for consideration at the meeting
4.1 Background
The draft policy statement from the Australian National Preservation Office is a clear and
concise statement of the current situation (in Australia and elsewhere):
" As the twentieth century draws to a close, an ever-increasing quantity of information is
created, stored, disseminated and networked in digital form. Digital objects, many of which
are dynamic in nature, are created by a variety of creators for a number of purposes. Digital
objects include data stored in digital form and accessed using electronic equipment.
Examples are databases, images, sound, video, documents, etc...
The organisations charged with the responsibility of preserving and making available
Australia's cultural and intellectual heritage will need to develop a range of strategies to
address the preservation of and access to various categories of digital objects. Custodial and
non-custodial arrangements will need to be considered both from a preservation and an
access perspective and will need to be considered prior to creation if possible and throughout
the life of the object.
(National Library of Australia. National Preservation Office.
Draft statement of principles on
the preservation of and long-term access to Australian digital objects, 1996. p.1-2.)
4.2 Critical analysis of the CPA/RLG report 'Preserving Digital Information'
This summary highlights the key points made in the
CPA/RLG report which we felt to be of
relevance for consideration in the UK. The report is covered section by section below.
A digital object is defined as a named collection of bytes, which may recursively include
digital objects. Terminology generally in the field of digital preservation is inchoate and
susceptible to misinterpretation.
Introduction
The short terms benefits of digital objects (manipulation, distribution, duplication, linking) are
immense. The long-term storage of digital objects, however, is near impossible because of the
ever-changing technology needed for their use and storage. This contradiction is the impetus
for digital preservation.
The challenge of archiving digital information
Preserving digital objects on physical media alone does not preserve the technology needed for
them to be usable. Refreshing, by recopying digital objects, only lengthens lifespan if the
storage format is independent of any particular technology. ASCII text is about the only format
that can currently be considered to be independent of technology. Not all digital objects,
however, can be reduced to ASCII text.
Migration is the storage of digital objects in a format (or series of formats over time) which are
independent of particular technologies. Migration is a much more complex task than
refreshing. While refreshing has been attempted for some years and is known, migration is
new and mostly unknown.
The long term future of the intellectual property rights held for digital objects is uncertain. The
enormous variety of agencies publishing and distributing digital objects, and the numerous and
ill-defined roles of the creators of digital objects make legal responsibility unclear. There is as
yet no legal deposit for digital objects, so there is no impartial source for such rights.
The widespread usage of digital objects means that their preservation is a general problem
throughout society. Failure to preserve them adequately will surely damage future scholarship
and weaken the cultural heritage.
Information objects in the digital landscape
The Integrity of Digital Information
Digital objects allow the common storage of hitherto disparate media (text, images, sound,
video etc). At the same time, digital objects can be mixed and linked in novel ways. This
intermingling of media within and across digital objects, which is unique to them, needs
preserving. However there are no precedents or precursors for preserving entities of this
nature.
Content
While content in terms of bytes is common to all digital objects, the storage and interpretation
of those bytes are hardware and software dependent. Migration is the preservation of the
intellectual content of digital objects, but not neccesarily their physical form according to
hardware and software dependencies. Preservation of digital objects cannot merely be about
format, as content is unrelated to the storage medium.
Fixity
Digital objects must have their content fixed before preservation can take place. This fixing
must be done in a manner independent of hardware and software dependencies, as these will
not be preserved over time by migration. Certain digital objects whose content changes rapidly
are impossible to fix permanently and can only be fixed as versions linked to date/time stamps.
Reference
Each digital object needs a full description of its nature and content. Such a description could
be incorporated as part of the digital object itself. The accepted term for a description integral
to the item it describes is ‘metadata’. There are metadata standards under development, but
none are currently geared towards the needs of digital preservation.
Provenance
Metadata for a digital object must include its provenance in terms of the individuals who have
used it, its function(s) in owning organisations and the source of any data it contains.
Context
Metadata for a digital object must contain its hardware and software dependencies, its links
with other digital objects and the social context in which it was created and used.
Stakeholder Interests
A digital archive must hold descriptions of the content, fixity, reference, provenance and
context of each of the digital objects it stores and any associated stakeholders (persons)
involved in its life cycle. These stakeholders may be involved with publication or distribution
or may be part of a team responsible for creation. Stakeholders may come from a large range
of organisations, reflecting the ubiquity of digital objects.
Archival roles and responsibilities
General Principles
Digital archives should form a distributed, linked structure. A single, monolitihic archive
would be unsuited to ditigal objects for a number of reasons. Their range and ever-evolving
nature mitigate against one archive, as does the ease with which digital objects can be
managed over a global network. Although a distrbuted set of archives would be a suitable
infrastructure for digital archiving, this networked infrastructure will raise new problems of
management not faced by previous archiving services.
The initial creators of digital objects carry a responsibility for their preservation. However, if
the initial creator reneges on this responsiblity, then a digital archive might have to step in.
How responsibility would be transfered, both legally and administratively, is far from clear.
Legislation is needed.
Legal deposit would aid in removing legal uncertainity. However the legal deposit of digital
objects would seem to need a standard for the format for deposited objects, which is unlikely
for technological reasons. The legal definition of a digital object would be difficult.
The Operating Environment of Digital Archives
Standards for the operation of, and services provided by, digital archives, are needed. As yet,
there are no such commonly agreed standards, but rather a diverse collection of local ad-hoc
standards. What is lacking is the long-term experience of digital archving necessary for the
encoding of standards.
Appraisal and Selection
Since the volume of digital objects precludes universal preservation, selection must take place.
As with general guidelines for digital archives, there are no common selection procedures.
Existing digital archives tend to cover specific types of digital objects or digital objects created
at a common time and place.
The management of selected digital objects requires knowledge of their current storage
location and retention value, so that weeding can be done. Again, weeding of digital objects is
a new practice with little tradition to guide decisions.
Accession
Accession to a digital archive involves giving digital objects descriptions to enable their
storage, retrieval and management.
Digital objects, because of their fluid nature, require special authenthication mechanisms.
There is an application here for cryptographic signatures and watermarks.
Storage
Digital archives should store digital objects according to their expected frequency of use. In a
properly organised hierarchical storage, speed of access will be directly linked to expected
frequency of use, with levels of access being provided by different storage topologies.
Digital objects may need to be duplicated to ensure security of storage. Duplication may be
but one of the measures a digital archive would need to ensure the security of its collection.
One thing a digital archive could never do is allow technical failure to harm or destroy any of
its collection.
Access
Digital archives and their holdings of digital objects should be accessible via a network. Since
digital objects are inherently networkable, this seems only practical. However the possible
security risks of network access would have to be considered and guarded against.
Migration Strategies
Migration strategies for simple digital objects exist, but those for more complex digital objects
are still to be developed. For migration purposes, the content of a digital object may be
preservedwhole, or in part, or as a surrogate (summary). The different levels of preservation
relate in part to the complexity of the digital objects being preserved. Migration of the whole
digital object might not be practical.
Change Media
Digital objects which exist as ASCII files or as files with simple, regular structures, can be
copied between different storage media, but more complex digital objects cannot be.
Change Format
One way of attempting to migrate complex digital objects is to break them down into a
collection of simpler digital objects, which are more suited for migration and which can be
recombined later to form the complex original.
Incorporate Standards
Standards can be expected to develop in business where there is a need to migrate between
generations of hardware and software systems. However there is a danger that such standards
may not completely meet the needs of digital preservation in digital archives. Dialogue
between the business and digital archive communities is essential.
Build Migration Paths
Ideally, standards from the business community need to be compatible with any standards
developed by the presevation community for use in digital archives. Any standard must allow
the current generation of hardware and software to access data in digital objects created by an
earlier generation.
Use Processing Centres
Specialist centres will be needed to cope with the enormous range of formats that digital
objects can be stored in, especially those formats from obsolete or unusual technologies, and
prepare those formats for migration. Such specialist centres will support the work of digital
archives dealing with digital objects produced by current technologies.
Managing Costs and Finances
The cost of operating digital archives will come from selecting, accessing and describing
digital objects, managing intellectual property rights and building systems to allow migration.
These costs will vary according to the type of digital object, their usage and over time. It is
currently difficult to forecast with any high degree of certainty what these costs will actually
be. A prime weakness of digital archiving is that it is a cost-unknown venture.
Migration costs will be linked to the complexity of digital objects, their description,
authentication and possible compensation to their intellectual property holders. Again, there is
little experience on which to base judgements of cost against the characteristics of digital
objects being archived.
Just as there is scant knowledge of costs of the procedures for archiving digital objects, there is
also no experience of how digital archives can achieve cost efficiency of operation.
The Yale Cost Mode
The ‘Yale Cost Model’ is one of a few possible precursors for the digital archive cost model.
Project Open Book, at Yale University Library, which digitised over 2,000 texts, concluded
that technological innovations did not save money, but rather organisational innovations might
do so. There was a tendency for costs of the preservation of digital materials to rise over time,
unlike costs for paper materials.
If funding for digital archives is uncertain, it is not known how digital archives might generate
income through their operation.
Summary and recommendations
NB Recommendations from the
CPA/RLG report are numbered and quoted in full
emboldened text, with page numbers following.
Pilot Projects
1. Solicit proposals from existing and potential digital archives around the country and
provide coordinating services for selected participants in a cooperative project designed
to place information objects from the early digital age into trust for use by future
generations. (p.41)
While rescue projects for old digital objects are valuable, they have little impact outside of the
limits of the material that they rescue. They are solving problems raised by obsolescent
technology. It is more valuable to try and perfect digital preservation for existing material, on
current technology, as this has wider application.
2. Secure funding and sponsor an open competition for proposals to advance digital
archives, particularly with respect to removing legal and economic barriers to
preservation. (p.41)
This proposal arises from examining the Digital Libraries programme. However digital
libraries are a different beast from digital archives. In digital library research the aim is to find
a new role for a traditional organisation. For digital archives the nature of the research is more
fundamental, to define the role. While research is certainly needed into legal and economic
preservation issues, an ‘open competition’ might not be the best way to achieve the
fundamental research needed.
3. Foster practical experiments or demonstration projects in the archival application of
technologies and services, such as hardware and software emulation algorithms,
transaction systems for property rights and authentication mechanisms, which promise
to facilitate the preservation of the cultural record in digital form. (p.41)
Emulation is a preservation option which runs against the main thrust of the Report which is
towards migration. Research though in transaction systems for copyright and authentication is
certainly needed.
Support Structures
4. Engage actively in national policy efforts to design and develop the national
information infrastructure to ensure that longevity of information is an explicit goal.
(p.42)
A digital preservation policy should be part and parcel of any national informaton policy.
There is little point investing in digital libraries and information superhigways without digital
preservation.
5. Sponsor the preparation of a white paper on the legal and institutional foundations
needed for the development of effective fail-safe mechanisms to support the aggressive
rescue of endangered digital information. (p.42)
While rescue of endangered digital information is a good thing, it is difficult to see at this early
stage how ‘aggressive’ such a rescue could be. This presupposes that digital archives know
how to select endangered material and how to preserve it once they have rescued it.
6. Organize representatives of professional societies from a variety of disciplines in a
series of forums designed to elicit creative thinking about the means of creating and
financing digital archives of specific bodies of information. (p.42)
Sources of funding for digital archives do need consideration. However any such sources
ought to reflect the deep and serious nature of the problem, and not be ad hoc.
7. Institute a dialogue among the appropriate organizations and individuals on the
standards, criteria and mechanisms needed to certify repositories of digital information
as archives. (p.42-43)
Certification of digital archives is certainly a necessary future condition of their appearance
and operation. However it is too early to know how a digital archive should function to be
certified and indeed, to find individuals or organisations with the knowledge and experience to
do the certification.
8. Identify an administrative point of contact for coordinating digital preservation
initiatives in the United States with similar efforts abroad. (p.43)
A vital proposal which applies as much to the UK as to the USA.
Best Practices
9. Commission follow-on case studies of digital archiving to identify current best
practices and to benchmark costs in the following areas:
a. The design of systems that facilitate archiving at the creation stage.
b. Storage of massive quantities of culturally valuable digital information.
c. Requirements and standards for describing and managing digital information.
d. Migration paths for digital preservation of culturally valuable digital information
(p.43-44)
None of the above recommendations for study can be realistically condemned. They are all
crucial. Especially valuable is the intent to discover best current practice and disseminate it
4.3 Comment on the CPA/RLG Task Force Report
Comment in this section is summarised briefly through the use of headings, which reflect key
content. All documents refered to (except the
CURL response, below) can be accessed
electronically - URLs are given.
Australia
National Library of Australia
National Library of Australia. Guidelines for the Management of Electronic Records,
Documents and Publications: a Towards Federation 2001 (TF2001) Progress Report. 1996.
(http://www.nla.gov.au/3/npo/conf/npo95kp.html)
Points made cover:
- Access
- Preservation responsibility
- Collaboration
- 'Failsafe' mechanism
- Best practices
- Selection
- Integrity of information
- 'Anyone can be a publisher'
- Different management system
- Research
Actions include:
- Current projects
- Analyse CPA/RLG report
- Develop model
- Publish and promote findings.
PADI: Preserving Access to Digital Information
(http://www.nla.gov.au/dnc/tf2001/padi/padi.html)
Working group made up of the Australian Archives, the Australian Council of Libraries, the
National Preservation Office, and the National Film and Sound Archive. Established in 1993.
Main goals:
- Information site
- Discussion forum
- Strategies
National Preservation Office
The Australian National Preservation Office issued a Draft statement of Principles on the
Preservation of and Long-Term Access to Australian Digital Objects. 1996
(http://www.nla.gov.au/3/npo/natco/draft.html)
In it they set out various principles including:
- Cooperation
- Primary responsibility
- Value/significance
- Stakeholder's rights
- Standards
- Research
- Legislation
Note: the draft statement has now been replaced by a full statement of principles: National
Library of Australia National Preservation Office. Statement of Principles: Preservation of
and Long-Term Access to Australian Digital Objects. 1997
(http://www.nla.gov.au/3/npo/natco/princ.html)
Canada
National Library of Canada
Electronic Publications Pilot Project. Summary of the final report. 1996
(http://www.nlc-bnc.ca/eppp/ereport.htm)
Issues addressed:
- Standard formats
- Access and copyright
- Hypertext links
- Storage
- Legal deposit
Standard Formats
- Electronic signatures
- Encryption
- Standard mark-up languages
- Multiple versions
Access and Copyright
- Copyright statements
- "Big Dreams - copyright Dare to Dream Enterprises. Permission is hereby given to
print out and distribute unlmited exact copies as long as no fee is charged."
- "All articles are copyrighted and copying for other than personal reference use without
express permission is prohibited."
- Browsing
- Hypertext Mark-Up Language (HTML) Links
- Where does a publication start and end?
- Web-directories
Storage
- Security
- Long-term preservation
- Storage responsibilities
- Compressing files
Legal Deposit
- Definition of 'electronic publication'?
- Directory of electronic publications
Technology
- Software, eg
- Word-processing,
- Web browsers,
- Compression software,
- Hardware specific software.
- Hardware
Personnel
- Staff Training/Resource Allocation
- Assigning new responsibilities
UK
CURL (Consortium of University Research Libraries)
CURL’s response to the draft report (Response by the Consortium of University Research
Libraries (CURL) to the CPA/RLG Draft Report “Preserving Digital Information”, 1996):
- Definition of term 'digital archive'
- Existing principles
- Specific agenda
- Non-document-like objects
- Research
- ‘Periodic snapshots’
- Use is not the best insurance
- Broad representation
- Primary responsibility
- Legal deposit
- Migration
- Legislation
Warwick Workshop
Long term preservation of electronic materials. A JISC/British Library Workshop as part of
the Electronic Libraries Programme (eLib). Organised by UKOLN 27th and 28th November
1995 at the University of Warwick. Report prepared by the Mark Fresko Consultancy. The
British Library, 1996. (British Library R&D Report 6238).
(http://ukoln.bath.ac.uk/fresko/warwick/intro.html)
Key features include:
Strategy
- Momentum
- Preservation responsibility
- Access
- ‘Sensitise’ data producers
Collection
Suggestions for action:
- Sampling techniques.
- Data deposition mechanism
- National level committee
- Policy
- Internet sampling study
Preservation Policy
- Current activities
- Standards
- Communication
- 'Watchdog'
- Preservation is access.
Practical Implications
Issues raised:
- Cost models
- Preservation=access?
Further actions:
- Research
- Awareness
- Training
Other General Comments
These were included as they illustrate and reinforce points made elsewhere.
"I do not have confidence in the ability of the publishers to take this on. They are much more
subject to takeover and closure than research libraries are. In addition, they have never before
shouldered this responsibility and I don't see them suddenly getting an interest in it now.
Sooner or later, they would have to say: where is the commercial benefit? ...The main
qualification I would make is that learned society publishers may find themselves able to take
on archiving since they have always accepted a greater responsibility for meeting the needs of
the intellectual community they represent."
(Bernard Naylor, email to arl-ejournal mailing list, December 4th 1996)
“These stories were easy to gather using CNN’s keyword search function. I e-mailed for
permission and got a quick response saying that links to the CNN site are Ok, although use of
their logo is restricted. Of more concern is their response on how long the stoires are
available:-
‘Our stories are archived, but some stories, depending on their source, are deleted after a
period of time. We are not at liberty to release which stories must be deleted’."
(John Kupersmith, email to web4lib mailing list, November 26th 1996)
"This [migration] to me represents the true cost of archiving, more than the cost for
hardware/memory, etc. In accepting the burden of archiving, the archiving agency is also
accepting an obligation to refresh data and formats as required-maybe as often as every five
years. For materials with relatively little economic payback, this is a daunting obligation."
(Sandra Whisler, email to arl-ejournal mailing list, November 21st 1996)
"Long-term digital archiving is...very expensive, and it requires special skills and technical
infrastructures which most libraries will not be able to acquire."
(John Mackenzie Owen, NBBI Ltd, email to arl-ejournal mailing list, November 27th 1996)
"Although the cost of digital archiving is high, it need not be a problem if we chose a different
organisational model for the digital archive. The lower cost of print archiving has to be
multiplied by the number of libraries/archives world-wide that include a publication in their
collection. That number, and therefore the global cost, is many times higher than the cost of
storing a digital publication in one location."
(John Mackenzie Owen, NBBI Ltd, email to arl-ejournal mailing list, Novermber 27th 1996)
4.4 Issues for Consideration
The following issues, categorised by broad heading, arising out of analysis of the
CPA/RLG Report (see
4.2 above), comment on the Report (see
4.3 above) and other documents, should
be considered with releance to the UK
Legal
Intellectual property: Protection of creator's rights. Will copyright become unmanageable with
the increase of digital information available? Investigate attitudes to intellectual property and
responsibility for archiving.
Legal deposit. Is it viable? Present legislation is outdated and therefore unclear on what is
covered. Tax incentives are not enough to ensure survival of information, there must be
legislation. Replace existing legal deposit with electronic deposit? The British Library has
recently submitted a report to government on the legal deposit of electronic publications
(Proposal for the legal deposit of non-print publications: to the Department of National
Heritage from the British Library. London: British Library Board, 1996). Note: in response,
the goverment has produced a consultation paper, Department of National Heritage, Scottish
Office, Welsh Office, Department of Education Northern Ireland. Legal deposit of
publications: a consultation paper: February 1997. Department of National Heritage, 1997.
Legally binding responsibility:
CPA/RLG report: "first line of defence lies with the creators,
providers and owners". How can this be legally binding?
Creating Standards
Standards are needed for:
Migration and/or refreshing procedures.
Descriptive information (similar to cataloguing details), to include details such as provenance,
ownership, change in format or structure, etc. (metadata). "Some producers are already taking
some actions, for example publishing materials already fit for preservation. Consequently there
is a need to select and encourage the use of a set of common standards" (
Warwick Workshop, p.55).
Certification of digital archives/archivists. By whom?
Education and Further Research
Achieve overview of current and forthcoming digital projects (both archival specific and
access lead). "There is a need to survey activities already in progress, in order to provide a
baseline for future efforts".
(Warwick Workshop, p.55).
Identify what policy statements exist to facilitate the production of guidelines.
Identify further areas of research/pilots, for example, how to deal with non-static resources
such as bulletin boards, databases, selected Internet resources, like the Human Genome
Project.
Identify training needs in libraries and archives concerning digital preservation. Identify library
and information studies and other educational courses relevant to digital preservation with a
view to develping teaching materials.
Policy and Organisational
Selection: Can/Should ALL information be ‘saved’? How can qualitative assessment of the
value of information be carried out? Issues such as censorship, misjudgement of individuals or
organisations responsible: standardisation? Is random selection an option?
Policy Statement: Organisations need clear policy statements (e.g. on selection for
preservation).
Which organisations will ‘hold’ material? Would either one major repository or a network of
digital archives (as the CPA/RLG report suggests) be better?
Awareness Raising (Warwick Workshop, p.56).
Training/Education: Are there suitably qualified people available at present?
Creating standards: Archival process, certification schemes (e.g. for migration procedures,
digital archives/archivists).
"Fail safe mechanism": How would this be implemented? For example how would digital
archives know if information was in jeopardy, especially if the danger was neglect?
Collaboration
Collaboration should be both national and international and public and private sector. Specific
issues include:
Involve IT suppliers: "Actions should be initiated to "sensitise" producers of data to the need
for, and the issues concerning, preservation." (Warwick Workshop, p.53)
Access issues: such as are preservation and access indistinguishable for digital materials? Is the
Internet the best mechanism of access? (Warwick Workshop, p.55)
Costs
Digital archives (e.g.'up-front', implementation, running)
Costs of research/pilots
Training.
4.5 Initial Prioritised Actions
Based on the issues for consideration given above, the following recommendations, intended to
generate activity on digital preservation in the UK, are suggested.
N.b. Emboldened numbers in brackets refer to the numbered recommendations in the
CPA/RLG report, which are given in section
4.1. above.
- Appoint National Digital Preservation Officer (based at the National Preservation
Office?)(8).
- Establish a representative National Digital Preservation body to work with and support
the activities of the National Digital Preservation Officer (8).
- Facilitate public and private sector representation and involvement (6).
Technical: Identify stakeholders and develop links between them. Establish
professional and industrial standards by consent (7).
Legal: lobby for legislation. Seek appropriate legal representation on national body (5).
Financial: Identify funders (2 & 6).
Organisational: Coordination of storage and access (1 & 9).
- Raise awareness. Promote education training, and discussion. Create a UK-based Web
site.
- Establish and maintain a UK (-based) discussion list, "Digpres" (could be used for
notification to and dissemination by National Digital Preservation Officer).
- Investigate current and proposed digital archival practice and policy nationally and
internationally. Achieve overview (1 & 3).
- Identify through research/pilots good practice, problems, e.g. selection, metadata,
standards and gaps in knowledge (9).
- Devise guidelines on practice and a digital preservation policy within the national
preservation policy and national information initiatives and within an International
context (4).
- Liaise with Library and Information Commission concerning its 20/20 Vision work
programme (eg with regard to input to development of National Information policy)
(4).
- Create and maintain a database of 'who's doing what?' (Current directory of UK digital
archives) as a source of information and expertise.
Previous section |
Contents |
Next section
Web version of this report by
Alan Poulter