The Investigation into the Digital Preservation Needs of Universities and Research Funders was commissioned by the Digital Achiving Working Group and carried out by staff of the UK Data Archive. The purpose of the study was to explore the extent to which electronic materials being generated within the university sector, as well as those created with funding from the main research councils and other research funding bodies, were subject to preservation policies; to identify variations in both existing and planned preservation policies; and to discover the views of key people in universities and funding agencies on their future needs with respect to digital preservation.
The impetus for the study was an increasing concern that as more digital information is produced - much of it held only in digital form - it is essential to develop strategies for the selection and preservation of such material.
Anecdotal information indicated that many of those creating digital information confuse routine back-ups with preservation and incorrectly believe that old material can be retrieved and read on current hardware and using current software provided that it has been 'backed up'. We were keen therefore to explore the knowledge and understanding of preservation of those who fund, create or otherwise have responsibility for electronic material within the broad university sector, before going on to talk to them about their future plans and concerns. Unfortunately this is difficult to do since the very process of interviewing about a topic as specialised as this informs and conditions the respondents to reply in specific ways. Thus the examination of knowledge was more successful in the face to face and telephone interviews rather than the self completion questionnaires since with the former two methods we had control over the order of the questions and the respondents' ability to consult others.
Given that the research had to be completed within a few months and limited financial resources were available it was necessary to focus upon some specified categories of respondent. Since one of the most important dimensions of the problem concerns ownership of the digital materials (and the extent to which ownership brings obligations or responsibilities) it was essential that we spoke to both representatives of the funding agencies and of universities. We also wanted to obtain the views of researchers 'at the coal face' who have to make day to day decisions about the preservation of the electronic material they create.
The list of potential funding agencies is very large and it was essential to concentrate on those funders who provide significant support for academic research in order to make optimum use of our and the respondents' time. These were selected by consulting experienced researchers in different disciplines and using the expertise of the University of Essex research office. Commercial and governmental bodies were excluded even though they support large amounts of research in some areas such as engineering because it was felt that the issues of ownership and the rights over data would be significantly different in these circumstances. It should be recognised however that some of the academics who participated in the research may have received funding from commercial and governmental bodies.
One of the methodological decisions was who to contact within the universities. We made a deliberate decision that this should not be the Head of the Computing Services since we wished to emphasise that preservation is not simply a technical issue. Since we were keen to find out about policy at the highest possible levels within the universities it seemed appropriate to contact vice-chancellors directly. We discussed this decision with some vice-chancellors in advance of the study and they confirmed that it was an issue which might be considered at the most senior levels within universities. Our own vice-chancellor at the University of Essex, Professor Crewe, was extremely helpful in advising us on how to pare down the questionnaire to an appropriate length with relevant content. It is important to realise that some vice-chancellors may have passed the questionnaire to someone else to complete on their behalf possibly a pro-vice-chancellor or dean of research. We encouraged this where it seemed appropriate, say, because someone else had responsibility for this policy area within the institution.
The potential population of researchers is very large and resources would have been wasted in trying to identify those for whom this is a current issue. We therefore decided to focus on representatives within projects and centres located in universities which were likely to be creating significant amounts of electronic information. It should be recognised that 'lone' or isolated researchers may be less able through lack of resources, expertise or recognition of the problem, to develop or implement preservation strategies. Thus our sample is not representative of the academic research community as a whole. But if projects and centres can be encouraged to take digital preservation seriously probably most data of value to secondary analysis will be covered, and the good practice may extend to cover individual researchers.
This study concentrated on a topic of a somewhat esoteric nature. Although some researchers, funding agencies and senior university staff have given it attention it became clear at the early stages of our research that the majority had not really thought through their obligations with respect to digital preservation. Our questionnaire was not always welcome therefore especially since it was not easy to complete but required respondents to discuss a number of quite complex issues which may be quite new to them. It also raised uncomfortable issues about juggling priorities within limited budgets. Add to this the fact that all three of our sample groups feel under increasing work pressure and resent the fact that more of their time is taken up with form filling and bureaucracy and one can understand why our response rates were relatively low. Indeed in some respects it is gratifying that 109 people gave up significant time to respond. The quality of the responses was good and the effort respondents were prepared to make in the face to face interviews was remarkable - several of them took two or more hours.
Of more concern than the response rate is the bias which has probably been introduced by the fact that the participants are different from those who were unable or unwilling to participate within our tight time schedule. It is inevitable that such a survey will over-represent those with an interest in the topic and in particular those who already have a digital preservation policy or are planning to introduce one. Without an extensive and expensive follow-up study of the non-respondents it is impossible to measure this bias, but we know from our attempts to persuade more people to respond that it is significant. We have tapped a very particular vein with respect to digital preservation and the results must be interpreted in the light of this. There is no evidence however that the issue of digital preservation is only of relevance to our respondents. It is likely that the non-respondents will in the future face these issues - perhaps in the near future - but that they do not feel prepared to discuss them yet.
Partly explainable by the response bias but interesting nevertheless is the fact that almost all of the respondents recognised the importance of digital preservation and understood the distinction between this and backing up data. When asked about the reasons why digital information should continue to be available many strong arguments were produced including the need for scientific advances to build on what had gone before, principles of openness and replicability, making greater use of limited and expensive resources, and reducing respondent burden. In those few cases where digital preservation policies had been established or were under discussion these arguments had been clearly articulated.
An overwhelming theme throughout all the interviews, and one which had an especial importance in the interviews with funding agencies was that of ownership of the material. In those situations where ownership was unclear or where there were disagreements about the ownership of research material it had not been possible to develop preservation strategies. It seems that clarity over data ownership is a prerequisite for the implementation of policies in this area. It was recognised by all three sample groups that ownership brings responsibilities - not just in terms of preservation but also in relation to ensuring that the rights of research subjects and researchers are respected. Many respondents spoke about the uneasy balance between on the one hand ensuring that data remain usable and are used and on the other hand maintaining data confidentiality and protecting the original researcher's intellectual property. Since there are few incentives to preserve data, almost no requirements imposed by universities or funders to do so and since digital preservation has long term financial implications it is no surprise that most researchers have taken the easy option of not tackling this difficult conundrum.
It was interesting that some respondents in funding agencies identified the fact that they did not claim ownership of the research material as a reason for absolving themselves from the responsibility for preservation. Others felt that they nevertheless had the right to specify that the grant recipients must ensure that material is preserved and remains accessible.
The sample was not large enough to enable us to disaggregate it according to discipline but nevertheless it is very clear that there are big variations across the disciplines with cultures of data sharing encouraged in the quantitative social, economic and natural environmental areas and with resultant preservation policies. In other areas - most notably the arts, humanities and medicine - the promotion of data sharing is much newer and as a result systems of data preservation are in their infancy. Several of the respondents were anxious to point out that there are unresolved issues with respect to the preservation of some types of digital material. Foremost among these issues is how to protect confidentiality of sensitive medical information or how to define what constitutes 'results' (does it include field notes, working papers, sample listings etc?) in the case of anthropological research. These respondents also felt that preservation and access policies could not be imposed on researchers but that there had to be broad consensus in the academic community first, and that preservation policies have implications for the way research is conducted and the promises made to respondents.
Almost all respondents in all three sample groups mentioned limited resources as a major obstacle to introducing digital preservation systems. Considerable concern was expressed about the on-going costs of preservation. A number of respondents confessed to the fact that they had no clear idea of what scale of costs we might be talking about but were worried that it might mean less money being available for the collection and creation of new materials. Not surprisingly there was no consensus on how digital preservation might be financed. A small but significant number of respondents pointed out that the costs of digital preservation meant that it was essential to be selective about what should be preserved and that the research community should be involved in the selection process.
When asked about the future and whether there was a need for national initiatives with respect to digital preservation (training schemes, more research, awareness campaigns etc) there was almost universal approval for 'more to be done'. There was very little complacency and even representatives of bodies which have well functioning digital preservation policies expressed the need for further expansion or improvements. It was particularly marked that representatives of smaller funding agencies and some of the smaller projects together with university vice-chancellors were of the view that they had limited resources and expertise and were keen for national or collaborative developments. This was not necessarily in order to shift responsibility - in fact some of the representatives of the smaller funding agencies in particular are anxious to play a full part in digital preservation and are willing to endorse centralised policies but do not feel able to take the lead. The need to have shared and agreed policies rather than a preponderance of many slightly varying policies was mentioned several times.
Despite the endorsement in principle of national or central initiatives most respondents had difficulty in outlining what such initiatives might comprise, and who might take responsibility for them. In general we found our ideas were endorsed but respondents came up with few of their own. This is not surprising since for many respondents this is not a topic with which they have much familiarity and the proposals are fairly complex which undoubtedly have both advantages and pitfalls. Many respondents demonstrated a lack of confidence in their own knowledge with respect to digital preservation not knowing if standards exist for example, or not being sure if all types of digital material can be preserved.
It is encouraging that most of the 109 respondents to this survey were keen to obtain more information about digital preservation practice and policy, and would welcome an opportunity to learn more in order to enable them to make informed decisions about the preservation of the electronic material they fund or create.
The recommendations fall into three broad areas:
* the development of national guidelines covering the key areas of concern
* the development of standards to allow kitemarking of centres where research data can be managed and preserved in the long term
* the development of a national policy on research data and dissemination of information about this national policy.
This study aims to determine:
* the extent of electronic resource creation currently undertaken by UK universities and funding agencies
* what current provision is made for the long-term preservation of data produced
* the future needs of universities and funding agencies with regard to digital preservation.
This research is divided into three parts. The first addresses the policies and needs of research funding agencies; the second, university research projects with particular attention to eLib, CTI, JTAP and TLTP projects; and the third, vice-chancellors of higher education institutes. Because very few of the organisations surveyed had formal preservation policies for electronic materials the views expressed here are normally those of individuals within the organisations rather than an expression of the corporate position.
A list of 200 potential relevant agencies was drawn up from our own knowledge and in consultation with the Essex University research liaison office. Contact with these organisations was made initially by telephone. However we experienced great difficulties in finding individuals with responsibility for this area of policy, usually because the organisation had not previously considered the preservation of electronic materials.
Once the suitable contact with each funding agency was identified research proceeded in one of three ways: a postal questionnaire was sent, a telephone interview was conducted, or a date for a personal visit was arranged.
Eleven visits to funding agencies were carried out to BBSRC, ESRC, EPSRC, NERC and MRC (PPARC was not visited as EPSRC are responsible for its preservation policy) The Nuffield Foundation, The Leverhulme Trust, The Imperial Cancer Research Campaign, The British Academy, The Wellcome Trust and the Rowntree Foundation.
Seven telephone interviews were conducted. Most respondents opted to be sent a postal questionnaire rather than discuss the matter over the telephone. Sixty questionnaires were sent out, but the response rate was low with only eleven being returned, despite reminders being issued.
A frame of units within universities which create or gather electronic data was compiled. This included all eLib projects, the JTAP projects, Computers in Teaching Initiatives across the various disciplines and the Research Council research centres. An email was sent to each of the Data Archive's organisational representatives, providing them with this list and asking them to check the information for their institution and to suggest additional centres or projects which may be creating digital material.
Questionnaires were distributed by email to each of the project directors. Approximately 400 were sent. Fifty-two responses were returned.
A letter was sent to the vice-chancellors and principals of all higher education institutes in the UK informing them of the research and seeking information about university-wide preservation policies and future needs. Thirty-three questionnaires were returned in time for inclusion in this report.
The low response is believed to be partly attributable to the low priority assigned to the preservation of electronic materials by all the agencies surveyed. These groups are only just beginning to consider the implications of preserving electronic materials. The responses received are evidently biased towards those who had already considered electronic preservation, or were particularly inspired to consider the problems on receipt of a questionnaire. The findings below must be viewed in the light of this response bias.
Apart from two funding agencies whose respondents were unsure whether their projects were creating electronic materials, all of the organisations which participated in this part of the research were funding research which involved the creation of electronic materials. The two respondents who were unsure thought it likely that their projects were creating electronic materials. All of the funding agencies had noted (or believed that there was) an increase in the number of projects creating electronic materials, and expected this trend to continue.
All the funding agencies surveyed supported researchers within the university sector. In some cases, individual researchers were supported, in others departments or centres were funded. The funding councils and some of the larger funding agencies funded in-house research at their institutions' research centres. The Wellcome Trust sponsors research centres which, although bearing the Trust's name, are not formally part of the trust. Funding agencies do not all limit their support to the university sector: organisations such as the Nuffield Foundation and the Millennium Commission also fund local authorities and school teachers for example. In these cases, the environment and the purpose of the research can be very variable, necessitating policies that accommodate the diversity of grant recipients. Funding agencies often support research in collaboration with other research agencies and with commercial companies. For example, medical research charities often collaborate with one another and with pharmaceutical companies. To further complicate the picture of research funding these collaborations often go beyond national borders.
The grant recipients supported by the funding agencies are from such a wide range of disciplines that it is impossible to produce a comprehensive list of the materials being produced. Examples of electronic data produced by research include: scientific experimental data, epidemiological data, social surveys, clinical trial data, metadata providing a description of metadata files, electronic teaching materials, musical and choreography notation, audio-visual files and multi-media databases.
`Electronic paperwork' has increased the amount of electronic material available for preservation. This category of material combines two distinct types which funding bodies' representatives discussed separately because they have different purposes and different ownership: electronic paperwork created by the funding agency to administer and monitor its grant scheme and electronic paperwork created by researchers as part of the research process. These may include administrative files and electronic research diaries. Electronic mail is an important example of this type as it has become one of the main means of communication between academics. Bulletin boards and mailing lists are also important and often communicate valuable information. There are moves amongst funding agencies to design systems to enable the on-line completion of grant applications. This will increase the amount of electronic paperwork pertaining to research projects.
Increases in volume are also due to availability of hardware and the development of software to meet ever widening range of tasks. In laboratory research many of the instruments used now produce an automatic electronic output, while at the same researchers tend to note observations and musings during experiments in paper lab books, although as electronic notebooks become more readily available this may change. So despite the noted increase in the creation of electronic materials research is far from being paperless. Whether electronic or paper materials are created is largely determined by the skills and background of individual researchers and the availability of technology.
There was agreement across the respondents in all of the funding agencies that the electronic materials being produced by researchers constitute a valuable resource and steps should be taken to preserve them. The rationale given for preservation included making the best of resources through secondary analysis, for evaluation and replication of past research, and the investigation of scientific misconduct. In summary, the reasons given in favour of preservation were:
* the expense of data collection makes the re-use of resources important
* uncompleted or unpublished research should be available to be built upon and extended or completed
* methods and results should be capable of being replicated to ensure scientific accountability
* the output of research units which are closed or whose projects come to an end should not be lost
* data should be available for historical research in due course.
However, this is not to say that the preservation of data is without its difficulties. Respondents from funding agencies expressed some concerns about the potential negative effects that preservation of electronic materials could bring. Concerns expressed included the following:
* the independence and impartiality of the funding body could be compromised by secondary analysis
* quality control over secondary analysis could not be guaranteed
* respondent confidentiality and respect for the sensitivity of data could be placed in jeopardy
* the security of preservation technologies (for electronic data) was suspect
* sensitive or commercially valuable data might fall into the hands of unsuitable individuals or bodies.
Of the 29 research funding agencies surveyed, only five had established policies and/or guidelines regarding the preservation of electronic materials. Two research funders were in the process of investigating policy needs.
The firmest of the policies were those of the ESRC and NERC. The ESRC supports The Data Archive and NERC supports seven data centres covering different disciplines within NERC's remit. ESRC researchers are obliged to discuss data access at an early stage and to offer any electronic materials to The Data Archive. NERC's policy is that grant recipients must satisfy the funder that they have the willingness, expertise and commitment to preserve data themselves or that they have made arrangements with an appropriate data centre to take responsibility for their data at a specified time point.
The British Academy encourages grant recipients to offer any electronic materials they produce to the Arts and Humanities Data Service (AHDS). Grant recipients are given AHDS literature and guidelines for creating and documenting datasets.
The Rowntree Foundation encourages grant recipients to contact the Data Archive or AHDS, but no follow up takes place.
The Leverhulme Trust does not have any guidelines or policies regarding the preservation of electronic materials but disseminates information about the AHDS and The Data Archive's services to grant recipients. However, this literature is not viewed as a recommendation from the Leverhulme Trust.
BBSRC and ICRF have both formed committees to investigate their requirements for preservation. At the time the visits were made no plans had been outlined. Currently, the BBSRC requires information from grant-holders of all grants that produce data so that they can confirm that data are logged in public access databases. The ICRF backs up all electronic material and paper documentation is stored in a number off site locations. The focus of current preservation is on the published material. Once research is published the raw data are not always preserved. Preservation is left to the decision of individual research staff.
ESRC policy actively encourages secondary analysis over primary data collection. The data policy asks all applicants whether new data collection could be avoided and requires them to check if secondary data are available. However, the ESRC were alone in formally prioritising secondary analysis. Other research funders recognise the value of secondary analysis, but do not officially encourage or prioritise it.
The most difficult problems to overcome in a preservation policy, as reported by respondents in funding agencies, are those of selection, confidentiality, access and ownership.
None of the AHDS, the Data Archive or the NERC Data Centres is obliged to take any of the materials offered to them. Limitations of resources mean that no archive can accept all material that it is offered. One respondent believed that a great deal of valuable data are not being preserved. However, not all material is worth preserving and there are difficulties in identifying which materials are worth this effort.
There are also difficulties in determining what constitutes data and what the researcher was obliged to deposit. Materials preserved must include contextual information by which the materials can be understood and used correctly. ICRF has begun to ask researchers and laboratory staff to categorise their data as 'vital or important' in order to aid selection of materials for preservation. They are asked to judge which data are vital to the organisation, which could be re-created if necessary, how long would it take and how much damage would the loss of the materials do to the organisations research. These are subjective decisions resting on the individual researcher who is best placed to know the true value of the material.
The problem of where to draw the line is important when considering 'electronic paperwork' as so much of this is created. As this problem has only been addressed on a local level by individuals attempting to identify key documentation, there are no guidelines in existence and so researchers must make subjective judgements. There is thought to be a danger that preservation considerations will distract organisations and researchers from their main aim of 'doing science'. The respondents' views on selection may be summarised as follows:
* prediction of what will be useful in the future needs to be improved in order to avoid the accumulation of material of low value
* decisions on when to acquire data for preservation and access are difficult because the researcher may claim still to be working on them beyond a reasonable period
* some datasets require constant updating in order to maintain their value and it is unclear who has responsibility for ensuring the integrity of the updated material.
Ownership of electronic materials is rarely established clearly. Nine funding agencies were reported as retaining the IPR in normal circumstances, although this is sometimes surrendered in exchange for a royalty fee. It was more common for ownership and IPR to rest with the researcher or their institution. In cases where research is thought to have commercial value, or where a patent is to be registered, a clause may be inserted into funding contracts. However, this is rare and only evaluated on a project by project basis.
There were concerns about ownership. These were mainly prompted by problems which other organisations were perceived to experience. Funding agencies felt that it might be necessary to clarify ownership within contracts. ICRF has established a separate company to handle these legal issues. NERC and the MRC have both recently clarified issues of IPR and ownership. Three categories of ownership emerged:
* Data created by research council staff is owned by the research council and the research council retains the IPR.
* Funded data collected by universities is the property of those universities. The research councils do not have any official authority over what happens to data owned by grant holders although they do have some unofficial leverage over universities. This is important in policing abuse.
* A grant could be awarded with the clause that the IPR is given back to the research council. This would constitute a contract not a grant and VAT might be payable.
It should be noted, however, that under some funding agency procedures research staff in external centres are classified as the agency's staff, thus giving the agency rights over the research data.
Ownership of data brings responsibilities in terms of access and preservation. However, not all of those research funders claiming ownership and IPR accepted the preservation of materials as their responsibility.
The research councils clearly have a wider motivation than other funding agencies for the furtherance of a particular branch of science. The research council respondents believed that the preservation of electronic materials is the responsibility of the research council, although this responsibility is shared with the researcher. Five other funding agency respondents shared this view. The remaining funding agency respondents believed that the creators of electronic materials should be responsible for their preservation. It was reported that these organisations felt it was their responsibility to provide researchers with financial support, but not to dictate procedures.
Many of the respondents in funding agencies felt it was too early to comment on outcomes as they knew too little about electronic preservation, while others felt it was not their place to comment as they had already stated that this was not their responsibility. However, other agency respondents suggested a number of outcomes.
It was suggested that what is needed is the creation of a culture of data sharing. This is best achieved through a campaign promoting awareness of data preservation and its value. There was disagreement over whether a specialised advisory service and a programme of education would be useful. This is probably due to the varying level of expertise in funding agencies. One respondent felt that a statement of national policy and guidelines would enable agencies to be more confident in its requirements of researchers and would help to allay fears about the safety of materials once preserved, and the ease with which they can be located and accessed.
The benefits of a centralised repository over a dispersed network were debated with widely differing views being expressed. One respondent felt that centralisation might be important in relation to preservation but not in relation to dissemination. Others felt that a central body might threaten their independence. Those who favoured co-operation along the lines of AHDS felt that they could further the aims of their discipline without fear of conflicting with the aims of other specialist bodies. Other agencies expressed a desire to work independently of other organisations.
Those funding agencies which did not accept responsibility for the preservation of electronic materials obviously did not wish to contribute funds to this aim. Representative of the smaller charitable trusts did not feel that they had the means to contribute. Other funding agency respondents believed they would be able to make a contribution, although this would create an imbalance of funding if others remained reluctant. Other suggestions included top slicing research council funding.
Overall, research agency respondents felt there should be national funding for the preservation of electronic resources.
This survey suggests that there is extensive creation of digital materials within universities. All of the 52 projects whose representatives completed questionnaires were collecting or creating electronic materials as part of their research. Of these the majority were creating electronic materials as a major part of their research.
Within these projects the nature of the digital materials produced was wide ranging. Numerical data derived from experiments and surveys was the most common type described. Papers and reports in the form of text files were also reported. Qualitative research data held in electronic form including interview transcripts and field notes and materials created in the increasingly popular CAQDAS (Computer Assisted Qualitative Data Analysis Software) programmes were described. Several historical projects were producing electronic documents and databases from historical sources. Projects were utilising new multimedia technologies to produce electronic images, audio-visual materials and World Wide Web pages. Electronic tutorials and teaching materials featured highly in the responses. This may be attributed to the sampling frame and the high representation of CTI projects in this research. The questionnaires for this part of the research were all distributed and returned over electronic mail. Access to email is readily available within the university sector and the responses demonstrate that this facility is being utilised by a large number of researchers. The production of emails was recorded by some of the projects and no doubt generated by the majority. Project administration files, agendas and minutes of meetings are also being generated in electronic form.
The sharing of electronic materials appears to be common within the academic sector. Few projects anticipated use of materials solely by people within their organisation. The majority assigned a high priority to providing others with access to their materials, both within and without their organisations. More than half of the respondents reported projects re-using materials created by other researchers. All had made plans for the dissemination of project results. In most cases distribution is in the form of CD-ROM or over the Internet, particularly the World Wide Web, often through the production of specific gateways. Only four projects mentioned deposit within an archive as a means of making material available. Projects gained access to other's materials via the same channels (CD-ROM, the World Wide Web and archives). Access to each other's materials is most often achieved through informal agreements with the creators and copyright holders.
The questionnaire sought views about the preservation of electronic materials and the value of secondary analysis. The attitudes expressed by the respondents demonstrated that on the whole they were in favour of preserving electronic materials, and were aware of the benefits that can be gained from the reuse of these materials.
Respondents agreed that the re-analysis of data was a central principle of scientific scholarship and that cost benefits could also be achieved from secondary analysis. It was also important in social research not to alienate potential respondents with too frequent surveys.
Predicting which digital materials are likely to be useful in the future was not considered to pose a serious problem or to risk wasting time and money preserving the wrong materials, although some felt that it did. Most respondents disagreed that materials could only be fully understood by the original researcher. However, there was also recognition of the difficulties involved in understanding other people's work.
Projects received funding from a variety of sources. Projects funded by the higher education funding councils were well represented in the research presented here. Other sources of funding included the project's own institution, charitable trusts, the European Union, commercial companies, central government departments, local government and the NHS. It was common for projects to receive money from several organisations, with up to five separate funders. A project's source of funding can vary over time.
Most of those answering the questionnaire did not know whether their funder had a policy regarding the preservation of the electronic materials they were creating.
Of the respondents who knew about their funding agencies' policies few problems were reported. The problems of multi-funded projects and materials and the possibility of having conflicting policies imposed were raised. In another case, it was felt that the policy did not represent a full understanding of the nature of the material being generated and that the specified archive was not the best place of deposit. No respondents representing projects reported a difficulty in understanding their funder's preservation policy, and none felt that the policy was inflexible, although two felt their funder's policy on this matter was too vague.
Seventeen of the 22 respondents whose funders did not have a policy (or were not aware of any policy) were strongly in favour of a policy being formulated. They felt that such a policy should cover an extensive range of issues. The funding of preservation and Intellectual Property Rights were the most frequently mentioned issues. Other issues raised were:
* technical issues and storage standards
* length of time material should be stored
* administrative support
* the purpose of preservation
* responsibility for preservation
* cataloguing.
Some concerns were raised about the potential difficulties which new preservation policies could bring. There were fears that the policies would not be based on a real understanding of the needs of researchers and the nature of the material being generated. Researchers were keen to ensure that their own access to materials was not affected and the IPR and publication rights remained with the researcher. There was also a concern that such a policy would place additional pressure on time and resources and would make multi-funding more difficult.
Six of the projects had their own internal preservation policy and one was drafting a policy. Four respondents said that they would offer material to an archive (the Data Archive or an AHDS centre) while two other projects held materials at their own centre. One of these two had a policy for back-ups and updates while the other did not specify whether a policy was in place.
Other project respondents reported little discussion of a preservation policy and only one reported an attempt to establish one. Insufficient resources to allocate to the task was given as frequent reason why projects had not attempted or succeeded in establishing a preservation policy. This is compounded by the short term nature of project funding (the third most frequent response). Respondents also reported that too little was known about preservation, and that conflicting priorities lead to disagreements about policy content.
In summary the problems experienced when attempting to establish a preservation policy were:
* lack of national guidance
* lack of interest in out of date materials
* uncertainty about technical standards
* unavailability of resources to carry out preservation.
Eleven respondents reported that their projects had deposited materials in an archive. The following archives were listed:
Data Archive (listed five times)
Oxford Text Archive (twice)
Oxford English Dictionary
ADAM
VADS
CogPrints
Other projects `archived' materials on institutional systems, but these were back-up rather than preservation systems. The remaining projects who had not archived materials were asked whether they had experienced any problems which had prevented them from depositing materials with an archive.
It was found that the reasons for not archiving materials were a lack of awareness about archives, and the absence of archives suitable for the materials being created. Two projects had had materials turned down by archives. Two other problems raised were the need for adequate metadata and concerns about the legitimacy of lodging material in an external archive when the intellectual property rights are not clearly defined.
The decision to deposit materials was dictated in most cases by the policy of the funding agency. The selection of materials for deposit appears to be fairly arbitrary and subjective, the responsibility of individual researchers with few set guidelines. In some cases decisions are made on the basis of quality of the materials, as in the reported case where poor images are rejected.
It was clear that respondents within projects believed the preservation of digital materials to be a joint responsibility of the researchers generating the materials and the agencies funding the generation of materials. Over half of the respondents projects held this view. The remainder of those who answered this question were divided between those who believed that funders should be responsible and slightly fewer who believed that researchers should be responsible.
Of those who believed that `other organisations' should be responsible for preservation, five felt that a dedicated central body should be responsible for preservation. This could either be a new creation or an extension of an existing body such as the British Library. One reason given for the need for a central body was that "preservation is a long term function that cannot be met by creators or funders, who will typically have limited existence". Three respondents believed that the existing archives should be given additional support to enable them to take responsibility for the preservation of digital materials, and to cover new types of material not currently archived. Two respondents believed that the role of the British Library should be expanded to take responsibility for electronic materials.
The questionnaire sought views on whether respondents felt that ownership affected the preservation of digital materials. The responses made it clear that they feel ownership is a central concern in the preservation of materials. Their views may be summarised as follows:
* ownership conveys the right and the responsibility to preserve the material
* copyright is bound up with ownership
* ownership confers responsibility for ethical issues
* ownership confers responsibility for accuracy and usability.
Preservation of digital materials requires funding and 14 of the respondents felt that the money should come from the public purse. Lottery money was suggested as a source of funds. Other responses suggested that the likely beneficiaries of preservation should pay, the beneficiaries being the users of secondary data and the copyright holders. Most respondents were uncertain whether their own organisations would be prepared to contribute funds towards the preservation of digital materials. Most respondents felt that fewer funds for primary research in order to fund preservation would be unacceptable. Only two stated the view that this would be acceptable because of the cost benefits of secondary analysis in the long run.
The respondents representing projects were given a number of possible solutions to the problem of preservation and asked to select which of these would be useful. The options were: a national body, a campaign to raise awareness, more research into the problem of preservation, a programme of education.
There is not much to separate the popularity of the outcomes. Most thought that an awareness campaign, a national body and more research would be useful and slightly fewer felt that a programme of education would be useful.
The national body should be organised by the National Preservation Office or the British Library. It should advise on preservation policy, and monitor all relevant developments in standards of best practice. It should be concerned with non-UK materials only if they have been used for research in the UK, created by UK Nationals (e.g. as British Council projects) or acquired, together with the IPR, by a British institution.
With regard to the nature of the preservation body it was felt that a distributed network would be desirable for the following reasons:
* it would allow greater security and reliability and distribute work and storage capacity
* it would avoid a monopoly situation
* it would allow for regional or discipline specific differences.
However, some doubts were expressed:
* advice on policy could only be issued by a central authority
* technical standards should be set by a central authority.
Several unresolved research issues were raised, again indicating the obstacles that must be overcome for successful preservation. These issues were concerned with:
* technical standards covering refreshing, migrating and preservation of hardware/software
* selection and retention policy
* storage media stability
* funding
* complex ownership
* confidentiality
* ongoing management of the materials.
In order to ascertain whether the preservation of electronic materials is being addressed by universities at an institutional level the third part of this research targeted the vice-chancellors of higher education institutes.
Twenty-five responses from vice-chancellors' questionnaires were returned in time to be included within this report. The responses confirmed the findings from the projects that universities are creating large quantities of electronic materials and this is expected to the increase. Twenty-one of the vice-chancellors said that they had noticed an increase in the number of projects producing electronic materials at their institutions and twenty-two anticipated a large increase in this sort of project in the future.
None of the respondents in the universities in the survey had established any procedures, policies or guidelines covering the preservation of electronic materials at their institutions. Seven reported that their universities had considered, or were considering establishing such a policy.
The questionnaire asked about the difficulties experienced, or anticipated, with establishing a policy on preservation. 'Insufficient resources to allocate to the task' was the most frequent difficulty encountered. Sixteen replies indicated that this problem had been encountered, 14 said that too little was known about electronic preservation, while 12 felt that the problem of data preservation had simply not been recognised. Five said that conflicting priorities meant that a policy could not be agreed upon.
Replies from the vice-chancellors or their representatives highlighted the following difficulties in formulating a preservation policy:
* resource constraints
* relationship to preservation policies for non-electronic material
* technical difficulties relating to obsolete or immature media and hardware
* rapid change of standards
* a belief that researchers would be unwilling to deposit data for preservation.
Respondents were asked their views on the relationship between ownership and preservation of materials. University respondents felt that ownership of electronic materials is often unclear and "digitisation exacerbates the problems of what constitutes intellectual property". That preservation policies addressed this was viewed as a priority. A number of replies stated that ownership should provide the incentive to preserve. However, the converse of this is that fears over access rights and conditions of use could provide a disincentive for preservation. This concern was also expressed by the project representatives.
Most vice-chancellors or their representatives agreed with the project respondents that the preservation of electronic materials should be a joint responsibility of researchers and the research funding agencies. Five believed that researchers should be responsible, and one stated that it should be a condition of grant funding for those who create or collect digital materials. Only one vice-chancellor felt that the preservation of electronic materials should be the responsibility of research funders. Four vice-chancellors felt that this responsibility should lie with 'other organisations' - centralised organisations such as JISC and the British Library with a long term remit and funding were favoured. It was felt that individual academics and publishers are not likely to take on this long-term, costly task. However another theme to emerge was the importance of flexibility, with the solution dependent on the nature of electronic material. One respondent stressed the need to recognise the diversity of materials to be preserved.
The question of funding the preservation of electronic resources generated a mixed response from vice-chancellors. Eleven believed that the preservation of electronic materials should be funded 'nationally' or 'centrally' from the public purse, four of these felt that this money should come from HEFCE funds. Six respondents believed that research funding agencies should foot the bill for the preservation of electronic materials, although we have already seen that not all funders are willing to contribute resources. Other sources to be explored included the private sector, funds generated from commercial partnerships and funds generated from the end users of materials. Just three vice-chancellors believed that universities should pay for preservation directly. While 12 universities envisaged their institution contributing resources to the preservation of data, a further eight did not. In response to the suggestion that provision of resources for data preservation may lead to fewer funds for primary research, 12 vice-chancellors believed this was inevitable, four felt that this would be desirable in the long run because it would ensure access to resources in the future, and three felt that this outcome was unacceptable.
Vice-chancellors were strongly in favour of a national body responsible for advising on preservation policy, twenty responding that such a body would be useful. Most felt that expanding the role of an existing institution was preferable to the establishment of a brand new body. The most suitable candidates for this role were thought to be the British Library, National Preservation Office or The Consortium of University Research Libraries. Vice-chancellors felt that this body should have more than an advisory role and should lead and co-ordinate a national strategy of preservation.
Vice-chancellors were divided as to whether this body should be concerned with data generated outside of the United Kingdom. Nine vice-chancellors believed that non-UK data should be within the remit of this body when the materials were perceived to be of value to the UK and where other countries did not have a policy of preservation. However, an equal number of vice-chancellors disagreed and believed that international co-operation with similar organisations abroad would make additional materials available to researchers and that this could possibly be framed within an EU-wide initiative.
Ten vice-chancellors believed that central funding for a dispersed network of preservation activities would be useful. It was felt that the Arts and Humanities Data Service provided a useful model that could be applied to all disciplines. One vice-chancellor asserted that it is not necessary to create one central electronic archive, "As Internet technologies become increasingly pervasive it should become less important where, and on what server, research data is held. Therefore, the creation of central archives should not be essential. What needs to be co-ordinated centrally is the cataloguing of and access to, such materials". This network should be co-ordinated by the new national preservation body, or through JISC.
A campaign to raise awareness of the problem of preserving electronic materials and a programme of education regarding preservation techniques were both viewed favourably, 14 vice-chancellors believing that they would be useful. Both the awareness campaign and the programme of education should be run by the new national body or by the HEFCE, the British Library or JISC. They should target a wide range of groups including the creators of digital materials, universities, publishers, database hosts, government departments, commercial organisations with a high level of involvement in research, central support services and research funders. Again vice-chancellors hoped that JISC or HEFCE should fund these programmes, but there were concerns that would lead to a situation of `robbing Peter to pay Paul'. The lottery was suggested a number of times as a potential source of funds along with private sector investment. It was suggested that the education programme should be funded through subscription.
Half of all the vice-chancellors felt there should be a wider programme of research into the preservation of digital materials. The diversity of research areas identified demonstrates the range of political, legal and technical issues that remain unresolved. Vice-chancellors suggested that research was needed to determine how the necessary fund could be generated, long term media and standards for storage that are sensitive to technology change, ownership of materials and intellectual property rights. A particularly useful suggestion was to conduct an in-depth study of the creation of electronic materials in a cross-section of higher education institutions in order to identify the best steps to preserve materials.
It is clear that electronic resources are being funded and created at an ever increasing-rate, both in terms of data and of `electronic paperwork'. Increases are due to the availability of hardware, software and the electronic infrastructure available to researchers. All but two of the funding agencies were funding projects which produced electronic materials. All of the projects surveyed were creating electronic materials as a major part of their research. The responses from the universities, 33 of which were returned in time to be included in this report, confirm that large quantities of electronic materials are being created in UK universities and that this is expected to increase. Also emerging from this study are three predominant issues:
* a lack of awareness of the need for preservation policies for electronic research materials
* a need for advice, standards and national policy to plan for preservation
* the need for a centre, or a distributed network of centres to provide preservation facilities for those without the facilities and resources to provide their own long-term preservation.
The first of these issues is difficult to quantify but is implicit in the low response rate and the difficulty experienced in finding a person with responsibility in this area to interview in many of the organisations selected. The study is biased towards those who had thought about preservation issues and these almost uniformly expressed a need for guidance and policy. A centre or centres with the facilities and expertise to carry out preservation services for others is another clear requirement which emerged both explicitly and implicitly from the investigation.
Only five of the 28 funding agencies had established policies and guidelines for preservation. Researchers within the projects surveyed whose funders did not have a preservation policy were sometimes concerned about preservation policies set out by funders. Many were multi-funded and raised the possibilities of conflicting policies imposed by funders, while others were concerned that policies might not be based on a real understanding of the material being generated. Six of the 52 projects surveyed had their own preservation policies and one was drafting a policy. Amongst the rest there had been little discussion of preservation with the lack of resources and the short-term nature of funding as the reason. There was also a worry that too little was known about preservation. On the whole the projects felt that guidance and resources were needed from funding bodies, but that they needed a strong say in the policy and a retention of intellectual property rights. Materials had been deposited in established archives by nearly a quarter of the projects. None of the universities surveyed had any procedures, policies or guidelines covering the preservation of electronic materials at their institutions, although nine of the universities had considered or were considering such a policy.
All of the groups surveyed felt that a campaign promoting awareness of data preservation and a culture of data sharing was required. There was concern that resources were being lost and that more would be lost in the future unless action was taken. There was agreement too on the need for a national policy and guidelines covering preservation of electronic materials. It was felt too that a central national body should lead on preservation policy and monitor all relevant developments in standards and best practice.
It is clear that the form which centres of preservation should take and the responsibility for undertaking this work cannot be addressed until some serious issues have been debated and clarified. These issues centre on the link between ownership of the materials and the rights and responsibilities to preserve. A very complex picture of the funding of research in the United Kingdom emerged, making it impossible to draw a simple line between ownership and responsibility for preservation. Preservation is by its very nature a complex, expensive and above all, a long-term commitment, while many research projects are short-term and shifting in funding, location and staffing. The potential value of the material they produce is inestimable, however, and in many cases its loss will be a loss to the nation's cultural and scientific heritage.
Both dispersed or central models were discussed and the vice-chancellors in particular were clear that Internet technology meant that dispersed models were no barriers to access provided that central information (cataloguing) was available
All groups regarded national funding for preservation of electronic resources as essential. The bodies regarded as most suitable to lead were seen to be the British Library, the National Preservation Office, The Data Archive, HEFCE (JISC) and the Consortium of Research Libraries. There was however, concern that such funding would result in "robbing Peter to pay Paul". The vice-chancellors in particular thought that new sources of funding including the Lottery Fund and the possibility of a subscription service to provide an education and advice programme should be considered.
In addition to the central issue of ownership and intellectual property rights, other issues which need to be addressed include:
* impartiality and quality control over secondary analysis
* ethical considerations in secondary analysis: the improper use of data
* respondent confidentiality
* better selection procedures to be able to predict future needs
* security of preservation techniques
* standards for preservation formats and media.
The case for the retention and preservation of research data which are of continuing use as a shared resource was clearly endorsed by the communities surveyed in the course of preparing this report. The following recommendations address the issues which were seen to stand in the way of retention and preservation. In summary, these recommendations suggest that the development of a national policy on research data would assist research councils, universities and research centres to formulate their own data policies thus ensuring that data which constitutes a national resource are preserved. The recommendations fall into three broad areas:
* the development of national guidelines covering the key areas of concern
* the development of standards to allow kitemarking of centres where research data can be managed and preserved in the long term
* the development of a national policy on research data and dissemination of information about this national policy.
Although the European Union has led the way in commissioning guidelines for the handling of electronic data and records[1] this report shows an urgent need for guidelines which are developed specifically to cover the selection and preservation of research data for sharing and long term preservation. Such guidelines should be generic in nature but should reflect the research structure in the UK and should allow for adaptation and development by particular research funding and research organisations to meet their particular needs. The guidelines should provide a basis for funders of research to formulate data policies which will cascade through their research contracts to practices adopted by researchers producing or re-analysing research data.
Recommended: the commissioning of a series of guidelines to cover the following areas:
* selection and appraisal of electronic research data
* copyright and intellectual property as they affect electronic research data[2]
* confidentiality and ethical considerations in making research data available for secondary research
* technical and management standards for the long term preservation of research data and metadata.
After guidelines the most pressing need emerging from this study was to establish standards which would enable a network of accredited centres for the preservation of research data to be set up. This report recommends that the National Preservation Office or an agency appointed by them should undertake the "kitemarking" of centres which adhere to standards to be drawn up as well as providing training, documentation and auditing to assist them to reach this standard. The funding of preservation was a topic which came up very frequently in the course of this investigation. Adherence to standards allows a case to be made by centres for their preservation services to be recognised and properly resourced.
Recommended: the development of a document analogous to the Public Record Office's Beyond the PRO[3] setting out minimum standards for suitability as a centre for long term preservation of research data. Once minimum standards are established, a system of training and auditing is required to sustain them. This system will enable research funders and research organisations to to choose those centres which they judge, against the standards set out, most suitable for the long term preservation of their research resources.
It is believed that the combination of the two broad recommendations made will lead to a series of practical outcomes which will meet a majority of the concerns felt by the communities surveyed in this study. However, neither recommendation will be effective unless their outcomes are widely explained and publicised. The final recommendation therefore concerns the need for discussion, education and publicity at the highest level which should both influence the development of the guidelines and accreditation standards and will ensure that they are widely known.
Recommended: the development of a national data policy for research data which takes into account, draws upon and influences the guidelines and standards recommended above. The National Preservation Office should call a national forum to include the research councils, other major research funders, representatives of the Committee of Vice-Chancellors and Principals, the major data centres, the British Library and JISC to deliberate upon national policy, discuss drafts of the documents recommended above and ensure that the results are widely and fully disseminated and acted upon.
The implementation of these recommendations will fulfil the needs for guidelines, accredited centres and education and publicity which will ensure the effective and economical preservation research data as a national resource.
ADAM An eLib centre for Art, Architecture, Design and Media resources
AHDS Arts and Humanities Data Service
BBSRC Biotechnology and Biological Sciences Research Council
BHF British Heart Foundation
CogPrints An eLib centre providing a cognitive sciences archive
CTI Computers in Teaching Initiative
eLib Electronic Libraries Programme
EPSRC Engineering and Physical Sciences Research Council
ESRC Economic and Social Research Council
HEFC Higher Education Funding Councils
HEFCE Higher Education Funding Council for England
ICRF Imperial Cancer Research Fund
JISC Joint Information Services Committee (of HEFC)
JTAP JISC Technology Applications Programme
Leverhulme The Leverhulme Trust
Lister The Lister Institute of Presentive Medicine
MRC Medical Research Council
NERC Natural Environment Research Council
Nuffield The Nuffield Foundation
Rowntree The Joseph Rowntree Foundation
TLTP Technology Learning and Teaching Programme
VADS Visual Arts Data Service
Wellcome The Wellcome Trust
The DLM Forum on Electronic Records which met in December 1996 in Brussels under the auspices of the European Union brought together experts from industry, research, administration and archives to discuss the "memory of the information society". An outcome of this meeting was Guidelines on best practices for using electronic information: how to deal with machine-readable data and electronic documents. (Luxembourg: Office for Official Publications of the European Communities, 1997).
[2] A recent report Copyright and confidentiality: final report to the Economic and Social Research Council by Allen & Overy, 20 March 1998commissioned by the Economic and Social Research Council found that ownership of copyright in old questionnaires re-used, in individual questions, in answers provided by respondents, the database of names and addresses contacted, the database of analysed data and the publication arising from the study might all be different.
[3] Beyond the PRO: Public Records in Places of Deposit (The Public Record Office, 1994)