Peter S. Graham, Associate University Librarian
Rutgers, the State University of New Jersey
Introduced by Timothy Hobbs, Librarian, University of Leicester
Leicester University, March 19, 1997
Recently we were informed by a helpful local that Leicester was " not a tourist mecca." After the past 24 hours including a thoroughly enjoyable evening at the Haymarket Theatre, a noontime wind quintet at the Leicester Cathedral, a visit to the Roman Baths and a sampling of the delights of the Asian restaurants on the London Road, we can only say that those of you who are here for the first time should plan on returning and those of you who are here for a long time should invite more friends.
As a personal matter I want to note the passing of Paul Evan Peters, an early speaker in this series. He was a former colleague and a good friend of mine and, as those of you understand who knew him, he was a good friend of many people. He died in December at the age of 48. As someone who was an inspiration in the world of digital information, in part as the Executive Director of the Coalition for Networked Information, he was important to all of us and not least to me.
The distinctive role of research libraries remains in many ways the same in the electronic environment as in the print environment. In the United States we learn in library schools a paradigm about libraries: we acquire information, we organise information, we make it available and we preserve it. For research libraries digital information imposes two significant changes on the way we carry out the research library role. First is in collection development, about which I'm not going to say very much. There are new relations to be worked out between selection and acquisition and the continuing responsibility of research libraries. They can be entirely different actions in the print world: we select a book, we acquire it and we preserve it each as decisions separated in time. In the electronic world these actions need to be thought of as immediately associated: that is, the act of selection of a digital resource may become the research library's statement of responsibility for it even though it may not be owned. The many implications of the binding together of selection and preservation are not however our topic today.
The preservation role is the distinguishing activity of the research library, because it makes information available in the future as well as the present. And the second major change in the digital research library role is in the preservation activities themselves. But what is it that we in research libraries preserve? Let us answer this question by noting the origins of the bookish culture that we have inherited. Then we can discuss some of the cultural challenges to this bookish culture, which with the technical problems create significant barriers to our ability to preserve it.
In 1993 Ivan Illich published In the Vineyard of the Text . It's a somewhat self-indulgent book, for Illich is a bit in love with his own footnotes, but still is a useful summary of the movement in Medieval Europe from the word to the text. Illich describes the origins of the bookish period of Western European culture from about 1250 to the present and he claims, as do many others, that it is now ending. He apparently was stimulated into writing his book by fear of losing bookish culture to the computer screen, a fear he shares with Steiner and many other critics though not with me as you will hear. "Bookishness" by the way appears to be a coinage of George Steiner's, and very broadly speaking means the way we have used and read books from the medieval period to the present when the screen and its usages become prominent.
Illich is trying to describe how western bookish culture came into being in the golden age of the Schoolmen, that is, about the twelfth to thirteenth century. Monastic reading and learning of the early middle ages, directed at attaining wisdom and illumination through repeated study of canonical works, was superseded by Scholastic reading and learning. Monastic reading was characterised by vocalisation, and the hum of monks reading aloud would have been a familiar sound. Monastic reading was also characterised by the use of works as received authorities, that is as the Word. These words were read straight through and memorised and were considered a primary source of wisdom and illumination. In the figures on the illuminated manuscripts of this earlier time, the illumination of figures proceeds from the gilded saintly figures themselves, not from some external source. Illumination to an early monk meant the inward elevation of the soul toward God, away from the darkness experienced by Adam and Eve when they were cast out of the Garden of Eden; it did not mean the sense of elucidation that we mean when gaining understanding of an important new idea.
For the purpose of one's becoming illuminated, the book was seen as a continuous text, rather like a landscape or garden or a vineyard to be traversed. The manuscript books of this time physically reflected this understanding as they were written on the page with little or no word spacing, no paragraphing, no highlighting and no internal guides. The incipit and the explicit at the beginning and the end were all one had for identification of a work, rather like the beginning and ending notes of a long musical piece. Indeed the work was to be read through and memorised as an entire piece, rather as we now listen to a complete symphony or sonata.
The movement towards Scholasticism occurs in about the 12th or 13th centuries. The Scholiasts moved from treating works simply as received authorities, the Word, to considering them as texts to be studied. Disputatio was added to lectio .It's the Scholastic approach to texts that leads George Steiner to speak of the bookishness we still practice today. The earliest Schoolmen included Peter Abelard, who compiled the Sic et Non, and Gratian of Bologna, who compiled the Decretus , or The Concordance of Discordant Canons. From the titles alone it is evident that these authors are now comparing texts rather than simply accepting them. They have begun the collation of seeming contradictories, the first step toward their rationalisation rather than acceptance.
I won't tease you further with discussion of Scholastic argument, in spite of the eager faces I see upturned before me. But note that the forms manuscripts took from about the twelfth century became very exciting from a bibliographical point of view. Concordances appeared, along with alphabetic indexes, tables of contents, lining in red for emphasis, underlining, paragraphing, titling of chapters, and different letter sizes. Marginal references now begin to appear in quantity. All are aids to study and tools for treating the work as a text rather than as an object. In addition, the monks began reading silently to themselves. As Illich says, "the schoolmen no longer approach the book as a vineyard, a garden or the landscape[;] the book connoted for them [instead] the treasury, the mine, the storage room."
In 1973, the German scholar Rolf Engelsing developed the idea of the leserevolution -- the reading revolution. Engelsing described a movement in 18th century Germany from what he called intensive reading to extensive reading. From the intensive reading of a few standard texts such as the Bible, readers moved to extensive reading of large quantities of material such as newspapers, broadsides, journals and sermons. David Hall has found much the same thing happening in 17th century America. Other scholars are sceptical that the change was so marked, but the descriptors of "intensive" and "extensive" have become useful.
From what Illich is saying, it seems arguable that there was also in the 12th and 13th centuries a movement from intensive reading of relatively few texts to more extensive critical reading of multiple texts. This may be a constant feature of cultural change. Often today we hear that computerised reading is fragmentary and that hypertexts prevent thoughtful narrative reading. In a recent issue on libraries of Daedalus, from the National Academy in the United States of America, an article by a law student described his transition into law school and work with electronic texts. Jamie Metzel writes, "the computer-facilitated ability to search so quickly and directly for so precise a piece of information seems inherently threatening to the idea of the book as an integrated whole.... I feel myself being conditioned to think of articles and books less as integrated narratives and more as groupings of small bits of information that can be accessed independently." Jamie Metzel may be right and this may be true, but having a historical perspective on reading revolutions of different times and social classes helps keep the complaint in perspective. It is likely that once again we are adding an arrow to our quiver rather than removing one.
In 1140, Hugh of St. Victor, in his pre-Scholastic treatise on reading, insists on patience and leisurely tasting of what can be found on the page. At almost the same time the Scholastic Peter of Lombard wants to give his pupils all the help he can to locate with ease and speed what they want to read in the book. Lombard's approach (helping the Jamie Metzels of the 12th century) is the bookishness that Steiner talks about, and our bookish culture is what we are used to today, I suspect even in much of the electronic environment.
"The new medium", he says, "...will change scholarly discourse and ... we will retrace our steps to the intellectual culture of the Middle Ages.... Works of scholarship produced in and through the electronic medium will have the same fluidity -- the same seamless growth and alteration and the same de-emphasis of authorship -- as medieval works had.... A work of scholarship mounted on the Internet will belong to the field it serves and will be improved by many of its users. Scholar-users will add to the work, annotate it and correct it and share it with those with whom they are working."
He goes on: "In the fluid world of the electron the body of scholarship in a field may become a continuous stream, the later work modifying the older and all of it available to the reader in a single data base or series of linked data bases. The prospect is exciting...."
Well, some of us are more reserved. He reminds me of Professor Harvey Wheeler, at the University of Southern California, whom I first ran into at a library automation conference in the United States in 1988. He gave a speech there on what he called "the dynamic document". He advocated a document which could reflect one's changes of mind and view over time. One could publish a document electronically, then change one's mind, then go change the document and make it available in the revised form. The implications of this for the continuity of scholarly argument didn't seem to disturb him at all.
Chodorow's predictions of new scholarly forms give us both insight and pause. We can forgive him overstating the case a bit in the interests of making some very serious points, but he does overstate the case, for example in making a monad out of scholarship's present and future variety.
He raises a question of considerable importance to us as librarians when he suggests that networked intellectual discussions will be lost when the discussion takes a break. He says, "so long as the discourse is lively, scholars and librarians who serve them will port it from system to system." But, he says, "who will use up space and effort keeping a data base alive during periods of intellectual downtime?" - the concept of 'intellectual downtime' is one that I really like. The first thing that comes to my mind is Irish monks but let's set that aside for the moment. Who will? Librarians will (or should, for that's our job), using principles of collection development and preservation as we have for the past century or two. Who after all is still keeping Patrologia Migne and the church fathers alive at great expense from Charles Chadwick-Healey during what can only be called the intellectual downtime of Dr. Chodorow's field, canon law?
Chodorow and Harvey Wheeler argue that scholarly discourse will become more fluid and less identifiable. But they both leave out of their argument what the Schoolmen taught us in the later Middle Ages: the continuing value of fixed text in scholarly discourse, the usefulness of chapter and verse, the importance of a solid foundation of reference points on which to ground a continuing discussion -- and the ability of a professional caste to aid scholars in confirming elements of their debate, that is, the work of librarians.
Most of us have at least a popular understanding of the post-modernist claim of the irrelevance of the author of a work, but there is of course a good deal more to it than that. A few years ago the American author Annie Dillard readably put forward what some of the characteristics of post-modern fiction are when she wrote:
"Post-modern fiction, technically as well as thematically has taught us to admire the surfacing of structure and device.... It prizes subtlety more than drama, concision more than expansion, parody more than earnestness, artfulness more than verisimilitude, intellection more than entertainment. It concerns itself less with social classes than with individuals and structurally less with individual growth than with pattern of idea. Instead of social, moral or religious piety or certainty and emotional depth, post-modern fiction offers humour, irony, intellectual complexity, technical beauty and a catalogue of the forms of unknowing."Linda Hutchin in 1995 published "A Poetics of Post-Modernism." In it she describes several other distinguishing characteristics, in particular the notion that the concept of process is at the heart of post-modernism. It's this concept of process that is potentially the most worrisome to us as librarians; we heard this concept of process both from Stanley Chodorow and from Harvey Wheeler. If all is process and there is no final product, if in fact there is no text in this class, just what is our role supposed to be? If the author has been dismissed and reception theory teaches us that the work itself is of far less importance than our individual performance of it, just what is it that libraries do?
A number of years ago Philippe Sollers, editor of the well-known critical journal Tel Quel , said in a similar context: "it is thus within language, now grasped somehow mathematically as our milieu of transformation, that we must pose the problems that concern us. And outside the notion of a product, for to the degree to which you valorise the product you posit the existence of the museum and sooner or later of the academy. You favour a collection of things, arrested and frozen in the pseudo-eternity of value in contradistinction to the way in which what we are looking for ought to lead us on beyond all values."
Again, this is a notion hostile to the museum and the product as opposed to the process. It is hostile to the notion of the archive, the artifact and the stable text. Linda Hutchin in her more recent book in 1995 goes on to make a specific comment on the archive (the archive is a term of some interest in the critical theory community). She notes "Derrida's famous contention that there is nothing preceding and nothing outside the text and Foucault's general unwillingness to accept language as referring...to any first order reference, anything that is that would ground it in any foundational 'truth.' ... This kind of post-structuralist thinking ... has obvious implications.... It radically questions the nature of the archive, the document, evidence. It separates the (meaning-granted) facts of history-writing from the brute events of the past."
Let's look at what Derrida himself has recently written about our own activities. It is not encouraging.
In 1996 Derrida published a book called Archive Fever: a Freudian impression , translated by Eric Prenowitz, at the University of Chicago; it apparently reflects a lecture in 1994 in London. His book focuses on psycho-analytical issues involving the archive of the mind in very Freudian terms. Yet in numerous asides he suggests that his topic is concrete as well. For example, he refers to technology as it would have affected the development of psychoanalysis. I'm quoting a paragraph here:
"One can dream or speculate about the geo-techno-logical shocks which would have made the landscape of the psychoanalytic archive unrecognizable for the past century if...Freud, his contemporaries, collaborators and immediate disciples, instead of writing thousands of letters by hand, had had access to MCI or AT&T telephonic credit cards ... computers ... faxes ... teleconferences, and above all E-mail."
Here he is talking about real records of a real discourse. He goes on and here perhaps you may share with me a certain schadenfreude as an important mind discovers the stunningly obvious.
"[E]lectronic mail today, even more than the fax, is on the way to transforming the entire public and private space of humanity, and first of all the limit between the private, the secret (private or public), and the public or the phenomenal. It [e-mail] is not only a technique...: at an unprecedented rhythm, in quasi-instantaneous fashion, this instrumental possibility of production, of printing, of conservation, and of destruction of the archive must inevitably be accompanied by juridical and thus political transformations. These affect nothing less than property rights, publishing and reproduction rights." M. Derrida wakes up and smells the coffee. Jacques Derrida, meet Ann Okerson of Yale; Bernard Naylor of Southampton, meet Jacques Derrida.
But let's follow his argument about the nature of archives. He notes the derivation of archives: "Let us...begin...at the word 'archive'....Arkhé, we recall, names at once the commencement and the commandment." He notes the derivation from archon, the Greek magistrate; he goes on: "This name apparently coordinates two principles in one: the principle according to nature or history, there where things commence ...but also the principle according to the law, there where men and gods command, there where authority, social order are exercised...." Derrida thus relates the archive fundamentally to issues of power and control, in my view entirely appropriately. The library or the archive does represent a set of views about what culture should be preserved, inevitably reflecting a cultural stance at a point in time affected by matters of social power, funding, librarian ideology and accessibility of materials to be preserved.
Derrida then defines archive fever, the title of his book. First he makes reference to a Freudian concept of the "mystic pad," a means by which memory is conserved in the psyche. He goes on, and I don't promise this will be lucid:
"The model of this singular 'mystic pad' also incorporates what may seem, in the form of a destruction drive, to contradict even the conservation drive, what we could call here the archive drive . It is what I call ... archive fever . ... [T]here is no archive fever without the threat of this death drive, this aggression and destruction drive.... There is not one archive fever, one limit or one suffering of memory among others: enlisting the in-finite, archive fever verges on radical evil."
Most of Derrida's book is about the psychoanalytic meanings of archives, of memory and of meanings. With Derrida's concrete references to real archiving situations in the same text, however, this reade'rs perception is that we should be worried about his views of real archives and archiving. And so am I worried about the views of works and texts that are trickling down from post-modern critics to a less careful scholarly public. It can become too easy to dismiss the importance of foundation texts and of accurate citation, or to lose sight of the importance of the bookish approach to texts. The bookish culture, I say it again, recognises the importance of fixed texts and the importance of a solid foundation of reference points in which to ground a continuing discussion.
The post-modern emphasis on process rather than product can allow the integrity of the text to be excessively undervalued. Let me provide a counter-example however in the work of Jerome McGann, the textual editor and romantic and Victorian scholar at the University of Virginia. In his recent essay, "The Rationale of Hypertext," available on the Web, he's described the need for a hyperarchive to fully realise the potential of scholarly editing. This is a scholar who values the importance of the texts but wants to approach them in their full multiplicity. As he says, we no longer have to use books to analyse and study other books or texts. "Editing in codex forms", he says, "generates an archive of books and related materials." Presently the print archives are becoming too voluminous, as we know. What is needed is the hyperarchive, a hypermedia archive.
What McGann calls for in editing texts is not simply a critical edition but a critical archive, as that is the only tool that can provide for the full study of multimedia works. For McGann the hypertext archive must provide for multimedia from the start, as texts always have visual and sonic qualities as well as intellectual or textual. William Blake may be an obvious case but so are the 19th century poets of pictures, such as Letitia Elizabeth Landon. and so, says McGann, are the modernist poets, influenced by the type design of William Morris and his 19th and 20th century followers.
The hypermedia archive McGann postulates is not founded on the fluidity of the individual texts as one might expect from Sollers or Hutchin or Derrida, but on the fluidity of the links between the texts and on the extensibility and expandability of the texts. McGann edited the ground-breaking New Oxford Book of Period Romantic Verse . He's also the creator on the World Wide Web of the Rossetti Project, which he describes as "an archive rather than an edition." It is a collection of texts, critical works, digitised images of paintings and manuscript pages and commentaries. One example of a "bookish horizon" it has been able to escape is the idea of the "definitive text," a concept of classical scholarship that McGann claims makes no sense in our more recent experience of multiple fractured texts such as those of King Lear, The Prelude and The Waste Land. His own project shows multiple texts of Rossetti's works and links them at many points. McGann notes that in the end we can only read one text at a time, that in the end we read in normal space-time, not in any virtual world. That is because our minds are embodied, like the minds of those who created the text. This is a refreshing practicality from someone whom I suspect finds more of use in the works of Derrida and Hutchin than in the works of, say, George Steiner. As he says, our electronic tools now allow a manipulation of the mind's creation in such a way as to establish links between texts that we never could do before when they were in printed books. In the end we integrate the multiple texts and works into our embodied minds in a narrative mode. This tension of attempting the paradise of perfect multitextual comprehension using our fallen embodied mind is a subject worthy of Milton, who shouldst be living at this hour.
The artifact or medium can itself decay. Medium preservation is the concern for preserving the medium on which information is stored such as tapes, disks, optical disks, CD-ROMs and the like, or the book. Copying to other devices of the same kind in the electronic environment is a technique which we know of as refreshing: we refresh a tape by copying from one old decaying one to another.
More problematic than medium decay are the rapid changes in the means of recording, in the storage formats and in the software that allows electronic information to be of use. We need to be aware of technology obsolescence as even more of a problem than medium decay and undertake steps of technology preservation. Rather than simply refreshing we also need to speak of migration, of migrating information forward through technology stages as they become available and as the old technology ceases being supported by vendors in the user community. Major problems arise in thinking about this. Do we automatically migrate everything forward, which is an enormously expensive proposition? Imagine xeroxing the book collection once every ten years. Or do we migrate information forward only when it is called for, after several decades in which the intervening technologies may have disappeared?
There is a third preservation requirement, intellectual preservation, which addresses the integrity and authenticity of the information as originally recorded. Preservation of the medium and of the software technologies will serve only part of the need if the information content has been corrupted from its original form, whether by accident or design. The need for intellectual preservation arises because the great asset of digital information is also its great liability. The ease with which an identical copy can be quickly and flawlessly made is paralleled by the ease with which a change may undetectably be made.
Here are some of the questions that arise for a researcher using electronic information. How can I be sure that what I am viewing is what I want to see? How do I know that the document I have found is the same one which you used and made reference to in your footnote? How can I be sure that the document I now view has not been changed since the last time I looked at it? Note that I am not talking about back-up, rather, it is how we know which version we have or don't have.
In the print world we sometimes speak of editions, printings or states, and there is not normally a problem in the period since the hand press. We properly take for granted the fixity of text in the print world: the printed article I examine because of the footnote you gave is beyond question the same text that you read therefore we have confidence that our discussion is based upon a common foundation. The present state of electronic text is such that we no longer can have that confidence. This is the challenge of the dynamic document presented to us by Stanley Chodorow and Harvey Wheeler.
There are at least three kinds of possible changes. Accidental change, intended change that is well meaning, and intended change that is not well meant, that is, fraud.
Accidental change: a document can sometimes be damaged accidentally, perhaps by data loss during transfer or through inadvertent mistakes in manipulation. For example data may be corrupted in being sent over a network, or between disks and memory on a computer; this happens seldom, but it does happen. More likely is the loss of sections of a document or a whole version of a document due to accidents in updating; I know it hasn't happened to anyone in this room, but it does happen.
Intended change that is well meant: there are at least two possibilities. The changes might result in a specific new version which is a structural update that is normal and expected. New versions and drafts are familiar to us from dealing with authorial texts, for example, or from working with legislative bills, authors' manuscripts or revisions of working papers. It is desirable to keep track bibliographically of the distinction between one version and another. We're accustomed to drafts being numbered and edition statements being explicit. Analytical bibliographers expend great effort on the descriptions of important works in order to make the distinctions clear.
Structural updates or changes that are inherent in the document may also cause changes in information content. A dynamic data base is by its nature frequently updated; Books in Print , for example, or a university directory. In each of these cases it is appropriate and expected for the information to change constantly yet it is also appropriate for the information to be shared and analysed at a given point in time. In print form, for example, Books in Print gives us a historical record of printing, and the directory tells us who is a member of the university in a given year. In electronic form there is no historical record unless a snapshot is taken at a given point in time. How do we identify that snapshot and authenticate it at a later time?
Intended change that is fraud: the change might be to one's own work to cover one's tracks or to change evidence for a variety of reasons, or it might be to damage the work of another. In an electronic future the opportunities of a Stalinist revision of history will be multiplied. An unscrupulous researcher could change experimental data without a trace. A financial dealer might wish to cover tracks to cover improper business or a political figure might wish to hide or modify inconvenient earlier views.
Imagine if you will that in the United States the only evidence of the Reagan Iran-Contra scandal was in electronic mail, or that the only record of Bill Clinton's fund-raising activities was in electronic form. Consider the political benefit that might derive if each of these parties could modify their own past correspondence without detection; then consider the case if each of them could modify the other's correspondence without detection. Not only they but the public need protection from such possibilities.
The solution is to fix a text or document in some way so that a user can be sure of the original text when it is needed. This solution we can call authentication and it is very important in the business, political and espionage communities; once again, as so often, libraries can take advantage of technology developed for other purposes. There are three important electronic techniques that can be used for authentication: hashing; cryptography and time stamping. Let me note in passing that digital signatures are not likely to do the job; they rely upon secrecy and upon individuals, neither of which can be depended upon if we're preserving information for the long term.
Hashing is a shorthand means by which we can establish the uniqueness of a document. Hashing depends upon the assignment of arbitrary values to each portion of the document and thence upon the resulting computation of specific but contentless values called hash totals or hashes. They are contentless because the specific computed hash totals have no value other than themselves. In particular it's impossible or infeasible to compute backward from a hash to the original document. The hash may be a number of a hundred digits or so but it is much shorter than the document it was computed from. Thus a hash has several virtues: it is much smaller than the original document; it can (if you wish) preserve the privacy of the original document, and it uniquely describes the original document.
Figure 1 allows me to give a simplified description of how a hash is created. If each letter is assigned a value from 1 to 26 then a word will have a numeric total if its letters are summed. In the first example EAT has the value of 26, if A=1 and E=5 and so forth. The problem is the word TEA, composed of the same letters, has the same value in this scheme. The scheme can be made more complicated as shown in the second pair of examples, where the letter values are also multiplied by a place value. In this scheme the two words of the same letters end up with different totals: A is 1 but it's also in the second position so it's multiplied by 2 to get a total of 2. For the sake of illustration the numbers at the right are shown as summing to the value 52 at the bottom. Let's say we just add all these up; the total is 52 (in fact the total is 152 in this case, but the leftmost digit can be discarded without materially affecting the fact that a specific hash total for a document consisting of these two words has been found). These totals are contentless, private and in this simple example reasonably descriptive of the particular words in the document. This is a very simplified description of a process that can be made excessively complicated for human computation. It is quite easy to compute quite complex hashes for any kind of document using cryptographic techniques which still preserve the public nature of the document. Paradoxically these cryptographic hashes are beyond the reach of super computers to phony up or break for the foreseeable future.
Electronic time stamping takes the process a step further. Time stamping is a means not only of authenticating a document but authenticating its existence. It's analogous to the rubber stamping of an incoming mail with the date and time it was received. An electronic technique has been developed by two researchers at BellCore in New Jersey, Stewart Haber and Scott Stornetta . They've recently spun off a small start-up company (Surety, Inc.) which has reached agreement with the Research Libraries Group to experiment with their procedure. Their technique depends on a mathematical procedure involving the entire specific contents of the document, which means they have provided a tool for determining change as well as for fixing the date of the document. A great advantage of their procedure is that it is entirely public.
It's useful for the library community which wishes to keep documents available rather than hide them and which needs to do so over periods of time beyond those that we in libraries can immediately control. Time stamping depends upon hashing as the first step (Figure 2). The user who creates this hash then sends this hash over the network to a time stamping server. The time stamping server uses standard, publicly available software to combine this hash with two other numbers: a hash from the just previous document that it has authenticated, very likely randomly received, and a hash derived from the current time and date. The resulting number is called a certificate and a server returns this certificate to the author. The time stamping server performs one other important function: it combines the certificate hash with others, obviously with the next document to come in, and each week a summary hash is published in the Sunday "Personals" column of The New York Times, mostly as a proof of concept (Figure 3).
The public nature of this number assures that it cannot be tampered with. They speak of their time stamping technique as depending on two qualities: the encryption technology and a "widely witnessed event." A common widely witnessed event occurs, for example, when at a lottery the person drawing the balls is televised so that the actual numbers are seen not to be rigged. (I'm told this has been gotten around in the United States, I'm sorry to say. Everything gets gotten around in the United States.)
The widely witnessed event and the encryption techniques assure that the time-stamp cannot be tampered with. There are in fact electronic techniques just as good as the technique of publishing in the Times .
Now let's consider a reader who wishes to determine the authenticity of the electronic document before her. Perhaps it's an electronic press release from a political campaign or an electronic funds transfer, or perhaps it's the year 2097 and the document is an electronic text by, say, John Major. The reader has available the certificate for the document. If she can then validate the number from the document she can be sure she has the authenticated contents. Using the standard software she recreates the hash for the document and sends the hash over the network with the certificate to the time stamping server, the server reports back on the validity of the certificate for that document.
But let's suppose it is the year 2097 and the server is nowhere to be found. Reader B then searches out the microfilm of the New York Times for the week of the document in question (or the electronic equivalent, which can be published and which libraries can maintain -- that's one of the experiments RLG will do). Then she determines the published hash number for that time period. Using that number in the standard software she tests the authenticity of her document just as the server would.
What I've described are simplified forms of methods for identifying a unique document and for authenticating a document as created at a specific point in time with a specific content. Whether or not these specific tools of hashing or time stamping are those the library community will use in future is still open to question. However, they demonstrate that librarians now have available authentication tools that provide generality, flexibility, ease of use, openness, low cost and functionality over long periods of time on the human scale. Using such tools a user can have confidence that a document being read is the one desired or intended and that it has not been altered without the reader being aware of it.
We are not yet out of the woods. You'll recall that I spoke of Jerome McGann's postulating the need for a critical archive, not simply a critical edition. The hypermedia archive that McGann requires would be organic, growing, flexible, modular and unlimited by current technology. He's given us a good example of what he means with his Rossetti Archive now on the Web at Virginia. What McGann postulates must, by its nature, be dynamic from its origins. This requirement presents immediate and continuing problems of integrity and authenticity. We have begun solving the matter of assuring authenticity for static files with hashing and incription methods: we may have tools, and they may cost money, but there may be a way. Assuring the authenticity of a dynamic set of files will call on further imagination and more flexible tools.
What is at issue is not only version control, a relatively simple matter for static file assurance, but also core data assurance. Assuring the core data means that different formats of a text are also assured: versions that have links supplied; versions that are marked up or versions that exist as manipulated or prepared by different engines, for example different word processors or Page Maker 6.5 or HTML. We even want -- let's not limit our requirements -- we want assurance that a bitmapped page image matches the formatted text extracted from it. We don't have these tools in sight yet, but they are needed.
At one point McGann describes the hypertext archive as a multiplicity of editions supporting the authority of the work as understood by their analysis. He compares this to the "fabulous circle whose centre is everywhere and whose circumference is nowhere." This circle, of course, as described by Vaughan and Treherne and Sir Thomas Brown, is God. If we liken the cluster of editions around the work to the fabulous circle the question arises: how do we preserve God? This is a larger task even than I wish to set for research libraries. But perhaps it indicates something of the magnitude of the task before us and the necessary hubris required to take it on.
Then Tom Stoppard, in his recent play Arcadia, lets his leading man blithely forego the possibility that we can preserve our culture. There are two leading figures in the play, Thomasina and Septimus Hodge. Thomasina is a child prodigy of about 14, a young woman who is a mathematical genius, and Septimus Hodge is her tutor. Thomasina has just discovered the loss of the library of Alexandria; she's a sensitive person and she bewails this loss. "The great library of Alexandria was burned.... All the lost plays of the Athenians....", she says "How can we sleep for grief?"
Septimus Hodge responds:
By counting our stock. Seven plays from Aeschylus, seven from Sophocles, nineteen from Euripides, my lady! You should no more grieve for the rest than for a buckle lost from your first shoe, or for your lesson book which will be lost when you are old. We shed as we pick up, like travellers who must carry everything in their arms, and what we let fall will be picked up by those behind. ... The missing plays of Sophocles will turn up piece by piece, or be written again in another language. Ancient cures for diseases will reveal themselves once more. Mathematical discoveries glimpsed and lost to view will have their time again. You do not suppose, my lady, that if all of Archimedes had been hiding in the great library of Alexandria, we would be at a loss for a corkscrew?
How do we respond to such trivialising quietism? In part by drawing on our own professional sense that this problem is specifically what we are to deal with. The distinction of the research library and of the research librarian is that we are concerned for the long term. Other libraries and other social functions, such as bookstores and newspapers, join with us in the access and distribution function. But no other agent in society has the responsibility we do for making sure that what is produced remains available in the future.
I will describe the kind of consortial activity I mean by using the Research Libraries Group, RLG, as a case study both in what is needed and what is no longer most relevant. The Research Libraries Group, as most of you know, is based in the US. It has developed an increasingly international membership in the last few years, particularly in the United Kingdom but also on the Continent and on the Pacific Rim.
For my first job in librarianship I had the good fortune to work at the Research Libraries Group in its earliest formative period, in 1975-1978. It was founded by only four research libraries, those of Harvard, Yale and Columbia Universities and the New York Public Library. At the time it was seen as a radically new kind of consortium. I recently sought out some of the public RLG statements from those early days.
There is a notable consistency of RLG's rhetoric in this period. It is striking how relevant these statements are today, almost word for word, and also how distant they are from the current RLG operational mode.
We have a distinctive mission. Not only do we preserve information for future use, we collect materials no one else does and we provide services needed by a relatively small constituency of scholars and researchers. Cost recovery is not possible for the research library except at the margin. The market place as such will not support fundamental research library activities of acquisition, organisation, access and preservation.
These facts hold true for services provided consortially as well as individually. While bibliographic utilities must demonstrate their usefulness through the willingness of users and libraries to support them financially, few such utilities can or should be expected to be as effective as needed based solely on per-unit cost recovery. What research libraries need to do will cost more than can be provided by selling services. In the near term this means that a consortium thinking solely in terms of revenue streams will lead itself away from the fundamental mission of research libraries. It remains true, of course, that it is essential to require adequate income streams for activities that have been understood, solved and operationalized. It is also true that where a consortium can economically provide services which its membership needs, it has the right and duty to charge for them. But to think only in terms of the operational revenue stream is to miss the point of the consortium.
The early and lasting RLG success was in establishing the bibliographical system, RLIN, as the underpinning of all its other program activities. But the establishment of RLIN was not itself achieved on a cost recovery basis: it took investment by universities, by some more than others, and by foundations, each of which saw value to the scholarly community if the task could be achieved. The research library community must now emphasise that our mission is to preserve and provide the human record in print and electronic environments, both now and in the future. We must then argue for the initial funding necessary to achieve this mission. The funding must, once again, come from our own institutions and from thoughtful, farseeing external funding agencies, whether private foundations or government offices. It will not come primarily from fee for service products. Research libraries need a partnership more than a vendor. The partnership must provide useful services at reasonable costs but its principal task will be its own transformation.
Some librarians may draw back from the apparent complexity of the technologies that support electronic information, but these technologies should present no difficulty to minds that presently cope with internet access, corporate authorship and duodecimal collations in half-sheet imposition. The mind capable of describing just how compositor E affected the text of the First Folio is adequate to the task of setting up standards for electronic preservation.
Many of us work with e-mail, list servers, data bases and the World Wide Web and are very aware of the technologies and increasingly aware of the need to manage them. And it is managing that is necessary. There are technical people aplenty who can grapple with the bits and bytes of these issues if librarians give them proper direction. We've seen this happen recentlyas leaders of the Internet Engineering Task Force have turned to the library community for assistance in building network discovery and retrieval tools. The need is for leaders to articulate the requirements for the electronic preservation of the human record and to see that our profession makes it happen. That is the professional requirement and it is the people in this audience, you, who are the most capable of assuring that it does happen.
There is a kind of back to basics quality to our now confronting the electronic environment. To grapple with the ephemerality of electronic information is to answer the abstract question of why we are librarians. We must continue to emphasise our professional obligation to preserve and make available the human record regardless of its form. To do so will be to lay claim to being part of the very current affairs of our society and of our universities. We can then lay very effective claim to the resources we need to carry out this obligation. Finding ways to get these resources is also our professional requirement.
We will do our best in the United States. Good luck in the United Kingdom and thank you very much for your attention.
Peter, thank you very much for a thoroughly stimulating and worrying lecture. I can just about cope with half-page imposition; I'm glad there are technocrats around who can cope with all these bits and bytes. I must say I find that very worrying indeed.
Peter's willing to take questions for half an hour or so before we move over to the University-provided reception. Are there any questions?
You're quite right there is an irony involved. On the other hand, as a librarian, that irony isn't too strong for me. Print has its purpose as has microfilm as well.
I was more worried since you said you were a mathematician that you were going to question the idea of cryptographic hash totals not being broken in the foreseeable future. Of course, they may be, but it takes an enormous amount of energy and power to do so. In fact Haber and Stornetta have come up with a theoretical, and presumably ultimately practical, way of getting around that. They propose to envelope hashes at some future time when supercomputers make breaking present codes trivial. They propose enveloping the existing validation in a then-unbreakable validation time stamp; the envelopes might get successively deeper over the centuries but that becomes a trivial matter to unwrap.
It's complicated and I alluded to this at the very beginning by noting the importance of the decision to make available on behalf of the libraries' patrons a resource that might not be either physically resident in the library or technologically under the control of the library. In my mind, and here I'm getting into the other talk that I didn't give, the decision of a research library to make electronic information available means that the preservation decision is entered upon from the moment of selection, rather than as now a decision that is put off for some time after the point of selection and acquisition. If a library chooses to make available resource X on server Z in some other country and it recognises this as a resource that should be valuable for the long term then the library has an interest in acquiring some rights over that information so that it can take the necessary steps to preserve it. These may simply be consortial rights in cooperation with the supplying agency.
I've thought myself of terms like trusteeship rights, in dealing with intellectual property matters. Perhaps a library may just buy the information. Then again, some information is available at no cost, so in transferring it to local control it becomes possible to take effective stewardship of it. But that decision at the point of deciding to make the resource available is no longer (as in print) simply a matter of buying it and putting it in the building here. It is a matter of making something permanently accessible, and that changes the way in which the selection decision is made as far as preservationis concerned.
The decision about whether it's a worthy resource: I'm not sure this changes. There's a lot of ephemera out there, and we don't always collect ephemera (whether the John Johnson collection should collect electronic ephemera or not is not for me to say, I suspect not that's not where their skills lie), But there may be some equivalent -- Brewster Kahle is now proposing to archive the Web. He claims now to have over a terabyte of information archiving the Web at a certain point in time. I wish him luck.
Libraries don't do that; we don't collect everything. We make lots of good decisions and some poor decisions. Our libraries as they exist are the result of conscious decisions about what to winnow and what to keep, augmented by individual collectors who 50 or 200 years later give us their weird collections which nobody felt was worthwhile when they collected them but turn out to be enormously valuable; the Pepys collection is an obvious one. Tim, I'm sure you could name others from your special collections experience. Whether this is going to happen in the electronic environment I don't know. What is it that everybody thinks is junk on the Web today that we're all going to want to see in forty years? I don't know.
But the concept of taking responsibility has to be acted upon at some point. The casual conversation that you hear in the world of newspapers and too many computer people, though not all, is "What do we need libraries for? The information's out there on the Net." Well we know it's out there, and it's gone too; here today and gone tomorrow. The act of taking responsibility is a library role that may play out as a consortium taking responsibility. In the near term what we see happening is a number of commercial electronic resources are being bought by consortia of varying kinds. At the moment in the US there are a number of statewide consortia developing, for example in the states of Virginia, Illinois, and California. One of the results is that one library can support this on behalf of the others. We're seeing quite an interlocking set of consortia beginning to develop. I doubt that there will be a single consortium handling all kinds of information.
The outstanding example of the success of electronic publishing that I know of is the Los Alamos Preprint Data Base, which has apparently transformed high energy physics. The real scholarly communication goes on through that data base, which has contributed to it non-peer-reviewed materials at a constant high rate, and you're not part of the community if you're not contributing to that data base; if you want to know what's going on you check that data base. Now that process has to transform itself at some point into the mechanics of promotion and tenure, which is the fundamental behavioural issue in the American academic side.
I've been fortunate to serve on a committee at my Rutgers University on "The Role of Electronic Communication in the Tenure Process." In our report we spend about ten or twelve pages just defining all the e-publishing possibilities, just to make clear that we've done some work and looked at the field, and we've informed the community what the possibilities of publication are, but basically we say that peer review is the issue and the format is not the issue. This is a faculty committee.
There's going to be incremental activities all over the US of this kind and in time, I think, a transformation. Steven Harnad has been the editor of Psycholoquy for 5 or 7 years; it is a peer reviewed electronic journal. It is all simply ASCII text but even in a textual environment it does things that are difficult for a print publication to do, such as publishing immediate responses along with articles. Then you get publications like the Human Genome project. Somehow contributions have to be evaluated for that kind of project; but there's nothing you can hold in your hand, there's no object you can submit to the editor. You're right about it being a behavioral matter. I can't tell you about the timing but all the instruments agree: things are moving forward on a variety of fronts and there will be a change.
This is not the kind of librarianship, with all respect, that Philip Larkin went into in the 1940's; he just wanted to stoke the furnace and then go upstairs and read while people looked at the newspapers on the main floor. He became, I understand, a very fine librarian but that's because something caught his fancy and he decided he wanted to keep up with some kinds of change. If you're willing to do that, I think it's just a matter of jumping in; I know it's a little frightening. I'm lucky enough to have been around technology for a while and am still discovering things that are absolutely new to me. Sometimes I ask myself, "What is going on here?" I don't know, and one just has to fool with it and play with it and do it. Library administrators like Tim here have to allow for play time. (Laughter) You have to understand you have to sit in front of a Web terminal looking at things and saying what's going on here, What's a URL? What's a Java script? Gee, this is very interesting. What's the Web?
Just over three years ago the graphical version of the Web appeared. I'd been hearing about the World Wide Web for about a year or two before that and frankly I didn't know what it was; I couldn't figure it out from the textual descriptions and I didn't have any way of using the Web in the non-graphical mode. Then Mosaic was announced and I downloaded it, started using it for about ten minutes and the scales fell from my eyes. It was astonishing the difference it made in how we all began to look at the intercommunication of information. And what was equally as astonishing was that it had been so well designed that in an hour and a half of elapsed time after I'd downloaded the browser I'd created a simple Web page. It was easy, just no problem. It may take you three hours instead of an hour and a half, okay, but the fact is it's not that hard. Any increment is not that hard. Jumping in right now, if you don't have any electronic experience, is very daunting -- I understand that --but it's time to do it.
For the same reason that Tim asked his question I don't think it makes sense in the electronic environment for a single library to be the repository of technological information -- it's simply dangerous. Fire can destroy a book stack, but a catastrophe can destroy an electronic repository far more easily. One of the things to be worked out in practice over the next few years is: what is redundancy and what does it mean? Is two locations enough? Is forty too many? How many?
And I know that geopolitical distinctions will come into this as well. The United States will have to have certain collections in spite of their being ten of them in Europe; even electronically we just know that. There will be the state consortia I talked about. They are growing up both for regional reasons and in response to commercial opportunity, the opportunity to get lower prices. But they'll take on a life of their own and thirty years from now we'll look back and see consortial structures which have their origins in these local state institutions. The West Coast won't tolerate the East Coast having all the good stuff and vice versa.