February 9, 1996

Mark Greengrass sees great opportunities but also a distinct threat in the development of electronically stored and revised humanities texts.

How much humanities scholars still rely on the great text editions of the past! Many of them were produced more than 50 years ago, relics of an age of scholarship where the work of a lifetime was unconstrained by the five-year accountability of the research assessment exercise and the economics of printing were very different.

For every completed modern monument to the edited text (the Gladstone Diaries, for example) there are as many, if not more (the Papers of Jeremy Bentham, the Papers of Charles Darwin) which are struggling towards fulfilment, like salmon swimming upstream, against the odds of a world where monograph publication has a swifter maturation, a greater kudos, and the weight of economic logic behind it.

The advent of the electronic text has changed things for good - and to some extent for ill. The greatest quantity of electronic text generally available is, of course, now on the Internet. Yet for the humanities this is not, typically, online commercial databases but teaching materials, private research collections, oddments selected (without the gaps always indicated) for disparate purposes (not always evident) from sources (not always stated) from editions (often out of date) on editorial principles that have simply not been thought through. So it is not surprising that, for humanities text purposes, the CD-Rom continues to be a very significant player within a rapidly growing domain.

It may not come free across the ether but the text which it contains has, or should have, added value. This value may exist in terms of the ability to compare and collate different manuscript or printed versions of a text. It may come in the form of a greater degree of sophisticated searchability than Internet software provides. Or it may be provided in the form of new editions of a corpus previously unavailable in print or unsatisfactorily provided for in the past. With the spread of CD drives into offices and homes and the advantageous economics of production of the discs, a renaissance in edited text is just around the corner.

Already, the first electronic editions of texts which have long been available in good scholarly editions are commonly available on CD. Scholars can now search the corpus of Greek and Latin texts (PHI 5.3 and PHI 6 from the Packard Humanities Institute), the early Christian Fathers (the Patrologia Latina), the ancient Hebrew texts of law and literature (the Judaic Classics Library) and the corpus of English poetry up to 1900 (the English Poetry database) in most academic libraries. Along with their choice of eight recordings to take to Sue Lawley's desert island (when will it acquire a CD player?), they can take the Bible in 13 different editions (the Bible Library) and the complete works of Shakespeare (the Shakespeare Study Guide), both with accompanying commentaries and reference works. In many respects, these CDs take their place alongside the library catalogues and bibliographic indices which are beginning to transform humanities research.

Now, also, we are beginning to see the appearance of texts which have not been available in print, or presented in ways which printed editions could never undertake. It is a development in which the United Kingdom has a narrow lead but in which projects tend to be multidisciplinary and sometimes multinational. So the first publication of all 58 of the pre-1500 manuscripts and printed editions of The Wife of Bath's Prologue is expected from Cambridge University Press in the next few months. This will be the first time that all the various manuscript versions of this problematic and important section of the Canterbury Tales will become available to scholars together. It will offer remarkable facilities for searching and collating the transcriptions and for consulting digitised images of every folio transcribed. In due course, separate discs will make separate manuscripts of the Canterbury Tales available as well as other collections of all the available witnesses to sections of the work.

In the early modern period, the manuscripts of Samuel Hartlib, a 17th-century man of science, were published for the first time a few months ago in a text and image edition from UMI of Ann Arbor, Michigan. This provides a full transcription of more than 25,000 pages of the manuscripts deposited at the library of the University of Sheffield. There are also accompanying facsimile images for each folio, and software appropriate to searching the archive. For the orthography and linguistic usage of early modern English is different from that anticipated by a user unfamiliar with the variety of subjects (from astronomy to zoology through logic and medicine) touched on in the Hartlib collection.

Also flagged for publication this year (from CUP on one disc) are the complete published works of John Ruskin - all 39 volumes of the original edition with its beautiful plates, now difficult to obtain. Perhaps most remarkable in technical terms will be the forthcoming integral publication on several discs from Tel Aviv University, of the issues of the only English language daily newspaper in the Middle East, the Palestine Post, from 1932 through to 1988. Although various newspapers have made recent years available on CD, this will be the first historic run of a newspaper to appear. The software takes the user from a search term to highlighted usages of the term on digitised images of the original newspaper. Even photographs in the newspaper can be readily searched and displayed by this software.

But are the right texts being chosen for the new medium? There has been no strategic decision-making within humanities disciplines and no mechanism to even arrive at such decisions. The results have been arrived at from the serendipitous chemistry of individual interests, library and archive involvement and research council or public-funding outcomes. Individuals have proposed pilot projects on sources which have presented a manageable mix of interesting technical and scholarly challenges.

Libraries and archives have responded on an individual basis, as have research funders and publishers. It is doubtless encouraging to scholars working on the political history of modern Britain to know that part of the Heritage Fund's multi-million pound acquisition of the Churchill archive for the nation includes a budget to digitise it. But aside from the other complex aspects of that acquisition, would this archive have been at the top of their priority list for such treatment? It would be comforting to imagine that the ultimate virtual library of textual resources would be available in electronic form as the distinguished French historian and former director of the Bibliotheque Nationale, Emmanuel Le Roy Ladurie, dreamt of in the 1980s.

In reality, even at our most optimistic, we must expect progress to be patchy and unsystematic. Even more reason, therefore, to ensure some strategic thinking, discipline by discipline, on priorities for digitising text resources. Although the new arts and humanities data service at King's College, London will play a role in stimulating such a process, it needs to be more broad-based.

The British archive establishment needs to think in terms of a new "Rolls Series" for the public sector archives. Modern linguists must work with colleagues in Europe and with an eye on European Union research budgets to see these developments, which have a tendency to be Anglo-Saxon based, spread more widely through the European linguistic register. Historians of political thought and science need to establish their own corpus texts on CD-Rom.

The past is littered with well-meaning and lavishly prospected editions which have been spoiled by inferior editorial standards. It would be easy to repeat this mistake in electronic editions. Although superficially it may make sense to transfer a printed edition into electronic form, too often those which are out of copyright represent the editorial standards of 50 years ago. There is little point in replicating the scholarship of two generations back when to do so would be to sacrifice the chance of a genuine new edition.

Many of the texts rapidly transferred into electronic form for undergraduate use by hard-pressed academics are presented without much concern for the textual complexities which they have necessarily elided for student use. And, despite the preoccupation with "skills-based" postgraduate teaching, how many humanities postgraduates in the UK are editing a text as part of their masters or PhD programmes? There are potential resources to deploy towards the renaissance of the edited text here; which can be the renewing foundations for humanities scholarship in the next century. A failure to seize the opportunity will make it more difficult to find younger scholars capable of creating fully reliable and authentic electronic texts using the text-encoding standards (typically SGML in the humanities) to ensure a reasonable degree of future intercompatibility of input text.

The use of facsimile images has revolutionised the edited text in ways which we are only beginning to appreciate. If we can store and search published and typewritten materials from the modern period, this provides an entirely different editorial platform for them. There is a danger of "facsimile mystique" in the pre-modern period, however. It is not an answer to the old-fashioned problems of orthographic representation of a text merely to provide a digitised image of the original and hope that someone else will undertake the delicate tasks of collation and/or interpretation. And there is a danger of being artificially constrained by the current limitations of screen technology, as to the size and nature of the digital images which we are capturing. These limitations should be a thing of the past by the year 2000.

Where will the renaissance of edited text have taken us by the year 2000? Although there will be a greater use of the Internet as a delivery mechanism, this will make more urgent the need for a sense of common strategy and standards. Publication by CD at least engages the well-intentioned, if not always far-sighted, disciplines of a publisher and the coarse-grained strategies of a market-place. Both would be able to work better if there was an acknowledged place of review for the products of electronic editions. (What better place than The THES?) The pessimist will prophesy that at best we shall have some useful bibliographical, cataloguing and calendaring aids to hand for research purposes. The optimist will proclaim that the electronic text enables humanities research to answer questions which it has never thought to ask before. Despite the unpredictability which that implies, however, we should not lose the opportunity to choose rationally what materials in electronic form are most likely to prove constructive for the humanities into the next century.

* Details of most of the electronic text databases mentioned are contained in the latest edition of the Gale Directory of Databases (vol ii).

Mark Greengrass is reader in history at the University of Sheffield and a director of the Hartlib Papers Project.

