Frank Booty investigates how Europe's heritage is being digitised in Asia and the Caribbean. What have Euromoney magazine, HMSO publications, The Bible in English, Diploma of Achievement student book and teacher's guide for the Oxford and Cambridge Schools Examination Board, Editions and Adaptations of Shakespeare, Goethes Werke (Weimarer Ausgabe), and Justis Family Law in common?
Answer: they are all available on CD-Rom. Most of these publications will also soon also be available online.
Euromoney's readership of 150,000 bankers and international financiers can now access all issues published since January 1990.
With 50,000 titles in print, HMSO is one of the largest publishers. It produces some 9,000 new items annually. Late last year it launched Food Safety+, a full text database with graphics on CD-Rom containing key texts of official information on food.
Chadwyck-Healey lists more than 85 titles in its electronic publications catalogue. Costing Pounds 995, The Bible in English contains versions from the Anglo-Saxon to the 20th century, comprising the complete text of 12 editions, seven texts of the New Testament, and two texts of the Gospels.
At Pounds 2,500, Chadwyck-Healey's Shakespeare disc holds 11 major editions of his work, 24 separate contemporary printings of individual plays, selected apocrypha and related works, and more than 100 adaptations, sequels and burlesques from the 17th, 18th and 19th centuries.
Leading edge stuff. But how does the information get on to a CD-Rom in the first place? With Chadwyck-Healey, HMSO and Euromoney the common denominator is the company Innodata. Justis relies on Context, an electronic publisher of legal and official information, to key in its documents. For the Diploma of Achievement this service is performed by Quorum Technical Services and its subcontractor TDS.
According to Julia Bridge, marketing director of Westkey in Cornwall, "The first part can be very expensive. If a publisher is looking at putting 750,000 pages on to CD-Rom, the sums could reach Pounds 1 million. Not a lot of people know where to go. Many publishers choose to go offshore because the keying of data is cheaper." Westkey has some 40 per cent of its 50-strong workforce working from home, keyboarding and checking.
Innodata is an international company with head offices in New York and European operations based in Bristol. From here a leased line links up with production facilities in Manila in the Philippines. Taking advantage of the lower cost of labour in offshore production facilities is said to save clients 50 per cent, compared to equivalent work done in the United States and Europe.
"Our Fastkey service offers a 12 hour turnaround time for projects that have to be completed overnight," says Kenneth Helps, Innodata's vice president for Europe. Competition? According to Mr Helps it comes from India, China, Vietnam and Sri Lanka.
Innodata's optic fibre line carries voice, text, and scanned images. Data can be returned in a range of formats including plain ASCII text, SGML (suitable for highly structured documents such as reference works), and Adobe Acrobat (which substantially preserves the visual appearance of a document when viewed on different kinds of computer). The company's other clients include British Gas and publishers Elsevier, Harrap and Derwent.
Mark Hudson, Euromoney's electronic publisher, says: "Publishers looking at the CD-Rom route have to consider whether to sell or give away their products, what quality is wanted, and look for a selling point - like a search vehicle which will identify phrases, etc.
"Getting the data on is the first stage and this will depend on several factors too. Straightforward black and white ASCII text would cost from 50p per 1,000 characters or keystrokes for large-scale work to about Pounds 1 per 1,000. Capturing images in colour will depend on how the images are to be used, and the quality, which could derive from eight to 256-colour palettes. For fully accurate text the price would be from Pounds 2 to Pounds 4 per 1,000 strokes, depending on the application."
Then the publisher must decide whether the users will want to search the text, or just browse. The search software could be Acrobat (which is free to readers but not to publishers), Topic, Conquest or Dataware.
"We used Innodata for text and image capture, and Verity's Topic as the search engine, and customised that and the interface using a software database company," says Hudson. Reliability was a key issue in choosing the company Charles Hazell, sales manager at Offshore Keyboarding, says: "We rekeyed all copies of Nature published over the past three years for Macmillan last year. We're also rekeying all medical and surgery journals for the publisher Ovid of New York, for online access by doctors worldwide.
"Rekeying work is labour intensive, dictating the need for a low labour cost site. Most of our work is done in the Caribbean. We've just handled our one millionth page for a US university - we're converting complete libraries into electronic libraries with images and full text for online access."
"All reference libraries will have to be electronic within ten years," says Hazell. "There's a lot of money at stake in the market and the big publishers like Elsevier and Macmillan have to move fast or be eclipsed by new start-ups who spot the opportunities."
Cheltenham-based Quorum started up some 13 years ago when the printer Linotype Paul ceased operating in the town. Now its business includes desktop publishing, typesetting, graphic design, printing and database publishing - where massive databases of reference information are transformed into printed books. Database publishing is a useful option for publishers who already have their data in digital form; but many are more interested in the opposite process, moving their assets from printed to digital media. Says managing director Brian Hayward, "We got into CD-Rom with the student book and teacher's guide through a recommendation from a printing company. It's small business currently, and complementary to typesetting. This first CD replaces a fat manual of over 600 pages. We're now expecting to do catalogues for the electrical and motor industries on CD-Roms." The Diploma of Achievement disc contains the relevant printed material in Portable Document Format (PDF) plus an Adobe Acrobat Reader for use on PC.
Quorum uses TDS of Tewkesbury to "stuff" its discs. TDS has up to 30 women double keying at home. It is probably more expensive than going offshore, but managing director Barry Townsend claims to give a faster turnaround than most, with "more control".
The production and quality control methods for most CD-Rom productions yield an accuracy of 99.995 per cent or better, which translates into one error in every 20,000 characters. All companies quote similar statistics. One of Innodata's European clients needs to have medical journals indexed and online before the competition. Thanks to electronic communications and Innodata's expert staff - the company has specialist indexers in medical, scientific, financial, and technical fields - the lead is maintained.
On some CD-Roms, Chadwyck-Healey's Shakespeare product for example, SGML coding is used to identify different structural elements in the text. In a play, elements distinguished by the encoding scheme include scene, act, speaker, stage instructions, and list of characters. Each of these elements can be searched or manipulated separately. SGML provides a standard method of text capture and description, enabling the encoded data to be interchanged or combined with other SGML coded texts.
Users can merge texts from the Shakespeare database with other SGML coded texts for analysis. These could come from other databases such as English Verse Drama. Texts can be handled not only with the software provided on the disc but with any SGML-compatible software.
The full text of 130 years of The Law reports is now being published on CD-Rom by Context. In printed form The Law reports, which have been published since 1865, comprise 753 volumes, amounting to some 480,000 pages, 200 million words or 1.5 gigabytes of data.
"The data quality for this job is very high," says Context's marketing director Michelle Green. "We don't use a go-between, but have a personal contact with the company doing the data capture. We've experimented with scanning and optical character recognition (OCR), but you can only do this with good paper. For example the paper during the war years was very thin and can't be scanned."
Problems also arise in checking quality, as errors in one scan could be repeated. So all scans were checked against the same text keyed in by hand. Hypertext links were checked by hand in the UK.
Context uses search software from Fulcrum Technologies, which has been licensed by such companies as Novell and Corel for incorporation in their office software. Fulcrum's application programming interface allows the CD publisher to add its own features.
Cases are searchable by date range, case name, or reference or court. The publisher has added hypertext links which allow users to retrieve pertinent cases, both previous and subsequent to the current case. Printed law reports, of course, do not usually contain references to cases tried at later dates.
The key decision for many publishers is whether to go for CD-Rom, online, or both - and in which order. Chadwyck-Healey chose the CD-Rom route first. Now the company is moving to a World Wide Web delivery architecture for the software for many of its larger databases. Its Periodicals Contents Index is also to be launched as an online service by the end of 1996. PCI will comprise more than nine million article records or 12 CD-Roms, making a practical case for online delivery.
For academics and other researchers, the digitisation of the world's cultural assets by armies of keyboard operators is good news. Going to the library to research biblical allusions will be quicker and more accurate. Literary and stylistic analysis, language investigations, and concordance compilations can all be achieved in a fraction of previous times.
Is this cultural wealth founded on exploitation? Questioned on the ethics of employing offshore keyboard workers, companies argue that the people doing this work in low-wage economies are highly paid compared to their compatriots.