For XML files, truth is out there on the internet

Applied XML - The XML Handbook - XML Specification Guide
February 4, 2000

"XML is the universal format for data on the web," says Microsoft, which this week releases the first full XML/XSLT browser, six years after the Mosaic browser for HTML. HTML has been extraordinarily successful in human-to-human communication as it was designed to support just those areas (text, graphics, and interaction) in which humans excel. But it was never designed to support robust publishing applications, commercial and legal transactions, data exchange and many of the other areas daily hyped as the e-revolution.

Any webmaster supporting a site or a lecturer producing material knows how rapidly information decays and how difficult it is to maintain. In the past few years, all of the main companies have come together within the World Wide Web Consortium (W3C). With a series of recommendations (standards), these companies are defining the way information will be transmitted over networks and, in many cases, how documents and data will be defined. An important example is XML, the eXtensible Markup Language and its associated protocols.

XML complements (not supersedes) HTML in providing protocols for encoding almost any information. It is a meta-language, or set of rules, in which languages such as HTML can be defined. HTML was created using XML's predecessor, SGML, an impressive creation but too complex for most people and applications. XML is in essence a simple subset of SGML that has arrived at just the time to underpin the coming revolution in consumer e-commerce and online business-to-business (b2b) dealing.

Although XML springs from commercial roots, it will enter every aspect of the e-world, and academia needs to play a central role in teaching, developing and using it. Publishing, mathematics, chemistry, student records, tutorial questions, prescriptions, medical records and hundreds of other applications are already being implemented. Universities may soon require theses to be submitted in XML. XML can even be seen as the first steps in the machine encoding of much of our culture - music, multimedia, organisations and their practice. But "What is a good book on XML?" is as broad a question as "How do I learn to program?"

The rudiments of XML are relatively easy to learn and any introductory book will cover them (www.xmlbooks.com lists 67 titles). But "Why should I learn it?" and "What do I need?" are not easy. Readers' reviews on www.amazon.com show rather varied expectations and criticisms. Moreover, XML is not a static subject, and there is a rapidly expanding family of essential related protocols covering style (XSL/XSLT), schemas, meta-data (RDF), hypermedia (linked documents) using Xlink, and programming interfaces (DOM and SAX). Add the current hype from vendors of e-commerce "solutions", and many books are out of date before publication. What sustains XML is the commitment of many vendors to the W3C vision of inter-operability and a very wide range of freely contributed software. Top-quality websites W3C (www.w3c.org) and OASIS (the non-profit association for "open XML" at www.oasis-open.org) provide much of what you could ever need to know about XML and link to key sites for the rest. Useful books must complement, not duplicate this.

The speed of progress in e-commerce - one consortium to use XML in finance was set up in days - strains conventional learning processes. Traditional paper-based publishing cannot support the rapid change or the breadth of the discipline. The XML developers' mailing list (XML-DEV), with 10,000 high-quality postings a year and responsible for important application programming interfaces and protocols, is a primary publication medium. The medium for publishing about XML has to be XML. Paper books are not obsolete, but must be read in conjunction with the XML websites.

A typical "how to" book is Applied XML from two Microsoft interns. Written hurriedly and full of errors, it describes how to use XML with Microsoft's tools in 1999. The racy colloquial style will offend many, but may give an insight into the 100-hour weeks mandatory for XML developers. XML has benefited greatly from Microsoft's commitment, and where standards have been finalised they are adhered to. However, the programming environment for XML is fluid and this book assumes full commitment to Microsoft tools. Unfortunately for portability, XML software will tend to split between Microsoft and the open source movement (such as www.apache.org). It will change rapidly, and programming books will need to be renewed annually.

The XML Handbook is not one, even though Charles Goldfarb's SGML Handbook was. It is two books in one. It contains an adequate overview of XML syntax and protocols with useful comments from two leaders in the field. The authors have authority and insight, and by itself this sub-book could be useful if not outstanding. But it is combined with chapters provided by commercial sponsors about their products and experiences in 1999. The authors attempt to illustrate aspects of XML within these chapters but the navigation and continuity do not work.

This second sub-book irked me, but could be a useful overview of the sort of e-commerce and b2b applications now in use. Those caught in the hyper-driven need to implement XML may well find useful business cases within it, but there is little for academia. It therefore does not work as the reference book that the title suggests and will date very quickly. Given the apparent sales success and prominence, there will have to be a new edition every year.

The XML Specification Guide is an excellent example of a handbook for those parts of XML that have solidified. Version 1.0 of the XML recommendation is deliberately terse (20 pages against SGML's 600) and is difficult for newcomers. The language is of parliamentary precision: "If the entity is external, and the processor is not attempting to validate the XML document, the processor may, but need not, include the entity's replacement text. If a non-validating parser does not include the replacement text, it must inform the application that it recognised, but did

not read, the entity." The correct interpretation matters - the "external entity" could be an essential contractual document or a student's essay.

It is critical that software implementers take a consensus view of such requirements, and Ian Graham and Liam Quin have provided much of what is required to help. Their book contains a formal overview of XML, then a closely annotated version of the recommendation, and then much valuable reference information. A large glossary, tables of character encodings, support for internationalisation, and much else provide the stable part of what the XML Handbook should have included. I am engaged in writing an online XML course and will constantly refer to this book for authority.

What is missing in all these books is any real sense of transition to the hyper-books of the future. Graham and Quin's glossary should be in electronic form for rapid searching and easy reference. The XML Handbook does not even index OASIS, the most important non-W3C XML resource in the world. It and Applied XML include the usual collection of free software but no sense of hypermedia as a learning resource. The next generation of XML learning resources must surely be in XML.

Peter Murray-Rust is director, virtual school of molecular sciences, University of Nottingham.

Applied XML: A Toolkit for Programmers

Author - Alex Ceponkus and Faraz Hoodbhoy
ISBN - 0 471 34402 8
Publisher - Wiley
Price - £32.50
Pages - 496

You've reached your article limit.

Register to continue

Registration is free and only takes a moment. Once registered you can read a total of 3 articles each month, plus:

  • Sign up for the editor's highlights
  • Receive World University Rankings news first
  • Get job alerts, shortlist jobs and save job searches
  • Participate in reader discussions and post comments
Register

Have your say

Log in or register to post comments