Making the most of markup

Markup Languages

November 24, 2000

Markup Languages is "a peer - reviewed technical journal, publishing papers on research, development and practical applications of text markup for computer processing, management, manipulation, and/or display". Although generally presented in an academic format, the journal’s papers are readable so non-academics will be able to g rasp all  the information.

The journal proposes to cover the following: "Design and refinement of systems for text markup and document processing; s pecific text markup languages; theory of markup design and use; a pplications of text markup; and languages for the manipulation of marked up text". Each issue aims to review markup languages using a variety of categories, such as the theoretical and practical aspects of markup, announcements of events and activities, commentary and opinion in the form of essays from the authors, practice notes on discussion of common practice, project/application reports and reviews of books/software/websites that may be of interest and standards reports.

In the first issue, the editors set the scene well for readers who are not familiar with the background to markup languages and, at the same time, give a lot of new information for the experienced reader. An overview of SGML and the way it gave birth to HTML and finally XML are described in "Programming marked-up documents" by Lauren Wood. She also gives an overview of the DOM system and its relationship to Dynamic HTML and Javascript/ECMA. Differences between the structure model, the data model and the object model are described.

A history lesson on the original hypertext systems from the 1960s onwards, plus up-to-date uses of the same principles, is given by Steve J. DeRose and Andries van Dam in "Document structure and markup in the FRESS hypertext system". They go into Fress in detail, describing how it was used to separate structure from formatting and hypertext semantics , thus provid ing a system to handle "dynamic document assembly, structured information retrieval and on the fly customisation of even very large documents for the user". Other highlights in the first issue include "Structure rules!" by Chet Ensign, a look at why DTDs are an important and powerful tool when used correctly; also, "A new generation of tools for SGML" by R. W. Matzen, in which a specific model is proposed to reduce and, finally, eliminate exceptions within DTDs via conversion into an expression grammar that can be analysed.

Later issues continue the themes of the first with an interesting article/case study entitled "Using SGML for linguistic analysis: the case of the BNC" by Lou Burnard. He discusses the British National Corpus and how a very large SGML is tagged and carries "automatically generated linguistic analysis" of the text. Burnard discusses types of query, display and manipulation of queries and the corpus query language ( CQL ) that serves as a B oolean - style retrieval system but is designed for machine processing rather than human use. Pekka Kllpelälnen describes the differences, advantages and disadvantages of the two main types of content models, SGML and XML, in "SGML & XML content models". Kllpelälnen looks at how content models are "restricted by a requirement of determinism" and presents methods for eliminating "and" groups from SGML as they are not present in XML and "state formally the circumstances where they can be applied".

Joshua Lubell’s article "Structured markup on the web" compares two website designs that use SGML documents , using Step to "define the ontology for the exchange of product data throughout a product’s life cycle". Robert D. Camero n in "REX: XML shallow parsing with regular expressions" discusses using shallow parsing to help construct lightweight XML processing tools, and provides complete shallow parser implementations in Perl, JavaScript and Lex.

There is also excellent coverage of Unicode by Tony Graham. This includes its design goals and detailed design principles, a breakdown of all of the Unicode sub-types and a comparison to the ISO/IEC 10646 standard. A list of which programs use Unicode, as well as appendices that include listings of character blocks and new scripts proposed for inclusion, are included. Accompanying book reviews are useful in that they give the table of contents of the book, annotated with additional notes, to help readers decide if the book is relevant to them. 
 
David Mortimer is a software d eveloper at Pharos Datacom Ltd.   

Markup Languages: (4 times a year)

Editor - C. M. Sperberg-McQueen and B. Tommie Usdin
ISBN - 1099-6621
Publisher - MIT Press
Price - Print and electronic $155 (instits.), $50 (indivs.)

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Register
Please Login or Register to read this article.

Sponsored