This document is intended for people with some basic understanding of SGML.
Annex E.1 of the SGML standard (ISO 8879) presents a document type definition (DTD) "as an illustration of a practical document type definition", and says: "It is primarily intended to illustrate the correct use of markup declarations, but it follows good design practices as well".
The DTD is very interesting, especially since it is in several ways much more advanced structurally than any HTML specifications. I think that some study of the DTD and its underlying ideas would be beneficial for the development of markup systems, including HTML-like systems for the WWW. It was part of the inspiration that made me write A proposal: Universal Text Data format (UTD).
For such purposes, I have reformulated the most essential part
of the DTD in a manner that is, in my opinion, easier to understand.
I have not changed the language described the declarations, just
the presentation of the formal syntax. (I resisted the temptation
to use more readable names for elements, like titlepart
for titlep
or frontmatter
for frontm
.) In addition to partly reordering the
element declarations, I have renamed some entity names that
looked too cryptic to me (like p.zz.ph
) and eliminated some use of
entities. I have also omitted attribute declarations, since they
are of lesser importance in this context. Moreover, some
shorthand notations are presented verbally here.
<!-- Entities for phrase level --> <!ENTITY % emphasized "hp1|hp2|hp3|hp0|cit"> <!ENTITY % refphrase "hdref|figref"> <!ENTITY % reference "fnref|liref"> <!ENTITY % phrase "q|(%emphasized;)|(%refphrase;)|(%reference)"> <!ENTITY % phrasecontent "(#PCDATA|(%phrase;))*"> <!-- Entities for block level --> <!ENTITY % paragraph "p|note"> <!ENTITY % itemlist "ol|sl|ul|nl"> <!ENTITY % list "(%itemlist;)|dl|gl"> <!ENTITY % otherblock "xmp|lq|lines|tbl|address|artwork|(%list;)"> <!ENTITY % basicblock "(%paragraph;)|(%topic;)|(%otherblock;)"> <!ENTITY % paragraphcontent "(#PCDATA|(%phrase;)|(%otherblock;))*"> <!ENTITY % paragraphsequence "(p, ((%paragraph;)|(%otherblock;))*)"> <!ENTITY % floating "fig|fn"> <!-- Top-level structure of a document --> <!ELEMENT general - - (frontm?, body, appendix?, backm?) + (ix|%floating;) > <!ELEMENT frontm - O (titlep, (abstract|preface|h1)*, toc?, figlist?)> <!ELEMENT body - O (h0+|h1+)> <!ELEMENT appendix - O (h1+)> <!ELEMENT backm - O (glossary|bibliog|h1)*, index?)> <!ELEMENT (toc|figlist|index) - O EMPTY -- generated content> <!-- "Title page" (title part) --> <!ELEMENT titlep - O (title & docnum? & date? & abstract? & (author|address|%basicblock;)* > <!ELEMENT (docnum|date|author) - O (#PCDATA)> <!ELEMENT title - O (tline+)> <!ELEMENT tline O O %phrasecontent;> <!-- Headed sections --> <!ELEMENT h0 - O (h0t, (%basicblock;)*, h1+) -- Part --> <!ELEMENT (h1|glossary|bibliog|abstract|preface) - O (h1t, (%basicblock;)*, h2*) -- Chapter --> <!ELEMENT h2 - O (h2t, (%basicblock;)*, h3*) -- Section --> <!ELEMENT h3 - O (h3t, (%basicblock;)*, h4*) -- Subsection --> <!ELEMENT h4 - O (h4t, (%basicblock;)*) -- Subsubsection --> <!ELEMENT (h0t|h1t|h2t|h3t|h4t) O O %phrasecontent; -- Headed section title > <!-- Topics (captioned subsections) --> <!ENTITY % topic "top1|top2|top3|top4"> <!ENTITY % topiccontent "(th?, p, (%basicblock;)*)"> <!ELEMENT top1 - O %topiccontent; -(top1) -- Topic 1 --> <!ELEMENT top2 - O %topiccontent; -(top2) -- Topic 2 --> <!ELEMENT top3 - O %topiccontent; -(top3) -- Topic 3 --> <!ELEMENT top4 - O %topiccontent; -(top4) -- Topic 4 --> <!ELEMENT th - O %phrasecontent; -- Topic heading --> <!-- Elements in sections or paragraphs --> <!ELEMENT address - O (aline+)> <!ELEMENT aline O O %phrasecontent; -- Address line --> <!ELEMENT artwork - O EMPTY> <!ELEMENT dl - - ((dthd+, ddhd)?, (dt+, dd)*)> <!ELEMENT dt - O %phrasecontent; -- Definition term --> <!ELEMENT (dthd|ddhd) - O (#PDATA) -- Headings for dt and dd --> <!ELEMENT dd - O %paragraphsequence; -- Definition description --> <!ELEMENT gl - - (gt, (gd|gdg))* -- Glossary list --> <!ELEMENT gt - O (#PCDATA) -- Glossary term --> <!ELEMENT gdg - O (gd+) -- Glossary def. group --> <!ELEMENT gd - O %paragraphsequence; -- Glossary definition --> <!ELEMENT (%itemlist;)- - (li*)> <!ELEMENT li - O %paragraphsequence; -- List item --> <!ELEMENT lines - O %paragraphsequence; -- Line elements --> <!ELEMENT (lq|xmp) - - %paragraphsequence; -(%floating;) -- Long quote --> <!ELEMENT %paragraph; O O %paragraphcontent; > <!-- Table --> <!ELEMENT tbl - - (hr*, fr*, r+)> <!ELEMENT hr - O (h+) -- Heading row --> <!ELEMENT fr - O (f+) -- Footing row --> <!ELEMENT r O O (c+) -- Row (in body of table) --> <!ELEMENT c O O %paragraphsequence; -- Cell in body row --> <!ELEMENT (f|h) O O (#PCDATA) -- Cell in fr or hr --> <!-- Phrases --> <!ELEMENT (%emphasized;) - - %phrasecontent; -- Emphasized phrases --> <!ELEMENT q - - %phrasecontent; -- Quotation --> <!ELEMENT (refphrase;) - - %phrasecontent; -- Reference phrases --> <!ELEMENT (reference;) - O EMPTY -- Generated references --> <!-- Includable subelements --> <!ELEMENT fig - - (figbody, (figcap, figdesc?)?) - (%floating;)> <!ELEMENT figbody O O %paragraphsequence; -- Figure body --> <!ELEMENT figcap - O %paragraphcontent; -- Figure caption --> <!ELEMENT figdesc - O %paragraphsequence; -- Figure description --> <!ELEMENT fn - - %paragraphsequence; -(%floating;) -- Footnote --> <!ELEMENT ix - O (#PCDATA) -- Index entry -->
<p>
, i.e. start of paragraph.
<q>
, i.e. start of quote,
except when a <q>
element is open, in which case it is
equivalent to </q>
, i.e. end of quote.
<ix>
element is open, a record end (i.e., end of line)
is equivalent to </>
, which is short for </ix>
then, i.e.
for end of index entry.Comparing the DTD with HTML
DTDs, we note several resemblances,
even in somewhat cryptic element names like h1
, li
, dt
.
But it needs to be noted, in particular, that
h1
, h2
, etc. are not heading elements but elements
for headed sections (which contain headings, or titles, as
h1t
, h2t
, etc. elements)
dl
(definition list) element is more complicated and
more structured than in HTML.
See also section Document Types in gf User's Manual.
Along with overall clarity and simplicity, the DTD has some essential problems. The element and entity names have been briefly discussed above, but more importantly, there are some structural deficiencies that need to be considered. These include the following:
With such problems fixed, and with some carefully chosen additional markup for covering the most common generic structure in different types of documents, the DTD could form a basis for a universal generic document format. Examples of the needs: simplest mathematical notations; basic poetry constructs (verse structure); hyperlink-like references. Naturally, semantic definitions would need to be given in sufficient detail, with some hints on how the semantic information included into markup could be used for different purposes, such as display of documents, indexing of document content for searching purposes, automatic conversions between data formats, automatic or computer-assisted translation.
The DTD discussed here was assumably largely based on similar ideas within the GML (Generalized Markup Language) framework, especially the GML Starter Set. The following Web pages describe such ideas in some detail, including notes on the semantics of different elements:
Since part of this document can be regarded as a modified version of the DTD, here's the copyright notice of the original:
(C) International Organization for Standardization 1986 Permission to copy in any form is granted for use with conforming SGML systems and applications as defined in ISO 8879, provided this notice is included in all copies