IT and communication - Web:

What is the intended usage of SUB and SUP in HTML?

Ambiguity in specifications

The HTML 3.2 specification classifies SUB and SUP as "font style elements". It describes that "SUB places text in subscript style" and "SUP places text in superscript style". This suggests that they are intended for stylistic presentation only, not affecting the meaning of a text.

The internationalization RFC 2070 (which partially conflicted with HTML 3.2) seemed to describe SUP and SUB as strictly stylistic (in clause 4.2.2):

Many languages require superscript text for proper rendering: as an example, the French "Mlle Dupont" should have "lle" in superscript. The SUP element, and its sibling SUB for subscript text, are introduced to allow proper markup of such text. SUP and SUB contents are restricted to PCDATA to avoid nesting problems.

On the other hand, the description of SUP and SUB in the HTML 4.0 specification contains the example

E = mc<sup>2</sup>
and such usage seems to be rather common in actual practise as well as in various HTML tutorials. Note that the specification apparently imitates the wording of RFC 2070 in this issue but oddly refers to "scripts" instead of "languages" (still preserving "French" as the example!). More importantly, it allows any text-level markup inside these elements, as opposite to PCDATA (i.e. plain character data, in practical terms). This means that nesting really becomes a problem in presentation. Note that PCDATA would be quite sufficient for the stylistic purposes.

Exponents: stylistic only?

When SUP is used for exponents, it has, of course, a definite meaning instead of being just stylistic presentation. Usage like <SUP>lle</SUP> is clearly stylistic only, H<SUB>2</SUB>O or x<SUB>1</SUB> (meaning a subscripted variable) might be regarded as stylistic only although they might also be regarded as essential to the contents of the message, but exponents are a different thing. This should become evident if we take a slightly different but very simple example: a<SUP>b</SUP>.

In mathematical and other notational systems, superscripts are used for a wide range of meanings, not limited to exponentiation. For example, in some notations for regular expressions, a superscript of + could mean iteration, e.g. A+ allows repetition of A any number of times. It depends on the details of the notation system whether it is admissible to present A+ as A+ without subscripting or whether that would have a completely different meaning. Similarly, if the subscript in a notation like A+ is "raised" to become a normally presented + sign, the whole meaning of an expression might change: a subscript could become an operator that follows an unsubscripted variable.

Between stylistics and structure

There are of course intermediates between purely stylistic usage and structural usage of SUB and SUP. Using SUB in chemical formulas or to denote subscripted variables is in a sense structural markup but "degrades gracefully" if a browser ignores SUB or otherwise presents SUB elements as normal text. People are accustomed to seeing things like H2O or x1, x2, ... in linearized notation. The same applies to some special cases of using SUP for exponents, such as denoting square meter by m2. (Please notice that superscripted 2 and 3 appear as separate characters in the character set which can be used in HTML, so you can write expressions like m² and m³.) However, in general using SUP for exponents or mathematical superscripts may cause serious confusions. If a<SUP>b</SUP> is intended to denote "a to the power b", then a browser which cannot use genuine superscripts should probably present it as e.g. a^b or a**b instead of ab, if only it could know that here SUP denotes exponentiation.

Superscripts for references

Superscripts are often used for references to footnotes or external documents42, perhaps in an attempt to make them less disturbing for normal reading than parenthesized references like (42) or [42]. Such superscripting is not purely presentational, any more than parentheses or brackets are.

Note, in particular, that a search engine might, and probably should, ignore SUP markup, and this would make "documents42" identical with "documents42". This may cause problems in searches, since "documents42" could be taken as a single word, in a technical sense.

Conclusion

Thus, I think it should be made clear (by W3C) whether SUB and SUP are intended to be used for stylistic presentation only. And if they are, it should be explicitly stated that they should not be used for exponents in mathematical expressions or in other contexts where it might change the meaning of a piece of text if SUB and SUP elements were presented as normal text.


Related documents:

This document was originally written for the context of Learning HTML 3.2 by Examples.