SUB
and SUP
in HTML?The
HTML
3.2 specification
classifies SUB
and SUP
as "font style elements".
It describes that "SUB
places text in subscript style" and
"SUP
places text in superscript style". This suggests that they are
intended for stylistic presentation only, not affecting the
meaning of a text.
The
internationalization RFC 2070
(which partially conflicted with HTML 3.2)
seemed to describe SUP
and
SUB
as strictly stylistic (in
clause 4.2.2):
Many languages require superscript text for proper rendering: as an example, the French "Mlle Dupont" should have "lle" in superscript. TheSUP
element, and its siblingSUB
for subscript text, are introduced to allow proper markup of such text.SUP
andSUB
contents are restricted to PCDATA to avoid nesting problems.
On the other hand, the
description of SUP
and SUB
in the
HTML 4.0 specification
contains the example
E = mc<sup>2</sup>
and such usage seems to be rather common in actual practise as well
as in various HTML tutorials. Note that the specification
apparently imitates the wording of
RFC 2070 in this issue but oddly refers to "scripts" instead
of "languages" (still preserving "French" as the example!).
More importantly, it allows any text-level markup inside these elements,
as opposite to PCDATA (i.e. plain character data, in practical terms).
This means that nesting really becomes a problem in presentation.
Note that PCDATA would be quite sufficient for the stylistic
purposes.
When SUP
is used for
exponents, it has, of course, a definite
meaning instead of being just stylistic presentation. Usage like
<SUP>lle</SUP>
is clearly stylistic only, H<SUB>2</SUB>O
or
x<SUB>1</SUB>
(meaning a subscripted variable) might be regarded
as stylistic only although they might also be regarded as essential
to the contents of the message, but exponents are a different thing.
This should become evident if we take a slightly different but
very simple example: a<SUP>b</SUP>
.
In mathematical and other notational systems, superscripts are used for a wide range of meanings, not limited to exponentiation. For example, in some notations for regular expressions, a superscript of + could mean iteration, e.g. A+ allows repetition of A any number of times. It depends on the details of the notation system whether it is admissible to present A+ as A+ without subscripting or whether that would have a completely different meaning. Similarly, if the subscript in a notation like A+ is "raised" to become a normally presented + sign, the whole meaning of an expression might change: a subscript could become an operator that follows an unsubscripted variable.
There are of course intermediates between purely stylistic usage
and structural usage of SUB
and SUP
.
Using SUB
in chemical formulas or to denote subscripted variables
is in a sense structural markup but "degrades gracefully" if
a browser ignores SUB
or otherwise presents SUB
elements as normal text.
People are accustomed to seeing things like H2O or x1, x2, ... in
linearized notation.
The same applies to some special cases of using SUP
for
exponents, such as denoting square meter by m2.
(Please notice that superscripted 2 and 3 appear as separate
characters in
the character
set which can be used in HTML, so you can write
expressions like m² and m³.)
However, in general using SUP
for exponents or mathematical
superscripts may cause serious confusions. If
a<SUP>b</SUP>
is intended to denote "a to the power b",
then a browser which cannot use genuine superscripts should probably
present it as e.g.
a^b or
a**b instead of ab, if only it could know
that here SUP
denotes exponentiation.
Superscripts are often used for references to footnotes or external documents42, perhaps in an attempt to make them less disturbing for normal reading than parenthesized references like (42) or [42]. Such superscripting is not purely presentational, any more than parentheses or brackets are.
Note, in particular, that a search engine might, and probably should, ignore
SUP
markup, and this would make
"documents42" identical
with "documents42". This may cause problems in searches, since
"documents42" could be taken as a single word, in a technical sense.
Thus, I think
it should be made clear
(by W3C)
whether SUB
and SUP
are
intended to be used for stylistic presentation only. And if they
are, it should be explicitly stated
that they should not be used for exponents in
mathematical expressions or in other contexts where it might change
the meaning of a piece of text if SUB
and SUP
elements were presented
as normal text.