Learning HTML 3.2 by Examples, section 4 Fundamental structures in HTML 3.2, with examples:

Text markup - emphasis, citations, code, etc

Logical vs physical markup

There are two major classes of text markup: logical and physical. Logical markup indicates the role of a text segment, such as being more important than normal text or being a citation. Physical markup is an instruction to present text in a particular manner, such as using a font of some specific kind or underlining.

Logical markup shall be preferred. Use physical markup only if it is really relevant that part of a text displayed in a particular physical way (if possible). The need for physical markup may arise when referring to information in fixed presentation form, such as text in a book or in an image. Such situations occur rarely.

For instance, use the STRONG element for strong emphasis, letting the various Web browsers express the emphasis in the way which is the best in the environment where they are used. Do not use the B element (indicating bolding), except in the rare occasions where you are writing about some text appearing in boldface somewhere or e.g. writing about mathematical vectors, for which no adequate markup exists in current HTML.

When style sheets will be generally usable, both authors and readers will be able to affect the rendering (e.g. font, color, and background) of elements. For instance, someone might wish to have all program code extracts presented with yellow background and larger than normal font whereas someone might prefer some quite different methods of distinguishing them from normal text. Such operations will be much easier if logical markup has been used consistently.

In addition to being more flexible with respect to various browsers and rendering environments, logical markup has the following advantage over physical markup: In an increasing amount, computer programs are used for extracting information from HTML documents for various purposes like indexing. For this to work, it is much better to have logical markup indicating e.g. that some text is more important than the rest or a quotation of computer printout, rather than having designations of physical fonts.

Both logical and physical markup is done using HTML elements with start and end tags. It follows from the nature of HTML language that markups must not overlap. For instance, the following is in error:

  This has some <B>bold and <I></B>italic text</I>.
On the other hand, markup elements can be nested. Browsers should do their best when rendering structures like the following:

Example nest.html:

This is <I>italic text which contains <U>underlined text</U>
within it,</I> whereas <U>this is normal underlined text</U>.

Obviously, browsers with limited font repertoire can have difficulties in presenting text markup.

Phrase elements (logical text markup)

There are two phrase element for emphasis: EM and STRONG, and naturally STRONG is used for stronger emphasis. The HTML 2.0 specification requires that these elements be rendered as distinct from plain text and from each other; most browsers (excluding Lynx) seem to obey this.

Avoid emphasizing too much, since emphasizing everything is tantamount to saying everything with the same emphasis, i.e. not emphasizing anything! (The proverbial student who underlines everything in his textbook has not grasped the idea of emphasizing.)

Unfortunately there is no phrase element for "de-emphasis", i.e. for indicating segments of text as less important. If you really need that, you may consider using the SMALL element. But especially if the less important text is relatively long, it might often be a better idea to put it "behind hyperlinks", into separate documents to which there are links in the main document. A person who follows such a link is probably interested in the text, so he probably prefers seeing it as normal text, and there is no need for any de-emphasis.

The DFN element can be regarded as a special kind of emphasis, too, but logically it indicates that a term is used in a context where it is defined. This is a very useful element in principle but unfortunately many browsers, including Netscape, do not effectively support it.

The VAR element indicates that a piece of text (typically, a word) is a variable, i.e. a generic notation to be replaced by different actual expressions.

The other phrase elements involve different kinds of citations or quotations:

CITE citation (title of a book or article or equivalent)
CODE program code or equivalent (e.g. HTML code)
SAMP sample output from programs, scripts, commands etc
KBD text to be typed from a keyboard by a user; typically used when giving instructions

Please do not identify e.g. the concept of emphasis with its physical representation on your browser (or even its typical representation on several browsers). See below for notes and examples on rendering markup.

Font elements (physical text markup)

The available font elements - to be used very sparingly! - are:
TT "teletype" text, i.e. monospaced text
I italics
B bold
U underlined
STRIKE strike-through text
BIG large font
SMALL small font
SUB subscript
SUP superscript

The HTML 2.0 specification says about the B, I and TT elements that where bold or italic typography or teletype font, respectively, is unavailable, "an alternative representation may be used". There is no explicit description of what this might mean, but there seems to be a general tendency to compare B to STRONG and I to EM

The FONT (and BASEFONT) element offers more possibilities to control font sizes than BIG and SMALL. However, all use of font size control in HTML should be avoided.

Rendering of markup

You may wish to view a separate file to see the visual appearance of the different markup elements on your browser. But please do not assume that the rendering which you see is universal or the correct one.

For example, some browsers (e.g. Internet Explorer) render TT (and CODE) so that the font is significantly smaller than normal text font, and this disproportion is preserved when the setting for font size is changed; moreover, Internet Explorer 3.0 renders VAR with monospaced font whereas most graphical browsers use (much more naturally) italics. On the other hand, in Netscape these font sizes are separately settable and by default the same font size is used for both, but "the same" is the technical size in points - in practise monospaced font looks bigger than normal proportional font!

Thus, avoid messing with font sizes; use phrase markup and other structural elements and let the users, if they dislike the font sizes, define fonts in their browser settings the best they can.

The following table is intended for giving an idea of the variation. It (verbally) presents the rendering of markup elements in Netscape Navigator, Microsoft Internet Explorer, and Lynx. Notice that there is variation even within each of these programs - depending on version, platform, and system-wide or user's own configuration, so this is just a typical situation. Thus, consider this as what different things might happen rather than as a description of what actually happens in some particular program.

element Netscape Internet Explorer Lynx
EM italics italics underlined
STRONG bold bold underlined
DFN normal text italics normal (monospaced)
CODE monospaced monospaced small normal (monospaced)
SAMP monospaced monospaced small normal (monospaced)
KBD monospaced monospaced small normal (monospaced)
VAR italics monospaced small normal (monospaced)
CITE italics italics underlined
TT monospaced monospaced small normal (monospaced)
I italics italics underlined
B bold bold underlined
U normal text underlined underlined
STRIKE strike-through strike-through text between [DEL: and :DEL]
BIG larger than normal larger than normal normal text
SMALL smaller than normal slightly smaller than normal normal text
SUB lowered, slightly smaller lowered normal text
SUP raised, slightly smaller raised normal text

These relate to unnested elements. Nesting of text elements may affect the rendering.

Presenting interaction with computer

In order to present text-based interaction between a human being and a computer, or similar situations, the following approach can be used: In all cases, the principles on division into lines and the use of blanks and tabs must be taken into account, and this may require the insertion of BR elements or the use of PRE elements. Notice that logical markup is allowed within a PRE element (although possibly not implemented in a quite satisfactory way).

The following example illustrates the approach in the context of an introduction to the Perl programming language.

Example interact.html:

<P>The following Perl script prints out its input so that each line begins with
a running line number:</P>
<PRE><CODE>
#!/usr/bin/perl
$line = 1;
while (&lt;&gt;) {
  print $line++, " ", $_; }
</CODE></PRE>
<P>The scalar variable <CODE>$line</CODE> is of course the line counter.<P>
<P>The loop construct is of the form<BR>
<CODE>while (&lt;&gt;) {</CODE><BR>
<VAR>process one line of input</VAR> <CODE>}</CODE><BR>
</P>
<P>Assuming that you have written this script (the simpler version of it) into a
file named <KBD>lines</KBD>, you could test it using a command of the form<BR>
<KBD>./lines</KBD> <VAR>datafile</VAR><BR>
In particular, using the script as input to itself, you would do as follows
(the details of system output vary from one system to another):
</P>
<PRE>
<SAMP>lk-hp-23 perl 251 % </SAMP><KBD>./lines lines</KBD>
<SAMP>1 #!/usr/bin/perl
2 $line = 1;
3 while (<>) {
4   print $line++, " ", $_; }
lk-hp-23 perl 252 % </SAMP>
</PRE>

Notes on the example:
Date of last update: 2010-12-16.
This page belongs to the free information site IT and communication, section Web authoring and surfing, by Jukka "Yucca" Korpela.