Learning HTML 3.2 by Examples, section 4 Fundamental structures in HTML 3.2, with examples:

Text markup - emphasis, citations, code, etc

Logical vs physical markup

There are two major classes of text markup: logical and physical. Logical markup indicates the role of a text segment, such as being more important than normal text or being a citation. Physical markup is an instruction to present text in a particular manner, such as using a font of some specific kind or underlining.

Logical markup shall be preferred. Use physical markup only if it is really relevant that part of a text displayed in a particular physical way (if possible). The need for physical markup may arise when referring to information in fixed presentation form, such as text in a book or in an image. Such situations occur rarely.

For instance, use the STRONG element for strong emphasis, letting the various Web browsers express the emphasis in the way which is the best in the environment where they are used. Do not use the B element (indicating bolding), except in the rare occasions where you are writing about some text appearing in boldface somewhere or e.g. writing about mathematical vectors, for which no adequate markup exists in current HTML.

When style sheets will be generally usable, both authors and readers will be able to affect the rendering (e.g. font, color, and background) of elements. For instance, someone might wish to have all program code extracts presented with yellow background and larger than normal font whereas someone might prefer some quite different methods of distinguishing them from normal text. Such operations will be much easier if logical markup has been used consistently.

In addition to being more flexible with respect to various browsers and rendering environments, logical markup has the following advantage over physical markup: In an increasing amount, computer programs are used for extracting information from HTML documents for various purposes like indexing. For this to work, it is much better to have logical markup indicating e.g. that some text is more important than the rest or a quotation of computer printout, rather than having designations of physical fonts.

Both logical and physical markup is done using HTML elements with start and end tags. It follows from the nature of HTML language that markups must not overlap. For instance, the following is in error:

  This has some <B>bold and <I></B>italic text</I>.

On the other hand, markup elements can be nested. Browsers should do their best when rendering structures like the following:

Example nest.html:

This is <I>italic text which contains <U>underlined text</U>
within it,</I> whereas <U>this is normal underlined text</U>.

Obviously, browsers with limited font repertoire can have difficulties in presenting text markup.

Phrase elements (logical text markup)

There are two phrase element for emphasis: EM and STRONG, and naturally STRONG is used for stronger emphasis. The HTML 2.0 specification requires that these elements be rendered as distinct from plain text and from each other; most browsers (excluding Lynx) seem to obey this.

Avoid emphasizing too much, since emphasizing everything is tantamount to saying everything with the same emphasis, i.e. not emphasizing anything! (The proverbial student who underlines everything in his textbook has not grasped the idea of emphasizing.)

Unfortunately there is no phrase element for "de-emphasis", i.e. for indicating segments of text as less important. If you really need that, you may consider using the SMALL element. But especially if the less important text is relatively long, it might often be a better idea to put it "behind hyperlinks", into separate documents to which there are links in the main document. A person who follows such a link is probably interested in the text, so he probably prefers seeing it as normal text, and there is no need for any de-emphasis.

The DFN element can be regarded as a special kind of emphasis, too, but logically it indicates that a term is used in a context where it is defined. This is a very useful element in principle but unfortunately many browsers, including Netscape, do not effectively support it.

The VAR element indicates that a piece of text (typically, a word) is a variable, i.e. a generic notation to be replaced by different actual expressions.

The other phrase elements involve different kinds of citations or quotations:

CITE	citation (title of a book or article or equivalent)
CODE	program code or equivalent (e.g. HTML code)
SAMP	sample output from programs, scripts, commands etc
KBD	text to be typed from a keyboard by a user; typically used when giving instructions

Please do not identify e.g. the concept of emphasis with its physical representation on your browser (or even its typical representation on several browsers). See below for notes and examples on rendering markup.

Font elements (physical text markup)

The available font elements - to be used very sparingly! - are:

TT	"teletype" text, i.e. monospaced text
I	italics
B	bold
U	underlined
STRIKE	strike-through text
BIG	large font
SMALL	small font
SUB	subscript
SUP	superscript

The HTML 2.0 specification says about the B, I and TT elements that where bold or italic typography or teletype font, respectively, is unavailable, "an alternative representation may be used". There is no explicit description of what this might mean, but there seems to be a general tendency to compare B to STRONG and I to EM

The FONT (and BASEFONT) element offers more possibilities to control font sizes than BIG and SMALL. However, all use of font size control in HTML should be avoided.

Rendering of markup

You may wish to view a separate file to see the visual appearance of the different markup elements on your browser. But please do not assume that the rendering which you see is universal or the correct one.

For example, some browsers (e.g. Internet Explorer) render TT (and CODE) so that the font is significantly smaller than normal text font, and this disproportion is preserved when the setting for font size is changed; moreover, Internet Explorer 3.0 renders VAR with monospaced font whereas most graphical browsers use (much more naturally) italics. On the other hand, in Netscape these font sizes are separately settable and by default the same font size is used for both, but "the same" is the technical size in points - in practise monospaced font looks bigger than normal proportional font!

Thus, avoid messing with font sizes; use phrase markup and other structural elements and let the users, if they dislike the font sizes, define fonts in their browser settings the best they can.

The following table is intended for giving an idea of the variation. It (verbally) presents the rendering of markup elements in Netscape Navigator, Microsoft Internet Explorer, and Lynx. Notice that there is variation even within each of these programs - depending on version, platform, and system-wide or user's own configuration, so this is just a typical situation. Thus, consider this as what different things might happen rather than as a description of what actually happens in some particular program.

element	Netscape	Internet Explorer	Lynx
EM	italics	italics	underlined
STRONG	bold	bold	underlined
DFN	normal text	italics	normal (monospaced)
CODE	monospaced	monospaced small	normal (monospaced)
SAMP	monospaced	monospaced small	normal (monospaced)
KBD	monospaced	monospaced small	normal (monospaced)
VAR	italics	monospaced small	normal (monospaced)
CITE	italics	italics	underlined
TT	monospaced	monospaced small	normal (monospaced)
I	italics	italics	underlined
B	bold	bold	underlined
U	normal text	underlined	underlined
STRIKE	strike-through	strike-through	text between `[DEL:` and `:DEL]`
BIG	larger than normal	larger than normal	normal text
SMALL	smaller than normal	slightly smaller than normal	normal text
SUB	lowered, slightly smaller	lowered	normal text
SUP	raised, slightly smaller	raised	normal text

These relate to unnested elements. Nesting of text elements may affect the rendering.

Presenting interaction with computer

In order to present text-based interaction between a human being and a computer, or similar situations, the following approach can be used:

computer output (whether it is prompts, normal output, or error messages) is within SAMP elements
generic terms describing user input are within VAR elements
actual user input is within KBD elements
if computer program (source) code is quoted, it is within CODE elements.

In all cases, the principles on division into lines and the use of blanks and tabs must be taken into account, and this may require the insertion of BR elements or the use of PRE elements. Notice that logical markup is allowed within a PRE element (although possibly not implemented in a quite satisfactory way).

The following example illustrates the approach in the context of an introduction to the Perl programming language.

Example interact.html:

<P>The following Perl script prints out its input so that each line begins with
a running line number:</P>
<PRE><CODE>
#!/usr/bin/perl
$line = 1;
while (&lt;&gt;) {
  print $line++, " ", $_; }
</CODE></PRE>
<P>The scalar variable <CODE>$line</CODE> is of course the line counter.<P>
<P>The loop construct is of the form<BR>
<CODE>while (&lt;&gt;) {</CODE><BR>
<VAR>process one line of input</VAR> <CODE>}</CODE><BR>
</P>
<P>Assuming that you have written this script (the simpler version of it) into a
file named <KBD>lines</KBD>, you could test it using a command of the form<BR>
<KBD>./lines</KBD> <VAR>datafile</VAR><BR>
In particular, using the script as input to itself, you would do as follows
(the details of system output vary from one system to another):
</P>
<PRE>
<SAMP>lk-hp-23 perl 251 % </SAMP><KBD>./lines lines</KBD>
<SAMP>1 #!/usr/bin/perl
2 $line = 1;
3 while (<>) {
4   print $line++, " ", $_; }
lk-hp-23 perl 252 % </SAMP>
</PRE>

Notes on the example:

nesting of text markup has been avoided
although having the program code within a CODE element may seem unnecessary when it is within a PRE element, it is logical to do so, it should cause no harm, and it might one day prove useful (in a browser which uses different monospaced fonts for different purposes).
similarly, using SAMP and KBD within the sample run might cause user input to be presented differently from computer output; using style sheets, you might even be able to specify the font, color, background and other properties differently for these logically different elements.

Next part: Controlling the layout

Date of last update: 2010-12-16.

This page belongs to the free information site IT and communication, section Web authoring and surfing, by Jukka "Yucca" Korpela.