Emphasis in context versus overall highlighting

People have often wondered why HTML has two elements for emphasis, the em element form “normal” emphasis and strong element form strong emphasis. The XHTML 2.0 draft asks about the strong element:

Leave in, deprecate or remove? No consensus.

I think the main problem is that the em and strong elements were, in fact, created as an afterthought. They were introduced as “logical” counterparts of “physical” markup, i (italics) and b (bold) elements. But is there any real logic behind this?

My answer is: There are two essentially different kinds of emphasis, and they need different markup elements. The difference is not in the strength of emphasis but between emphasizing in local context and highlighting key words.

Some emphasis isn’t meant to play a role when reading the text, only to draw attention to the text on the basis of key words or phrases, or sometimes sentences.

Local vs. global emphasis

The best approach is probably to treat em as “local” emphasis, which should be indicated when the text is read but need not jump on the reader’s face, whereas strong is for “global” emphasis, which is roughly the opposite: text marked up using strong should be prominently highlighted, but usually not emphasized when actually reading the text. In fact, it might be a good idea, for such emphasis usage, to make strong elements clickable so that when clicked on they turn into normal text. Just like you might like to click on blinking or marquee text after it has caught your attention. (Ideally, this would be a browser feature; it can be clumsily implemented on a per-page and per-element basis using CSS and scripting.)

This isn’t the official position, which only says that em is emphasis, strong is strong emphasis, and this doesn’t say much. One may wonder why strong is needed if it’s just a strong version of em - wouldn’t ... or <em level="2"...</em be more natural?

In a Usenet discussion on emphasis markup, Harlan Messinger wrote:

In programmerese, one might say that italics enhance sequential access, while boldface enhances random (direct) access.

Typographic notes

For serif fonts, bolding is rarely used in typography. Instead, italics is used for all kinds of inline emphasis, though primitive methods like underlining or increased letter spacing have been used too.

As sans-serif fonts have become common, bolding has become popular as a method of emphasis. Bolding works better for sans-serif fonts than for serif fonts, since sans-serif characters usually have little or no internal variation in thickness.

In optimal rendering of local and global emphasis, the nature of the presentation medium and the font would be taken into account. Thus, for example, in screen rendering colors would often work much better than variation in letter shapes or thickness. On paper, on the other hand, even underlining might be better than bolding, if a serif font is used.

Just physical markup in logical clothes?

In reality, em and strong are apparently created in a moral hangover after having too much i and b. It is symptomatic that em and strong are generally described after i and b in specifications and tutorials. They are, in effect, little more than alias names for i and b, to satisfy the purists.

Italics is the usual way to indicate “local” emphasis in print matter, whereas bolding is one of the ways to highlight, and the normal way in black and white print matter these days. But as the typographic notes above indicate, the connection between em and i as well as the connection between strong and b should be broken.

Thus, it would be logical to define that em emphasizes the enclosed text with respect to the text in the enclosing element and should be rendered in a manner that reflects this; whereas strong (renamed to e.g. key or highlight, if we give up continuity, as planned for XHTML 2.0) would indicate its content as key word or phrase in the context of the entire document, to appear as highlighted when possible, and to gain special weight in indexing.

But many speech browsers ignore all of em, strong, i, b at least when reading fast (as usual in heavy use of such browsers). This could be a problem if the emphasis is semantically significant, i.e. if the meaning of the text changes when it is omitted.

If em and strong were defined and consistently used the way described above, one could safely say that speech browsers may ignore strong (though might offer a special mode where strong elements only are read, or they are read along with some other special elements like headings and table captions, to give a quick overview, an aural counterpart of glimpsing at a page), but they should always honor em some way or another, perhaps in a simple but effective way like saying the word “emphatically” before the content (and “end of emphasis” after it, if it is long).

Similar considerations would apply to character cell browsers. (The version of Lynx I’m using ignores em, strong, i, b completely, which rather bad. A character cell browser that cannot use colors should probably do something to indicate emphasis, e.g. rendering foo</em as /foo/ and foo</strong as **foo**, for example.)

Would headings be better than `strong`?

There is much to be said in favor of using different headings to emphasize key phrases. Emphasized keywords inside text make the visual appearance somewhat confused, and they disturb when you actually read the text.

But authors may still wish to use highlighting for various reasons. Headings cannot be very short without looking strange, and often we have lots of keywords to emphasize. Hence, although headings should be the primary approach, authors should have the option of using inline highlighting.

Emphasized statements

When entire statements are to be emphasized, the purpose is to make them stand out both when actually reading the text and when glimpsing over it. In a sense, they should be emphasized both locally and globally. Such statements might be conclusions, warnings, or otherwise important.

There are different ways of indicating such emphasis visually. Inside text, using italics or a distinctive background color might be used. When an entire block of text is emphasized, indentation, increase of font size, distinctive font face, and many other methods could be used. In speech, the rendering should differ from simple emphasis of individual words; for example, a clear pause before the statement could be used, and the statement could be read in a completely different tone of voice (say, male voice as opposite to female voice for normal text).

Tentatively, I would suggest that “statement emphasis” be indicated using a separate attribute in block-level elements, say em="1" (allowing extensions that add new levels of emphasis). Normally a block as whole is the natural unit for such emphasis, and the rendering of the block should be affected. It is thus more suitable to use an attribute than an enclosing element.

This would not exclude the possibility of emphasizing entire statements using em or strong. It would just give a feasible opportunity to deal with very common situations like summary and abstract paragraphs or lists, important warnings, etc.

De-emphasis

It has often been said that HTML should have an element for indicating some text as less important than normal text. Typically, font size reduction, using the small element or otherwise, has been used for the purpose. We might even say that small normally means de-emphasis.

However, small and other methods of reducing font size could be used for a multitude of purposes, including attempts to make content fit into smaller area. Some writing systems even use font size reduction for emphasis!

Thus, there is clearly need for markup that means that the content is less important than the enclosing text. Due to characteristics of visible or audible presentation of text, the physical methods used for de-emphasis work best when applied to blocks. For example, reducing font size inside a paragraph tends to result in esthetically poor results.

The em attribute suggested above could be used to address this issue too, if the value -1 (and perhaps other negative values as well) were permitted. It would probably be best to limit its use to block elements, since reasonable rendering methods generally apply best to blocks.

Summary of proposals

Define em as indicating its content as more important than the content of the enclosing element.
Require that user agents render em element content as different from the surrounding text, at least when em elements are not nested.
Define strong as indicating its content as a key word or phrase that is descriptive of the content of the document. User agents would be encouraged but not required to highlight such content.
Define the em attribute for block level elements, indicating level of emphasis so that "0" (the default) indicates lack of any particular emphasis, positive values indicate importance, and negative values indicate that the content is less important than normal text.