Learning HTML 3.2 by Examples, section 3 General remarks on the syntax of HTML:

Division into lines and the use of blanks and tabs

With the exception of text enclosed in PRE tags (preformatted text) or TEXTAREA tags, blanks and newlines are not preserved when displaying the document. More technically, any sequence of blanks, tabs, and newlines is equivalent to a single blank in an HTML file. On the other hand, a blank in an HTML file may be rendered using several space characters or replaced by newline(s).

The term newline is used to denote an end of line designation. Theoretically, the SGML declaration for HTML specifies that line feed (LF, ASCII code 10 in decimal) acts as a record (line) start character and carriage return (CR, ASCII code 13 in decimal) as a record end character. In practise, HTML documents are presented and transmitted using a newline presentation convention of the computer system used. Therefore, HTML browsers are encouraged to accept any of the three common representations, namely CR LF sequence, CR only, and LF only, as line separators and to infer the missing record end and start characters.

Thus, it does not matter how you divide the text into a lines, since a newline is equivalent to a blank. Notice, however, that you must not divide a word into two lines in HTML. If you e.g. divide the word international into two lines as follows:

inter-
national
it will be interpreted as equivalent to
inter- national
and the result is not what you want.

Thus, you must use HTML tags such as P or BR to force line breaks, if they are necessary for the logical representation of your document.

Browsers usually do not divide words into two lines, except possibly when a word contains a hyphen. The HTML 3.2 Reference Specification is not very explicit in this matter; it just says, in the discussion of tables, the following:

For some browsers it may be necessary or desirable to break text lines within words. In such cases a visual indication that this has occurred is advised.

Beware that the line length is outside your control. It depends on the browser, device, and settings used by the people who look at your document. You can force line breaks but not prevent line breaks between words, in general. (You can try to prevent line breaks by using non-breaking spaces.)

As regards to newlines in conjunction with HTML tags, there are special rules:

However, popular browsers (such as Netscape and Internet Explorer) are known to violate these official rules. For example, if you write an A element as follows:
<A HREF="foo.html">bar </A>
then many browsers incorrectly display it as if the link text had a blank appended. Since browsers often indicate links with underlining, there could be an extra underlined space. Thus, in some cases removing a newline before an end tag may help in improving the presentation on popular but buggy browsers. See the document White Space Bugs in Browsers for more detailed explanation with examples.

The horizontal tab character (HT) can appear in the HTML source. Within PRE elements, tabs have a special interpretation. Otherwise a tab is equivalent to a space. Thus, it does not imply tabulation of any kind. (In order to present tabular data, use the TABLE element.) It is best to avoid tabs in HTML code and to use a suitable number of spaces instead, if one wants to format the HTML source code into tabular form.


Date of last update: 2010-12-16.
This page belongs to the free information site IT and communication, section Web authoring and surfing, by Jukka "Yucca" Korpela.