Publishing on the Web Is Different

Summary

Publishing on the Web is very different from older methods of publication. A Web publication is inherently a general, device-independent and program-independent document with structural markup. The presentation of a document may vary greatly, and it must vary, to allow viewing (or hearing) the same document on a wide variety of devices, ranging from wristwatch monitors to full-size movie screens.

There are strong forces, most importantly the conservativeness of most people, which try to impose author-controlled layout on Web documents. However, switch to presentation control by readers is inevitable - and a positive thing.

HTML = content + structure

The HTML language was designed to promote worldwide distribution of documents in a device-independent form. It is far from being perfect for the purpose, but it has served well and is suitable for a wide range of documents. It is easy to learn and easy to use.

For example, in the HTML way of thinking the author indicates a piece of text as a heading; technically, the author just puts e.g. the tags <H2> and </H2> around the text in order to make it a 2nd level heading. It is the duty of each Web browser to present headings in a manner suitable for each particular browsing environment. The presentation may vary greatly, depending on screen and window size (or paper size), resolution, availability of colors and fonts, and so on. A user can affect these things. For instance, if he has difficulties in seeing normal-sized text, he may select a large font of his choice. And after some browsing with the selected browser and settings, the user can easily recognize different basic structures such as headings, quotations, lists and code samples in all documents written in good HTML.

Admittedly, popular Web browsers are often defective in presenting basic HTML structures properly, and they are far from providing high quality by default and well-customizable by user options. However, attempts to improve the situation by tuning HTML files according to "known" characteristics of popular browsers, or the popular browser, lead to confusion, and the end result might be worse presentation on other browsers, which might be more popular next year, and even on previous or future versions of the browser for which the document is "enhanced". (Author's style sheets are a marginally better approach, but they are at best the right way of doing the wrong thing.)

"HTML programming" is a bluff

Unfortunately HTML, especially as implemented with nonstandard extensions, also contains tools for affecting colors, font sizes, font faces, positioning of text, etc. Even more unfortunately, people use such features, and so-called "HTML editors" (such as FrontPage and Netscape Gold) and "converters" make heavy use of them. If a document contains - instead of e.g. simple heading tags - such tags which instruct that some text be presented using 12 point Palatino font in red with green background, perhaps blinking, it is evident that the entire idea of HTML has been lost.

Consider, for example, the use of television as a Web browsing device. People who think that Web authors should decide the physical appearance of documents have proposed that at least on WebTV one should use large fonts. They have suggested setting the base font size to higher than normal. Now of course the designers of WebTV have made their best effort to guarantee that normal text is legible, presenting it using some font suitable on the TV screen. This means that if the default font size for a document has been set to larger than normal, it will be so large that the document becomes practically unreadable. The lesson is that font selection should be made to suit a device or the browsing situation in general, not on a per document basis. But still many people who call themselves "Web authors" think that font, color, layout, and other presentation features are crucial. (They are crucial - for the browser designers and browser settings.)

The absurdity of the situation can hardly be exaggerated. When "HTML authoring skills" - or worse, "HTML programming skills" - are advertised, they are typically some technical skills of breaking the whole idea of HTML. It really puzzles me why companies pay a lot to people for using weeks to obscure data into Web pages; any secretary would put the documents onto the Web in a few hours, after an initial education of a day or two.

Authors and readers - who controls the presentation?

The traditional system of publishing, applied in newspapers, books, TV programs, etc, involves the preparation of a publication up to the smallest detail, including layout, by the authors (in the broad sense, including a lot of technical and artistic staff in many cases). Once prepared, the publication is reproduced in some sort of mass production, ie identical copies are issued. The role of the consumer is to take it or leave it. This paradigm has already been challenged by factors other than Internet, eg the feasibility of print on demand. Originally the paradigm was adopted for obvious economic reasons.

On the Web, each access to a document by a user involves customized formatting according to users' screen, browser, option settings, etc. Moreover, publishing takes place every day and everywhere, with no definite publication dates. The difference between a simple personal home page and some popular document is not technical - both could be single HTML files and accessible to anyone on the Internet. There are no strict borders - a document can slowly move from one extreme to another as it becomes more interesting and known to people. Copies of the publication can be made at ease, typically one at a time, and in greatly varying forms. In addition, the contents of a publication may vary, since the author may update it or the user may customize even the contents eg by requesting some HTML elements to be ignored. In the future, publications themselves may support customizability.

The common idea of designing Web pages with newspaper-like format is therefore inherently archaic: it binds itself to a primitive system of mass production of copies, and that system doesn't even work, since people do not use the same kind of hardware, software, and settings as the designers. (It would be impossible to do that, because different designers work on different systems.) Far from being progressive, it essentially takes the limitations and features dictated by the old medium to a new one. It is, in a deeper sense, similar to putting typewriter-style text with fixed layout onto the Web. (It can be done.) The products of ambitious "visual design" on the Web usually don't even pass the simple mouse test: take the mouse and modify the size of the browser window horizontally and vertically. Well written HTML documents typically behave well in that test; in fact, it takes special skills to write documents which don't.

Towards control by readers

De gustibus et coloribus non est disputandum.

The traditional paradigm of publishing will soon be considered as a necessary evil which is no more necessary. It was, during a few centuries, necessary for economical reasons to issue publications in thousands or even millions of identical copies. But it was contrary to the basic nature of human communication; human beings are different, they live in different environments, and they need and prefer different methods of presentation.

People employed or otherwise involved in traditional publishing presumably feel uneasy, and not without reason. They are accustomed to seeing their job and professional competence as important and essential in publishing. Now they are faced with the situation where the careful and detailed design of layout is not only unnecessary but also harmful.

Layout will not lose its importance but it will take place on users' systems (PCs, workstations, or something else). A user, that is a customer, will have his own layout preferences and styles, with colors and margins and fonts and so on, perhaps designed by a professional, but anyway selected by him to suit his personal preferences. Any layout by the author or by the publishing side will just not get through; an attempt to enforce it will fail miserably, since if a document is not formattable according to user's specification, it will look like a mess and the user will therefore discard it.

What is left to the author is the production of contents and structure. This has always been the author's most essential job, and it should be regarded as a relief that the author neither can nor need care about presentation issues.

Let's have a dream. Suppose that I'm starting to read a real newsdoc (that's my neologism, a structural, device-independent counterpart of newspaper) using my Web browser on the screen (or as printed out on my printer, if I prefer that). The browser will obey my presentation preferences and format the entire newsdoc according to them. My personal preferences are irrelevant here, but they might include the following: display first a table of contents in pure text form, with the most important items (as designated by the editor, using structural markup) first but otherwise all headings in thematic order; text in single column, 130 mm wide; images aligned to the left, never alongside with text; video clips in separate window, to be started on my command only; background music, if suggested by author, to be started automatically but with low volume. Alternatively, when tired or doing some cooking, I might ask my browser read the headlines to me, and I'd whistle when something is so interesting that I want to hear the entire text of an article. You might have entirely different preferences - and get them obeyed.

Exactly because visual presentation and other external expressions of the contents of a document are so important, it shall be left to the customer (who ultimately pays the bill, by the way, either as paying customer, taxpayer, or target of advertising).

Notice that I have not discussed customization of contents, which might involve, in a simplest case, the exclusion or inclusion of material marked as "technical details" according to user's preferences. Such features will become very important, and they involve interesting questions on the roles of authors and readers, but my theme has been just the customization of visible (or audible) presentation.

What is "contents" and "structure"?

Thus far I have intentionally formulated my propositions so that I have probably been seriously misunderstood. Now it is time to clear things up.

"Contents and structure" does not mean plain boring text with a few different types of headings. Contents includes figures, images, photographs, formulas, music, videos, and lots of other forms of communication, most of which have not been invented yet. Structure is not only a simple hierarchical division into sections and subsection; it includes the relationships of various kinds and pieces of contents with each other and with the contents and structure of other documents. And what looks like an ordinary page of text would, in this new mode of thinking, actually be structured as a rather complicated entity; it might contain quotes from other texts, mathematical and scientific symbols, emphasized parts, less important parts, unordered and ordered lists of things, tables, maps, and so on. Depending on the user and his environment, the various elements might each have its own style of presentation, or the entire text might be just plain text in one font, or something between the extremes.

This way of thinking does not mean that you should, as a general rule, just divide the document into parts according to the internal logical structure of the subject matter and fill this hierarchical, analytical structure with text contents. Even less does it mean that you take a long piece of plain text and put a few headings (like Introduction and Test arrangements and References) here and there. Such a paradigm of document authoring seems to be dominant in the production of research reports and many other papers, and it may have a restricted area of applicability. But the normal way should be to pay a lot of attention to

Thus, journalists may still have jobs. In fact, they will be desperately needed - but only if they are able to adopt entirely new forms of expression. They will have to learn to speak fluent HTML. They will lack fonts, colors and layout; they will have to concentrate on the structure and contents. This means, inevitably, that they will have to learn to think and communicate more abstractly. Whether they use some version of HTML or some other notation for structural relations is less relevant. The important thing is that their language will be a lot richer in expressions for structural relations, and it will lack expressions for presentation issues.

Some possible counterarguments

Naturally, some authors say that they simply do not want readers to have control over presentation. Authors may regard they works as pieces of art produced by them, not by readers or even jointly by author and readers. For graphic art, this sounds quite plausible. The point is that there is a large number of formats and ways to distribute such art exactly as designed by the author. The author can even use the Internet and the Web for distributing such works as images, using HTML as "hyperglue" only (e.g. providing an HTML file which only contains a simple list of names of the works, each being a link to an image). But using HTML for creating graphic art implies getting the worst of the two worlds: layout control which is lousy for any true art, yet "powerful" enough to confuse several browsers - and people.

Most people who advocate the use of layout control in HTML or as attached to HTML (such as author's style sheets) do not seem to think in terms of ambitious artistic design. Rather, they a worried about getting minor presentation conventions obeyed, such as indenting the first line of a paragraph, having two spaces after a full stop which ends a sentence, or presenting cited book titles in a small-caps font. Such requirements arise from the lack of understanding the global nature of the Web.

There are various styles as regards to such presentation conventions, each being natural and perhaps the only right one to people accustomed to it. Some people really recognize paragraphs by one blank line between them and no indent. And although most people will probably recognize paragraphs using either of style, after some initial confusion, the mental process may distract the readers attention from the form and content. Just as people can understand someone speaking English with a distinctly French accent, yet feel slightly disturbed. Being slightly disturbed all the time while reading or listening can make the difference between understanding and not understanding in borderline cases, or the difference between being motivated enough or not motivated enough in quite a many cases. Thus, it is in the interests of the author not to disturb the reader by insisting on author's presentationalk preferences.

Couldn't a Web author at least suggest presentation features? Wouldnīt it sometimes be useful to suggest that a particular font face or color or multi-column format be used, for example?

As a matter of principle, notice that by doing so you support and encourage the principle of author's dominance over presentation. Other people may learn from you, and they might be less careful in using the features.

In rare cases, it might make sense to give some presentation suggestions. They should be features that are unlikely to cause any harm on browsers which do not support them, such as ALIGN=CENTER attribute in the main heading of a document or an authorīs style sheets suggesting that VAR elements be rendered in italics (which suggestion exists in the official HTML specification but is not obeyed by default by all popular browsers). But you should feel reluctant, not proud. You are using them in lack of anything better, i.e. in lack of appropriate structural markup in standard HTML and common support to it.

As an example, consider the use of different background colors for various elements. You might have some mathematical equations, some of which should be memorized by the reader while others exist as reference or background material only. (Assume for simplicity that the equations themselves are so simple that they can be presented using current standard HTML, something like aē + bē = cē.) It may sound like a good idea to use different background colors; currently you might use style sheets or nonstandard but commonly supported HTML features. The problem is that if you declare, say, white background for normal formulas and yellow background for the important ones, you loose everything on any medium which does not support those color, such as monochrome monitors, speech-based user agents, and rendering to Braille. You might also irritate some readers whose taste of colors differs from you. Thus, when using such features please bear in mind that what you really need is a general way of marking some elements are important and good browsers which support the cleverly.

A common motivation for setting up some particular layout for Web pages is the need for uniformity, "company look". (The word company should be understood generically here; it could equally well refer to an institution or even a private person.) There is a good point behind this idea. When I visit a lot of interlinked Web documents, I would very often like to see, at a glimpse, where I am in terms of page owner and officiality.

But "company look" is not the right answer. It makes the recognizability of pages dependent on presentation, which in any case varies a lot. And a "company look" says nothing to a casual visitor who sees it the first time. The HTML language should be made more structured, and this includes requiring the presence indispensable metainformation like source, author, officiality, language, abstract etc. The key question is not how the author could provide a company look but how the reader can check what company the information comes from and have this displayed in a manner the reader likes. Some users might prefer seeing a company logo on each page (and for this purpose the address of a suitable logo might be one part of the metainformation); some others might prefer just the company name since they think logos are usually awful; occasionally one might even wish to have the company main page displayed in a separate window alongside with each document from that company; and still some other might let such information stay on the background, to be popped up when desired.

"Newspaper look" is something which people often mention as a motivation for presentation control by authors. Some people argue that by using only simple structural markup you limit yourself to dull, antiquated, visually poor style, going backwards in time and ignoring journalistic layout design.

Now, obviously, if nice presentation is nice, it is nice to have it for all documents, so we are back in the need of having all documents presented nicely on users' screen or other media. Journalists may say that presentation issues cannot be distinguished from structure and content, so presentation must be designed for each concrete publication and issue separately. This might be true for some publications; consequently, such publications should not be published as HTML documents but by using other methods. (They might still be published on the Web as e.g. PDF or PostScript files, for example.) On the other hand, for an enormous amount of publications such a distinction can and should be made.

Newspapers use multicolumn format and a mosaic-like layout, with distinct stories scattered around a page. There are people who try to imitate this without understanding that newspapers have their style because of their physical format which has its own reasons - for newspaper production.

It is true - and an important thing indeed - that columns should not be too wide for good readability. Different people prefer different widths, and the optimal width depends on the font size. (Thus, a good Web browser should provide a simple user option for setting text width, without deriving it automatically from the total window width.) On the other hand, multicolumn format is not optimal or an aim by itself (except in rare cases where one really wants to present texts in parallel for comparison) but a consequence of the physical format. If you have large pages, you have to fill them.

Admittedly, a newspaper page layout can be useful for getting an overview of what there is for me to read - I can quickly glimpse the headings and perhaps read a few emphasized words here and there, then select what I am going to read in detail.

But on the Web, there are two fundamental reasons for not trying newspaper look:

  1. a screen (or, more accurately, a browser window on a screen) is not a newspaper page; its dimensions vary and are unknown to the author, and it is typically so much smaller than a newspaper page that newspaper look looks silly there
  2. the hypertext concept (with links) together with the logical markup for headings, emphasis, etc, is the Web way of achieving same things which newspaper look achieves (when properly used in applicable media).

Finally, one may argue that if document layout is left to browsers and users only, each browser will have different aspects of presentation that can be controlled by the user and different methods of doing it. In a sense, each browser would have its own syntax and semantics for (users') style sheets. My reply is that this need not be more disastrous than differences between browsers in general. Moreover, why shouldn't it be possible to standardize this? The style sheet model would be essentially simpler than the cascading style sheets concepts, so it should be at least technically more easily standardizable.

Well, that wasn't quite the end of my story. On May 25, 1997, Liam Quinn made an very interesting remark in a discussion in the comp.infosystems.www.authoring.html newsgroup. We were discussing various styles of paragraph presentation, such as having first line indented vs. one blank line and no indent. I wrote:

- - although most people will probably recognize paragraphs using either of style, after some initial confusion, the mental process may distract the readers attention from the form and content. Just as people can understand someone speaking English with a distinctly Frenc[h] accent, yet feel slightly disturbed.
and Liam responded:
... or slightly attracted. Some people find some foreign accents alluring and pleasant to listen to. Hence we have author style sheets, so that the author can suggest an accent for those in the mood to hear something different. And of course the reader who has problems understanding foreign accents, or who simply prefers "unaccented" English, there is the option of overriding the author's suggestion.

I have to think about that. Could it be so that used in a careful and disciplined way, author's presentational suggestions might assist in communication by creating an "atmosphere"? It need not be just attraction. The presentational suggestions could also convey important non-verbal messages about the personality of the author, about the way he sees the subject area and feels about it, etc. (I compare this to some uses of images described in my essay on images.) Needless to say, it would be very difficult to make it work.


Jukka Korpela
Originally written 1997-05-30. Only minor technical updates since that, the last one 2002-12-10.

This document is essentially an "expanded extract" from the document Why style sheets are harmful. The reason is that when writing the critical essay on style sheets I decided to discuss these more general themes as well, and later I realized that the general discussion is probably useful as an independent document, too.