IT and communication - Web authoring and surfing:

Customized browsing: a proof of a concept

A common Web browser such as Internet Explorer can easily be turned into a customized "content extractor" that displays just selected parts of documents. In fact, one can say that such customizability has been built into IE at least since version 3, via support to Cascading Style Sheets (CSS).

A simple customization: grab a few elements, ignore the rest

Consider the following example, illustrated by the screenshot image above: A user wants to tune his browser to show documents so that 1st, 2nd, 3rd, and 4th level headings and tables are shown normally, and information marked as address is shown in reduced-size font, but everything else is ignored.

Although the actual usefulness is not the point here, it should not be too difficult to imagine situations where "compactification" of this kind might turn out to be useful. For example, if you intend to view a large set of documents returned as a response to a Google search, you might prefer a fast way of looking at the most important information only. And in studying a particular topic, you might decide that something like the above is a good way of achieving that. It might happen that not all documents have adequate markup for headings, and some of them might use tables for non-tabular data; but the point here is not perfection but usefulness.

You could write a user style sheet, which is a simple-format text file, like the following:

blockquote, center, del, dir, dl, form, frame,
  h5, h6, hr, ins, isindex, menu, noscript, ol, p, pre, ul 
  { display: none !important; }
h1, h2, h3, h4, table { display: block !important; }
address { display: block !important; font-size: 80%; }

This instructs the browser behave as described above. The content of the style sheet can hopefully be understood rather intuitively, if you know some basics of HTML, mainly what the element names listed roughly stand for. The first list of elements basically consists of all block-level elements except those that should actually be shown. (Technically, this could be made easier, if IE had a little better CSS support.)

To make IE use the style sheet, you would just select (on IE 5) Tools, then Internet Settings, then Accessibility, and check on the relevant checkbox and specify, via a file browse or by typing the name, the style sheet file. Voilą!

You can later switch back to "normal" state in the obvious way, i.e. checking off the checkbox. On more advanced browsers, such as Opera, it is possible to toggle between "author preferences mode" and "user preferences mode" by simply clicking on an icon. On Netscape 6, the user can select between different styles, and an author could provide for example the "overview style" discussed here as an alternate style sheet.

The future: more specific customization

The possibilities of customizing the presentation using CSS depend heavily on the expressive power of the version of CSS in use (as implemented on a browser). The current version CSS2 is fairly large but not that powerful for purposes of content extraction. The main limiting factor is the set of selectors available. A selector is an expression that specifies the set of page elements that a rule (like display:none or font-size:80%) applies to.

In CSS3, currently under construction, there will probably be several extensions to the selector syntax. They will let a person writing a style sheet identify the element(s) in elaborated ways, corresponding to notions like "the first line of each paragraph" or "items 3 and 5 in a list".

For individual pages, such as a page that is often viewed by the user and that has rapidly changing content that interests him, quite specialized "customized viewing" might be possible at present too, and especially in the future. The user might look at the HTML markup, figure out how its basic structure remains the same, and then write a special user style sheet for viewing it so that only the interesting (to this user) parts are shown, perhaps highlighting some content that is not highlighted in the original, etc.

Perhaps such mode of viewing will never become very popular, since it requires some technical understanding of both HTML and CSS. But tools can be developed for "customized viewing" via simple graphic user interfaces, so that the user just points at ingredients of a page and specifies how they are to be treated.

Malibu Anti-Portal system illustrates this, in a context that is more demanding than just browser customization: MAP delivers the content of user-selected portions of a page to a mobile phone as SMS text messages, either as timed "push" or by user-initiated "pull". (MAP actually operates at a more advanced level and can heuristically identify document structures beyond what HTML markup indicates. The point here is that selection of page content, as locations rather than just current actual content, can be made using a simple user interface that hides the technical details fairly well.)

Some implications: freedom of browsing

One of the inherent characteristics of digital data transmission is that when delivered to the recipient, the data is processable as the user sees fit. Using sufficiently complicated (proprietary) data formats, the processing task can be made more difficult, perhaps infeasibly difficult.

The Web has largely been based on technologies that do not create obstacles to customization. They might instead actively favor customization. Content providers may wish to use technologies that lack such properties, creating more rigid documents that are intended to be viewed "as is" only. But the use of simpler, more customization-friendly technologies will probably remain the mainstream, for their simplicity in part.

There's nothing really fancy about customizability and selectability. When you open a newspaper, you don't actually intend to read, or even view, all of it, as a rule. Admittedly, you can't change the font face and size, as you could on the Web. But the selection process works, though "manually" only, unless you can afford to pay someone to read newspapers for you and pick up the interesting stuff according to your instructions. Automated tools for content selection in browsing pages mean, in a sense, that anyone can afford a servant who delivers some selected content to the master, as instructed by the master.

Notes on the role of CSS

Style sheets have been conceptually part of the Web technology since very early days. Detailed specifications of style sheet technologies and especially implementations in browsers are newer, but despite the serious problems in implementations, style sheets are actually used widely. User style sheets are less widely known, but they are an integral part of all CSS specifications, and increased use is to be expected.

The precedence between user style sheets and author style sheets has varied in CSS specifications, but CSS2 set the balance so that for normal style sheets, a declaration in an author style sheet overrides a conflicting declaration in a user style sheet. But CSS has a mechanism for making a rule in a style sheet "important", in a technical sense, and for "important" style sheets, the user style sheet has preference. This boils down to the principle that a user is expected to have the final word on the appearance (visual presentation) of a document, down to the finest detail, if he really wants to.

Technical notes on the sample style sheet used

The original version of the sample style sheet discussed here contained the div element among those to which display:none is to be applied. However, as Ben Meadowcroft kindly pointed out to me, the CSS2 specification says:

9.2.5 The 'display' property

- -
This value causes an element to generate no boxes in the formatting structure (i.e., the element has no effect on layout). Descendant elements do not generate any boxes either; this behavior cannot be overridden by setting the 'display' property on the descendants.

Thus, my original style sheet would turn off e.g. the display of headings if they are enclosed into div elements. And using div to divide the document into logical sections is often regarded as a good authoring principle, or a nice tool for styling, or both.

On the other hand, the sample style sheet does not prevent the display of "loose" text inside a document, i.e. text that is directly inside the body element, without any intervening markup such as a p. Thus, the approach works best for well-structured documents.