On multilingual Web sites and CSS

This document describes some problems and ideas that are relevant when a Web site (or just a Web page) is made available in several languages with a common style sheet. This is a fairly advanced topic, so I refer to my notes on creating multilingual sites and to my list of style sheet (CSS) resources for background.

The issue is how to design a style sheet that works with the different language versions. It is possible to tune the style by CSS rules specific to a particular version, but usually we would like to avoid that.

Lengths of texts may vary a lot

Often a style sheet sets a specific width for a column of text or links. Even if the logical em unit (i.e., the size of the font) is used, problems may arise. In particular, a very brief expression (say, “FAQ”) may need to be expanded to a long one (say, “Usein kysyttyä”), perhaps because an abbreviation needs to be spelled out or replaced by a description, since the abbreviation or its equivalent is not known to readers of the translation. Even if translation of words and phrases is straightforward, differences in length can be significant.

As a rule of thumb, you should expect that the length of any string may double in translations. If you can’t afford that, due to to space limitations, translators should know about that.

Translators should be informed about the requirements of the style sheets, as far as there are essential implications like the above. The information might be given fairly exactly, e.g. as a note that items in a menu should be shorter than 42 characters if at all possible.

Roughly speaking, if the width of an element is set to N em units, it corresponds to 2×N characters. This postulates that normal mixed case is used in the text.

Ordering of phrases may change

At the low level of markup, phrase level, various inversions are possible. In English, we can say “Treasure Island by Stevenson” or “Stevenson’s Treasure Island”, but in Finnish, for example, there is no preposition corresponding to “by”, and a genitive expression must be used, unless you write ungrammatically or rephrase the text thoroughly. Thus, if the texts “Treasure Island” and “Stevenson” are marked up as elements, e.g. as links, the order of the elements will be different in the translation. This might matter if you use advanced CSS 2 selectors, for example.

Word length affects paragraph formatting

Web browsers normally don’t divide words into lines. This means that the when the width is small, the visual appearance can be rather poor in languages that use long words.

This is particularly relevant if the text is justified on both sides. Consider the following examples, a short text three languages, when formatted in a 8.5em wide column, justified. For English, the spacing between words is not quite nice in all cases; for German or Finnish it is often outright maddening.

Littering a dark and dreary road lay the past relics of browser-specific tags, incompatible DOMs, and broken CSS support.

Wir blicken zurück auf den dunklen Weg vergangener Relikte wie browserspezifischen Tags, inkompatiblen DOMs und einer brüchigen CSS Unterstützung.

Pimeän ja kolkon tien varrella lojuu roskia – selainriippuvien tägien, yhteensopimattomien DOMien ja rikkinäisen CSS-tuen jäännöksiä.

Text in images means a lot of work

It has been common on the Web to include text in image format, so that it's exact appearance can be specified. Sometimes even background images are used to present textual information.

In multilingual authoring, such approaches mean that images need to be generated for each and every version. And the process must be repeated after each modification of the text.

CSS currently has fairly good tools for specifying the visual appearance of texts, and this would save a lot of work as compared to using images. The tools are probably not satisfactory for logos and other texts for which a particular visual form is essential, but this is not a serious problem since such texts are typically proper names or abbreviations that usually remain invariant in translations, at least in logo-like usage.

When an image contains texts, like a map contains names of places, we could sometimes use one base image without the texts and have the texts as textual content, to be positioned over the image. This takes some work, but once done, the translations can be prepared by translating just the texts. The following demo illustrates this. It has the same treasure map in two languages, using the same image and the same markup, just with texts replaced in the HTML file.

[Treasure map]
The map has the following targets indicated in a manner that works on CSS enabled browsers only:
Safe entry
Dangerous reef
Karttaan on merkitty seuraavat kohteet tavalla, joka toimii vain CSS:ää tukevissa selaimissa:
Vaarallinen riutta

Stay tuned to cultural differences

Language and culture are not the same thing, but translations typically need to be written for people who live in a particular cultural environment. This may involve differences in views on the meanings of symbols and colors and other presentational features. There’s a review of some of the differences in the CEN Workshop Agreement European Culturally Specific ICT Requirements (CWA 14094; in PDF format).

This document is largely based on the author’s reflections while preparing the Finnish translation of the text of css Zen Garden, namely css Zen Garden: CSS-muotoilun kauneutta.