Techniques for multilingual Web sites:
Language negotiation and settings in a server (with Apache as example)

It depends on the server and its settings whether and how an author can make available, via the language negotiation mechanism, versions of pages in different languages. Here we discuss only the methods that might be used in one widely used server software, Apache, and mainly just one of the two alternative methods there. As regards to other servers, see WebServer Directory by WebServer Compare, for links to original documents of different server software.

The alternatives on Apache

The Apache documentation contains the section Content Negotiation, which describes two basic methods:

Multiviews
The alternative versions are in the same directory, and they are named in some uniform way. The author specifies some general rule according to which a generic URL is to be mapped to file names referring to different versions.
type-map
For each generic URL there is a separate file that lists the corresponding language-specific file names, possibly with some associated properties (e.g. the encoding of the file).

For a description of these methods which is one step more detailed, see the CERN page Language Negotiation.

Using Multiviews

If Multiviews is enabled on Apache (as it is by default), then you can use language negotiation in the following, though somewhat limited, manner for a directory:

A simple example on using type-map

A simple example of applying the latter method:

Explanations of the versions

If a browser sends such language preferences that none of the versions is acceptable by them, Apache sends back the HTTP error code 406 Not Acceptable. This itself can be somewhat confusing, but further problems are caused by the associated conciseness: the text that comes along with the error messages contains links to the alternative versions but so that they just tell the relative URL and the language specified using a two-letter language code.

The situation can be improved to some extent by adding, into the .var file, after each alternative (below each Content-Language line) a line with the keyword Description: and a description of the alternative, e.g. the name of the page in its own language. For example, for the English version of the main page of this documentation I have written:
Description: Techniques for multilingual Web sites
By adding such descriptions, you can make the server response look somehow understandable:

Not Acceptable

An appropriate representation of the requested resource /~jkorpela/multi/index.html could not be found on this server.

Available variants:

You might be able to make the situation even better by creating a specific error page for the error code 406 and by applying the ErrorDocument-directive to make Apache use that customized error page.

The best option is, however, probably to append a generic alternative to the list: an alternative with no Content-Language specified. Such an alternative will be sent by the server as a response to a request which cannot be satisfied by any other alternative. That alternative could be a page that explains the available alternatives in English, with their names in their own languages. The page could additionally, for the general benefit of the user, give the user some advice on setting his browser's language preferences at least by adding English there.

An example of such an alternative: my "generic" 404 error message page. That example is somewhat special, since there are specific pragmatic requirements on error page contents.

Another example: language selection for this set of pages

Language negotiation for this documentation, Techniques for multilingual Web sites, has been implemented using the type-map method. (The server that was originally used by the author had been configured not to allow the use of Multiviews.) In detail, the method has been used as follows:

It is then caused by settings of the server (which are in this case the default settings of Apache) that when a browser sends a request for http://www.cs.tut.fi/~jkorpela/multi/ (with a trailish slash), the server first expands it to the URL http://www.cs.tut.fi/~jkorpela/multi/index.html and then begins language selection. Note that the "URL" or "Location" box in the browser displays the original URL, since the expansion was made by the server without informing the browser; the browser has just got the content of the document that the server selected. This does not mean that such URLs like http://www.cs.tut.fi/~jkorpela/multi/index-fi.htm wouldn't work any more; they just refer to the specific alternatives in a fixed way, bypassing the language selection mechanism.

The Apache documentation uses the expression ".var file", but this does not mean that the file names must end with .var. The approach described above, using .html instead of .var, is a bit esoteric, but handy. Note that this approach cannot be applied (well, cannot be conveniently applied), if the directory already contains normal HTML documents in files ending with .html and their URLs should keep working. The reason is that the server applies the type-map method to all files with names ending with the string specified in the AddHandler directive; thus all such files must be type-map files. In particular the language-specific file names must not end with .html in our example; but .htm will do just fine.

Next section: Language selection in browsers.


2003-09-08 Jukka K. Korpela