Translation-friendly authoring
,
section Automatic translation and HTML
As an example of how modifications to a document can improve translatability, I have taken a short page which tells some numeric and other facts about the university where I work, HUT. Such simple fact pages could be expected to be relatively easily translatable, since they do not contain grammatically complex structures. Moreover, automatic translatability is essential since such pages can be interesting to people speaking different languages, and one hardly wants to allocate resources to maintaining such pages in many languages by hand.
Note: The example document and its modified form and their translations are not embedded into this document. Instead, links to them are provided. In a typical graphical browser, such as Internet Explorer or Netscape, on Windows for example, you can use the rightmost button of the mouse when following a link (instead of the normal use of the leftmost button), then select the alternative Open in New Window in the pulldown menu opened. You can the move window to another position on the screen and resize it suitably, e.g. so that you can view different versions side by side.
The original page is a short fact sheet, Helsinki University of Technology in a Nutshell. In its French translation by Babelfish, there are several obvious failings (most of which you probably notice even if you don't know French):
In other translations, there are similar failings but also some different problems. For example:
ä
circumvents the problem.
In order to solve some of the problems detected, I constructed an experimental modified page by applying the methods described in the first section (guidelines on natural language usage and guidelines on HTML markup). Its French translation (by Babelfish) is considerably better than that of the original. The remaining flaws (such as "professeurs d'associé" instead of "professeurs associés") are probable things that can be fixed only by improving the translation program.
Notes on the changes:
ADDRESS
element,
which is
treated in
"don't translate this" mode by Babelfish.
The illogical use of ADDRESS
for something that really
isn't a normal address thus causes unwanted phenomena in
automatic translation.
In the first of ADDRESS
, the tags were simply removed.
In the latter case, they were replaced by SMALL
tags;
it seems natural to suggest that technical information about
the maintenance of a document should appear in smaller font
than normal.
ADDRESS
element, namely the
abbreviation HUT) need to be
protected from any attempt to translate them.
This was made using the
"SAMP
hack";
it has the drawback that words so marked are presented in monospaced
("typewriter") font on many browsers by default. Style sheets are
used to suggest another rendering, small-caps.
You may wish to compare the presentation of the modified document (in English) on your browser with a screenshot of what it looks like in one browsing situation viewed on Internet Explorer 4.0 with stylesheet support on. (It isn't quite what it should, due to deficiencies in stylesheet support.)
You may wish to look at the other translations of the modified document:
German translation | Italian translation | Portuguese translation | Spanish translation |
The Portuguese translation is the most problematic. In addition to the "nutshell" problem mentioned above, the change of the English spelling "vicerector" to "vice-rector" caused a new problem: it's now translated as "vice-vice-rector"!
Next subsection: Logical markup and translation