section Proposed improvements to
HTML and translation techniques
Proposed improvements to
The following list indicates some deficiences and problems
me when using it.
The list is by no means exclusive and not even systematic.
- Instead of simply translating from one language to another,
the languages being specified by the user,
a translation program should operate on a
multilingual basis, not bilingual.
That is, it should accept data where different languages
may appear and it should produce a result where different
languages might be used, depending on language
preferences. For example, a user might request
for a translation in Finnish but so that texts in Swedish and
English are passed as such; and for other languages, if
direct translation from it to Finnish is not available,
he might prefer an English translation to a translation
from the original via English.
- Babelfish seems to ignore the
entirely. In addition to using the
HTML element in order to determine
the basic language in the document,
a translation program should check the
in contained elements
and leave texts written in other languages than the
basic source language untranslated, or translate them using
algorithms and lexica for the language specified.
- No attempt is made to translate texts in attributes like
ALT. It would be
quite essential to have them translated, too. Notice that
ALT is crucial for
- Any text within
SAMP elements should be left untranslated by
PRE elements are messed up, since
the translation does not preserve line breaks.
- Babelfish often translates words assuming a
specific technical meaning even for
words which are much more often used in another meaning.
For instance, the word "reader" gets translated into
"program de lectura" in Spanish!
- Babelfish converts notations like 3.2 in English assuming
that they are decimal numbers, making it 3,2 in French
for example. This is incorrect when the notation is actually
something else, such as a program version number.
(An addition to HTML language might be needed to distinguish
between the cases. As an interim solution, translation
programs should refrain from trying conversion between
different notations for decimal numbers.)
Some ISO 8859-1 characters are not handled correctly
in some cases.
To be continued...
Next subsection: Proposed improvements to
the HTML language