section Proposed improvements to
HTML and translation techniques
Proposed improvements to
the HTML language
This is a very preliminary "wish list".
Most probably some of the problems discussed here should be
solved by introducing a more general construct than the one
proposed here, or solved outside HTML, e.g. by improving the
Sorry, you probably don't understand very much of this unless
you know the HTML language rather well.
- It is very important in translations that the translator
knows the context. A human translator or
and advanced automatic translator could first process the
entire text and deduce what topics it discusses, then
use this information to select suitable alternatives among
dictionary entries. It might still be useful to assist
translators by using markup which specifies the context,
for an entire document, or for a part of it, or even for
a single word or phrase. Some general classification (e.g. UDK)
might be used. When a translator knows that the text discusses
mathematics, for example, it (or he) is in a much better position
to translate terms like "field" or "product" appropriately.
In other languages, these words as mathematical terms may
have equivalents quite different from the words that correspond
to "field" or "product" in everyday language.
On the other
hand, a mathematical text might use "field" in its everyday
- Markup for indicating "mode" or "style" such
as sarcasm, jocularity, or poetry might be added.
On Usenet, people sometimes use pseudo-HTML like
"<SARCASM>...</SARCASM>. Perhaps translation programs
might one day benefit from such markup. (They might also be
used e.g. by speech-based user agents in order to select
appropriate tone of voice.)
Introduce markup for indicating uncertainty
of enclosed text. Additional information indicating the
numerical degree of uncertainty and an explanation of the
reason for uncertainty might be introduced.
Translators - both
automatic and human - might use such markup for indicating
that the translation might be incorrect.
The explanation generated might include the corresponding fragment
in the original text.
(Notice that in our
example of Finnish to English translation,
the translator uses italics to denote
one kind of
uncertainty. Naturally, logical markup would be needed
instead of physical, but current HTML has no suitable element.)
The markup could also
be used for other purposes, such as in an HTML document containing
the text of an old manuscript where some words are less
certain than others, e.g. interpolated.
- In conjunction with uncertainties, markup with
variants could be used. That is, at the HTML level
one could specify different variants for a piece of text.
One of the variants could be marked as the "normal" one to be
presented to the user by default, distinguished in a manner
similar to links, and letting the user access the other variants
and their explanations. Naturally, translators should generate
such constructs sparingly.
- Introduce "joiner" markup which means that
two or more words are to be treated as a whole. This would be
especially useful for languages like English which use
phrases consisting of several words rather than compound words.
Translation might be greatly assisted by indicating how
words belong together. (Notice that the existing
SPAN element and the no-break space
do not logically mean the same as a "joiner" markup would.)
ORIGINAL into the set of standardized
values for the
REL attribute. It would indicate
a link to the original version from which the current version
was translated. Translator programs should leave such links
intact instead of converting them to links through a translator
(as Babelfish now seems to do to all links).
Naturally, a translator program, when asked to translate a document
to language X, should check whether the document
itself refers to its original which is written in X.
This could be important when following links in a manner which goes
through translations; it might prevent the situation where
the user gets a translation of a document
from language Y to X
instead of getting an existing original in X!
- Include markup for presenting
author's estimate of translatability.
This should be in the HTML markup, not in metadata outside it.
For example, a prose document might contain a quotation of
a poem, and the author might wish to denote that it has
low translatability. This might imply, depending on software
and user preferences, that parts of a document are not
translated, or translations would be indicated as being
less reliable than usual. At the HTML level, this might involve the use
"uncertainty element" mentioned above.
To be continued...
Date of last update: 1998-08-24