Translation-friendly authoring
,
section Practical guidelines for authors
These guidelines apply to the textual content of documents,
irrespective of the presence of HTML markup or some other markup.
Mostly the guidelines apply to all forms of translation - human,
automatic, or combined.
- Make your material available, at least as one option,
as
a set of small pages, each consisting
of a logical unit such as a section or subsection or
perhaps a large table.
(As a concrete practical point,
Babelfish
says, in
its
help file,
that
it translates "a maximum of 5k
of text in an html page".)
When using HTML, the pages should of course be interlinked.
"Small" means,
speaking very roughly, at most two pages when printed
on paper with typical settings. This is useful mainly
for practical reasons, such as restrictions in
freely accessible translation services and evaluation versions
of programs, and also for efficiency reasons: it is faster to
translate a short text separately than as part of a large text,
of course. (In principle, translating as part of a large text
may produce better quality, since the translation program can
make use of the context.)
- Use normal language,
avoiding idiomatic expressions,
dialect and slang words, and
technical terms outside their normal scope.
Figurative expressions are risky, unless the metaphor is
widespread among languages.
Fixed idiomatic expressions as such pose no fundamental
challenge to translation software; it is quote easy to
make a program check a large list of fixed phrases and use predefined
translations for them, instead of translating "word by word".
But current translation programs are not very good at idioms,
and even in the long run idioms will cause problems in cases
where it is context-dependent whether a phrase is to be interpreted
literally or idiomatically.
Naturally, you should not impoverish you language to make your
documents more suitable to simple translators, just to notice
somewhat later that newer software could handle richer language
better. The point here is that you should think about
your language and abstain from using idiomatic and figurative
language in vain, keeping in mind both human readers
whose native language might be different from yours and
automatic translators which are unable to read anything
between the lines.
- In particular, say things directly instead
of using hidden humour, sarcasm, or implicit language.
On the Internet,
Wiio's law
"if a message can be understood in different ways,
it will be understood in just that way which does the most harm"
applies particularly widely.
Attempts to be sarcastic will fail even more often when read through
automatic translation.
- Write simple sentences which are reasonably
short. The longer and the more complex the sentence, the more
probable it is that an automatic translator (or a human reader!)
parses it wrongly.
- Write words and phrases in full form,
avoiding abbreviations, except very common ones
like "etc". (If you really need to use abbreviations in HTML
authoring, you may consider using the
ABBR
element to specify an expansion of an
abbreviation.)
- Prefer words with specific meaning to words
which have a large set of different meanings. For example,
instead of using a word like "issue" in the meaning
'subject of a discourse', consider using "topic", since
"issue" has several meanings and a
translation program would have great difficulties in
selecting a correct one.
- Formulate sentences so that ambiguous words have
suitable local context to give a clue
to translation software (and people). For example, the
English word "type" has quite a many meanings,
as a verb or as a noun, and this might
confuse a translator. If your intention is to give an instruction
to type something, please begin it with something like "please type".
- Keep phrases together if feasible. For example,
don't use a list header like
"The goal of the university is to:" but instead attach the
word "to" to each verb in the list, making it clear
(even to an intellectually challenged translator program)
that it is
an infinitive form.
-
As an interim solution, prefer spellings
like "DejaNews" and "AltaVista" to "Deja News" and "Alta Vista",
since a simple translator might well translate the latter
alternatives word by word (producing e.g.
"Nouvelles De Deja"!).
- Use spelling checkers.
A spelling error is very often
corrected (perhaps unconsciously) by a human reader but
may cause serious trouble to translation programs
(as well as indexers and other software).
For HTML documents in English, you could use e.g.
a simple online spelling checker named
WebSter's Dictionary.
Next subsection: Guidelines on HTML markup