Finnish language and localization: an executive summary

This information has been compiled mainly for people who make decisions on localization of software or on translations of texts. They also help people who implement such decisions. This presentation deals with such features of Finnish that impose requirements on software design or translation processes. No previous knowledge about Finnish is assumed here.

Character repertoire

In addition to the common Latin letters, the letters ä and ö (in uppercase Ä and Ö) are necessary for Finnish texts. There is no accepted way to replace them. The letters š and ž are desirable, as they are part of official orthography, but in practice (though not officially) they are often replaced by sh and zh.

Basic units of texts

A Finnish word may be a compound word and it may contain several suffixes. A compound word often corresponds to two or more words in another language. For example, keskushermosto is “central nervous system”. This means that an English word, such as “central” or “nervous”, often cannot be translated into Finnish without knowing at least some of the context.

The suffixes often correspond to prepositions or other small words in other languages. For example, taloissammekin consists of the base word talo and four suffixes and means “in our houses, too”. Thus, e.g. in translation from English to Finnish, it is usually necessary to have at least a few consecutive words to work on, and it is very unrealistic to require “word to word” translations.

As a rule, a complete clause (with subject, verb, etc.) is usually the smallest feasible unit of translation. When individual words and phrases, such a menu item texts or button texts, need to be translated, they should be presented as grouped by context and with suitable explanations if possible.

Word order

In general, word order cannot be preserved when translating into (or from) Finnish. The normal order of parts of a clause is often different from the order in English. For example, even a simple clause like “A new proposal was made” must be translated using a different order: Tehtiin uusi esitys, putting the verb (tehtiin, “was made”) at the start. The reason is that Finnish lacks articles, and the distinction that English makes by using “a” or “the” must be made using other means, such as word order.

Word inflection

Finnish has a large number of inflected forms for nouns, adjectives, numerals, pronouns, and verbs. In general, all the forms cannot be derived from the basic form alone. Two words may well have the same basic form but different inflection. Therefore, when storing a word as a vocabulary entry, inflection information should be stored as part of it.

When translating a word into Finnish, the sentence context is needed for the selection of a proper form. For example, it is impossible to give a single translation for the English word form “hats”, since it should be translated as hattuja when used in an advertisement text like “new hats for sale”, as hattua when occurring in “I have five hats”, as hatut when used as a label in a product catalog, etc.; and the phrase “in my hats” should be translated as a whole as hatuissani.

Word inflection is applied to proper names (including foreign names) and abbreviations, too. In abbreviations, the colon “:” appears before the suffix, e.g. EU:ssa “in the EU”. Word division after the colon (as applied by some software) is not acceptable.

Sometimes companies impose a requirement that a company name or a product name be used in one form only. This is impossible in Finnish, as impossible as it would be to write about a product in English without ever using any preposition before the product name. When a trade mark symbol is appended to a name, it is written after the inflected form, e.g. Aspirin® (basic form), Aspirinin® (genitive). The only way to avoid all inflection of a name is to use a hyphenated compound word with the name as the first part and a generic noun as the second part, so that the second part is inflected, e.g. Aspirin-lääke (lääke means medicin), genitive Aspirin-lääkkeen. Such texts look clumsy and artificial.

When patterns such as “from … to …” need to be translated, the process should deal with each pattern as a whole rather than translate just “from” and “to”. Those prepositions simply have no translations as such in Finnish; they need to be translated by attaching a suitable suffix to the next word. The suffix depends on the context and on the word, and there may be a change in the word stem involved. For example, “from Helsinki to Vantaa” should be translated as Helsingistä Vantaalle and “from Tampere to London” as Tampereelta Lontooseen.

The importance of word flexion also means that search routines that simply operate on words are of very limited usefulness for Finnish. A word may have dozens (even hundreds) of inflected forms. Search engines like Google can deal with this in a limited manner. In most situations, it is more or less sufficient to have the ability to search with wildcards at the end of a string. For example, “Helsin*”, where “*” is a wildcard, would find Helsinki, Helsinkiin, Helsingissä, and all the other inflected forms.

Length of words and expressions

Long words are common in Finnish due to many suffixes and compound words. Words longer than 20 characters appear often in business texts. For such reasons, hyphenation is desirable. Without hyphenation, lines tend to be of different lengths, causing either very ragged right margin or (in a justified column) very wide gaps between words.

As a rule, the length of a piece of text should be expected to vary greatly when translated into another language, even doubled or more. For this reason, fixed width settings on texts should be avoided or set rather liberally. For example, in user interfaces, a menu item like “Save As” is usually (and properly) translated into Finnish as Tallenna nimellä, and the button text “Undo” would best be translated as Peruuta muutos (since mere Peruuta could also mean “cancel”, for example).

Hyphenation

The basic hyphenation rules are simple and easy to implement in software. However, compound words require special attention. Good hyphenation requires either software that knows how to recognize the components of compound words or manual checking. Hyphenating Finnish texts with English hyphenation rules produces unacceptable results.

Capitalization

In the use of capital letters, Finnish generally follows continental European (e.g. French) tradition rather than English practice. This means that normally only the first letter of a sentence (or a sentence-like separate expression) and the first letter of each proper noun is in upper case. Derivations of proper names, such as englanti (English language) and englantilainen (English or Englishman or Englishwoman), are not treated as proper names.

Capitalizing almost every word in a title of a work, which is common in English (e.g., “On the Origin of Species”), is definitely incorrect in Finnish.

If text is written in all upper case, care should be taken to make sure that ä and ö are capitalized, too.

For business documents, a requirement on writing some words in all upper case is often made. Typically, the words are company or product names or terms used in a contract, such as COMPANY and CUSTOMER. Such style has traditionally not been used in Finnish, and language authorities recommend against it, but it has become increasingly common.

Collation and sorting

The standard alphabetic order in Finnish is A B C D E F G H I J K L M N O P Q R S (Š) T U V (W) X Y Z (Ž) Å Ä Ö. Letters in parentheses are treated as equivalent to the preceding letter. However, it is increasingly common to treat W as a letter of its own, placed after V.

Sorting algorithms designed for English do not sort Finnish words correctly, since they treat Å, Ä, and Ö as variants of A and O, rather than as separate letters at the end of the alphabet. On the other hand, sorting tailored for Finnish often treats W as a variant of V, instead of applying the modern approach

Punctuation

Finnish uses symmetric quotation marks: ”tekstiä” and (within a quotation) ’tekstiä’. The opening and closing mark are identical and correspond to the closing mark as used in English, e.g. “text” or ‘text’.