Finnish language and localization: an executive summary

This information has been compiled mainly for people who make decisions on localization of software or on translations of texts. They also help people who implement such decisions. This presentation deals with such features of Finnish that imposes requirements on software design or translation processes. No previous knowledge about Finnish is assumed here.

Character repertoire

In addition to the common Latin letters, the letters ä and ö (in uppercase Ä and Ö) are necessary for Finnish texts. There is no accepted way to replace them. The letters š and ž are desirable, as they are part of official orthography, but in practice (though not officially) they are often replaced by sh and zh.

Basic units of texts

A Finnish word may be a compound word and it may contain several suffixes. A compound word often corresponds to two or more words in another language. For example, keskushermosto is “central nervous system”. This means that an English word, such as “central” or “nervous”, often cannot be translated into Finnish without knowing at least some of the context.

The suffixes often correspond to prepositions or other small words in other languages. For example, taloissammekin consists of the base word talo and four suffixes and means “in our houses, too”. Thus, e.g. in translation from English to Finnish, it is usually necessary to have at least a few consecutive words to work on, and it is very unrealistic to require “word to word” translations.

As a rule, a complete clause (with subject, verb, etc.) is usually the smallest feasible unit of translation. When individual words and phrases, such a menu item texts or button texts, need to be translated, they should be presented as grouped by context and with suitable explanations if possible.

Numeric expressions

Expressions like “five apples” or “5 apples” pose special problems when generated programmatically. For English, you can mostly use simple code that just appends “s” to the noun if the number is not one (1). In Finnish, the noun must be in a special case form, the partitive, e.g. 5 omenaa versus 1 omena or 5 hevosta versus 1 hevonen. This means that you either need to store the partitive forms of all nouns that may appear, in addition to the basic form, or to have a rather complicated algorithm that constructs the partitive forms. If you only store the partitive forms and use them even when the number is one (e.g., 1 omenaa, 1 hevosta), the result is understandable but odd-looking and ungrammatical, comparable to a presentation that uses “1 apples” and “1 horses” in English.

Word order

In general, word order cannot be preserved when translating into (or from) Finnish. The normal order of parts of a clause is often different from the order in English. For example, even a simple clause like “A new proposal was made” must be translated using a different order: Tehtiin uusi esitys, putting the verb (tehtiin, “was made”) at the start. The reason is that Finnish lacks articles, and the distinction that English makes by using “a” or “the” must be made using other means, such as word order.

To take another example, a sentence like 舠There is a rat in the house舡 cannot be reasonably translated so that the order of the words for rat (rotta) and house (talo) is preserved. The natural Finnish expression is Talossa on rotta.

Although Finnish is often said to have 舠free word order舡, the order is significant. It just often expresses different things than word order in English. Thus, a requirement that a specific order of words or expressions be preserved in translation is generally unrealistic.

Word inflection

Finnish has a large number of inflected forms for nouns, adjectives, numerals, pronouns, and verbs. In general, all the forms cannot be derived from the basic form alone. Two words may well have the same basic form but different inflection. Therefore, when storing a word as a vocabulary entry, inflection information should be stored as part of it.

When translating a word into Finnish, the sentence context is needed for the selection of a proper form. For example, it is impossible to give a single translation for the English word form “hats”, since it should be translated as hattuja when used in an advertisement text like “new hats for sale”, as hattua when occurring in “I have five hats”, as hatut when used as a label in a product catalog, etc.; and the phrase “in my hats” should be translated as a whole as hatuissani.

Word inflection is applied to proper names (including foreign names) and abbreviations, too. In abbreviations, the colon “:” appears before the suffix, e.g. EU:ssa “in the EU”. Word division after the colon (as applied by some software) is not acceptable.

Sometimes companies impose a requirement that a company name or a product name be used in one form only. This is impossible in Finnish, as impossible as it would be to write about a product in English without ever using any preposition before the product name. When a trade mark symbol is appended to a name, it is written after the inflected form, e.g. Aspirin® (basic form), Aspirinin® (genitive). The only way to avoid all inflection of a name is to use a hyphenated compound word with the name as the first part and a generic noun as the second part, so that the second part is inflected, e.g. Aspirin-lääke (lääke means medicin), genitive Aspirin-lääkkeen. Such texts look clumsy and artificial.

When patterns such as “from … to …” need to be translated, the process should deal with each pattern as a whole rather than translate just “from” and “to”. Those prepositions simply have no translations as such in Finnish; they need to be translated by attaching a suitable suffix to the next word. The suffix depends on the context and on the word, and there may be a change in the word stem involved. For example, “from Helsinki to Vantaa” should be translated as Helsingistä Vantaalle and “from Tampere to London” as Tampereelta Lontooseen.

The importance of word flexion also means that search routines that simply operate on words are of very limited usefulness for Finnish. A word may have dozens (even hundreds) of inflected forms. Search engines like Google can deal with this in a limited manner. In most situations, it is more or less sufficient to have the ability to search with wildcards at the end of a string. For example, “Helsin*”, where “*” is a wildcard, would find Helsinki, Helsinkiin, Helsingissä, and all the other inflected forms.

For similar reasons, automatic checks for consistency of use of terms generally fail if they do not recognize inflected forms. Although a Finnish noun has dozens of forms (when all possible suffixes are counted), typically only a handful of them occurs in normal text when the noun is a term. This means that recognition of inflected forms can even be handled in a simplistic manner by listing the most common forms in the term glossary.

Length of words and expressions

Long words are common in Finnish due to many suffixes and compound words. Words longer than 20 characters appear often in business texts. For such reasons, hyphenation is desirable. Without hyphenation, lines tend to be of different lengths, causing either very ragged right margin or (in a justified column) very wide gaps between words.

As a rule, the length of a piece of text should be expected to vary greatly when translated into another language, even doubled or more. For this reason, fixed width settings on texts should be avoided or set rather liberally. For example, in user interfaces, a menu item like “Save As” is usually (and properly) translated into Finnish as Tallenna nimellä.

Hyphenation

The basic hyphenation rules are simple and easy to implement in software. However, compound words require special attention. Good hyphenation requires either software that knows how to recognize the components of compound words or manual checking. Hyphenating Finnish texts with English hyphenation rules produces unacceptable results.

Capitalization

In the use of capital letters, Finnish generally follows continental European (e.g. French) tradition rather than English practice. This means that normally only the first letter of a sentence (or a sentence-like separate expression) and the first letter of each proper noun is in upper case. Derivations of proper names, such as englanti (English language) and englantilainen (English or Englishman or Englishwoman), are not treated as proper names.

Capitalizing almost every word in a title of a work, which is common in English (e.g., “On the Origin of Species”), is definitely incorrect in Finnish. Capitalizing words for emphasis, as in ”Very Important” (Hyvin Tärkeää) is not normal in Finnish and may make a very childish impression.

If text is written in all upper case, care should be taken to make sure that ä and ö are capitalized, too.

For business documents, a requirement on writing some words in all upper case is often made. Typically, the words are company or product names or terms used in a contract, such as COMPANY and CUSTOMER. Such style has traditionally not been used in Finnish, and language authorities recommend against it, but it has become increasingly common.

Collation and sorting

The standard alphabetic order in Finnish is A B C D E F G H I J K L M N O P Q R S (Š) T U V (W) X Y Z (Ž) Å Ä Ö. Letters in parentheses are treated as equivalent to the preceding letter. However, it is increasingly common and now standard to treat W as a letter of its own, placed after V.

Sorting algorithms designed for English do not sort Finnish words correctly, since they treat Å, Ä, and Ö as variants of A and O, rather than as separate letters at the end of the alphabet. On the other hand, sorting tailored for Finnish often treats W as a variant of V, instead of applying the modern approach.

Punctuation

Finnish uses symmetric quotation marks: tekstiä” and (within a quotation) ’tekstiä’. The opening and closing mark are identical and correspond to the closing mark as used in English, e.g. “text” or ‘text’.

Lack of he/she distinction

Finnish has no separate male or female pronoun. The same pronoun hän is used for both sexes. This may cause unintended ambiguity in tranlations. A common technique to avoid that is to use people’s names instead of pronouns when needed.