Handbook of Finnish, 2nd edition, section 21 Language technology and Finnish:

Finnish language and localization: a summary

This information has been compiled mainly for people who make decisions on localization of software or on translations of texts. They also help people who implement such decisions. This presentation deals with such features of Finnish that impose requirements on software design or translation processes. No previous knowledge about Finnish is assumed here. This presentation is self-contained and can be read independently. Most items in this summary are described in more detail elsewhere in this book.

Character repertoire

In addition to the common Latin letters, the letters ä and ö (in uppercase Ä and Ö) are necessary for Finnish texts. There is no accepted way to replace them. In situations where they cannot be used, they are usually replaced by a and o, less often by ae and oe.

The letters š and ž are desirable, as they are part of the official orthography, but in practice (though not officially) they are often replaced by sh and zh.

Basic units of texts

A Finnish word may be a compound word and it may contain several suffixes. A compound word often corresponds to two or more words in another language. For example, keskushermosto is “central nervous system”. This means that an English word, such as “central” or “nervous”, generally cannot be translated into Finnish without knowing at least some of the context.

The suffixes often correspond to prepositions or other small words in other languages. For example, taloissammekin consists of the base word talo and four suffixes (i, ssa, mme, kin) and means “in our houses, too”. Thus, e.g. in translation from English to Finnish, it is usually necessary to have at least a few consecutive words to work on, and it is very unrealistic to require “word to word” translations.

A complete clause (with subject, predicate, etc.) is usually the smallest feasible unit of translation. When individual words and phrases, such as menu item texts or button texts, need to be translated, they should be presented as grouped by context and with suitable explanations.

Word inflection

Finnish has a large number of inflected forms for nouns, adjectives, numerals, pronouns, and verbs. In general, all the forms cannot be derived from the basic form alone. Two words may well have the same basic form but different inflection. Therefore, when storing a word as a vocabulary entry, inflection information should be stored as part of it.

When translating a word into Finnish, the sentence context is needed for the selection of a proper form. For example, it is impossible to give a single translation for the English word form “hats”, since it should be translated as hattuja when used in an advertisement text like “new hats for sale”, as hattua when occurring in “I have five hats”, as hatut when used as a label in a product catalog, etc.; and the expression “my hats” should be translated as hattuni.

Word inflection is applied to proper names (including foreign names) and abbreviations, too, though with some exceptions to normal rules. In abbreviations, the colon “:” appears before the suffix, e.g. EU:ssa “in the EU”. Word division after the colon (as applied by some software) is not acceptable. If a trade mark symbol is appended to a name, it is placed after the inflected form, e.g. Coca-Colalla®, Windowsissa™.

Word inflection in software

The inflection of a word is indicated in dictionaries of the Finnish language such as Kielitoimiston sanakirja. There is an online, machine-readable (XML format) presentation of inflection, Nykysuomen sanalista, covering 94,100 word entries. It is not official, but it has been provided by the Institute for the Languages of Finland, Kotus..

The use of inflection information requires a lot of code. There are 78 inflection types (counting both noun and verb inflection), and e.g. the number of different inflection forms for a noun is about 140. In addition, there are separate rules for modifying the word stem (consonant gradation) and choice of suffix variant (vowel harmony). Thus, if an application needs to generate or recognize just a few inflected forms of a limited set of words, it can be best to use a table containing just the forms needed for those words, rather than trying to set up a general inflection algorithm.

There is software available for generating and recognizing inflected forms, but it can be difficult to integrate it into other software. There is Joukahainen, which is a free online vocabulary database, with the ability generate inflected forms. Voikko is a package of free linguistic software and data for Finnish, containing a spelling and grammar checker and a hyphenator, with underlying inflection capabilities. FINTWOL is a morphological analyzer by Lingsoft.

When patterns such as “from … to …” need to be translated, the translation process should deal with each pattern as a whole rather than translate just “from” and “to”. Those prepositions simply have no translations as such in Finnish; they need to be translated by attaching a suitable suffix to the next word. The suffix depends on the context and on the word, and there may be a change in the word stem involved. For example, “from Helsinki to Vantaa” should be translated as Helsingistä Vantaalle and “from Tampere to London” as Tampereelta Lontooseen.

Spelling, grammar, and style checks

Good spelling and grammar checkers are available for Finnish. However, inflection of words and compound words make simple, word list based checking clearly inadequate. Hunspell, the widely used open source spelling checker, cannot handle Finnish properly either, even though it can deal with part of the problems. Yet many programs that are advertized as providing spelling checking for several languages use Hunspell.

The language pack for Finnish in Microsoft Word has an advanced spelling and grammar checker, with some optional style checks, too. The grammar and style checks can be configured; this is little known and poorly documented, but this book contains a description of the available settings.

For word processors like LibreOffice and StarOffice, the free Voikko package can be used. It is of good quality and under continuous development. There is also a web version of the package, Oikofix.com, which can be used to check texts via direct input (copy and paste) and to check web pages.

If a spelling checker lets you add words to a custom dictionary, you probably need to add words to all inflected forms separately. This is not as bad as it sounds, since typically a word occurs in a text in a few inflected forms only, and you can add just forms that are actually used in your texts.

Readability measurements

Simple readability measurement indexes for Finnish were developed by professor Osmo A. Wiio in the 1960s. They have been heavily criticized for being too technical. However, the main reason why they have not been commonly used is that code for computing them has not been included in popular software.

Nowadays, however, the Oikofix service computes a Wiio index that predicts the grade level, i.e. number of years of education needed to read the text. It is based solely on the lengths of words, measured in syllables: the index is 2.7 + 0.3X, where X is the percentage of long words, with “long” defined as consisting of four or more syllables. Thus, if a sequence of 50 words has just one long word (2% of words), the index is 2.7 + 0.3 × 2 = 3.3. Generally, grade level less than 7 means simple text, 7 to 10 is average text, and greater than 10 is difficult.

Of course, actual readability is a very complicated issue. Wiio’s simple grade level index is just a useful tool for checking that word length does not make the text excessively difficult. The limits depend on judgment and on the purpose and nature of texts. Since the compulsory school education in Finland is 9 years, we can say that newspaper texts, information given to general audience on practical matters, and similar texts should be written so that the index is less than 10.

Hyphenation

Since Finnish words are long in the average and may be very long, hyphenation is essential to good text formatting. Basic hyphenation is very simple in Finnish and can be handled algorithmically, without a hyphenation dictionary. However, compound words and new loanwords cause considerable extra work and in practice often require manual checking for perfect results.

Since texts are usually hyphenated fully automatically, incorrect hyphenations of compound words are common. For typographically acceptable results, texts to be printed should be proofread at least to check against such errors, which is relatively fast. For good results, proofreading needs to pay attention to avoiding incorrect or inferior hyphenation between vowels (e.g. dividing “kauan” into “kau-” and “an”), since automatic hyphenators produce such hyphenations. This requires more work and a proofreader who knows the rules well.

Impact on searching

The importance of word inflection also means that search routines that simply operate on words as strings are of very limited usefulness for Finnish. A word may have dozens (even hundreds) of inflected forms. In many situations, it is more or less sufficient to have the ability to search with wildcards at the end of a string. For example, “Helsin*”, where “*” is a wildcard, would find Helsinki, Helsinkiin, Helsingissä, and all the other inflected forms.

Google does not support wildcards for referring to words that start with a given string. Instead, Google makes its own analysis based on the recognition of some inflected forms, so that if you e.g. search for “Helsinki”, it will also find pages where the name appears in other forms only. The details of this have not been disclosed, but it probably recognizes only relatively frequently used inflected forms.

Numeric expressions

Expressions like “five apples” or “5 apples” pose special problems when generated programmatically. For English, you can mostly use simple code that just appends “s” to the noun if the number is not one (1). In Finnish, the noun must be in a special case form, the partitive, e.g. 5 omenaa versus 1 omena or 5 hevosta versus 1 hevonen. This means that you need a) to store the partitive forms of all nouns that may appear, in addition to the basic form, or b) to have a rather complicated algorithm that constructs the partitive forms.

If you only store the partitive forms and use them even when the number is one (e.g., 1 omenaa, 1 hevosta), the result is understandable but odd-looking and ungrammatical, comparable to a presentation that uses “1 apples” and “1 horses” in English.

Word order

In general, word order cannot be preserved when translating into or from Finnish. The normal order of parts of a clause is often different from the order in English. For example, even a simple clause like “A new proposal was made” must be translated using a different order: Tehtiin uusi esitys, putting the verb (tehtiin, “was made”) at the start. The reason is that Finnish lacks articles, and the distinction that English makes by using “a” or “the” must be made using other means, such as word order.

To take another example, a sentence like “There is a rat in the house” cannot be reasonably translated so that the order of the words for rat (rotta) and house (talo) is preserved. The natural Finnish expression is Talossa on rotta.

Although Finnish is often said to have “free word order”, the order is significant. It just expresses different things than word order in English. Thus, if a specific order is imposed, the meaning or the style may change.

Lengths of expressions

As a rule, the length of a piece of text should be expected to vary greatly when translated into another language, even doubled or more. For this reason, fixed width settings on texts should be avoided or set rather liberally. For example, in user interfaces, a menu item like “Save As” is usually (and properly) translated into Finnish as Tallenna nimellä.

Abbreviations

When a word needs to be abbreviated, it is cut between a consonant and a vowel, and a period “.” is appended. For example, the possible abbreviations of kirjoittanut are k., kirj., kirjoitt., and kirjoittan., though the last one does not abbreviate much and the first one is hardly understandable without an explanation.

In constrast, international identifiers of units, quantities, etc., are written without periods, e.g. min = minuutti, h = tunti (hour). Following this principle, abbreviations of units (based on Finnish words, not on international conventions) are also written without periods, e.g. t = tunti, tlk = tölkki (can).

It has often been proposed that periods be omitted from abbreviations, or most of them. This has not been accepted by language authorities, but it is an established practice in the military, where e.g. kenr is used for kenraali (general), instead of the standard abbreviation kenr. with a period.

An abbreviation normally represents only the stem of a word, and if the context requires an inflected form, the suffix is appended so that the period is replaced by a colon “:”. Thus, for example, the abbreviation s. is commonly used for sivu (page), but the genitive sivun, when abbreviated, must be written as s:n.

There is also a set of common abbreviations that differ from the simple principle. Among them, a few abbreviations of sequences of two more words are important, such as jne. = ja niin edelleen (and so on), ym. = ynnä muut/muita (and others), and mm. = muun muassa (among other things). When reading texts aloud informally, and sometimes in normal speech, these abbreviations are often spoken by letters, e.g. jii änn ee and yy äm.

Many modern concepts are denoted by initialisms, i.e. by expressions that take the first letters of two or more words and use them, often without periods. In this context, “word” often means a component of a compound word, too. However, notations vary, and e.g. ALValvalv. are all used and accepted, for arvonlisävero (value-added tax, VAT). The trend is to use all lowercase, without any periods (e.g. alv).

There is a large and authoritative list of abbreviations and identifiers used in Finnish, Lyhenneluettelo by Kotus. However, it cannot cover all abbreviations, and lists like this tend to be partly outdated (they contain abbreviations that are no more in use or are used in specialized texts only).

As in other languages, abbreviations are avoided in formal writing. Newspapers and maganizes may apply the same principle, mainly for readability, except in technical and scientific texts, where identifiers and abbreviations are often unavoidable.

Capitalization

In the use of capital letters, Finnish generally follows continental European (e.g. French) tradition rather than English practice. This means that normally only the first letter of a sentence (or a sentence-like separate expression) and the first letter of each proper noun is in upper case. Derivations of proper names, such as englanti (English language) and englantilainen (English or Englishman or Englishwoman), are not treated as proper names.

Capitalizing almost every word in a title of a work, which is common in English (e.g., “On the Origin of Species”), is definitely incorrect in Finnish. Capitalizing words for emphasis, as in “Very Important” (Hyvin Tärkeää) is not normal in Finnish and may make a very childish impression.

If text is written in all upper case, care should be taken to make sure that ä and ö are capitalized, too.

For business documents, it is a common requirement that some words be written in all upper case. Typically, the words are company or product names or terms used in a contract, such as COMPANY and CUSTOMER. Such style has traditionally not been used in Finnish, and language authorities recommend against it, but it has become increasingly common.

Collation and sorting

The standard alphabetic order in Finnish is A B C D E F G H I J K L M N O P Q R S (Š) T U V (W) X Y Z (Ž) Å Ä Ö. Letters in parentheses are treated as equivalent to the preceding letter. However, it is increasingly common and now standard to treat W as a letter of its own, placed after V.

Sorting algorithms designed for English do not sort Finnish words correctly, since they treat Å, Ä, and Ö as variants of A and O, rather than as separate letters at the end of the alphabet. This may require careful manual corrections. On the other hand, sorting tailored for Finnish often treats W as a variant of V, instead of applying the modern approach.

Punctuation

Finnish uses symmetric quotation marks: ”tekstiä” and (within a quotation) ’tekstiä’. The opening and closing mark are identical and correspond to the closing mark as used in English, e.g. “text” or ‘text’.

Lack of he/she distinction

Finnish has no separate male or female pronoun. The same pronoun hän is used for both sexes. This may cause unintended ambiguity in translations. A common technique to avoid it is to use people’s names instead of pronouns when needed.

Titles of people

In titles used before names, herra corresponds to “Mr.”, rouva corresponds to “Mrs.”, and neiti corresponds to “Miss”. There is no word corresponding to “Ms.”, but the use of rouva and neiti is regarded as outdated by many. This is somewhat awkward, and it might be safest to use rouva for all adult women. However, such titles are often avoided in Finnish; a name might be used without a title. Alternatively, a title describing occupation, education, or position can be used, e.g. johtaja Virtanen (johtaja means director).

When prompting for personal information, if a title is needed, it is thus best read as free text input if possible. A menu with alternatives corresponding to Mr./Mrs./Miss may be regarded as old-fashioned, and the alternative “Ms.” is untranslateable.

Localization data

Nowadays software industry uses extensively the localization data compiled in CLDR, Common Locale Data Repository, cldr.unicode.org. It is based on joint effort of interested parties and directed by the Unicode Consortium.

To illustrate the idea, consider the localization of a computer program that contains a menu for selecting a country, among all countries of the world. Using CLDR, this can often be fully automated so that an extract of CLDR data is made available to the program, and it can then display any country name in any language included in CLDR. Even when this is not possible, it is surely simpler and more reliable to manually copy country names from CLDR into the program than to have someone try and find the names from various sources.

CLDR can help to handle many of the data presentation issues mentioned in section Notational conventions in Finnish of this book, among other things.

The extent and reliability of data in CLDR varies considerably by language. For Finnish, the data is extensive and generally reliable, and it has been composed by the national Kotoistus activity funded by the Ministry of Education. In addition to data available in CLDR, in the defined database format, there are also some prose documents that describe relevant data, in Finnish, at kotoistus.fi/suositukset. It currently includes recommended names for languages, writing systems, countries and geographic areas, and currencies.


© 2015, 2025, 2026 Jukka K. Korpela, jukkakk@gmail.com. This book was last updated February 18, 2026.