Finland Shows Example in Localization

This article was published as article of the month at the e.finland.fi site in March 2005.

In a worldwide activity for making IT systems more user friendly through localization, the Finnish organization model has been presented as exemplary. The goal is to find a national consensus on how computers shall present data such as dates and times, currency amounts and countries in each user’s language. Major software companies participate in the activity, and they will implement the results rather soon.

Localized Software Makes Things Cosy

Information technology tends to use the English language and notational conventions of the English-speaking world, especially the United States. This has become an increasingly serious problem, as IT penetrates all areas of life and everyone has to face it, directly or indirectly.

This means, among other things, that we see computer-generated notations that deviate from our cultural heritage. We see “1.005” and have to find out, or even guess, whether it stands for a little more than 1, as in English notation, or one thousand five, as it traditionally means in Finnish. Even on official Web pages in Finnish we may encounter date notations like “3/4 2005” and need to guess whether it means 3rd of April or 4th of March.

Nowadays, both commercial and free software is often available in different language versions. This is often crucial to their popularity.

In some cases, the language of a program’s interface can even be changed “on the fly”, so that a simple command changes all texts from Finnish to Swedish, for example. This is important in a bilingual country with a large number of computers in public institutions, Internet cafés, schools, etc.

Online services, including search engines and e-commerce, need to be available in different languages as well. This is not just a matter of translating their fixed texts but also about presenting dates, prices, and other data extracted from databases, in the user’s language.

However, the quality of the translations has often been criticized, for good reasons. Part of this problem is that although the basic menus, instructions, and error messages have been translated, the computer-generated parts of the texts often use original English notations, e.g. for dates, or use incorrectly localized notations.

There ’s often a simple explanation: existing subprograms for printing things out have mostly been written for English language environment, and programmers use the subprograms instead of writing a lot of code of their own. Moreover, different vendors have often used different translations for the texts.

This is what the Common Locale Data Repository (CLDR) activity, by the worldwide Unicode Consortium, tries to address. It aims at agreed conventions for presenting certain types of data in a manner that corresponds to the user’s language preference, or cultural preferences in general. Ultimately, such data is to be built into basic computer software (such as widely used subprogram libraries) so that programmers will find it easy to create localized software.

This idea of localization is not new, but what’s new is that major software vendors and many other important organizations are seriously involved in the business. After all, if you are a software company that wants to compete in the international market by offering your products in different language versions, you don’t want to decide on names of countries or time zones in different languages, or how quotation marks are used in them, or similar things. You want to be able to use some existing data for it, prepared and agreed upon as widely as possible.

Major software vendors have allowed the use of their own localization data for comparison tables. This is an indication of their commitment, since it allows the public monitoring of the progress that the vendors make in adopting the data.

From the ordinary computer user’s point of view, this means that we can expect that software will use notations and names that are more familiar and acceptable to us. Moreover, the user may get more control over such presentation issues. In Finnish, localization is increasingly called “kotoistus” to express it. This interesting word is derived from “kotoinen” ‘cosy, homely, familiar’ and means ‘making things (more) homely’. In a sense, this is about domestication of computers.

Localization Needs to Be Agreed Upon

In the work conducted by the Unicode Consortium, the crucial question is how to produce localization definitions for each language so that they are widely accepted by people who speak that language. After all, localization is about making things familiar to people.

The definitions being prepared are mainly intended for use in computer systems, e.g. in selection menus, or in computer-generated texts such as error messages or database search results. However, as a side effect, the results will be useful in other areas as well. In particular, people who translate or edit texts will find lists of names for languages and territories handy especially when they deal with less known languages and areas. The very existence of a large database on matters such as varying number and currency notation formats will raise awareness of cultural differences and the need to deal with them in information technology.

In Finland, this work has been organized as a process that is open to all but not loose. Several key organizations have been especially invited to join the work and to create a coordinating group. Moreover, this work is carried out under the umbrella of the Research Institute for the Languages of Finland (RILF, Kotus), with special funding from the Ministry of Education. Yet, the working groups are open to all interested people, and the proposals created by them are published on the WWW for public review and comments. The goal is to make interested experts involved, to create national consensus, making everyone happy, and do this is reasonable time.

The first proposals were published for review in early February. They contain Finnish names for all countries and territories that have an assigned international code, as well as for all human languages that have such a code. Later other proposals have been added, e.g. on date and time notations.

The Finnish approach appears to work well, and the Unicode Consortium has presented it as an example or a model in the presentations of the worldwide activity.

Jukka K. Korpela
IT Generalist and Specialist
jkorpela@cs.tut.fi