Will Computers and the Internet make an IAL Necessary?

Abstract

The development and use of computers and the Internet, for communication between people, between computers, and between human beings and computers, have finally made it realistic to create an international artificial language for common use. The situation calls for a language which can serve a wide variety of purposes, which often involve non-human agents as well.

The need for a unified approach to different forms of verbal communication becomes more and more important as the low-level technical facilities, such as communication line speeds and coverage, are continuously improved. On the other hand, an IAL must be designed so that the needs of different modes of communications are taken into account. This means, in particular, that it must be possible to transform speech to text and vice versa with fast algorithms. In general, the tractability of a language by computer programs is usually compatible with the ease of learning it by human beings, and the differences of man and computer views are mainly esthetic.

Preface

Traditionally, international artificial languages have been designed for human communication, but they have reached little popularity. The most popular of them, Esperanto, can count at most a few million speakers - less than a thousandth of mankind. On the other hand, there are several artificial languages which are actually used very widely, often exclusively in some areas of communication. I am referring to languages of man-machine communication, such as command languages of computer operating systems, and languages of machine-machine communication, such as the so-called protocols used in the Internet.

This article discusses the impact of computers and computer-aided communication on international artificial language design. Ultimately, it aims at a truely unified language to be used between us humans, between us and computers, and between computers. There is nothing inherently unrealistic in this approach. On the contrary, the multitude of languages is becoming a serious restriction on the advancement of techonology. If each new computer application is controlled using a different language, often clumsily designed, the advancement of technology and economics is seriously impoverished.

On men and machines

Why the "IAL movement" has failed

It is very easy to find good reasons for all mankind adopting a neutral international language for communication betweem peoples. Advocates of particular IALs often present such reasons excellently. However, it is irrational to assume that people are rational, at least in issues like this.

Language is a thing which is intimately connected with the deepest feelings and motives of human beings. Real international languages, used routinely in communication, such as Greek, Latin, French, Russian, and English, each in their own time and area, have gained their position through some form of imperialism or at least economic or political dominance. On the other hand, the language of previous colonialists might be regarded both as "natural", due to historical reasons, and as neutral, due to its not being the language of any of the competing tribes.

In addition to the varying political and social constellations, the position of a widely used language is strengthened by the feature of human behavior of using a language used by "everybody else". A language needs some sort of critical mass to survive, and still larger critical mass to conquer the world. In short, people study languages because they expect other people to use them. This fact favors commonly used languages, and a designed language is initially not used at all.

It is well-known that the expanding use of computers and networking tends to enforce the position of the English language. And one of the well-known problems related to this is that people with English as their native language use it naturally, with all the richness of idioms and phrases and delicate semantic differences. On the other hand, other people may unconsciously use the descriptiveness of their own native language by just changing the words into English, resulting in something which hardly anyone can understand correctly. This causes a lot of problems, but there is very little we can do about it. We can hardly expect suggestions like Basic English gain much acceptance. (On the other hand, in an artificial language subsetting would meet much less resistance.)

Thus there seems to be very little space left for an IAL. The picture changes, however, when we consider how human communication is changing and how it becomes intimately connected with communication with computers.

The importance of taking computers into account

The efficient use of computers, from the human point of view, obviouly requires that human being be able to command computers to do things and computers to inform about problems or other significant phenomena on completing such tasks. Less obviously, but certainly significantly, computers must be able to communicate with each other, if computers are to cooperate to serve us or to provide us a means of communication.

Therefore, languages in the a very broad sense are gaining importance. This applies to programming languages and command languages, used by humans to have control over computers, and various protocolos designed by humans for communication between computers or computer software elements.

Consider, for example, the World Wide Web. Under the level of various human languages used in Web documents - a very problematic area by the way - there are several languages involved:

The document description language, HTML, with expressions like <H1 ALIGN=CENTER>
The HTTP language used between Web servers and Web browsers when requesting and transmitting documents. It contains expressions like Content-Type: application/octet-stream
The CSS1 language used to control the visible presentation of documents, with expressions like H1 { color: blue }
The various user interfaces of Web browsers, with commands, options, error messages etc.
The TCP/IP language used in the low-level communication between computers.

One may find it difficult to accept that user interfaces are languages, but this is based on a misunderstanding. Although user interfaces have become more graphical, they involve verbal expressions as well, and these expressions can be quite complicated; moreover, icons and other visual symbols can form a language, too.

Some specialists may say that HTTP and in particular TCP/IP are not languages but protocols. But any such protocol must be implemented using a language, and it is mostly just a theoretical question whether there is a difference between a protocol and the language. Admittedly, TCP/IP is not human-readable in the normal sense, and there are good efficiency reasons for this. On the other hand, we could define a canonical human-readable presentation of TCI/IP if needed. And to an increasing amount languages (protocols) used in communication between computer programs resemble simple human languages, partly because this makes occasional human monitoring easier

The languages used for communication between programs or between programs and users have typically been designed on an ad hoc basis, to suit the particular application or task. Consequently, each of them has to be learned separately. To take a trivial example, the very basic and simple expression for ending the use of a program might be exit in one and program, bye, end, stop, finish, fin, monitor, mon, system, sys, or quit in some others.

If a language is used infrequently, the user tends to make a lot of mistakes even if he has once learned the language well. In particular, one tends to use expressions of a more frequently used language. This is a phenomenon which makes the small differences between languages so harmful. A programmer who is used to writing programs in a language where the equals sign (=) denotes comparison for equality can make serious errors, when he occasionally uses a language where it means assignment.

Some languages are, by their very purpose, used rather infrequently. For instance, the increase of junk messages on the Internet has made many people use E-mail filters. This typically means using programs like procmail with a powerful, compact control language. Probably a user does not change his E-mail filter very often. So when he decides to modify it, he has to switch to a cryptic language, perhaps making a trivial mistake which automatically deletes some messages instead of processing them as very important and urgent. It should be clear that a language used for operations like E-mail filtering should be natural-looking and redundant and well-known to the user. The only way to guarantee the familiarity is to have a language which is used for many other purposes as well, preferably daily. (Whether the filtering instructions are internally converted into a more compact notation for efficiency is a different thing.)

Evidently, it would increase the productivity of people and ease the design of new software if there were a common basis language. Defining a new protocol could be made simply by selecting a suitable (perhaps very limited) subset of that language and giving some specific semantic rules for interpreting some elements of it.

Just as there is a very large number of artificial languages suggested for human communication, there is a large number of programming languages, and only a small minority of them have gained significant popularity. Since programming languages may have very different areas of application and they might be have fundamentally different design criteria (e.g. simple interpreted languages versus languages to be processed by highly optimizing and parallelizing compilers) it is understandable that there are so many of them. They might be compared to protocol languages mentioned above. But programming languages also have irritating differences in details like the style of declaring variables, the various symbols used for an assignment operator, and different lexical rules for identifiers. By removing unnecessary notational variation we could make it easier to define and learn new languages and especially to switch between different languages as needed. This might perhaps be done in the framework of a common base language.

The design criteria

The need for criteria on criteria

A very large number of design criteria for artificial languages have been proposed. They include principles which have contradictory implications, and this is one of the reasons for the failure of IAL movement. For example, resemblance to existing languages (typically Romance and other European languages) is largely incompatible with two other commonly suggested principles: regularity and cultural neutrality. Since advocates of IAL are typically enthusiasts, they are seldom willing to make compromises. They think they have very good grounds for their own design criteria, and indeed they do, but they fail to understand why others have their own convictions.

Thus, if any IAL is to gain enough popularity even in the circles of IAL enthusiasts, there must be some force which is strong enough to dictate a solution to the problem of criteria. The solution in sight is that the idea of a language for both machines and human beings is the only way of reaching wide adoption of an IAL.

Regularity

One classical problem in IAL design must be solved in favor of regularity. This means in particular that resemblance to the large Latin-based vocabulary in many natural languages cannot be achieved by any means which would imply the adoption of the irregularities in Latin word declination and derivation.

The regularity principle would not be so important in "high-end" applications involving programs which handle the full language, since in them grammatical irregularities would be a small program compared with others. But it is important in "low-end" applications involving small restricted languages which must be processed using small resources only.

Embedding of formalisms

The common basis language should contain, in addition verbal phrases, different formalisms needed in computing, formal logic, and mathematics. Some of these could exist in several presentations, such as normal mathematical notation and its linearized variant, but they should be algorithmically convertible to each other. Currently even very simple languages like regular expressions exist in a multitude of notations, so that a student of computing may have to learn regexps a dozen times, each time on a different course using different notations, and additionally new notations used actually in computer programs.

Thus, a universal language should have some basic formal languages embedded. Such formalisms would normally appear in written form only. On the other hand, it would be important to automatic processing (and useful to human readers to) to indicate a switch form normal language to a formalism and vice versa. More generally, the language should incorporate a metalanguage for expressing a switch from one language to another. Thus, for example, the beginning of a quotation from a natural language would imply an explicit specification of that language and an indication of the way in which the end of quotation is marked. Such notations would be very useful for relatively simple tools for language processing, too, such as hyphenating software and spelling checkers.

Modalities

Considering the nature of statements in communication protocols, programming languages, etc, it is evident that a common base language should have modality as a very essential category. By modality I refer here to the roles of statements as imperative, descriptive, narrative, declarative, etc. Using modes of verbs is one way of expressing modalities in natural languages, but it is usually a very coarse way. For instance, the indicative mode of verbs is typically used for both factual claims and moral evaluations as well as predictions and postulates.

It should be clear from the form of a statement, without deep grammatical analysis, what its modality is. To take a trivial but practically important example, computer users very often get upset by messages from computer programs because they cannot distinguish severe error messages from purely informational notes. Some systems have their own conventions (like preceding all error messages by the ? character and warnings by the % character), but such private "standards" are not very useful to occasional users, and even experienced users have difficulties in analyzing which program has issued the message.

An explicit indication of modality is very useful in human communication, too, especially when there are cultural differences involved. Even if you understand a foreign language, you may not know the delicate ways in which it used e.g. to express requests using sentences which look purely indicative. In international contexts, it would be very useful to have a language in which modalities can and must always be clearly expressed. (This does not exclude the possibility of having a way of specifying a "global default" for modality, so that one can present e.g. a sequence of indicative statements without including a modality marker into each sentence.)

The overall communications protocol

Verbal communication consists of "statements" in the broad sense. There are, however, expressions relating to the communication process rather than individual statements and their modalities and meanings. For instance, in spoken communication between two people, the one who is listening may throw in some attempt to interrupt the speaker for some reason or another, such as to request repeating a statement which was not heard, to request "changing the direction" of communication or to request temporary suspension of communication (e.g. on telephone when some event requires immediate attention). On the other hand, the sender may need similar tools - in the simplest case, to "delete" a word or a sentence after having made a mistake.

In human communication, such "protocol level" requests very often fail, partly because they do not belong to the system of the language. In communication between computers, there are techniques for negotiating a protocol and sending and processing protocol level messages. To take a very simple example, in some communications protocols a slow device may send an X-OFF character to request suspension of sending and an X-ON character to tell that sending can be resumed. Similarly, there are protocols for requesting resending in the case of transmission errors - something that we should always be prepared for in any communication. The existence of such methods does not make redundancy unnecessary, of course, since they can basically deal with detected errors, not detect errors. Moreover, a robust language should have well-defined error recovery points which allow processing to continue in some meaningful way in spite of previous errors which cannot be resolved. (For example, a compiler for the Pascal programming is written so that if serious errors are detected, input is skipped e.g. up to next semicolon, at which processing is resumed. This allows most of the program to be checked syntactically.)

In real-time communication between people but using networked computers as tools, special indications and abbreviations are often used to denote e.g. the end of one person's statements for the moment ("over") and suggesting or accepting end of entire communication ("over and out"). Such indications would be extremely helpful in all communication, especially in international contexts where things like delicate choice of expressions or tones of voice cannot be used reliably to deduce such things.

Ideally, a protocol level statement should be easily distinguishable from normal statements by its form, to allow adequate and fast processing of protocol level requests. Normally the first word (or morpheme) of a message should indicate its role in this sense, but for human communication something even more distinctive might be needed, such as the appearance of a sound which does not occur in the language otherwise.

Extension mechanisms

Since a common base language would be used for an extremely wide range of purposes, it is important that it is flexible and adjustable. This of course conflicts with the very idea of universality. The solution is to embed language extension tools into the language itself. For instance, a language might include a method of defining new derivational suffixes so that the rules are given in the language itself. Similarly it could contain tools for subsetting, e.g. tools for defining a restricted set of words. (Such subsets could be very useful in normal human communication, too. For example, at a bridge board one could and perhaps should live with a very small vocabulary, containing a few dozens of normal words and a few dozens of bridge terms.) Phrases could be defined, too: the basic definition of a language could assign meanings to words in "normal" contexts only, giving freedom to define various symbolic meanings for various purposes.

Extension mechanisms are also needed for defining special abbreviations and phrases for use in some restricted area of communication. Such special glossaries would normally be made publicly available, and normal communication would begin with "headings" which explicitly refer to such glossaries. For example, an article would begin with "headings" (in the protocol sense) specifying the glossaries assumed, in a specific order. This would solve the frequent problem of abbreviations (and other terms) being ambiguous or practically undefined, since the user may not have any idea of where to look for definitions.

As a simple special case of extension mechanism, the language should have a method for literal borrowing of names and other words from other languages. They should be used for casual use only and so that the origin language is indicated. For more permanent use, such as for commonly used terms, extension mechanisms internal to the language should be used.

Jukka Korpela
June 17th, 1997