The need for a unified approach to different forms of verbal communication becomes more and more important as the low-level technical facilities, such as communication line speeds and coverage, are continuously improved. On the other hand, an IAL must be designed so that the needs of different modes of communications are taken into account. This means, in particular, that it must be possible to transform speech to text and vice versa with fast algorithms. In general, the tractability of a language by computer programs is usually compatible with the ease of learning it by human beings, and the differences of man and computer views are mainly esthetic.
This article discusses the impact of computers and computer-aided communication on international artificial language design. Ultimately, it aims at a truely unified language to be used between us humans, between us and computers, and between computers. There is nothing inherently unrealistic in this approach. On the contrary, the multitude of languages is becoming a serious restriction on the advancement of techonology. If each new computer application is controlled using a different language, often clumsily designed, the advancement of technology and economics is seriously impoverished.
Language is a thing which is intimately connected with the deepest feelings and motives of human beings. Real international languages, used routinely in communication, such as Greek, Latin, French, Russian, and English, each in their own time and area, have gained their position through some form of imperialism or at least economic or political dominance. On the other hand, the language of previous colonialists might be regarded both as "natural", due to historical reasons, and as neutral, due to its not being the language of any of the competing tribes.
In addition to the varying political and social constellations, the position of a widely used language is strengthened by the feature of human behavior of using a language used by "everybody else". A language needs some sort of critical mass to survive, and still larger critical mass to conquer the world. In short, people study languages because they expect other people to use them. This fact favors commonly used languages, and a designed language is initially not used at all.
It is well-known that the expanding use of computers and networking tends to enforce the position of the English language. And one of the well-known problems related to this is that people with English as their native language use it naturally, with all the richness of idioms and phrases and delicate semantic differences. On the other hand, other people may unconsciously use the descriptiveness of their own native language by just changing the words into English, resulting in something which hardly anyone can understand correctly. This causes a lot of problems, but there is very little we can do about it. We can hardly expect suggestions like Basic English gain much acceptance. (On the other hand, in an artificial language subsetting would meet much less resistance.)
Thus there seems to be very little space left for an IAL. The picture changes, however, when we consider how human communication is changing and how it becomes intimately connected with communication with computers.
Therefore, languages in the a very broad sense are gaining importance. This applies to programming languages and command languages, used by humans to have control over computers, and various protocolos designed by humans for communication between computers or computer software elements.
Consider, for example, the World Wide Web. Under the level of various human languages used in Web documents - a very problematic area by the way - there are several languages involved:
<H1 ALIGN=CENTER>
Content-Type: application/octet-stream
H1 { color: blue }
Some specialists may say that HTTP and in particular TCP/IP are not languages but protocols. But any such protocol must be implemented using a language, and it is mostly just a theoretical question whether there is a difference between a protocol and the language. Admittedly, TCP/IP is not human-readable in the normal sense, and there are good efficiency reasons for this. On the other hand, we could define a canonical human-readable presentation of TCI/IP if needed. And to an increasing amount languages (protocols) used in communication between computer programs resemble simple human languages, partly because this makes occasional human monitoring easier
The languages used for communication between programs or between
programs and users have typically been designed on an ad hoc basis,
to suit the particular application or task. Consequently, each of
them has to be learned separately. To take a trivial example, the
very basic and simple expression for ending the use of a program
might be exit
in one and program
, bye
, end
, stop
, finish
, fin
, monitor
,
mon
, system
, sys
, or
quit
in some others.
If a language is used infrequently, the user tends to make a lot of mistakes even if he has once learned the language well. In particular, one tends to use expressions of a more frequently used language. This is a phenomenon which makes the small differences between languages so harmful. A programmer who is used to writing programs in a language where the equals sign (=) denotes comparison for equality can make serious errors, when he occasionally uses a language where it means assignment.
Some languages are, by their very purpose, used rather infrequently. For instance, the increase of junk messages on the Internet has made many people use E-mail filters. This typically means using programs like procmail with a powerful, compact control language. Probably a user does not change his E-mail filter very often. So when he decides to modify it, he has to switch to a cryptic language, perhaps making a trivial mistake which automatically deletes some messages instead of processing them as very important and urgent. It should be clear that a language used for operations like E-mail filtering should be natural-looking and redundant and well-known to the user. The only way to guarantee the familiarity is to have a language which is used for many other purposes as well, preferably daily. (Whether the filtering instructions are internally converted into a more compact notation for efficiency is a different thing.)
Evidently, it would increase the productivity of people and ease the design of new software if there were a common basis language. Defining a new protocol could be made simply by selecting a suitable (perhaps very limited) subset of that language and giving some specific semantic rules for interpreting some elements of it.
Just as there is a very large number of artificial languages suggested for human communication, there is a large number of programming languages, and only a small minority of them have gained significant popularity. Since programming languages may have very different areas of application and they might be have fundamentally different design criteria (e.g. simple interpreted languages versus languages to be processed by highly optimizing and parallelizing compilers) it is understandable that there are so many of them. They might be compared to protocol languages mentioned above. But programming languages also have irritating differences in details like the style of declaring variables, the various symbols used for an assignment operator, and different lexical rules for identifiers. By removing unnecessary notational variation we could make it easier to define and learn new languages and especially to switch between different languages as needed. This might perhaps be done in the framework of a common base language.
Thus, if any IAL is to gain enough popularity even in the circles of IAL enthusiasts, there must be some force which is strong enough to dictate a solution to the problem of criteria. The solution in sight is that the idea of a language for both machines and human beings is the only way of reaching wide adoption of an IAL.
The regularity principle would not be so important in "high-end" applications involving programs which handle the full language, since in them grammatical irregularities would be a small program compared with others. But it is important in "low-end" applications involving small restricted languages which must be processed using small resources only.
Thus, a universal language should have some basic formal languages embedded. Such formalisms would normally appear in written form only. On the other hand, it would be important to automatic processing (and useful to human readers to) to indicate a switch form normal language to a formalism and vice versa. More generally, the language should incorporate a metalanguage for expressing a switch from one language to another. Thus, for example, the beginning of a quotation from a natural language would imply an explicit specification of that language and an indication of the way in which the end of quotation is marked. Such notations would be very useful for relatively simple tools for language processing, too, such as hyphenating software and spelling checkers.
It should be clear from the form of a statement, without deep grammatical analysis, what its modality is. To take a trivial but practically important example, computer users very often get upset by messages from computer programs because they cannot distinguish severe error messages from purely informational notes. Some systems have their own conventions (like preceding all error messages by the ? character and warnings by the % character), but such private "standards" are not very useful to occasional users, and even experienced users have difficulties in analyzing which program has issued the message.
An explicit indication of modality is very useful in human communication, too, especially when there are cultural differences involved. Even if you understand a foreign language, you may not know the delicate ways in which it used e.g. to express requests using sentences which look purely indicative. In international contexts, it would be very useful to have a language in which modalities can and must always be clearly expressed. (This does not exclude the possibility of having a way of specifying a "global default" for modality, so that one can present e.g. a sequence of indicative statements without including a modality marker into each sentence.)
In human communication, such "protocol level" requests very often fail, partly because they do not belong to the system of the language. In communication between computers, there are techniques for negotiating a protocol and sending and processing protocol level messages. To take a very simple example, in some communications protocols a slow device may send an X-OFF character to request suspension of sending and an X-ON character to tell that sending can be resumed. Similarly, there are protocols for requesting resending in the case of transmission errors - something that we should always be prepared for in any communication. The existence of such methods does not make redundancy unnecessary, of course, since they can basically deal with detected errors, not detect errors. Moreover, a robust language should have well-defined error recovery points which allow processing to continue in some meaningful way in spite of previous errors which cannot be resolved. (For example, a compiler for the Pascal programming is written so that if serious errors are detected, input is skipped e.g. up to next semicolon, at which processing is resumed. This allows most of the program to be checked syntactically.)
In real-time communication between people but using networked computers as tools, special indications and abbreviations are often used to denote e.g. the end of one person's statements for the moment ("over") and suggesting or accepting end of entire communication ("over and out"). Such indications would be extremely helpful in all communication, especially in international contexts where things like delicate choice of expressions or tones of voice cannot be used reliably to deduce such things.
Ideally, a protocol level statement should be easily distinguishable from normal statements by its form, to allow adequate and fast processing of protocol level requests. Normally the first word (or morpheme) of a message should indicate its role in this sense, but for human communication something even more distinctive might be needed, such as the appearance of a sound which does not occur in the language otherwise.
Extension mechanisms are also needed for defining special abbreviations and phrases for use in some restricted area of communication. Such special glossaries would normally be made publicly available, and normal communication would begin with "headings" which explicitly refer to such glossaries. For example, an article would begin with "headings" (in the protocol sense) specifying the glossaries assumed, in a specific order. This would solve the frequent problem of abbreviations (and other terms) being ambiguous or practically undefined, since the user may not have any idea of where to look for definitions.
As a simple special case of extension mechanism, the language should have a method for literal borrowing of names and other words from other languages. They should be used for casual use only and so that the origin language is indicated. For more permanent use, such as for commonly used terms, extension mechanisms internal to the language should be used.
Jukka Korpela
June 17th, 1997