Will Computers and the Internet make an IAL Necessary?
Abstract
The development and use of computers and the Internet, for
communication between people, between computers, and between
human beings and computers, have finally made it realistic
to create an international artificial language for common use.
The situation calls for a language which can serve a wide variety
of purposes, which often involve non-human agents as well.
The need for a unified approach to different forms of verbal communication
becomes more and more important as the low-level technical facilities,
such as communication line speeds and coverage,
are continuously improved. On the other hand, an IAL must be
designed so that the needs of different modes of communications
are taken into account. This means, in particular, that it must
be possible to transform speech to text and vice versa with
fast algorithms. In general, the tractability of a language by
computer programs is usually compatible with the ease of learning
it by human beings, and the differences of man and computer views
are mainly esthetic.
Preface
Traditionally, international artificial languages have been designed
for human communication, but they have reached little popularity.
The most popular of them, Esperanto, can count at most a few
million speakers - less than a thousandth of mankind. On the other
hand, there are several artificial languages which are actually used very
widely, often exclusively in some areas of communication. I am
referring to languages of man-machine communication, such as
command languages of computer operating systems, and languages
of machine-machine communication, such as the so-called protocols
used in the Internet.
This article discusses the impact of computers and computer-aided
communication on international artificial language design.
Ultimately, it aims at a truely unified language to be used between
us humans, between us and computers, and between computers. There is
nothing inherently unrealistic in this approach. On the contrary,
the multitude of languages is becoming a serious restriction on the
advancement of techonology. If each new computer application is
controlled using a different language, often clumsily designed,
the advancement of technology and economics is seriously impoverished.
On men and machines
Why the "IAL movement" has failed
It is very easy to find good reasons for all mankind adopting a
neutral international language for communication betweem peoples.
Advocates of particular IALs often present such reasons excellently.
However, it is irrational to assume that people are rational, at
least in issues like this.
Language is a thing which is intimately connected with the deepest
feelings and motives of human beings. Real international languages,
used routinely in communication, such as Greek, Latin, French, Russian,
and English, each in their own time and area, have gained their position
through some form of imperialism or at least economic or political dominance.
On the other hand, the language of previous colonialists
might be regarded both as "natural", due to
historical reasons, and as neutral, due to its not being the language
of any of the competing tribes.
In addition to the varying political and social constellations, the
position of a widely used language is strengthened by
the feature of human behavior of using a language used by
"everybody else". A language needs some sort of critical mass to
survive, and still larger critical mass to conquer the world.
In short, people study languages because they expect other people
to use them. This fact favors commonly used languages, and a
designed language is initially not used at all.
It is well-known that the expanding use of computers and networking
tends to enforce the position of the English language. And one of
the well-known problems related to this is that people with English
as their native language use it naturally, with all the richness of
idioms and phrases and delicate semantic differences. On the other
hand, other people may unconsciously use the descriptiveness of their
own native language by just changing the words into English,
resulting in something which hardly anyone can understand correctly.
This causes a lot of problems, but there is very little we can do about it.
We can hardly expect suggestions like Basic English gain much acceptance.
(On the other hand, in an artificial language subsetting would meet
much less resistance.)
Thus there seems to be very little space left for an IAL.
The picture changes, however, when we consider how human
communication is changing and how it becomes intimately
connected with communication with computers.
The importance of taking computers into account
The efficient use of computers, from the human point of view,
obviouly requires that human being be able
to command computers to do things and computers to inform about
problems or other significant phenomena on completing such tasks.
Less obviously, but certainly significantly, computers must be
able to communicate with each other, if computers are to cooperate
to serve us or to provide us a means of communication.
Therefore, languages in the a very broad sense are gaining importance.
This applies to programming languages and command languages, used by
humans to have control over computers, and various protocolos designed
by humans for communication between computers or computer software
elements.
Consider, for example, the World Wide Web. Under the level of various
human languages used in Web documents - a very problematic area by the
way - there are several languages involved:
- The document description language, HTML, with expressions like
- The HTTP language used between Web servers and Web browsers when
requesting and transmitting documents. It contains expressions like
Content-Type: application/octet-stream
- The CSS1 language used to control the visible presentation of
documents, with expressions like
H1 { color: blue }
- The various user interfaces of Web browsers, with commands,
options, error messages etc.
- The TCP/IP language used in the low-level communication between
computers.
One may find it difficult to accept that
user interfaces are languages, but this
is based on a misunderstanding. Although user interfaces have become
more graphical, they involve verbal expressions as well, and these
expressions can be quite complicated; moreover, icons and other
visual symbols can form a language, too.
Some specialists may say that HTTP and
in particular TCP/IP are not languages but protocols. But any
such protocol must be implemented using a language, and it is mostly
just a theoretical question whether there is a difference between a
protocol and the language. Admittedly, TCP/IP is not human-readable
in the normal sense, and there are good efficiency reasons for this.
On the other hand, we could define a canonical human-readable
presentation of TCI/IP if needed. And to an increasing amount
languages (protocols) used in communication between computer programs
resemble simple human languages, partly because this makes occasional
human monitoring easier
The languages used for communication between programs or between
programs and users have typically been designed on an ad hoc basis,
to suit the particular application or task. Consequently, each of
them has to be learned separately. To take a trivial example, the
very basic and simple expression for ending the use of a program
might be exit in one program, bye, end, stop, finish, fin, monitor,
mon, system, sys, or quit in some others.
If a language is used infrequently, the user tends to make a lot
of mistakes even if he has once learned the language well.
In particular, one tends to use expressions of a more frequently
used language. This is a phenomenon which makes the small differences
between languages so harmful. A programmer who is used to writing
programs in a language where the equals sign (=) denotes comparison
for equality can make serious errors, when he occasionally uses a
language where it means assignment.
Some languages are, by their very purpose, used rather infrequently.
For instance, the increase of junk messages on the Internet has made
many people use E-mail filters. This typically means using programs
like procmail with a powerful, compact control language. Probably
a user does not change his E-mail filter very often. So when he
decides to modify it, he has to switch to a cryptic language,
perhaps making a trivial mistake which automatically deletes some
messages instead of processing them as very important and urgent.
It should be clear that a language used for operations like E-mail
filtering should be natural-looking and redundant and well-known to
the user. The only way to guarantee the familiarity is to have a
language which is used for many other purposes as well, preferably
daily. (Whether the filtering instructions are internally converted
into a more compact notation for efficiency is a different thing.)
Evidently, it would increase the productivity of people and ease the
design of new software if there were a common basis language.
Defining a new protocol could be made simply by selecting a suitable
(perhaps very limited) subset of that language and giving some
specific semantic rules for interpreting some elements of it.
Just as there is a very large number of artificial languages suggested
for human communication, there is a large number of programming languages,
and only a small minority of them have gained significant popularity.
Since programming languages may have very different areas of application
and they might be have fundamentally different design criteria
(e.g. simple interpreted languages versus languages to be processed by
highly optimizing and parallelizing compilers)
it is understandable that there are so many of them. They might be
compared to protocol languages mentioned above. But programming languages
also have irritating differences in details like the style of declaring
variables, the various symbols used for an assignment operator, and
different lexical rules for identifiers. By removing unnecessary
notational variation we could make it easier to define and learn
new languages and especially to switch between different languages
as needed. This might perhaps be done in the framework of a common
base language.
The design criteria
The need for criteria on criteria
A very large number of design criteria for artificial languages
have been proposed. They include principles which have contradictory
implications, and this is one of the reasons for the failure of IAL
movement. For example, resemblance to existing languages (typically
Romance and other European languages) is largely incompatible with
two other commonly suggested principles: regularity and cultural
neutrality. Since advocates of IAL are typically enthusiasts, they
are seldom willing to make compromises. They think they have very
good grounds for their own design criteria, and indeed they do,
but they fail to understand why others have their own convictions.
Thus, if any IAL is to gain enough popularity even in the circles of
IAL enthusiasts, there must be some force which is strong enough to
dictate a solution to the problem of criteria. The solution in sight
is that the idea of a language for both machines and human beings
is the only way of reaching wide adoption of an IAL.
Regularity
One classical problem in IAL design must be solved in favor of
regularity. This means in particular that resemblance to the large
Latin-based vocabulary in many natural languages cannot be achieved
by any means which would imply the adoption of the irregularities
in Latin word declination and derivation.
The regularity principle would not be so important in "high-end"
applications involving programs which handle the full language,
since in them grammatical irregularities would be a small program
compared with others. But it is important in "low-end" applications
involving small restricted languages which must be processed using
small resources only.
Emnbedding of formalisms
The common basis language should contain, in addition verbal phrases,
different formalisms needed in computing, formal logic, and mathematics.
Some of these could exist in several presentations, such as normal
mathematical notation and its linearized variant, but they should be
algorithmically convertible to each other. Currently even very simple
languages like regular expressions exist in a multitude of notations,
so that a student of computing may have to learn regexps a dozen times,
each time on a different course using different notations, and additionally
new notations used actually in computer programs.
Thus, a universal language should have some basic formal languages
embedded. Such formalisms would normally appear in written form only.
On the other hand, it would be important to automatic processing
(and useful to human readers to) to indicate a switch form normal
language to a formalism and vice versa. More generally, the language
should incorporate a metalanguage for expressing a switch from one
language to another. Thus, for example, the beginning of a quotation
from a natural language would imply an explicit specification of that
language and an indication of the way in which the end of quotation
is marked. Such notations would be very useful for relatively simple
tools for language processing, too, such as hyphenating software and
spelling checkers.
Modalities
Considering the nature of statements in communication protocols,
programming languages, etc, it is evident that a common base language
should have modality as a very essential category. By modality I refer
here to the roles of statements as imperative, descriptive, narrative,
declarative, etc. Using modes of verbs is one way of expressing modalities
in natural languages, but it is usually a very coarse way. For instance,
the indicative mode of verbs is typically used for both factual claims
and moral evaluations as well as predictions and postulates.
It should be clear from the form of a statement,
without deep grammatical analysis, what its modality is. To take a
trivial but practically important example, computer users very often
get upset by messages from computer programs because they cannot
distinguish severe error messages from purely informational notes.
Some systems have their own conventions (like preceding all error
messages by the ? character and warnings by the % character), but
such private "standards" are not very useful to occasional users,
and even experienced users have difficulties in analyzing which
program has issued the message.
An explicit indication of modality is very useful in human communication,
too, especially when there are cultural differences involved. Even if
you understand a foreign language, you may not know the delicate ways
in which it used e.g. to express requests using sentences which look
purely indicative. In international contexts, it would be very useful
to have a language in which modalities can and must always be clearly
expressed. (This does not exclude the possibility of having a way of
specifying a "global default" for modality, so that one can present
e.g. a sequence of indicative statements without including a modality
marker into each sentence.)
The overall communications protocol
Verbal communication consists of "statements" in the broad sense.
There are, however, expressions relating to the communication process
rather than individual statements and their modalities and meanings.
For instance, in spoken communication between two people, the one who
is listening may throw in some attempt to interrupt the speaker for
some reason or another, such as to request repeating a statement
which was not heard, to request "changing the direction" of communication
or to request temporary suspension of communication (e.g. on telephone
when some event requires immediate attention). On the other hand, the
sender may need similar tools - in the simplest case, to "delete" a
word or a sentence after having made a mistake.
In human communication, such "protocol level" requests very often fail,
partly because they do not belong to the system of the language.
In communication between computers, there are techniques for
negotiating a protocol and sending and processing protocol level messages.
To take a very simple example, in some communications protocols a slow
device may send an X-OFF character to request suspension of sending and
an X-ON character to tell that sending can be resumed. Similarly, there
are protocols for requesting resending in the case of transmission errors -
something that we should always be prepared for in any communication. The
existence of such methods does not make redundancy unnecessary, of
course, since they can basically deal with detected errors, not detect
errors. Moreover, a robust language should have well-defined error
recovery points which allow processing to continue in some meaningful
way in spite of previous errors which cannot be resolved. (For example,
a compiler for the Pascal programming is written so that if serious
errors are detected, input is skipped e.g. up to next semicolon,
at which processing is resumed. This allows most of the program to
be checked syntactically.)
In real-time communication between people but using networked computers
as tools, special indications and abbreviations are often used to denote
e.g. the end of one person's statements for the moment ("over") and
suggesting or accepting end of entire communication ("over and out").
Such indications would be extremely helpful in all communication,
especially in international contexts where things like delicate choice
of expressions or tones of voice cannot be used reliably to deduce
such things.
Ideally, a protocol level statement should be easily distinguishable from
normal statements by its form, to allow adequate and fast processing of
protocol level requests. Normally the first word (or morpheme) of a
message should indicate its role in this sense, but for human communication
something even more distinctive might be needed, such as the appearance
of a sound which does not occur in the language otherwise.
Extension mechanisms
Since a common base language would be used for an extremely wide range
of purposes, it is important that it is flexible and adjustable. This
of course conflicts with the very idea of universality. The solution
is to embed language extension tools into the language itself.
For instance, a language might include a method of defining new
derivational suffixes so that the rules are given in the language
itself. Similarly it could contain tools for subsetting, e.g. tools
for defining a restricted set of words. (Such subsets could be very
useful in normal human communication, too. For example, at a bridge
board one could and perhaps should live with a very small vocabulary,
containing a few dozens of normal words and a few dozens of bridge
terms.) Phrases could be defined, too: the basic definition of a
language could assign meanings to words in "normal" contexts only,
giving freedom to define various symbolic meanings for various purposes.
Extension mechanisms are also needed for defining special abbreviations
and phrases for use in some restricted area of communication. Such
special glossaries would normally be made publicly available, and
normal communication would begin with "headings" which explicitly
refer to such glossaries. For example, an article would begin with
"headings" (in the protocol sense) specifying the glossaries assumed,
in a specific order. This would solve the frequent problem of abbreviations
(and other terms) being ambiguous or practically undefined, since the user
may not have any idea of where to look for definitions.
As a simple special case of extension mechanism, the language should
have a method for literal borrowing of names and other words from
other languages. They should be used for casual use only and so that
the origin language is indicated. For more permanent use, such as for
commonly used terms, extension mechanisms internal to the language
should be used.