Definition: a definition and an analysis

Content:

Definitions and terms are essential for any systematic knowledge. Their rigorousness and other properties vary a lot, from vague terms and implicit definitions to specialized terms with formalized definitions. After a basic survey of this field, this document discusses the significance of definitions and terms in information systems. In particular, as the amount of data in the Internet grows, it becomes crucial to be able to search for definitions only (or, in some cases, to exclude definitions in searches) and to extract definitions from large documents.

For this purpose, markup for terms and definitions is needed. Specifically, although the HTML markup notation has some elements for the purpose, a detailed analysis shows that they are of very limited usefulness. This document outlines a markup system for definitions which allows for a great variety in giving a definition, yet lets the author of a document clearly indicate what parts are definitions, what the general kind of the definition is, and what the definiendum and alternate names for it are.

Definition: a simple definition

According to WWWebster, the word definition has the following meanings (examples omitted from this quotation):

  1. an act of determining
    1. a statement expressing the essential nature of something
    2. a statement of the meaning of a word or word group or a sign or symbol
    3. a product of defining
  2. the action or process of defining
    1. the action or the power of describing, explaining, or making definite and clear
      1. clarity of visual presentation : distinctness of outline or detail
      2. clarity especially of musical sound in reproduction
    2. sharp demarcation of outlines or limits

We are here basically interested in the meaning 'a statement of the meaning of a word or word group or a sign or symbol'. Let us interpret it broadly so that the word statement is not limited to a verbal expression in a natural language; it could also be some formalized notation, such as a mathematical formula, or even graph or image, if it conveys a message about the meaning. Similarly, the expression being defined, often called definiendum, need not be verbal, though it usually is; it could be a sound, for example, to which some meaning (e.g., 'alarm signal') is assigned by convention, and a definition explicitly describes such a convention.

So a simple definition might be: a definition is a statement or other presentation that indicates what a word or other expression refers to or otherwise means. The expression being defined is typically a noun or noun-like. But even a preposition could have a definition; then the definition would specify the grammatical role and meaning. The essential point is that a definition indicates what an expression is used for, e.g. to state (or claim) a fact, to express a feeling, or to make a promise, instead of actually using it for such purposes.

As an example, let us quote a WWWebster definition for the word term in a particular meaning relevant to our topic:

term
a word or expression that has a precise meaning in some uses or is peculiar to a science, art, profession, or subject

This is a typical "dictionary definition": it defines a word using a compact prose description. Note that in terminology work, we might prefer restricting the meaning to just 'a word or expression that has a precise meaning in a specific context'. And, for example, Min-Yen Kan's Multilingual and Monolingual Term Identification and Applications gives a more specific definition (which cites some sources):

What is a term?

Thus, different definitions can be given for different purposes, even for the same basic meaning. (Note that e.g. the word term has several essentially different meanings, too, such as in expressions like "in the short term" or "on good terms" or "the first term of the polynomial".)

Looking for definitions (or avoiding them)

Consider a printed book, or a Web page, or an oral presentation, where some definitions are given. Usually there is some reason to giving definitions. If you always defined all the words you use, you would hardly get anywhere; in fact, you would find yourself in a circle, defining the words you used in the definitions! In any communication, some words - in fact, most words - need to be assumed to be understood without giving definitions for them. When you do define something, you do it normally because you assume that the word or symbol is not well enough known to the audience, or might have a different meaning to different people. And if you actually include a definition, instead of just referring to one (e.g. by saying "see The Encyclopedia of Hypermystics for the definitions of terms used here"), it means that you regard it as essential for your presentation. (Admittedly, you might sometimes include definitions just for the readers' convenience or because you did not find any suitable definition to refer to.)

Thus, generally a definition is assumed to be particularly important to some people at least. If a reader somehow skips a definition, he might misunderstand, or might fail to understand, the rest of the text. And logically definitions are different from normal text. There are three basic ways of using a word or other symbol

  1. as referring to something, which is the normal use of words
  2. as a definiendum in a definition, for establishing a meaning which will be implied in future normal uses of the word
  3. as a linguistic or other object per se, e.g. when discussing the phonetic structure or etymology of a word or the graphic properties of an icon.

Due to the nature of our topic, the discussion above is simplified. In a natural language, not all words refer to things, of course. A word such as a preposition or interjection does not have a referential meaning, adjectives typically characterize things instead of referring to anything, etc.

Examples:

  1. There are four tigers in the zoo.
  2. The tiger, Panthera tigris is a large, striped Asian felid.
  3. The word "tiger" is probably of Iranian origin and related to the Avestan "tighra" 'pointed' and "tighri" 'arrow', perhaps so named from its quickness.

There is an obvious need for distinguishing the three different uses from each other visually (or aurally). Bolding, underlining, or other typographic methods for emphasizing are often used for a definiendum, and either quotation marks or italics are common indicating words as linguistic objects. And as the third example above illustrates, single quotes (apostrophes) are often used when giving the meaning of a (foreign) word. Appararently, different typographic or other methods are needed in different environments; for example, bolding is not always available, or bolding might be in other use (such as for highlighting important words in general). This calls for generic markup which indicates the logical roles of expressions and which can be mapped to various physical manifestations as needed. We will discuss this more in a separate section on markup for definitions.

Visual distinctions make it easier to a human reader to find the definitions when needed. One might also wish to skip definitions e.g. when the reader is very familiar with the topic area and thinks he knows the definitions. Then again, the author might have some "private" definitions, for coining new terms perhaps. This emphasizes the need to distinguishing between different kinds of definitions.

But any document is more than it looks like. Documents are processed using automated tools, such as search engines on the Web. Currently they process documents in relatively simple ways, mainly just extracting textual content and indexing it. This means that if you are looking for documents which define a word or abbreviation, instead of just using it, you're in trouble - especially if the word is a common one. For example, on Google, A simple search for documents containing the word term gave about 15 million hits. How could I find those that might give a definition for it?

And sometimes one might wish to exclude documents that contain a definition for a word. For example, if you wish to study how common a word has become, for inclusion into a dictionary perhaps, you could specifically look for documents that just use it, without a definition, since such usage indicates that the author regards the word as known to readers. And, in general, one might wish to exclude dictionaries from searches, unless you are specifically looking for definitions. The same applies other documents which mostly contain just definitions of some kind; and a document could be automatically classified as such, if definitions can be recognized.

If definitions were always presented using some markup that minimally indicates what the definiens is and which part of a document constitutes the definition, quite a few useful things could be done. For example, you could simply ask a program to find all Web documents containing a definition for a word you are interested and to extract the definitions for them. In fact, the growth of the Internet and the bulk of computer-readable documents will make efficient searching, extraction, etc., a necessity rather than just a nice thing.

Even within a single document or set of documents, definition markup could be very useful. A program (such as a text processing program or a Web browser) could be written so that it recognizes the definitions and displays them to the reader upon request. Technically, the program could recognize all occurrences of the expressions that have definitions in the document (or in an associated glossary), display them in a manner which means mild highlighting (say, slightly but noticeably different background color), and display the definition when the user clicks on such a word. If there is a simple way of writing the definitions, then general reading tools can be developed, and the same tool could be used in conjunction with different texts. It is crucial here that we have a concept of definition which is general enough, yet not arbitrary. If a definition defines a word by giving its equivalent in another language, such a tool would act as a simple but very easy to use reading aid in language studies and in reading texts in foreign languages. If a definition explains the meaning in simpler terms, the tool would make it possible to read relatively fluently even if the vocabulary of the text exceeds the skill level of the reader. And if the definitions are technical or scientific definitions, the tool would make the text hypertext without any explicit individual links (from each occurrence of a word to its definition). And, for greater usefulness, such a tool should be able to handle different kinds of definitions, so that the user, when clicking on a word, could request for a translation, or a technical definition, for example. This requires some standardized way of indicating the kind of a definition.

Definition = definiendum + definiens?

An example above illustrates a very compact form of a definition: "tighra" 'pointed'. A foreign word is given in quotes, and a definition is given as a translation to a language which is assumed to be more familiar to the reader. Here the definition simply consists of a definiendum followed by a definiens, i.e. an expression that gives the meaning of the word, with some punctuation.

The words definiendum and definiens are Latin participle forms of the verb definire 'to define', and they mean 'the one being defined, the one to be defined' and 'defining, the one that defines', respectively. By the way, the meaning of that verb was originally related to limiting, then to designating by limiting; the verb is derived from the noun finis 'boundary, limit, border, end'.

Similarly, a formula used for defining something usually has the form definiendum = definiens, e.g. F = ma (force equals mass times acceleration). (Most formulas are not definitions, of course, and it depends on the system of physics or other realm of knowledge which formulas are taken as definitions. A formula which is a definition in one system might be a derived one in another.)

Even a definition written as a normal sentence often consists of a definiendum = definiens pattern, just with a word like is or means instead of an equals sign. Example: A liger is a hybrid (cross) between a tiger and a lion.

But a simple definition could be written in a different form, and for stylistic reasons this is quite common. It is normal to write a new word after its definiens: A hybrid (cross) between a tiger and a lion is called a liger.

To complicate things further, the grammar of a language may require that the definiens appears in a declined form, not in its basic form. For example, in Finnish the latter definition would be: Tiikerin ja leijonan risteymää (hybridiä) sanotaan liikeriksi. (Here the definiendum appears in a form with a case suffix, liikeriksi, not in the base form liikeri.) This is an essential problem in computerized searches, of course, since simple search algorithms look for exact or partial matches only, not for a match between a base form (used in a search clause) and a declined form. Note that declination is not necessarily agglutinative (so that one just takes a base form and appends a suffix or adds a prefix) but may involve changes in the base too, as sometimes even in English: woman - women, mouse - mice.

A definition is hardly imaginable without some definiendum. But whether there is an (explicit) definiens depends on how the definition is written or otherwise presented. We could define that a definition is separable if it can be divided into a definiendum and a definiendum so that the rest of the definition is just some syntactic sugaring like "is", "=", "means", or "is called". For example, a typical dictionary entry, when viewed as a definition of a word, contains an explanation or a translation which can be regarded as the definiens, but it may well also contain usage examples which illustrate the meaning but can hardly be regarded as part of the definiens. On the other hand, an entry in a simple word list, with just words in a language and its equivalent(s) in another language, is a separable definition. - The next section will discuss a generalization to the separability concept.

A definition where the definiendum is followed by a genuine verb (as opposite to a copula like "is" or "means") should probably not be treated as separable.

Several meanings, or a combined definiens?

Can a definition contain several definienses? This is largely a matter of definition (of "definiens"), but I propose a negative answer. It is best to regard e.g. a list of translations for a word as the definiendum, rather than regard each of them as a definiendum. For example, no single-word Finnish translation of the English word "run" really constitutes a definition for it as a whole; on the other hand, in a particular context a word might have a fixed meaning that is accurately given by a single translation. To summarize, it is best to think that a definition has (at most) one definiens, but the definiens itself can be structured so that it contains alternatives.

The practical benefit is that the definiens can then be used as something that is associated with the definiendum in a simple way. For example, definiendum/definiens pairs can be extracted for definitions and tabulated. And, under some conditions, the definiens can be substituted for the definiendum. Consider, for example, a translator (a human being or a program) processing some text where a word occurs so that its meaning in that context cannot be resolved. It could then be replaced by its definiens from a general dictionary, suitably marked as not being part of translated text but as reflecting the translation problem. This could then be handled by other translators and by specialists that will be consulted, or it could even be left into the result as the last resort. The point here is that such processing needs to take a list of alternatives as a unit, as a "block", i.e. as a single definiens.

Thus, a definiens may have an internal structure. It could be just a set of alternative meanings, probably to be regarded as an ordered set, although the criteria for ordering may vary. The ordering would depend on the purpose and nature of the document; the meanings could be listed from oldest to newest, or from common to less common, or grouped together topically, or arbitrarily. The possibility of hierarchic ordering, which is common in large general dictionaries, suggest that the internal structure should be basically an ordered list of items which may themselves contain ordered list, i.e. something that more or less corresponds to the ol element in HTML.

In such an internal structure - as well as in the wording of definitions! - care must be taken to distinguish between multiple meanings and multiple explanations (for one meaning). For example, a definition could first explain a meaning, then rephrase it, or perhaps contain a plain English explanation, then a formal description.

Synonyms for the definiendum

What would be the natural interpretation of a definition with two or more definiendums (definienda)? Apparently that they are synonyms by definition, i.e. synonymous at least in the context where the definition applies.

Typical situations where it would be suitable to use several definiendums are:

In quite a few cases, it is questionable whether an "expansion" of an abbreviation should be regarded as a definiendum at all. We could define "Active Server Pages" and "ASP" as synonyms, but it's questionable whether "HTML" really has "Hypertext Markup Language" as its expansion and whether that expansion is something to be included into a definition as a definiens. And hardly anyone claims that "Beginner's All-purpose Symbolic Instruction Code" is or ever was a name for anything. It's just the string of words from which the language name BASIC was formed; or actually it may have happened that the "abbreviation" was invented first, then the "expansion". So when writing definitions for things that are commonly denoted by "abbreviations", we should consider whether the abbreviation is actually the name, and the only name, and an "expansion" is just an explanation of how the name was formed. In such a case, the "expansion", if included into a definition, should not be marked as definiendum.

The definition of separability then needs to be generalized so that a separable definition consists of one or more definiendums and a definiens, so that everything else inside the definition is just grammatical or notational "sugar".

Different kinds of definitions

There are various ways to classify definitions. See, for example, Meaning and Its Representation: An Introduction to Semantics by Shen Shanshan, especially section Definitional Techniques. Here we discuss some classifications and properties, in no particular order, as a preliminary consideration before a more systematic (and more technical, in a sense) approach in the next section.

The Aristotelian concept of real definitions as opposite to nominal definition, outlined and commented in some detail in Posterior Analytics by S. Mark Cohen, postulates the existence of some language-independent content behind some definitions. Briefly, a real definition states the "essence" of something, whereas a nominal definition just describes a language rule. Operationally, this might be seen as meaning that a real definition is translatable; a statement like "man is a rational animal" could be translated to another language normally, whereas a compact definition like "man 'Mensch" (which gives one meaning of "man" in another language) is a translation, in a sense. But it is probably best to limit the meaning of "definition" to "nominal definition" only. An Aristotelian "real definition" does not serve the purpose of indicating what an expression points to; rather, it tries to say something about the thing pointed to.

A stipulative definition is what we will later classify as having status "proposed": it suggests a new meaning to a word, or even a new word.

A lexical definition describes the actual use of words in a language. It is aimed at just documenting the meaning.

A precising definition tries to reduce the vagueness of a word, by restricting the meaning (in a particular context) to a subset of a general meaning. Quite often term definitions are of that kind: a common word is taken into technical use with a precise or at least specialized meaning. Sometimes a precising definition just draws a more or less arbitrary line; for example, the word "adult" as such is too vague for legislation purposes where a specific age limit (or other specific criterion) probably needs to be set.

A notational definition simply assigns some meaning to a symbol or a word, often in an ad hoc manner, e.g. for the purposes of one particular presentation.

A persuasive definition "is to engender a favorable or unfavorable attitude toward what i[s] denoted by the definiendum", to quote the Definitional Techniques document, which gives two examples:

"Abortion" means the ruthless murdering of innocent human beings.

"Abortion" means a safe and established surgical procedure whereby a woman is relieved of an unwanted burden.

Such "definitions" are very common, but in fact they are just propositions about things, presented in a somewhat definitional format. In the first one of the examples above, this is rather apparent. The second one mixes a genuine definition with a proposition. Note the ambiguous meanings of the words "means": it can denote a semantic relationship between an expression and the thing it denotes, but it can also express someone's idea of what that thing "really means", i.e. implies, causes, matters to someone. (Similar considerations apply to the word "is".)

Here we might stop and ask whether the distinction between definitions of expressions and propositions about things can and should be made explicit and absolute. At least for the purposes of science, education, and intelligent discussion, I think it can and should be made. But in actual usage, which is what logical analysis of and structured markup for documents should cover, too, we need to accept that a definition is not always a "pure" definition. However, for our purposes, a statement must have some of the essential characteristics of a definition in order to be considered as one. Returning to our examples, the first one should not be categorized as a definition whereas the second one should, though it is far from optimal as a definition. (It is, among other things, not a good lexical definition, since "abortion" need not involve surgery; it can be chemical too.) Naturally, this categorization takes no position on the abortion issue itself.

An extensional definition generally indicates the set of things that an expression refers to, i.e. a class that constitutes its extension. As a simple special case, an enumerative definition lists the alternatives; generally, the alternatives could refer to individuals or to sets. Another special case is a subset definition: it defines the extension as a subset of the extension of another expression, by imposing some restriction. Even a simple "genus and difference definition", like "ice means frozen water", can be viewed as a subset definition: among the observations of water, those which are in frozen state are called "ice".

An intensional definition assigns a meaning to word by indicating the properties that the expression implies. The distinction between extensional and intensional is far from absolute. An intensional definition implicitly defines an extension: the set of things which have the property.

An operational definition specifies a method for resolving whether the expression applies to something or not. The method could be an algorithmic calculation but usually it is some experimental procedure. A well-known and controversial definition of this kind is the definition that a human being is alive when brain activities can be measured in certain types of tests.

A recursive definition uses the definiendum in the definiens (direct recursion) or refers to a definition which refers back to this definition, perhaps through a long chain of references (indirect recursion). In definitions, direct recursion is rather rare, except in mathematics and related fields, whereas indirect recursion is very common; for example, a dictionary in one language defines most words just through other words.

An ostensive definition "defines by pointing". To answer a question of the form "what is X?" you could simply point at something. And we might, and should, generalize the concept of pointing here; anything that directly demonstrates something is an ostensive definition. A direct reference to an image, or sound, or movement, could be a simple definition; instead of describing verbally what a triangle is you might draw one. This is a superficially simple and old and important way of defining things.

But it isn't as simple as you might think. An ostensive definition alone leaves it to the recipient to figure out what the pointing really means. By saying "Kissa!" and pointing at an animal, do I mean that the word is the name of this individual, or of its species, or the color of her fur, or perhaps a word that denotes an animal in general? Thus, an ostensive definition generally requires some additional information, either as a separate verbal explanation or by the context (as e.g. in an illustrated dictionary).

Markup for definitions

Although the following discussion deals with markup, the principles can also be applied to other methods of expressing the structure of information. For example, they could be used as a basis for defining the structure of records in a data base of definitions. Instead of markup tags in the midst of text, specifying elements, elements would directly appear as fields and attributes as subfields in records.

In section Looking for definitions (or avoiding them) we discussed some reasons why some markup would be needed to distinguish definitions from other content. We also mentioned that minimal markup would delimit the definition as a whole and the definiens. This could be achieved using SGML-format (or, if you like, XML-format) markup like the following:

<definition>The <definiendum>tiger</definiendum>, <taxon>Panthera tigris</taxon>, is a large, striped Asian felid.</definition>

This hypothetical markup uses the taxon element, which has nothing to do with our basic topic. It's there partly to illustrate that definition markup would be used in conjunction with other markup. In hypertext, a good definition would contain hyperlinks for cross-references to other definitions and perhaps links to additional information such as encyclopedia articles and illustrative examples.

Should we also have markup for the definiens? Perhaps, but what is the definiens in each case? As we have noted, not all definitions are describable as simply consisting of a definiendum and a definiens, with nothing else but punctuation and grammar words like "is" added to it. But it might be useful to have the possibility of designating part of a definition as definiens. In our example, we might regard the scientific name as being the definiens (for some purposes) and the rest as just additional explanation:

<definition>The <definiendum>tiger<definiendum>, <definiens><taxon>Panthera tigris</taxon></definiens>, is a large, striped Asian felid.</definition>

This would make it possible to use the definiens as a "tooltip" text, for example: A program for reading a document on screen could collect all the definitions from the document and from auxiliary documents indicated as applicable glossaries. And it could let the user ask for definitions e.g. by clicking on a word and show them in a small popup window or in some specific area of the screen. The definiens, when given, could be given as a response, unless the user explicitly requests for a full definition. If the glossaries include multilingual dictionaries, this would create a very useful reading aid, too. The basic idea applies to speech-based user interfaces too: a program that reads a document might be interruptible, so that the user can ask the program to read a definition for a word before proceeding.

The discussion of multiple meanings suggested that a definiens could consist of an ordered list, possibly with nested lists inside, that indicates alternate meanings. In a markup language, the definiens element could be defined as having rather arbitrary content - say, any block or inline element, if the markup were added to a language basically similar to HTML. We would just need to add a semantic rule that says: if the definiens consists of an element which is a list, then the items of the list are interpreted as alternate meanings, and if a list item is a list, its items are alternates within an alternate, etc. (This idea might need reconsideration. Perhaps multiple definitions aren't organizable in such a simple way.)

It is largely a matter of judgement what part of a definition is regarded as the definiens. In the example above, the definiens could be made to extend to the end of the definition, making the definition separable. But it might be better to minimize the definiens. Automatic processing might thus create two different associations (e.g. via "tooltips") for an expression: a short compact definiens and the full definition, which might contain e.g. an illustrative picture.

Markup for definitions could also specify various general characteristics of each definition, to make it possible to process documents automatically in more advanced ways, e.g. ignoring some kinds of definitions. Technically, the characteristics could be specified using attributes (e.g., <definition status="common" kind="translation">) probably so that each attribute has a fixed set of possible values. Possible properties and values:

It might be useful to add a topic attribute that indicates the topic realm where the definition applies, such as general, biology, history etc. This would make it possible to search for definitions for a common word as used as a term in some speciality and to create topical glossaries. However, it would be very difficult define a standardized and widely useful categorization, and without a categorization (i.e. with free text values) a topic attribute would not help much. So this idea needs to be elaborated separately. (It might be possible to define a generic method of specifying a topic hierarchically and to start with defining a very coarse standardized top-level categorization, based e.g. on a study of categories used in dictionaries and grouping them to larger units. Even such a basic classification might turn out to be useful e.g. when limiting searches.)

Perhaps a formalism attribute should be present too, indicating which formal notations, such as SGML or BNF or regular expressions or mathematical formulas, are used in the definition. Perhaps this could be integrated into language, defining a possible range of its value that covers both human languages and some set of formal notations. Or perhaps not.

Probably all the attributes should be optional, in a general format of definitions. This would make it possible to authors to decide how detailed markup they use, and existing documents could be gradually converted to more structured format. Apparently, all attributes should have a special value like unspecified as the default value.

The kind attribute should be taken as specifying the nature of the primary definition given. For example, a multilingual dictionary entry would have kind="translation" even if it contains usage examples, illustrations, and other material as part of a definition.

Perhaps the definiendum should have an optional base attribute that specifies the base form of the word or phrase, in cases where the element content itself is in a declined form (e.g., <definiendum base="liikeri">liikeriksi</definiendum> Alternatively, it might be left to automatic linguistic analysis to deduce the base form. But since such an analysis would be rather complex (it could not be just a simple morphological analysis but would need to consider the grammatical context too), it would appear to be highly desirable to be able to indicate the base form in a simple manner. Software for making use of definitions would not need to be "language-aware" in any specific way.

On the implementation of the markup system

XML?

The markup system outlined above could easily be described in XML. However, contrary to widespread disinformation, XML is just a syntactic metanotation and does not solve the essential problems of universal markup. Anyone is free to use the notation system outlined here, in XML or otherwise, but what is really needed is the incorporation of such a system to a markup language with generic semantics - something that HTML was once meant to be.

A review of markup for definitions in HTML

Comparing with the markup system outlined above with the HTML markup that could be used for definitions shows that HTML wasn't particularly well designed in this respect.

The dl (definition list) element

In the first HTML specification, HTML 2.0, the only element related to definitions was the dl element. Although it is said to stand for "definition list", this was never honored much by authors. Rather, the more liberal interpretation "description list" was applied, generally because the visual appearance of dl elements in popular browsers looked suitable! The definition of dl in HTML 2.0 more or less invited that, since it described:

A definition list is a list of terms and corresponding definitions. Definition lists are typically formatted with the term flush-left and the definition, formatted paragraph style, indented after the term.

No matter how good the intentions were, this (or, rather, popularized versions thereof in tutorials) was read as saying that dl means a list formatted in a particular way. For example, lists of links with annotations below them are often presented that way, with the name of the linked title as dt ("definition term") and an explanation or annotation as dd ("definition data"). I even used such presentation in an old version of my document Resources on the English language. It's handy. And it's fundamentally and completely wrong when the original definition of dl is taken seriously. (Besides, the new version that does not use dl markup is better in visual appearance, too.)

For example, the W3C page on XML contains a "definition list" (dl) of translations, which begins as follows: with the list marked up as dl

Chinese (Simplified)
Chinese (Traditional)
French

Surely the list does not define the terms "Chinese (Simplified)", "Chinese (Traditional)", "French", etc.! Any automated analysis based on the assumption that dl is really used in its defined meaning, even by the organization that has issued the definition, would give nonsensical results.

And the HTML 4.0 definition for dl waters things down in a manner which effectively makes dl purely presentational, without actually admitting this. Still first using words like "definition list", it then switches to "a term and a description", then gives a few examples where definitions are given, then adds:

Another application of DL, for example, is for marking up dialogues, with each DT naming a speaker, and each DD containing his or her words.

So it's about "application" now: about using markup which is actually presentational, to get a particular visual effect, with no relation to definitions.

Thus, dl as definition markup is a lost battle. It is used for "effects" only so widely that any software that analyzes the existing bulk of HTML documents by extracting "definitions" from dl elements will not produce much useful. This isn't that bad, after all, since the dl element wasn't particularly well designed initially. It is seriously restrictive in two ways: it applies to separable definitions only, and it requires definitions to be grouped into lists. (A lone definition could technically be made a one-item list, but it would still be impossible to use the markup for inline definitions, like very short notes on meanings of words inside running text.)

The definiendum markup: dfn

In HTML 4.0, the dfn markup was introduced, with a fairly explicit semantic definition: "Indicates that this is the defining instance of the enclosed term".

The dfn element has hardly been abused much, so it might be considered as a good starting point for definition markup. However, it is useful for definiendum markup only, and there is no way to indicate what constitutes the definition.

But dfn hasn't been used much at all. Part of the reason is that Netscape 4 does not recognize dfn markup at all, and Internet Explorer uses rather poor default presentation for it: italics font, which means that it looks less prominent than strong, and it is in fact impossible to the human eye to see from the visual appearance what italics stands for in each case. Using style sheets, the situation can be improved under some conditions. For this document, and for my documents in general, I have written a style sheet that tries to make dfn elements "stand out" and do it in a manner which distinguishes them from normal emphasis or strong emphasis. The style sheet rule I use is simply

dfn { font-weight: bold;
      color : #066;
      background:transparent none;}

To summarize, dfn might be adopted as the name for a definiendum element, if compatibility with HTML is desired. Otherwise it is not of much use in practise, though you might use dfn in your documents in order to do all that can presently be done in favor of promoting markup for definitions Just remember that you cannot rely on having anything highlighted that way.

Implications on HTML authoring

In practical Web authoring at present, the use of dfn markup is not very useful, but it is still logical to use it for any definiendum in a definition. It is probably useful to have some style sheet rules that make it look more prominent, as outlined above.

Definition lists could be written using dl markup, but it is probably more practical to use other approaches. A simple one for short definitions is to make each definition a paragraph (p) element, with the definiendum marked up with dfn. To emphasize the structure of a sequence of definitions, as well as to get potentially better appearance, one could use a table, if the definitions are separable, i.e. the definiendums can be presented separately, in a column. Apparently, a cell containing a definiendum in such a context should be marked up as a header cell (th element); this has the desirable side-effect of making it look more prominent than normal text even on Netscape 4.

For terms with multiple definitions, you would just use several adjacent table rows, using the rowspan attribute in the first column to indicate that they are associated with the same term.

The visual presentation is probably more readable if the table has borders around cells. Alternatively one could put some vertical spacing between rows, but this is more difficult to achieve at present.

The following example, which presents the first few items in the Internet Security Glossary (RFC 2828), illustrates this approach:

Internet Security Glossary
termdefinition
3DES See: triple DES.
*-property (N) (Pronounced "star property".) See: "confinement property" under Bell-LaPadula Model.
ABA Guidelines (N) "American Bar Association (ABA) Digital Signature Guidelines" [ABA], a framework of legal principles for using digital signatures and digital certificates in electronic commerce.
Abstract Syntax Notation One (ASN.1) (N) A standard for describing data objects. [X680]
(C) OSI standards use ASN.1 to specify data formats for protocols. OSI defines functionality in layers. Information objects at higher layers are abstractly defined to be implemented with objects at lower layers. A higher layer may define transfers of abstract objects between computers, and a lower layer may define transfers concretely as strings of bits. Syntax is needed to define abstract objects, and encoding rules are needed to transform between abstract objects and bit strings. (See: Basic Encoding Rules.)
(C) In ASN.1, formal names are written without spaces, and separate words in a name are indicated by capitalizing the first letter of each word except the first word. For example, the name of a CRL is "certificateRevocationList".
ACC See: access control center.

The long way to go

The "modularization" of HTML would make it possible to specify a "definitions module". Although this might be useful for some special applications, it would be an exercise in futility for the fundamental purpose outlined above. I believe in the Renaissance of the HTML, with the original design goals formulated more maturely, and probably under a different name. I don't know whether it is a matter of years, decades, or centuries. This document was written basically to contribute to such development in a particular area; for my ideas on the big picture, see HTML in retrospect - what can we learn from the great success, and the great failure?

Since definition markup is something new, we can expect that authors will have difficulties in writing it consistently. Good tutorials, perhaps even systematic training, would be needed, both general and specialized. But first a good basic system needs to be developed, and this will mean some experimental implementations, so that a balance between ease of writing the markup and the richness of the information content.