Experiences about Britannica Online

This document has been preserved for historical reasons only. It describes my experiences with the paid version of Britannica Online in 1995, when I worked at Helsinki University of Technology. Quite a lot has changed after that. The links in this document probably don't work, mostly.

What is Britannica Online

Since this document is aimed not only to Encyclopędia Britannica Inc as feedback but also to various institutions considering subscription to Britannica Online, a short introduction to Britannica Online is provided.

Britannica Online is a network-accessible hypertext version of the well-known Encyclopędia Britannica. It also includes Merriam-Webster's Collegiate Dictionary and the Britannica Book of the Year. Britannica Online is copyrighted by Encyclopędia Britannica, Inc.

The idea of having a hypertext version of a high-quality encyclopędia on the net is very exciting. Consulting a large encyclopędia is something one would really like to do on a computer in a hypertext fashion, with fast access to information, computer-based searching tools, and hypertext links to follow. Such possibilities have existed for a few years in the form of encyclopędias on CD, but such an approach requires special equipment for the user and, more importantly, does not allow the information to be really up-to-date. In a network version, there is practically no delay between the time of updating the information by the information provider and the time of accessing the updated information by the user. It's all instantaneous. Moreover, a networked version can easily be linked to other sources of information on the network.

Technically Britannica Online, hereafter abbreviated BO, is a collection of files on the World Wide Web (WWW) system, accessible with any WWW browser such as Mosaic, Lynx, or Netscape. The files reside on a single server, www.eb.com. This of course implies limitations which may turn out to be difficult if (or when) the use of BO becomes extensive. It is not known whether BO is easily scaleable by creating mirror servers.

However, the accessibility is restricted by arrangements related to the commercial nature of BO. There is a freely accessible description of BO (with demos) at URL http://www.eb.com/ but access to BO itself must be based on a payed contract (subscription).

Subscription

The subscription information on the freely accessible pages tells that BO is now available by subscription to colleges and universities. Information about pricing policy, terms of subscription etc were obtained in a reasonable time by sending electronic mail to the announced address. The concepts used (such as 'full-time student equivalent') were not clear enough, so some exchange of mail messages was required.

However, it is quite odd that the pricing information is not on the publicly accessible pages. A potential customer would certainly like to have some idea of what a subscription costs.

The demos on the public pages only give a rough idea of what BO really is. For a genuine evaluation, more realistic use of BO is needed. Fortunately the company allowed a test (preview) period for Helsinki University of Technology where there was considerable interest in BO, both in the Computing Centre and in the Library as well as elsewhere. This document is mostly based on experiences from the test period. Most of the testing was made using a graphical WWW browser (X-Mosaic), but the service seems to work with text browsers (such as Lynx) quite well, too.

Getting started

Getting started with BO is very simple, provided that the user has some prior acquaintance with using WWW. One can get to BO itself via the public page, by selecting the link named Search Britannica Online there. It is also possible to select by directly the corresponding URL,
http://www.eb.com:180/cgi-bin/g/Articles/HTML/0/ebonline/http/eb.html?Mode=F
but the length of the URL makes this a bit awkward. (Moreover, one cannot give directly the URL as an argument to the command for starting a WWW browser under Unix, since the question is a special character in normal Unix shells; the argument must be quoted.)

Having entered BO itself one can of course put the document into one's hotlist, to make it easier to access it in later sessions.

The document mentioned above is a fill-in form with two possibilities:

The index search works excellently in general. If the user gives a word which one expects to find in an encyclopędia, then almost always a relevant item is returned. Working with BO is faster than with traditional encyclopędia lookup, and the information returned is often in a more suitable form than in a printed book, since one can follow the hyperlinks. However, the lack of figures and pictures is a serious limitation in many cases.

If one gives, for example, the word Finland, one gets a list of items relevant to Finland, and the first of them is a link to rather extensive information about the country. However, one gets a bit confused by the fact that it does not look like a normal encyclopędia entry. Basically the page contains a list of contents, from which one can select the items one is interested in. However, the first items on the page are titled

MICROPAEDIA
MACROPAEDIA
BRITANNICA BOOK OF THE YEAR 94
STATISTICAL INFORMATION: see BRITANNICA BOOK OF THE YEAR
and at this point the user probably wonders what Micropaedia and Macropaedia are. Thus, the user interface is not quite intuitive. The interface should at least be improved by explicit statements which recommend that the user consult the Micropaedia entry for a short description of the topic and the Macropaedia entry for a long description, and suggest that the user can pick up an interesting subtopic from the list that follows.

The text search example suggests that answering natural language (English) questions is supported, which probably means promising a bit too much. With some testing one can find out that the search is based on the words in the search string, not on full grammatical and semantic analysis.

An annoying feature is that if there is just one match in the search, it is not displayed directly. The user sees a message which says there was one match and contains a link to it. Thus a search which is in a sense as succesful as a search can be (exactly one match) has the frustrating effect of forcing the user to do something that could easily be done by a computer. Usually just a few seconds of user's time are wasted, but the psychological effect might be important.

More generally, there is insufficient information about the search methods. The search reports are rather technical, and they do not describe the logic of searches. Especially an advanced user with difficult questions would appreciate a good explanation of the search logic, both in order to understand the limitations and in order to formulate the search expressions in a better way. For instance, is the order of words in a search string significant?

As regards to search reports, they emphasize the internal technical issues, not the user view. In particular, they report, for each word in the query, the technical processing of the words, instead of simply listing out the relevant words, ie those words that were actually used, as opposite to words which are effectively treated as grammatical noise.

Performance

The overall performance seems to be reasonable, but it is difficult to say how things will change when the use of BO grows. For high-quality service, some distributed system will be necessary. People often wish to do some quick checking, just picking up an encyclopędia entry, and this should happen fast, faster than taking a look at an encyclopędia on one's bookshelf. And on the hand, heavy use of BO such as hunting for a large amount information sets even higher requirements on performance.

Typically it seems to take 5 - 10 seconds to get a page from BO. Occasionally there longer delays. Just following links is not substantially faster than getting the result of a search. This obviously means that the search methods have been implemented efficiently.

Sometimes a search fails with a message reporting that the server could not be accessed. This means that the WWW client cannot connect to the server fast enough, ie a timeout occurs. This problem is related to WWW as a whole and normally caused by an overload of communication lines or the server. Howeve, it emphasizes the need for a distributed implementation of BO.

Getting deeper

It is far from obvious what exactly is the information contents of BO. The About Britannica Online page says:
Britannica Online is a fully searchable and browsable collection of authoritative references, including Britannica's full encyclopędic database, Merriam-Webster's Collegiate Dictionary (Tenth Edition), the Britannica Book of the Year, and more.
The phrase 'and more' suggests that the list is not exhaustive. Moreover, a normal user probably does not know what is 'Britannica's full encyclopędic database' and what are Propaedia, Micropaedia and Macropaedia which occur in documents in BO. Some conceptual model, an orientation basis, should be provided to assist people in understanding what there is in BO. It actually exists but under the somewhat misleading name Databases in Britannica Online, and it should be extended with more pragmatic issues such as how the databases relate to each other and which of them are suggested for various purposes.

By the way, the Book of the Year is the 1994 edition, describing events of 1993. In March 1995, one might expect to find (also) the Book of the Year 1995, or at least information about the date of its availability.

The basic search page does not explicitly state what is the collection of information (database) from which searches are made. Each user has to figure out by himself that probably just the encyclopędic databases are searched - but exactly what databases? To make a dictionary search, one has to explicitly select the dictionary, for example. This is possibly a good idea, since a user who really wants encyclopędic information may prefer getting a failure message to getting a dictionary entry containing no useful information.

However, it would be useful to provide, as an option, a search through all available databases, either sequentially or in parallel (in a manner similar to multithreaded query gateway).

There is a fundamental flaw in the manner in which search reports are expressed. If one searches, for example, for the word preface from the basic search form (with index search), the report says

Britannica Online contains 1 item relevant to 'preface'.
and if the search is made from the dictionary, the report says
Britannica Online contains 2 items relevant to 'preface'. 
although the the three items are all disjoint. This also implies that the system may report failure (0 items) even if BO does contain relevant items - in another database. Thus, the reports should explicitly refer to a particular database (or set of databases), not to Britannica Online as a whole. The information returned to the user should always indicate the database from which the information has been extracted. In addition to assisting the user, this would make it much easier to report errors in the information contents. (For instance, such reports in this document are not necessarily accurate enough, since I do not always know the database.)

The implementation of references (links) looks silly. For example:

WWW: see World Weather Watch [Cross ref]
Here the words in brackets act as WWW links, instead of the much more natural and WWW-like style of making the words or expressions themselves into links. Assumably there have been some technical difficulties in converting the book form of Encyclopędia Britannica into hypertext.

Anyway, it is not comfortable to read something like

The order [Index] Anseriformes includes the well-known [Index] ducks, [Index] 
geese, and [Index] swans (family [Index] Anatidae) and the little-known [Index]
screamers (family Anhimidae).
A reader who does not, for the moment, care about links but just about the text itself finds it difficult to read. And when one wants to follow links, one often does not know what they relate to (eg in the first link above, does it relate to the biological concept order or to the particular order Anseriformes?). Still worse, the URLs are usually complicated and usually do not suggest too well what the document is about.

Missing pictures

Although a document, Britannica Online FAQ in the public pages says that There are currently a large number of hotlinked illustrations there seem to be no pictures, figures, maps or any other than textual information on BO. (There might be, but I haven't found any, not even about such issues which strongly suggest themselves to be visualized.) This is a very serious limitation.

Better user documentation needed

There is quite a lot of information about the use of BO, but it is split into several documents in a confusing way. Hypertext is fine, but in order to become a user of BO quickly one needs a self-containing document of instructions. It should explain what information (and what databases) there is on BO, the principle of accessing it (ie using one's favourite WWW browser), the basic methods of searching and navigating, and any special features of BO as part of the WWW world, such as the abnormal way of presenting links, if it is preserved.

Such a document should of course contain pointers (links) to more detailed information, but it should be as such sufficient for the normal user for getting started. It should also be in a form suitable for printing on paper (ie preferably as a single HTML document) and declared public domain, so that subscriber organizations could make paper copies of it (or translations of it into various languages) to their members, to promote the use of BO.

Organization of information

There are also some other basic flaws in the organization of information into hypertext, in addition to the above-mentioned problems of unnecessary indirections and stylistically odd implementation of references. A user easily gets lost in the jungle of links on WWW in general, but there are some additional traps on BO in particular. They are probably best described by an example.

I am looking for information about Eero Saarinen, the architect. The simple, obvious method of index search with string Eero Saarinen was succesful. However, I encountered the annoying feature (explained later in some more detail) that BO also offers me information about Eero Erkko and Eliel Saarinen.

Following the obvious link is an obvious thing to do, but I think I should not have been compelled to do it, since even a computer should be able to figure out that only one of the hits was a true hit. Now I see the following:

Saarinen, Eero (Am. arch.) 
   collaboration with 
      Eames 
      Roche 
   contribution to modern architecture 
   Gateway Arch 
If I pretend to be an inexperienced user of BO, I cannot quite see that the first line is a link to a (good) biography of Eero Saarinen, with links to the topics mentioned above, among other things.

Perhaps this is just a matter of taste, but I would prefer getting directly to the biography and following the links there if I like. The current approach suggests that the only information about Eero Saarinen on BO is about his collaboration with Eames and Roche, his contribution to modern architecture, and Gateway Arch (which logically is part of the contribution, by the way).

Problems do not end here. In fact I was interested in the Gateway Arch. (It is a natural thing to expect that there is some picture of it, but that is a different issue.) It is mentioned at the end of the biography, and there is even a link to more information - at least that is what it looks like. In reality, following the link gives me

Jefferson National Expansion Memorial (St. Louis, Mo., U.S.) 
   design by Saarinen 
which is something I had just read. Being an optimist, I assume the link "design by Saarinen" leads me to information about the design. In reality it gets me back to the end of Saarinen's biography.

Oh, but now I notice the symbol containing the words Next section. So the biography actually continues (but with no new information about the Arch). Perhaps I am stupid, but I really did not realize at first that the biography was divided into sections, linked together. Whether this is a good approach is debatable, especially because the division does not seem logical, and it is not even practical since the sections are longer than one page, so that the user must anyway know how to scroll within a section. It would be better to organize information like this into a single WWW document. (Larger documents should of course be organized hierarchically, with tables of contents, which actually seems to be the approach adopted.)

Now, I still hope to know more about the Arch, so I return to the page I originally got as response to the search, and I pick up the link to Gateway Arch there. I find myself at the end of some document. Knowing something about my WWW browser, I scroll upwards to see that the document is about St. Louis. There really is some additional information about the Arch in the document. Wanting still more, I look carefully at the text

stainless-steel [Index] Gateway Arch, designed by Eero [Index] Saarinen
which promises me two links. The latter is probably to something I have already read, since I am able to guess that [Index] refers to an entry about Eero Saarinen, although it is funnily placed before his first and last name. The first link is intuitively less clear. I had learned that [Index] is often a link to information about something that is mentioned before it, and currently I am not so enthusiastic about stainless steel. But it turns out, as I had anticipated, that this time the link is to information about something mentioned after the code [Index]. What I get is
Gateway Arch (mon., St. Louis, Mo., U.S.) 
   Saint Louis [Ref 1]; [Ref 2] 
This seems to lead me back to the article about St. Louis, which assumably mentions the Arch in two contexts. Fine, in a sense. But in those contexts there are links which are not natural cross-references within the same document but links to the above-mentioned page which in turn contains links to the two contexts. That is, yet another frustrating indirection. This approach is probably good when there is a large number of contexts to refer to, and perhaps it would take too much work to handle simpler cases in a simpler, more user-friendly way.

Sample searches

This section describes in some detail sample searches from BO and the contents of the information found.

How sweet is lactose?

As a good example of succesful search using text search, the question How sweet is lactose? was answered by providing a list of links related to the question. The first of them contained a direct answer to the question: On a scale of sweetness on which sugar is 100, lactose is only about 20.

However, another link (a link to the entry labelled sweeteners) pointed to document with numerically different information: If sucrose is taken as a standard of 1, the sweetness of - - lactose is 0.27. This deviation is of course caused by an inconsistency in Encyclopędia Britannica itself, not by the search methods of BO. In fact, BO is a valuable tool in improving the contents of Encyclopędia Britannica, since using BO one can relatively easily access different Encyclopędia articles related to the same topic in the and find out inconsistencies.

Finland

As mentioned above, an index search with the search string Finland leads to information about Finland in a nice (but not optimal) way. It is easy to navigate to those issues about Finland which one is interested in.

However, the information contents can be criticized. There are misspellings such as Helsingen Sanomat for Helsingin Sanomat. (This misspelling occurs not only in the title but also in the article itself, and the article also misspells the earlier name Päivälehti of the paper as Paivalehti. The selection of topics is not very balanced. It looks odd to a Finn that the only subtitle under the title communications is Helsingen Sanomat, or that the only social issues in Finland seem to be alcohol consumption and prohibition. It is difficult to say how the situation could be improved, but obviously the idea is to present links to special articles about some topics in a classified manner. Care should be taken to avoid the impression that the list of articles covers everything there is about the whole topic on BO.

There are even some plain errors. For instance, the Macropaedia entry for Finland says: independence was formally recognized by the Soviet Union in 1920. In fact it was recognized at the end of 1917 (and reconfirmed in the Tartu peace treaty in 1920), not by the Soviet Union which did not exist at that time but by Russia. (As the BO entry for the Soviet Union correctly describes, the Union was established in 1922. Whether the Union was a nation, as the explanation (hist. nation, Eurasia) suggests, is debatable.)

These remarks apply of course to Encyclopędia Britannica itself, not to BO in particular, but a user of BO inevitable judges the product on the basis of the correctness of its information contents. Moreover, the nature of BO would give an excellent opportunity to get feedback from readers, for instance in the form of suggested corrections by using WWW forms for the purpose.

Kokkola

Since I spent my school years in a Finnish city called Kokkola, I checked what information BO has about it. Astonishingly, this relatively small city has an entry covering the basic facts.

However, there is a serious error in the information contents: The statement Almost half of the inhabitants are Swedish-speaking is utterly false. The percentage of Swedish-speaking inhabitants has been around 20 % for decades.

Olli Lounasmaa

A text search with the search string Olli Lounasmaa (the name of a famous Finnish contemporary scientist) gives one hit, a link named Joensuu. Following that link, the user sees a text in which the string Olli is highlighted. The information (about the town of Joensuu) is correct, but it is totally misleading that the search returns a link to it. 'Olli' is a common first name in Finland, and returning a link to a document which happens to mention some Olli when the search string was a name containing that first name is quite unacceptable behaviour.

In general, the searches seem to be much too powerful in the sense that eg with a two-word search string a match is reported when there is match for one word only. Notice, in particular, that the first response to the query says:

Britannica Online contains 1 item relevant to 'Olli Lounasmaa'.
without mentioning at all that a match was found for Olli only. (The detailed search report does describe the situation, although not very clearly, but users normally don't consult such technical reports unless they see an obvious reason to do so.)

Canary Islands

An index search with the search string Canary Islands results in a long list of hits, obviously just because the word islands appears in them. The good point is that the real hit, for the Canary Islands, appears on top - not as the first but as the second, the first one being 'Canary Islands chaffinch (bird)'! However, returning irrelevant information in addition to relevant is annoying, when there is an obvious technical method to avoid it (strongly prefer full match to partial match), and it is probably a symptom of the same flaw as the bad behaviour described above.

What is the capital of Burkina Faso?

Using text search with the simple question What is the capital of Burkina Faso? results in a very large number of hits, obviously because there are so many texts referring to Burkina Faso. However, one would expect to get a single document answering the question, or just the answer - the name of the capital. The behaviour is not essentially different from BO's reaction to simpler search with just the search string Burkina Faso.

Admittedly the first hit in the list returned does contain the answer, and in the displayed excerpt from the document the word capital is in bold face.

Thus, text search seems to work reasonably well for simple questions, but the format of giving answers is far from optimal.

To solve the practical problem of producing vinegar from my own wine, I asked the question How can I make vinegar from wine? Quite obviously the question was ill-posed, relative to the search strategies of BO, since the first entries in its answer were about
  1. wine regions and varieties
  2. North Caucasian languages
  3. Assyrian culture
  4. Existentialism.
The list continues with other astonishing entries. It does contains an entry for vinegar, among very exotic other entries.

Studying the answers and the search report closely, I found out that BO had used the words I, make, vinegar, and wine in the search, ignoring other words. Paying attention to the pronoun I explains the strange search results.

In fact, in the light of my previous experiences with BO I had anticipated this problem, and actually I first used index search with the string vinegar. The entry for this topic was returned, and it indeed gives a good tutorial introduction to the process of making wine vinegar. I will try it.

The lesson is that index search can be much more efficient than text search, at least at the current stage of the development of text search strategies within BO.

WWW, Internet

An index search with the search string WWW gives a link to a short note which gives a link to World Weather Watch. A naive user might expect that BO, being implemented with the aid of World Wide Web (WWW), would contain some information about it.

On the other hand, there is a good (although short) article about Internet. But in this context I noticed that obviously some articles contain links to information about their authors. In this case the link is named B.Ka. However, following that link I get a page on which there is information about several writers. I have to scan through it in order to find the information about B.Ka. This is something that would better be performed by a computer program, ie the an author link should point directly to information about that person.

More about searches

In addition to previous remarks about the searches, the most important note is probably the following: When BO returns a list of documents, it does not specify why each of the document was returned. Compare this with the user interface of Lycos (which is, according to a very common user opinion, currently the best general search engine on WWW). When returning a list of documents, it specifies for each document the number of keyword hits for it and also a hit score (a number between 0 and 1). This can be extremely valuable, since very often a single keyword hit is irrelevant.

Admittedly, BO provides a method for restricting the search to those articles containing all the terms in the query. Such a method is useful but insufficient, since for complicated searches a hit for all terms is often very improbable.

Character code problems

The Britannica Online FAQ addresses, in response to questions 4 and 5, problems with mathematical and chemical notations as well as letters with accent marks and other special characters.

In general, the approach adopted is a good compromise between readable and natural notation on one hand and manageability by WWW browsers on the other hand. However, as indicated above, there are some problems in presenting correctly e.g. the Scandinavian letter ä.

Moreover, there are deficiencies in presenting mathematical notations and deviations from the principles presented in the FAQ. For instance, the entry for ampere contains the notation

2 {times} 10{sup -7} newton
which is almost unreadable, or at least requires imagination. A better approach for such notations would be to present them using the clumsy but rather generally understood linearized notation like
2*10**(-7) newton
or, in this particular context, 200 nanonewton or 0.2 micronewton.

Notice that the SI system recommends that when using an exponent notation, the exponent of 10 should be evenly divisible by 3. This principle is logically compatible with the use of prefixes like k, m, M, (indicating multiplication by a power of ten with an exponent divisible by 3), but it is not generally applied in the representation on BO.

A similar problem is the notation

4{degree} C
in the Macropaedia entry for Metric system. If one wishes to avoid using the notation 4° C (because it may be displayed incorrectly by browsers), it is much better to write it without abbreviations, ie 4 degrees Celsius, than to use an ad hoc notation which indicates an abbreviation.

Copyright issues

The copyright notice says:
By law, no part of this work may be reproduced or utilized in any form, except for copying of brief excerpts as permitted under U.S. copyright law.
This is a tricky legal issue. There are different copyright laws in different countries, and normally an act is to be judged according to the laws of the country where the act was taken. For instance, the Finnish copyright law does not limit the right to quote published works to "brief excerpts" but on other grounds.

It can be argued that the existence of the copyright notice constitutes an agreement, ie that the user commits himself to obeying U.S. legislation in this context and a violation could be treated as violation of that agreement. The tricky point is that signed agreements are not made by end users but by an institution, which cannot control the acts of its members in this respect.

The pragmatic side of the issue is that information searches are made for various purposes, including purposes for which it is essential to be able to quote parts of the information. The question arises which institution decides the extent of allowable quotations and on what grounds.

Mysterious errors

Once it occurred that when trying to access Subscribers only area, after the arrangements made for the test period, and after having accessed that area succesfully, I got (when using X-Mosaic under Unix) the following error message:
Britannica Online Home Page 

Unauthorized Access

You have accessed a hypertext link to the Encyclopędia Britannica proprietary
database from bastion.eb.com (198.242.219.5). This host has not been enabled
for access. 
This error was intermittent; subsequent accesses were succesful.

Testing the beta version

The experiences described above were obtained by using the current "normal" version of BO. (There is no version number one could refer to, unfortunately.) That version also contains a link to a newer version in beta test phase, beta version 1.1.

The beta version 1.1 looks quite different from the current "normal" version. It is difficult to say to what extent the changes are improvements. In general, I think that frequently used user interfaces should be kept stabile, making changes only if they are definite and considerable improvements. However, the customer base of BO is probably not very wide yet, so some experimentation is understandable at this stage.

The basic difference seems to be that instead of two search methods there is, superficially, only one. The difference between index search and text search is probably now implemented in the selection of reference in the unified search scheme. I am inclined to think that this is a better approach, but once more I emphasize the need for better user documentation. A beginning user should be assisted so that he gets very early an idea of the difference between index search and text (or free) search.

Conclusions

Britannica Online is a very promising product, both because of the generally acknowledged good quality of Encyclopędia Britannica and because of the powerful tools for accessing information via WWW.

There is quite a lot of development work to be done, and apparently it is being done, but even in its present state Britannica Online can significantly improve the productivity of teachers, research workers, other staff, and students at universities.