HTML authoring in French

Contents:

This document discusses some basic problems with accented letters, guillemets, etc., when authoring HTML documents in French.

In French, the following characters may be needed in addition to the ASCII repertoire (which shouldn’t cause problems):

French typography rules for spacing need to be taken into account, too. For example, a space is required before a semicolon (:). The space should be narrower than a normal space, and it should be non-breaking. These pose some challenges in HTML authoring, and so does the French practice of using superscript characters, as in “1er.”

Entering characters

The table suggests some ways to insert the characters into an HTML document as follows:

GlyphCommon name Windows Reference Unicode
Vowels with diacritic marks
à a with grave Alt0224 à 00E0
À capital a with grave Alt0192 À 00C0
â a with circumflexAlt0226 â 00E2
Âcapital A with circumflex Alt0194 Â 00C2
è e with graveAlt0232 è 00E8
È capital e with grave Alt0200 È 00C8
é e with acute Alt0233 é 00E9
É capital e with acute Alt0201 É 00C9
ê e with circumflex Alt0234 ê 00EA
Ê capital E with circumflex Alt0202 Ê 00CA
ë e with dieresis Alt0235 ë 00EB
Ë capital E with dieresis Alt0203 Ë 00CB
î i with circumflex Alt0238 î 00EE
Î capital I with circumflex Alt0206 Î 00CE
ï i with dieresis Alt0239 ï 00EF
Ï capital I with dieresis Alt0207 Ï 00CF
ô o with circumflex Alt0244 ô 00F4
Ô capital O with circumflex Alt0212 Ô 00D4
ù u with grave Alt0249 ù 00F9
Ù capital U with grave Alt0217 Ù 00D9
û u with circumflex Alt0251 û 00FB
Û capital U with circumflex Alt0219 Û 00DB
ü u with dieresis Alt0252 ü 00FC
Ü capital U with dieresis Alt0220 Ü 00DC
ÿ y with dieresis Alt0255 ÿ 00FF
Ÿ capital Y with dieresis Alt0159 Ÿ 0178
Other letters
ç c with cedilla Alt0231 ç 00E7
Ç capital C with cedillaAlt0199 Ç 00C7
œ oe ligature Alt0156 œ 0153
Œ capital OE ligature Alt0140 Œ 0152
Punctuation
« left guillemet Alt0171 « 00AB
» right guillemet Alt0187 » 00AB
left single guillemet Alt0139 ‹ 2039
right single guillemet Alt0155 › 203A
left double quote Alt0147 “ 201C
right double quote Alt0148 ” 201D
left single quote Alt0145 ‘ 2018
apostrophe Alt0146 ’ 2019
em dash (cadratin) Alt0151 — 2014
en dash (demi-cadratin) Alt0150 – 2013
Other characters
euro sign Alt0128 € 20AC
  no-break space Alt0160   00A0

Example: To produce « ici », you could enter, on Windows,
Alt0171Alt0160iciAlt0160Alt0187
and see the text really as « ici » in some font. Or you could type, on any system,
« ici »
and have it displayed properly on a browser but shown as the codes in an editor. Yes, both ways are clumsy. That’s why simplified style like "ici" is used so much. There are programs that convert e.g. "ici" to the correct notation.

Notes:

Spacing

The French language uses special spacing in connection with several punctuation char­ac­ters, for example before an exclamation mark. The following example shows a sentence first in a “Internet style”, then in proper French style:

Il disait: "L'État, c'est moi!".
Il disait : « L’État, c’est moi ! ».

Non-breakability

Normally web browsers (and many other programs) treat every space as an allowed line breaking point. This is not acceptable for the spaces discussed here; we do not want the previous example to be rendered as follows:

Il disait : « L’État, c’est moi
! ».

This can be prevented by using the no-break space character, often referred to as NBSP. It can be used as such, though in most keyboard layouts, there is no direct way to type it, and is usually not directly distinguishable from a normal space in an editor or other authoring tool. This is why the entity reference   is used so often. Example:
Il disait : « L’État, c’est moi ! ».

However, the no-break space is usually as wide as a normal space, therefore too wide. There are two issues here: what should the width be, and how to implement it in HTML doc­u­ments?

Width of spaces around punctuation

The terms and rules for spacing in French orthography are somewhat confusing and mixed. For example, Microsoft’s Character design standards - Punctuation 1 says:

Traditionally in French typography the left pointing guillemets are followed by a non-breaking word space or thin space of 1/8 the em and the right [pointing guillemets] pr[ec]eded by a non-breaking word space or thin space of 1/8 the em.

This is strange, since a no-break space is normally too wide, and in Unicode the thin space character is defined as "1/5 em (or sometimes 1/6 em)". It seems that the description is meant to justify the actual behavior of Microsoft software:

In Microsoft Word 97 the non-breaking space U+00A0 is automatically inserted when the French language is selected and a guillemet is typed. Some French typographers prefer to use a non-breaking thin space (espace fine insécables) with the guillemets.

It seems that espace mots insécable “no-break inter-word space” is often confused with espace fine insécable “fine (narrow) no-break space.“

Probably the special spacing around punctuation marks is best implemented as 1/8 of the em unit, i.e. 0.125em to put it in CSS terms. Using 1/6 em, or 0.167em, is possible, too, but 1/5 em (0.2em) is too near to the width of a normal space (about 0.25em in the average).

Setting the width of a space in an HTML document

The best way to create a fixed-width nonbreaking space in HTML is probably to use the no-break space and style its width. Since the width of an inline element cannot be set in CSS, according to the specifications, we make the element an inline block. This means markup like the following (where   can be replaced by the no-break space character itself:

<span class=fine>&nbsp;</span>

The style sheet could be:

.fine {
  display: inline-block;
  width: 0.125em;
}

The following demonstrates the effect. The first version has just unstyled no-break space, and the second has the styling described above.

Il disait : « L’État, c’est moi ! ».
Il disait : « L’État, c’est moi ! ».

In this approach, the width can easily be modified by changing a single value in the stylesheet (0.125em).

Alternative methods for setting the width of spacing

In theory, there is a large number of different space characters; see the Unicode block General Punctuation, or a summary of space characters in Unicode. You could consider using the Unicode character U+2009 THIN SPACE, which according to the Unicode standard has the width of “a fifth of an em (or sometimes a sixth).” On the other hand, it’s just a variant of the normal space, so it is breakable, and you surely don’t want that. There’s also U+202F NARROW NO-BREAK SPACE. But this character is much less widely supported than the no-break space. What’s worse, browsers display various things, like a small box or a question mark, when encountering a character they don’t support.

One might try to use a no-break space in a smaller font: <small>&nbsp;</small>. But this does not seem to have much effect. Another approach is to use style sheets in a fairly complicated way: put the last letter and the exclamation mark (or other punctuation, as the case may be) within a span element, and in a style sheet suggest added spacing between characters:
... mar<span style="letter-spacing:0.1em">k!</span>
Or you could use a no-break space and suggest reduced (negative) spacing between words:
... <span style="word-spacing:-0.13em">mark&nbsp;!</span>

Yet another approach, perhaps the most natural, is to wrap each of the guillemets inside a span element and set margins or paddings for them to create the desired spacing:
<span class="Pi">«</span>foo<span class="Pf">»</span>
with a style sheet like
span.Pi { margin-right: 0.1em; }
span.Pf { margin-left: 0.1em; }

The following demonstrates the effect of various approaches on your browser (rest assured it’s different on other browsers!):

Superscripts

It is customary to use superscript style for some endings in French, as in “1re.” The best way use such style in HTML is to use the span element (or the a element for brevity) with a class, e.g.
1<span class=sup>re</span>
with a style sheet like the following:
sub, sup, .sub, .sup {
  position: relative;
  bottom: 1ex;
  font-size: 80%;
}

In the notations 1o, 2o, etc., which stand for Latin words (primo, secundo, etc.), the super­script is small letter o, not the digit zero or the degree sign. Instead of the technique described above, the notations could conceivably be written using the masculine ordinal indicator character (U+00BA), e.g. 1º. However, this is best avoided, since the appearance would differ from the style of other superscript letters and would contain an underline in many fonts.

It is customary to use the sup markup for superscripts, e.g. 1<sup>re</sup>. However, the use of sup may cause uneven line spacing on some browsers. The following screenshot illustrates this; it shows a text with a superscript suffix, first when marked up as sup, then when marked up as span and styled in CSS as described above.

(An image where the same text appears twice, first with a gap
between the first two lines.)

Resources on writing French (partly conflicting with each other):