This document discusses some basic problems with accented letters, guillemets, etc., when authoring HTML documents in French.
In French, the following characters may be needed in addition to the ASCII repertoire (which shouldn’t cause problems):
French typography rules for spacing need to be taken into account, too. For example, a space is required before a semicolon (:). The space should be narrower than a normal space, and it should be non-breaking. These pose some challenges in HTML authoring, and so does the French practice of using superscript characters, as in “1er.”
The table suggests some ways to insert the characters into an HTML document as follows:
&#x
and append a semicolon
(e.g., œ
stands for the character with number
0153 in hexadecimal, i.e. œ)
\u
and write it in exactly four digits
(sample code: var oe = '\u0153';
).
Glyph | Common name | Windows | Reference | Unicode |
---|---|---|---|---|
Vowels with diacritic marks | ||||
à | a with grave | 0224 | à | 00E0 |
À | capital a with grave | 0192 | À | 00C0 |
â | a with circumflex | 0226 | â | 00E2 |
 | capital A with circumflex | 0194 |  | 00C2 |
è | e with grave | 0232 | è | 00E8 |
È | capital e with grave | 0200 | È | 00C8 |
é | e with acute | 0233 | é | 00E9 |
É | capital e with acute | 0201 | É | 00C9 |
ê | e with circumflex | 0234 | ê | 00EA |
Ê | capital E with circumflex | 0202 | Ê | 00CA |
ë | e with dieresis | 0235 | ë | 00EB |
Ë | capital E with dieresis | 0203 | Ë | 00CB |
î | i with circumflex | 0238 | î | 00EE |
Î | capital I with circumflex | 0206 | Î | 00CE |
ï | i with dieresis | 0239 | ï | 00EF |
Ï | capital I with dieresis | 0207 | Ï | 00CF |
ô | o with circumflex | 0244 | ô | 00F4 |
Ô | capital O with circumflex | 0212 | Ô | 00D4 |
ù | u with grave | 0249 | ù | 00F9 |
Ù | capital U with grave | 0217 | Ù | 00D9 |
û | u with circumflex | 0251 | û | 00FB |
Û | capital U with circumflex | 0219 | Û | 00DB |
ü | u with dieresis | 0252 | ü | 00FC |
Ü | capital U with dieresis | 0220 | Ü | 00DC |
ÿ | y with dieresis | 0255 | ÿ | 00FF |
Ÿ | capital Y with dieresis | 0159 | Ÿ |
0178 |
Other letters | ||||
ç | c with cedilla | 0231 | ç | 00E7 |
Ç | capital C with cedilla | 0199 | Ç | 00C7 |
œ | oe ligature | 0156 | œ |
0153 |
Œ | capital OE ligature | 0140 | Œ |
0152 |
Punctuation | ||||
« | left guillemet | 0171 | « |
00AB |
» | right guillemet | 0187 | » |
00AB |
‹ | left single guillemet | 0139 | ‹ |
2039 |
› | right single guillemet | 0155 | › |
203A |
“ | left double quote | 0147 | “ |
201C |
” | right double quote | 0148 | ” |
201D |
‘ | left single quote | 0145 | ‘ |
2018 |
’ | apostrophe | 0146 | ’ |
2019 |
— | em dash (cadratin) | 0151 | —
| 2014 |
– | en dash (demi-cadratin) | 0150 | –
| 2013 |
Other characters | ||||
€ | euro sign | 0128 | € |
20AC |
no-break space | 0160 | | 00A0 |
Example: To produce « ici »,
you could enter, on Windows,
01710160ici01600187
and see the text really as « ici »
in some font. Or you could type,
on any system,
« ici »
and have it displayed properly on a browser but shown as the codes in
an editor. Yes, both ways are clumsy. That’s why simplified style like
"ici" is used so much. There are
programs that convert e.g. "ici" to the
correct notation.
Notes:
’
. The notation
'
denotes the Ascii apostrophe (') and should
never be used in text.
The French language uses special spacing in connection with several punctuation characters, for example before an exclamation mark. The following example shows a sentence first in a “Internet style”, then in proper French style:
Il disait: "L'État, c'est moi!".
Normally web browsers (and many other programs) treat every space as an allowed line breaking point. This is not acceptable for the spaces discussed here; we do not want the previous example to be rendered as follows:
This can be prevented by using the no-break space character,
often referred to as NBSP. It can be used as such, though
in most keyboard layouts, there is no direct way to type it, and
is usually not directly distinguishable from a normal space in an editor
or other authoring tool. This is why the entity reference
is used so often. Example:
Il disait : « L’État, c’est
moi ! ».
However, the no-break space is usually as wide as a normal space, therefore too wide. There are two issues here: what should the width be, and how to implement it in HTML documents?
The terms and rules for spacing in French orthography are somewhat confusing and mixed. For example, Microsoft’s Character design standards - Punctuation 1 says:
Traditionally in French typography the left pointing guillemets are followed by a non-breaking word space or thin space of 1/8 the em and the right [pointing guillemets] pr[ec]eded by a non-breaking word space or thin space of 1/8 the em.
This is strange, since a no-break space is normally too wide, and in Unicode the thin space character is defined as "1/5 em (or sometimes 1/6 em)". It seems that the description is meant to justify the actual behavior of Microsoft software:
In Microsoft Word 97 the non-breaking space U+00A0 is automatically inserted when the French language is selected and a guillemet is typed. Some French typographers prefer to use a non-breaking thin space (espace fine insécables) with the guillemets.
It seems that espace mots insécable “no-break inter-word space” is often confused with espace fine insécable “fine (narrow) no-break space.“
Probably the special spacing around punctuation marks is best implemented as 1/8 of the em unit, i.e. 0.125em to put it in CSS terms. Using 1/6 em, or 0.167em, is possible, too, but 1/5 em (0.2em) is too near to the width of a normal space (about 0.25em in the average).
The best way to create a fixed-width nonbreaking space in HTML
is probably to use the no-break space and style its width. Since
the width of an inline element cannot be set in CSS, according to the
specifications, we make the element an inline block. This means markup
like the following (where
can be replaced by
the no-break space character itself:
<span class=fine> </span>
The style sheet could be:
.fine {
display: inline-block;
width: 0.125em;
}
The following demonstrates the effect. The first version has just unstyled no-break space, and the second has the styling described above.
In this approach, the width can easily be modified by changing a single
value in the stylesheet (0.125em
).
In theory,
there is a large number of different space characters;
see the Unicode block
General Punctuation,
or a summary of space characters in
Unicode.
You could consider using
the Unicode character
U+2009
THIN SPACE, which according to the
Unicode standard has the width of “a fifth of an em (or sometimes a sixth).”
On the other hand, it’s just a variant of the normal space, so it is
breakable, and you surely don’t want that.
There’s also
U+202F
NARROW NO-BREAK SPACE.
But this character is much less widely supported
than the no-break space.
What’s worse,
browsers display various things, like a small box or a question
mark, when encountering a character they don’t support.
One might try to use
a no-break space in a smaller font:
<small> </small>
.
But this does not seem to have much effect.
Another approach is to use
style sheets in a fairly complicated way:
put the last letter and the exclamation mark (or other punctuation, as the
case may be) within a
span
element, and in a style sheet suggest added spacing between characters:
... mar<span style="letter-spacing:0.1em">k!</span>
Or you could use a no-break space and suggest reduced (negative)
spacing between words:
... <span style="word-spacing:-0.13em">mark !</span>
Yet another approach, perhaps the most natural, is
to wrap each of the guillemets inside a span
element
and set margins or paddings for them to create the desired spacing:
<span class="Pi">«</span>foo<span class="Pf">»</span>
with a style sheet like
span.Pi { margin-right: 0.1em; }
span.Pf { margin-left: 0.1em; }
The following demonstrates the effect of various approaches on your browser (rest assured it’s different on other browsers!):
 
)
before the exclamation mark !
 
)
before the exclamation mark !
It is customary to use superscript style
for some endings in French,
as in “1re.”
The best way use such style in HTML
is to use the span
element (or the a
element for brevity) with a class, e.g.
1<span class=sup>re</span>
with a style sheet like the following:
sub, sup, .sub, .sup {
position: relative;
bottom: 1ex;
font-size: 80%;
}
In the notations 1o, 2o, etc., which stand for Latin words (primo, secundo, etc.), the superscript is small letter o, not the digit zero or the degree sign. Instead of the technique described above, the notations could conceivably be written using the masculine ordinal indicator character (U+00BA), e.g. 1º. However, this is best avoided, since the appearance would differ from the style of other superscript letters and would contain an underline in many fonts.
It is customary to use the
sup
markup for superscripts, e.g.
1<sup>re</sup>
.
However, the use of sup
may cause
uneven line spacing on some browsers.
The following screenshot illustrates this; it shows a text
with a superscript suffix, first when marked up as sup
,
then when marked up as span
and styled in CSS
as described above.
Resources on writing French (partly conflicting with each other):