In the HTML language itself, there is nothing special with the reverse solidus character: it is a normal data character with no markup significance. But in Web authoring, different other languages and notations (e.g. JavaScript and URLs) are used too, and they are often combined with HTML and with each other. This may cause confusion, since the languages and notations may use the reverse solidus for different purposes. This document tries to clarify concepts and to suggest practical measures to avoid confusion.
The reverse solidus character "\" is a normal data character as far as HTML itself is considered - nothing special about it, and no need to "escape" it in any manner.
You can however use the
numeric character reference
\
to denote "\"
in an HTML document (e.g. if your key for reverse solidus is broken, or you
can't find a key or other way to enter the reverse solidus on a strange keyboard).
Relatively often the reverse solidus as a character is confused with the solidus (slash) character "/". They are similar in shape, just slanted differently. But they are quite distinct characters and have different uses. In particular...
It is a common error to use "\" instead of "/" in
URLs, as
in href="..\index.html"
instead of the correct
href="../index.html"
.
Further confusion is caused by the phenomenon that some browsers accept the
incorrect format too, so people complain that "my links work on IE but not
on Netscape".
This isn't
about HTML itself however; it's about misunderstanding the relationship
between URLs and filenames and their syntax; see
my attempt to describe
the URL syntax and semantics.
Rather often people write script
elements that contain JavaScript code like
document.write('</table>');
In HTML terms, the problem is that HTML parsers should recognize
</table>
as an end tag when processing the <script>
element,
even though it is syntactically not allowed in that context.
After all, a parser must look for </script>
, and for certain reasons it
should look for any end tag. This has practical impact in validation
mainly; see section
B.3.2 Specifying non-HTML data
in the HTML 4 specification.
The use of "\" to solve the problem, as in
document.write('<\/table>');
is not really an HTML issue, though the problem was in a sense caused by
HTML parsing rules.
It's a matter of
JavaScript rules for string literals
that the character pair
\/
is a
valid notation for the "/" character. So what happens is that the HTML
parser (typically, a routine in a browser) takes
document.write('<\/table>');
as such (note that it does not see any end tag there - the essential
point is that "<" and "/" are not consecutive here), passes it to a
JavaScript interpreter (typically, a set of routines in a browser),
which will, by virtue of JavaScript rules, takes it as equivalent to
document.write('</table>');
and so what gets written is just </table>
.
Confusing, isn't it? So to
avoid the confusion, consider putting JavaScript code to
an external file and referring to it via
<script src="URL"></script>
.
Naturally, in such an external file, "</" in a string causes no problems,
since the file is not processed by an HTML parser, only by a JavaScript
interpreter. (The use of external JavaScript files
caused
some problems
on very old browsers like IE 3, but it is in many ways better
particularly for bulky code.)
When an HTML document is generated by a Perl
(or C or some other language)
program, the reverse solidus is frequently used, since it has special
meanings in Perl (and in other programming languages).
For example, the program might contain
print "<h1>Hello</h1>\n";
There's nothing special about it. The notation \n
is the Perl (and C)
way of specifying an end-of-line character; what gets written into the HTML
document is the string <h1>Hello</h1> followed by an end of line.
But some confusion can be caused when people present their problems with HTML
by quoting the Perl code for generating it, rather than the actual HTML document!
Some
Web browsers may incorrectly split lines between characters. In
particular, IE may even split "a-b" as "a-" and "b".
And such problems include a possible line split between the solidus
and the reverse solidus in "/\"; the workaround is to use nonstandard nobr
markup around them: <nobr>/\</nobr>
It's a rare combination of characters of course. But I suspect that IE might
split a line before or after reverse solidus in some other contexts too.
Date of creation: 2000-08-22. Last modification: 2006-04-18.
Jukka Korpela