The program is mainly intended for getting HTML documents printed on paper with good layout in a user-controllable way. You can convert an HTML file to a LaTeX file, then process it with LaTeX, convert the resulting DVI file to a PostScript file and print it. For this, you need the LaTeX software and you must know how to run LaTeX on a file, of course, but in principle you need not know more about LaTeX. Naturally it helps if you know LaTeX enough eg to be able to fix hyphenation errors. - You can, of course, make the DVI or PostScript file available via WWW so that people can decide whether they wish to access your HTML file (letting their browser do the formatting) or the file you have formatted.
Essentially, HTML is a languages which speficifies the structure, not the layout, of a document, whereas LaTeX is a powerful tool for formatting documents. Thus, when one wants to produce a nicely formatted paper copy of an HTML document, going via LaTeX is natural choice.
The h2l program provides some basic options for defining the document layout, such as setting the document class. If you want to do something else, you can of course edit the LaTeX file produced by h2l before processing it further.
Source code for h2l (in C) consists of config.h, h2l.c, scanHTML.c, scanHTML.h, and makefile.
h2l
[opt ...] [file ...]
-
. Output will go to a similarly named file with a
.tex
extension (h2l recognises
.html
extensions).
Options modify the action of h2l. The options are:
h2l -n - < file.html | lessThis converts file.html to LaTeX and pages through the output. The sections (corresponding to heading tags in the HTML source) will be numbered.
Another example is
h2l -t 'Introduction to HTML' -a gnat -p -c html-introThis takes input from the file
html-intro
,
if existent, or from html-intro.html
,
writing to
html-intro.tex
, and adds a title page (with title
Introduction to HTML
and author gnat
)
and table of contents with page-breaks after both. The sections of
the document are not numbered.
The rules for converting HTML to LaTeX are mainly specified
in the process_HTML
function in
h2l.c,
and it should be relatively straightforward to modify the
behaviour in some simple manner like changing the way
HTML elements are mapped to fonts.
In particular, all HTML elements are recognized within the scope of <LISTING>, <PLAINTEXT>, <PRE>, and <XMP>.
Last update: September 11th, 1996