Hlins: Hyper-Link Insertions in HTML documents
Version 0.39
Hlins: Hyper-Link Insertions in HTML documents
Version 0.39
May 1, 2003
1 An Introductory Example
Hlins inserts in a HTML
document the url's (uniform resource locator) for certain names
(normally the names of people), according to a data base associating
url's to names.
First you have to create a data base that associates url's to names,
let's call it addresses:
Donald Knuth = http://www-cs-staff.stanford.edu/...
Leslie Lamport = http://www.research.digital.com/...
Suppose that you have a HTML document mytext.html that
contains text as
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor Donald
Knuth, which was used by L. Lamport as a base to build the more user-friendly
(but less powerful) LaTeX system.
Calling hlins -db addresses -o newtext.html mytext.html will
generate a file newtext.html that contains now the piece of
text
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor <a
href="http://www-cs-staff.stanford.edu/...">Donald Knuth</a>, which
was used by <a
href="http://www.research.digital.com/...">L. Lamport</a> as a base to
build the more user-friendly (but less powerful) LaTeX system.
which will eventually be rendered by a browser as something like
A milestone in the development of digital typesetting was the
TeX system developed by Stanford computer science professor
Donald
Knuth, which was used by
L. Lamport
as a base to build the more user-friendly (but less powerful) LaTeX
system.
Note that the url insertion knows about abbreviating first names (as
for Leslie Lamport) and works over line breaks (as for Donald Knuth).
2 Usage
hlins [options] [inputfile]
Hlins can be used in three different modes (see below).
The following general options exist:
-
-h, --help
-
Show summary of options and exit.
- -v, --version
-
Show version of program ad exit.
- -q, --quiet
-
Surpress diagnostic output.
- -db, --data-bases files ...
-
Use files as address data bases.
The string files is a blank-separated list of data base
files, which means that you have to protect the blanks from your shell
when using several data base files. Multiple
-db
options are
accepted.
Examples of usage strings in the
csh shell are
hlins -db myaddresses
hlins -db "friends groupmembers"
hlins -db friends -db groupmembers
The last two invocations are equivalent.
2.1 Usage in filter mode
In filter mode, hlins reads html from one source and writes to a
different target. Input is taken from the inputfile argument
if existent, otherwise from stdin. Output goes by default to
stdout.
-
-o, --output-file file
-
Write to file instead of standard output.
2.2 Usage in modify mode
In modify mode, hlins modifies html files in place.
-
-m, --modify-files files ...
-
Modify the files in-place..
- -R, --recursive
-
recursively descend into directories and operate on all files with
names ending on .html. Only effective in with the
--modify option.
For instance, ``hlins -db ... -m /WWW -R'' makes hlins
operate on your complete WWW tree.
- -td,--tempdir dir
-
When doing in-place modifications of files use the directory
dir to create temporary files. Default is the value of the
TMPDIR environment variable, and /tmp if
TMPDIR is not set.
2.3 Usage in database list mode
-
--db-to-html
-
Lists the contents of the databases in html to standard output. This
can be handy to create an adress book.
3 Secondary Effects on the HTML Text
Hlins replaces special characters of HTML (as é
or
é
) by the corresponding ISO-8859-1 character, which is in
this case é
. Hence, you can use Hlins without any database
argument to replace HTML special characters in a HTML document.
In some cases, non-empty sequences of white space characters may be
replaced by one space. However, this happens only when the white space
is part of a prefix of some name in the data base. Anyway, this
replacement is irrelevant for the rendering of HTML documents.
4 Address Data Bases
Every line of the file must be either a comment line or an address
specification. A comment-line is a line that either consists only of
white space, or that starts with the comment-symbol #
(possibly
preceded by white space).
An address specification consists of a name and a url that are
separated by the character =
. Leading white space of the line
is ignored. In the name, the character =
must be written as
==
.
Special characters in the name can be either written in HTML or as 8bit
characters. The number of spaces separating the words of a name is not
relevant.
The syntax of the url is not checked.
5 Variants of Names
Several variants of the names in the data base are recognized as
well. To find the variants of a name we first split it at white spaces
into components.
-
If a name consists of just one component than it has no variant
other than itself.
- Otherwise, the variants of the name are obtained by considering
all possible combinations of variants of the components. The last
component is treated differently from the other components:
-
If the last component contains the symbol
-
then
the name without this -
and everything behind is also
recognized. Hence, if you have an entry for Egon Müller-Meier
then Egon Müller is also recognized.
- A component which is not the last component may be abbreviated,
unless it consists of one only one letter or it terminates on a
dot. The abbreviation of a first name is its first letter followed by
a dot. In case of a word starting with St a further
abbreviation is St followed by a dot, and a word starting on
Ch has additional abbreviation Ch followed by a dot.
Composite first names are abbreviated in both components, hence
Marc-Stephane becomes M.-St. (but not, for instance,
M.-Stephane).
- In any case generation of variants is surpressed if you write
the component in angular brackets like <Marc-Stephane>.
This mechanism is used in the data
base to produce this document, to have matching of
Objective Caml
but to avoid matching of O. Caml
.
6 The Exact Rules of Searching Names
Names are searched starting from the beginning of the text. If there
are overlapping matches then the match starting at the earlier
position wins. For example, if the data base contains entries for
Egon Meier
and for Hans Egon Meier-Müller
then the second
one matches on input Hans Egon Meier-Müller
.
A match is extended to longer matches if possible. That is, if the
data base contains entries for Hans Egon
and for
Hans Egon Meier
then the second one matches on input
Hans Egon Meier
.
7 The Exact Rules of URL Insertion
Hlins does not touch any text between
<a ... href= ...>
and </a>
. Note that this applies only if
the <a>
tag contains the href
attribute, that is hlins
does look at text inside of <a name=...>
and
</a>
. As a consequence, hlins is idempotent, that is if you
apply hlins twice (for instance using the --modify option) to
a file you get the same effect than with just one application. Hence, you
can, when you extend your database, safely rerun hlins on your html
files.
The replacment mechanism (including the normalisation of HTML special
charactes) is shortcut for any text inside the following tags:
-
<head>
... </head>
<samp>
... </samp>
<kbd>
... </kbd>
<pre>
... </pre>
<div nohlins>
... </div>
The rationale is that the first four tags of this list are intended to
mark some kind of verbatim text (see the
HTML 4.01 specification). The
last one is an escape mechanism in case you have to overrule hlins'
mechanism. Text from the beginning of one of the start
tags to the first occurrence of the corresping end mark is ignored.
The consequence is that among the above list embedded tags of the same
kind are not correctly treated.
Furthermore, text inside angular brackets <
and >
is not
treated by hlins.
If there are several different url's for a string foundname
then the following rules apply to determine the url inserted:
-
An address specification ``name = url'' where
name matches exactly (modulo white space and HTML special
characters) foundname has priority over a name specification
``name = url'' where foundname is an
abbreviation for name.
- In the list obtained from the above priority rule, the first
match is taken.
A warning is issued in case of a conflict, unless the --quiet
option has been given.
For instance, your data base might contain something like
Hans Meyer = http://address.for.full.name
H. Meyer = http://address.for.abbreviated.name
On input H. Meyer
, the second address specification is selected
(and a warning is issued).
8 Implementation
Hlins is written in Objective Caml.
9 License and Installation
Hins ins covered by the Gnu General Public License.
See the Hlins home page for
binary and source distributions.
10 Credits
Thanks to Claude Marché and Jean-Christophe Filliâtre for their
remarks and suggestions.
This document was translated from LATEX by
HEVEA.