Hlins: Hyper-Link Insertions in HTML documents Version 0.39

Hlins: Hyper-Link Insertions in HTML documents
Version 0.39

Ralf Treinen

May 1, 2003

1  An Introductory Example

Hlins inserts in a HTML document the url's (uniform resource locator) for certain names (normally the names of people), according to a data base associating url's to names.

First you have to create a data base that associates url's to names, let's call it addresses:
Donald Knuth        =  http://www-cs-staff.stanford.edu/...
Leslie Lamport      =  http://www.research.digital.com/...
Suppose that you have a HTML document mytext.html that contains text as
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor Donald
Knuth, which was used by L. Lamport as a base to build the more user-friendly
(but less powerful) LaTeX system.
Calling hlins -db addresses -o newtext.html mytext.html will generate a file newtext.html that contains now the piece of text
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor <a
href="http://www-cs-staff.stanford.edu/...">Donald Knuth</a>, which
was used by <a
href="http://www.research.digital.com/...">L. Lamport</a> as a base to
build the more user-friendly (but less powerful) LaTeX system.
which will eventually be rendered by a browser as something like
A milestone in the development of digital typesetting was the TeX system developed by Stanford computer science professor Donald Knuth, which was used by L. Lamport as a base to build the more user-friendly (but less powerful) LaTeX system.
Note that the url insertion knows about abbreviating first names (as for Leslie Lamport) and works over line breaks (as for Donald Knuth).

2  Usage

hlins [options] [inputfile]
Hlins can be used in three different modes (see below). The following general options exist:
-h, --help
Show summary of options and exit.
-v, --version
Show version of program ad exit.
-q, --quiet
Surpress diagnostic output.
-db, --data-bases files ...
Use files as address data bases. The string files is a blank-separated list of data base files, which means that you have to protect the blanks from your shell when using several data base files. Multiple -db options are accepted. Examples of usage strings in the csh shell are
hlins -db myaddresses
hlins -db "friends groupmembers"
hlins -db friends -db groupmembers
The last two invocations are equivalent.

2.1  Usage in filter mode

In filter mode, hlins reads html from one source and writes to a different target. Input is taken from the inputfile argument if existent, otherwise from stdin. Output goes by default to stdout.
-o, --output-file file
Write to file instead of standard output.

2.2  Usage in modify mode

In modify mode, hlins modifies html files in place.
-m, --modify-files files ...
Modify the files in-place..
-R, --recursive
recursively descend into directories and operate on all files with names ending on .html. Only effective in with the --modify option.

For instance, ``hlins -db ... -m  /WWW -R'' makes hlins operate on your complete WWW tree.
-td,--tempdir dir
When doing in-place modifications of files use the directory dir to create temporary files. Default is the value of the TMPDIR environment variable, and /tmp if TMPDIR is not set.

2.3  Usage in database list mode

--db-to-html
Lists the contents of the databases in html to standard output. This can be handy to create an adress book.

3  Secondary Effects on the HTML Text

Hlins replaces special characters of HTML (as &eacute; or &#233) by the corresponding ISO-8859-1 character, which is in this case é. Hence, you can use Hlins without any database argument to replace HTML special characters in a HTML document.

In some cases, non-empty sequences of white space characters may be replaced by one space. However, this happens only when the white space is part of a prefix of some name in the data base. Anyway, this replacement is irrelevant for the rendering of HTML documents.

4  Address Data Bases

Every line of the file must be either a comment line or an address specification. A comment-line is a line that either consists only of white space, or that starts with the comment-symbol # (possibly preceded by white space).

An address specification consists of a name and a url that are separated by the character = . Leading white space of the line is ignored. In the name, the character = must be written as ==.

Special characters in the name can be either written in HTML or as 8bit characters. The number of spaces separating the words of a name is not relevant.

The syntax of the url is not checked.

5  Variants of Names

Several variants of the names in the data base are recognized as well. To find the variants of a name we first split it at white spaces into components.

6  The Exact Rules of Searching Names

Names are searched starting from the beginning of the text. If there are overlapping matches then the match starting at the earlier position wins. For example, if the data base contains entries for Egon Meier and for Hans Egon Meier-Müller then the second one matches on input Hans Egon Meier-Müller.

A match is extended to longer matches if possible. That is, if the data base contains entries for Hans Egon and for Hans Egon Meier then the second one matches on input Hans Egon Meier.

7  The Exact Rules of URL Insertion

Hlins does not touch any text between <a ... href= ...> and </a>. Note that this applies only if the <a> tag contains the href attribute, that is hlins does look at text inside of <a name=...> and </a>. As a consequence, hlins is idempotent, that is if you apply hlins twice (for instance using the --modify option) to a file you get the same effect than with just one application. Hence, you can, when you extend your database, safely rerun hlins on your html files.

The replacment mechanism (including the normalisation of HTML special charactes) is shortcut for any text inside the following tags: The rationale is that the first four tags of this list are intended to mark some kind of verbatim text (see the HTML 4.01 specification). The last one is an escape mechanism in case you have to overrule hlins' mechanism. Text from the beginning of one of the start tags to the first occurrence of the corresping end mark is ignored. The consequence is that among the above list embedded tags of the same kind are not correctly treated.

Furthermore, text inside angular brackets < and > is not treated by hlins.

If there are several different url's for a string foundname then the following rules apply to determine the url inserted:
  1. An address specification ``name = url'' where name matches exactly (modulo white space and HTML special characters) foundname has priority over a name specification ``name = url'' where foundname is an abbreviation for name.
  2. In the list obtained from the above priority rule, the first match is taken.
A warning is issued in case of a conflict, unless the --quiet option has been given.

For instance, your data base might contain something like
Hans Meyer   =  http://address.for.full.name
H. Meyer     =  http://address.for.abbreviated.name
On input H. Meyer, the second address specification is selected (and a warning is issued).

8  Implementation

Hlins is written in Objective Caml.

9  License and Installation

Hins ins covered by the Gnu General Public License. See the Hlins home page for binary and source distributions.

10  Credits

Thanks to Claude Marché and Jean-Christophe Filliâtre for their remarks and suggestions.


This document was translated from LATEX by HEVEA.