where variable_name is a member of one of the types described in the remaining sections, and argument(s) are specific to every variable name. For variable_name, case is irrelevant.
where pattern is a shell pattern (regular expression) and command is the command-line to execute the filter.
Within a command, there are a few % substitutions that are done at run-time:
That is: the % and one character immediately after it are substituted as described in the above table. Substituted filenames are skipped past and not rescanned for more substitutions, but the remainder of the command is. To use a literal % or @, simply double it. (For more on filter variables, see FILTERS below.)
Variables of this type are: FilterAttachment and FilterFile.
For WordThreshold, only the super-user can specify a value larger than the compiled-in default.
An IncludeFile configuration file line is of the form:
where module_name is the name of the module (case is irrelevant) to handle the indexing of the filename patterns that follow. Module names are: text (plain text), HTML (HTML and XHTML), ID3 (ID3 tags), LaTeX (LaTeX source), Mail (mail and news messages), Man (Unix manual pages), and RTF (Rich Text Format).
An IncludeMeta configuration file line is of the form:
It is like a set variable except arguments may optionally be followed by reassignments. For example, a value of:
says to include and index the words associated with the meta name adr, but to store the name as address in the generated index file so that queries would use address rather than adr.
A SocketAddress configuration file line is of the form:
that is: an optional host and colon followed by a port number. The host may be one of a host name, an IPv4 address (in dot-decimal notation), an IPv6 address (in colon notation) if supported by the operating system, or the * character meaning ``any IP address.'' Omitting the host and colon also means ``any IP address.''
Given that, a filename such as foo.txt.gz would become foo.txt. If files having txt extensions should be indexed, then it will be. Note that the command on the FilterFile line must not simply be:
because gunzip will replace the compressed file with the uncompressed one.
Here's an example to convert PDF to plain text for indexing using the xpdf(1) package's pdftotext command:
A file can be filtered more than once prior to indexing or extraction, i.e., filters can be ``chained'' together. For example, if the uncompression and PDF examples shown above are used together, compressed PDF files will also be indexed or extracted, i.e., filenames ending with one of .pdf.bz2, .pdf.gz, or .pdf.Z double extensions.
Note, however, that just because a filename has an extension for which a filter has been specified does not mean that a file will be filtered and subsequently indexed or extracted. When index++ or extract++ encounters a file having an extension for which a filter has been specified, it performs the filename substitution(s) on it first to determine what the target filename would be. If the extension of that filename should be indexed or extracted (because it is among the set of extensions specified with either the -e or --pattern options or the IncludeFile variable or is not among the set specified with either the -E or --no-pattern options or the ExcludeFile variable), then the filter(s) are executed to create it.
For example, to convert a PDF attachment to plain text so it can be indexed, the FilterAttachment variable line in a configuration file would be:
MIME types must be specified entirely in lower case. Patterns can be useful for MIME types. For example:
can be used regardless of whether the MIME type is application/msword (the official MIME type for Microsoft Word documents) or application/vnd.ms-word (an older version).
The MIME types that are built into index++(1) are: text/plain, text/enriched (but only if the RTF module is compiled in), text/html (but only if the HTML module is compiled in), text/*vcard, message/rfc822, multipart/something (where something is one of: alternative, mixed, or parallel). FilterAttachment variable lines can override the handling of the built-in MIME types.
Unlike file filters, attachment filters must convert directly to plain text and can not be ``chained'' together. (This restriction exists because there is no way to know what any intermediate MIME types would be to apply more filters.)
Nathaniel S. Borenstein. ``The text/enriched MIME Content-type,'' Request for Comments 1563, Network Working Group of the Internet Engineering Task Force, January 1994.
David H. Crocker. ``Standard for the Format of ARPA Internet Text Messages,'' Request for Comments 822, Department of Electrical Engineering, University of Delaware, August 1982.
Frank Dawson and Tim Howes. ``vCard MIME Directory Profile,'' Request for Comments 2426, Network Working Group of the Internet Engineering Task Force, September 1998.
Ned Freed and Nathaniel S. Borenstein. ``Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies,'' Request for Comments 2045, RFC 822 Extensions Working Group of the Internet Engineering Task Force, November 1996.
International Standards Organization. ``ISO/IEC 9945-2: Information Technology -- Portable Operating System Interface (POSIX) -- Part 2: Shell and Utilities,'' 1993.
Steven Pemberton, et al. XHTML 1.0: The Extensible HyperText Markup Language, World Wide Web Consortium, January 2000.