dwww Home | Manual pages | Find package

CATDVI(1)                   General Commands Manual                  CATDVI(1)

NAME
       catdvi - a DVI to plain text converter

SYNOPSIS
       catdvi  [-d debuglevel, --debug=debuglevel] [-e outenc, --output-encod-
       ing=outenc] [-p pagespec, --first-page=pagespec] [-l pagespec,  --last-
       page=pagespec]   [-N,   --list-page-numbers]  [-s,  --sequential]  [-U,
       --show-unknown-glyphs] [-h,  --help]  [--version]  [--copyright]  [dvi-
       file]

DESCRIPTION
       This manual page documents catdvi version 0.14

       catdvi  reads the DVI (typesetter DeVice Independent) file dvi-file and
       dumps a plain text approximation of the document it describes  to  std-
       out.   If the argument dvi-file is omitted or a dash (`-'), catdvi will
       read from stdin.  Several output encodings (different character sets of
       the plain text output) are supported, most notably UTF-8.

       The  current version of catdvi is a work in progress; it may not be ro-
       bust enough for production use, but already works fine with linear eng-
       lish  text.   Many  mathematical symbols (e.g. the uppercase greek let-
       ters) and moderately complex formulae also come out right.

       The program needs to read the TFM (Tex Font Metric) files corresponding
       to  the fonts used in the DVI file.  These are searched (and, if neces-
       sary and possible, created on the fly) through the Kpathsea library.

       In order to correctly translate a DVI file to text, the input  encoding
       of  the  fonts  used in it (i.e. a meaning-preserving mapping from font
       code points to Unicode) must be known. There are  a  lot  of  different
       font  encodings  in use. At the time of writing, catdvi understands the
       following input encodings:

       `TEX TEXT'
              Knuth's original font encoding, also known as OT1.

       `TEX TEXT WITHOUT F-LIGATURES'
              A variant of the above.

       `EXTENDED TEX FONT ENCODING - LATIN'
              The Cork encoding, also known as T1.

       `TEX MATH ITALIC'
              The encoding of Knuth's math italic fonts, also known as OML.

       `TEX MATH SYMBOLS'
              The encoding of Knuth's math symbol fonts, also known as OMS.

       `TEX MATH EXTENSION' (most of it)
              The encoding of Knuth's math  extension  fonts  (big  operators,
              brackets, etc.), also known as OMX.

       `TEX TYPEWRITER TEXT'
              The encoding of Knuth's typewriter type fonts.

       `LATEX SYMBOLS'
              The encoding of the lasy fonts.

       Henrik Theilings European currency symbol (`eurosym') font.

       `TEX TEXT COMPANION SYMBOLS 1---TS1' (almost everything)
              The encoding of the text companion fonts.

       Martin Vogels symbol (`MarVoSym') font.
              Both  the 1998 and the 2000 version are supported as far as pos-
              sible -- about half of the symbols are not representable in Uni-
              code.

       `BLACKBOARD'
              The encoding of the blackboard bold math (`bbm') fonts.

       All AMS fonts except the Cyrillic ones.
              This  includes  the  AMS math symbols group A and group B, Euler
              fraktur, Euler cursive, Euler script and Euler compatible exten-
              sion fonts.

       It  is  impossible  to  do  perfect translation from unmarked-up DVI to
       plain text, since the former does only describe the layout of  a  page,
       and  a translator such as this should really know where words and para-
       graphs end, and more importantly, which glyphs should be aligned verti-
       cally  and  which  shouldn't.  The current alignment algorithm tries to
       preserve the relative horizontal positions  of  word  beginnings;  this
       works  well  in  most  cases.   Word  breaks  are detected using simple
       heuristics; paragraphs are not detected at all (and no  paragraph  fill
       is attempted).

       The  price  of alignment is that the output will likely be more than 80
       columns wide, even though catdvi tries very hard not to use  more  col-
       umns than strictly necessary.  Output is usually less than 120 columns,
       almost always less than 132 columns wide. It may  be  a  good  idea  to
       switch your terminal to one of these modes if possible.

OPTIONS
       The  program  follows  the usual GNU command line syntax, with long op-
       tions starting with two dashes.

       -d debuglevel, --debug=debuglevel
              Set the debug output level to debuglevel (default is 10).  Large
              values  will  result  in lots of debug output, 0 in none at all.
              The maximal debug output level currently used is 150.

       -e outenc, --output-encoding=outenc
              Specify the encoding of the output character set.  outenc can be
              one  of  the  numbers  or names from the table below.  Names are
              case insensitive.  The  following  output  encodings  should  be
              available:

              0: UTF-8
              1: US-ASCII
              2: ISO-8859-1
              3: ISO-8859-15

              The  command  catdvi  --help (see below) will give a more up-to-
              date list of all compiled-in output encodings. The  default  en-
              coding is 1.

       -p pagespec, --first-page=pagespec
              Do  not  output pages before page pagespec.  Pages can be speci-
              fied in three different ways; the first two are exactly the same
              as for dvips(1).

              A  (possibly  negative)  number num specifies a TeX page number,
              which is stored as the so-called count0 value in  the  DVI  file
              for every page.  Plain TeX uses negative page numbers for roman-
              numbered frontmatter (title page, preface,  TOC,  etc.)  so  the
              count0 values compare as
                     -1 < -2 < -3 < ... < 1 < 2 < 3 < ...
              There  may be several pages with the same count0 value in a sin-
              gle DVI file. This usually happens in documents with a per-chap-
              ter page numbering scheme.

              A  number prefixed by an equals sign (`=num') specifies a physi-
              cal page, i.e. the num-th page appearing in the DVI  file.  Num-
              bering  starts  with 1.  Note that with the long form of the op-
              tion you actually need two equals signs, one as part of the long
              option and one as part of the page specification. Example:
                     catdvi --first-page==5 foo.dvi

              The third form of a page specification, two numbers separated by
              a colon (`num1:num2'), is useful for documents with  separately-
              numbered  parts,  e.g.  chapters.   It  refers  to the page with
              count0 value equal to num2 that catdvi believes to  be  in  part
              num1.   Since those part numbers are not stored in the DVI file,
              the program has to guess them: an internal  chapter  counter  is
              increased by one every time the count0 value of the current page
              is not greater (in above ordering) than  that  of  the  previous
              page.   The  counter  is  initialized to 1 if the first page has
              negative count0 value and to 0 otherwise. (A document with sepa-
              rately  numbered  parts  will  probably have separately numbered
              frontmatter as well, and  then  this  rule  keeps  the  internal
              counter equal to real world part numbers.)

       -l pagespec, --last-page=pagespec
              Do  not  output  pages after page pagespec.  Pages are specified
              exactly as for the --first-page option above.

       -N, --list-page-numbers
              Instead of the contents of pages,  output  their  physical  page
              count,  count0 value and chapter count (see the --first-page op-
              tion above for a definition of these).

       -s, --sequential
              Do not attempt to reproduce the page layout;  output  glyphs  in
              the  order  they appear in the DVI file. This may be useful with
              e.g. multi-column page layouts.

       -U, --show-unknown-glyphs
              Show the Unicode number of unknown glyphs instead of `?'.

       -h, --help
              Show usage information and a list of available output encodings,
              then exit.

       --version
              Show version information and exit.

       --copyright
              Show copyright information and exit.

ENVIRONMENT
       The  usual  environment variables TFMFONTS, TEXFONTS, etc. for Kpathsea
       font search and creation apply.  Refer to  the  Kpathsea  documentation
       for details.

SEE ALSO
       xdvi(1), dvips(1), tex(1), mktextfm(1), the Kpathsea texinfo documenta-
       tion, utf-8(7).

BUGS
       These things do not work (yet):

       •      No rules are converted.

       •      Extensible recipes (very large brackets, braces, etc. built  out
              of several smaller pieces) are not properly handled.

       •      Complicated  math  formulae are sometimes misaligned (mostly due
              to lack of appropriate word break heuristics).

       •      Some fonts and font encodings are not recognised yet.

       •      Most mathematical symbols have no representation in  the  avail-
              able  output character sets except Unicode, and hence show up as
              `?' unless UTF-8 output encoding is selected.  A  textual  tran-
              scription would be desirable.

       Watch out for these:

       •      If  there  is a space where it does not belong or if there is no
              space where there should be one, report this as a bug (send  the
              DVI file to the catdvi maintainer, stating where in the file the
              bug is seen).

AUTHORS
       catdvi was written by Antti-Juhani Kaijanaho <gaia@iki.fi>, based on  a
       skeletal    version    by    J.H.M. Dassen    (Ray).     Bjoern   Brill
       <brill@fs.math.uni-frankfurt.de> did further improvements and currently
       maintains the program.

       The manual page was compiled by Bjoern Brill, using material written by
       the first two program authors.

                                8 November 2002                      CATDVI(1)

Generated by dwww version 1.15 on Fri Jun 28 02:50:31 CEST 2024.