dwww Home | Show directory contents | Find package

3. The Input module
*******************

This module provides a set of packages with a common interface to
access the characters contained in a stream. Various implementations
are provided to access files and manipulate standard Ada strings.

A top-level tagged type is provided that must be extended for the
various streams. It is assumed that the pointer to the current
character in the stream can only go forward, and never backward. As a
result, it is possible to implement this package for sockets or other
streams where it isn’t even possible to go backward. This also means
that one doesn’t have to provide buffers in such cases, and thus that
it is possible to provide memory-efficient readers.

Four predefined readers are available, namely *String_Input* to read
characters from a standard Ada string, *File_Input* to read characters
from a standard text file, *Http_Input* to read from http location and
*Socket_Input* to read from a streaming socket.

All readers share same limitation of total length of input: files
bigger than 2GB are not supported.

They all provide the following primitive operations:

*Open*

   Although this operation isn’t exactly overridden, since its
   parameters depend on the type of stream you want to read from, it
   is nice to use a standard name for this constructor.

*Close*
   This terminates the stream reader and frees any associated memory.
   It is no longer possible to read from the stream afterwards.

*Next_Char*
   Return the next Unicode character in the stream. Note this
   character doesn’t have to be associated specifically with a single
   byte, but that it depends on the encoding chosen for the stream
   (see the unicode module documentation for more information).

   The next time this function is called, it returns the following
   character from the stream.

*Eof*
   This function should return True when the reader has already
   returned the last character from the stream. Note that it is not
   guarantee that a second call to Eof will also return True.

It is the responsibility of this stream reader to correctly call the
decoding functions in the unicode module so as to return one single
valid unicode character. No further processing is done on the result
of *Next_Char*. Note that the standard *File_Input* and *String_Input*
streams can automatically detect the encoding to use for a file, based
on a header read directly from the file.

Based on the first four bytes of the stream (assuming this is valid
XML), they will automatically detect whether the file was encoded as
Utf8, Utf16… If you are writing your own input streams, consider
adding this automatic detection as well.

However, it is always possible to override the default through a call
to *Set_Encoding*. This allows you to specify both the character set
(Latin1…) and the character encoding scheme (Utf8…).

The user is also encouraged to set the identifiers for the stream they
are parsing, through calls to *Set_System_Id* and *Set_Public_Id*.
These are used when reporting error messages.

Generated by dwww version 1.15 on Tue Jul 2 00:27:29 CEST 2024.