dwww Home | Show directory contents | Find package

5. The DOM module
*****************

DOM is another standard associated with XML, in which the XML stream
is represented as a tree in memory. This tree can be manipulated at
will, to add new nodes, remove existing nodes, change attributes…

Since it contains the whole XML information, it can then in turn be
dumped to a stream.

As an example, most modern web browsers provide a DOM interface to the
document currently loaded in the browser. Using javascript, one can
thus modify dynamically the document. The calls to do so are similar
to the ones provided by XML/Ada for manipulating a DOM tree, and all
are defined in the DOM standard.

The W3C committee (http://www.w3c.org) has defined several versions of
the DOM, each building on the previous one and adding several
enhancements.

XML/Ada currently supports the second revision of DOM (DOM 2.0), which
mostly adds namespaces over the first revision. The third revision is
not supported at this point, and it adds support for loading and
saving XML streams in a standardized fashion.

Although it doesn’t support DOM 3.0, XML/Ada provides subprograms for
doing similar things.

Only the Core module of the DOM standard is currently implemented,
other modules will follow.

Note that the "encodings.ads" file specifies the encoding to use to
store the tree in memory. Full compatibility with the XML standard
requires that this be UTF16, however, it is generally much more
memory-efficient for European languages to use UTF8. You can freely
change this and recompile.


5.1. Using DOM
==============

In XML/Ada, the DOM tree is built through a special implementation of
a SAX parser, provided in the *DOM.Readers* package.

Using DOM to read an XML document is similar to using SAX: one must
set up an input stream, then parse the document and get the tree. This
is done with a code similar to the following:

   --
   --  Copyright (C) 2017, AdaCore
   --

   with Input_Sources.File; use Input_Sources.File;
   with Sax.Readers;        use Sax.Readers;
   with DOM.Readers;        use DOM.Readers;
   with DOM.Core;           use DOM.Core;

   procedure DomExample is
      Input  : File_Input;
      Reader : Tree_Reader;
      Doc    : Document;
   begin
      Set_Public_Id (Input, "Preferences file");
      Open ("pref.xml", Input);

      Set_Feature (Reader, Validation_Feature, False);
      Set_Feature (Reader, Namespace_Feature, False);

      Parse (Reader, Input);
      Close (Input);

      Doc := Get_Tree (Reader); 

      Free (Reader);
   end DomExample;

This code is almost exactly the same as the code that was used when
demonstrating the use of SAX (Using SAX).

The main two differences are:

* We no longer need to define our own XML reader, and we simply use
  the one provided in *DOM.Readers*.

* We therefore do not add our own callbacks to react to the XML
  events. Instead, the last instruction of the program gets a handle
  on the tree that was created in memory.

The tree can now be manipulated to get access to the value stored. If
we want to implement the same thing we did for SAX, the code would
look like:

   --
   --  Copyright (C) 2017, AdaCore
   --

   with Input_Sources.File; use Input_Sources.File;
   with Sax.Readers;        use Sax.Readers;
   with DOM.Readers;        use DOM.Readers;
   with DOM.Core;           use DOM.Core;
   with DOM.Core.Documents; use DOM.Core.Documents;
   with DOM.Core.Nodes;     use DOM.Core.Nodes;
   with DOM.Core.Attrs;     use DOM.Core.Attrs;
   with Ada.Text_IO;        use Ada.Text_IO;

   procedure DomExample2 is
      Input  : File_Input;
      Reader : Tree_Reader;
      Doc    : Document;
      List   : Node_List;
      N      : Node;
      A      : Attr;
   begin
      Set_Public_Id (Input, "Preferences file");
      Open ("pref.xml", Input);

      Set_Feature (Reader, Validation_Feature, False);
      Set_Feature (Reader, Namespace_Feature, False);

      Parse (Reader, Input);
      Close (Input);

      Doc := Get_Tree (Reader); 

      List := Get_Elements_By_Tag_Name (Doc, "pref");

      for Index in 1 .. Length (List) loop
          N := Item (List, Index - 1);
          A := Get_Named_Item (Attributes (N), "name");
          Put_Line ("Value of """ & Value (A) & """ is "
                    & Node_Value (First_Child (N)));
      end loop; 

      Free (List);

      Free (Reader);
   end DomExample2;

The code is much simpler than with SAX, since most of the work is done
internally by XML/Ada. In particular, for SAX we had to take into
account the fact that the textual contents of a node could be reported
in several events. For DOM, the tree is initially normalized, ie all
text nodes are collapsed together when possible.

This added simplicity has one drawback, which is the amount of memory
required to represent even a simple tree.

XML/Ada optimizes the memory necessary to represent a tree by sharing
the strings as much as possible (this is under control of constants at
the beginning of "dom-core.ads"). Still, DOM requires a significant
amount of information to be kept for each node.

For really big XML streams, it might prove impossible to keep the
whole tree in memory, in which case ad hoc storage might be
implemented through the use of a SAX parser. The implementation of
*dom-readers.adb* will prove helpful in creating such a parser.


5.2. Editing DOM trees
======================

Once in memory, DOM trees can be manipulated through subprograms
provided by the DOM API.

Each of these subprograms is fully documented both in the Ada specs
(the "*.ads" files) and in the DOM standard itself, which XML/Ada
follows fully.

One important note however is related to the use of strings. Various
subprograms allow you to set the textual content of a node, modify its
attributes… Such subprograms take a Byte_Sequence as a parameter.

This Byte_Sequence must always be encoded in the encoding defined in
the package *Sax.Encoding* (as described earlier, changing this
package requires recompiling XML/Ada). By default, this is UTF-8.

Therefore, if you need to set an attribute to a string encoded for
instance in iso-8859-15, you should use the subprogram
*Unicode.Encodings.Convert* to convert it appropriately. The code
would thus look as follows:

   Set_Attribute (N, Convert ("å", From => Get_By_Name ("iso-8859-15")));


5.3. Printing DOM tress
=======================

The standard DOM 2.0 does not define a common way to read DOM trees
from input sources, nor how to write them back to output sources. This
was added in later revision of the standard (DOM 3.0), which is not
yet supported by XML/Ada.

However, the package "DOM.Core.Nodes" provides a *Write* procedure
that can be used for that purpose. It outputs a given DOM tree to an
Ada stream. This stream can then be connected to a standard file on
the disk, to a socket, or be used to transform the tree into a string
in memory.

An example is provided in the XML/Ada distribution, called
"tests/dom/tostring.adb" which shows how you can create a stream to
convert the tree in memory, without going through a file on the disk.


5.4. Adding information to the tree
===================================

The DOM standard does not mandate each node to have a pointer to the
location it was read from (for instance *file:line:column*). In fact,
storing that for each node would increase the size of the DOM tree
(not small by any means already) significantly.

But depending on your application, this might be a useful information
to have, for instance if you want to report error messages with a
correct location.

Fortunately, this can be done relatively easily by extending the type
*DOM.Readers.Tree_Reader*, and override the *Start_Element*. You would
then add a custom attribute to all the nodes that contain the location
for this node. Here is an example.

   --
   --  Copyright (C) 2017, AdaCore
   --

   with DOM.Readers;       use DOM.Readers;
   with Sax.Utils;         use Sax.Utils;
   with Sax.Readers;       use Sax.Readers;
   with Sax.Symbols;       use Sax.Symbols;

   package DOM_With_Location is

      type Tree_Reader_With_Location is new Tree_Reader with null record;
      overriding procedure Start_Element
         (Handler     : in out Tree_Reader_With_Location;
          NS          : Sax.Utils.XML_NS;
          Local_Name  : Sax.Symbols.Symbol;
          Atts        : Sax.Readers.Sax_Attribute_List);

   end DOM_With_Location;

   --
   --  Copyright (C) 2017, AdaCore
   --

   with DOM.Core;            use DOM.Core;
   with DOM.Core.Attrs;      use DOM.Core.Attrs;
   with DOM.Core.Documents;  use DOM.Core.Documents;
   with DOM.Core.Elements;   use DOM.Core.Elements;
   with Sax.Locators;        use Sax.Locators;

   package body DOM_With_Location is

      overriding procedure Start_Element
         (Handler     : in out Tree_Reader_With_Location;
          NS          : Sax.Utils.XML_NS;
          Local_Name  : Sax.Symbols.Symbol;
          Atts        : Sax_Attribute_List)
      is
         Att, Att2 : Attr;
      begin
         --  First create the node as usual
         Start_Element (Tree_Reader (Handler), NS, Local_Name, Atts);

         --  Then add the new attribute
         Att := Create_Attribute_NS
            (Get_Tree (Handler),
             Namespace_URI  => "http://mydomain.com",
             Qualified_Name => "mydomain:location");
         Set_Value (Att, To_String (Current_Location (Handler)));

         Att2 := Set_Attribute_Node (Handler.Current_Node, Att);
      end Start_Element;

   end DOM_With_Location;

Generated by dwww version 1.15 on Tue Jul 2 00:30:37 CEST 2024.