NAME

domutil -- XML::DOM helpers: search nodes, rescan tree, get text content.


SYNOPSIS

  use domutil;
  use XML::DOM;

  my $parser  =  new XML::DOM::Parser;
  my $doc     =  $parser->parsefile ("test.xml");

  # Dump title and first paragraph of each (sub)section of article.
  my $sects   =  search_elem_nodes ($doc->getDocumentElement (), ['section','subsection']);
  foreach my $sect (@$sects) {
    my $title_node = child ($sect, 'title');
    my $para_nodes = search_elem_nodes ($sect, 'para');
    my ($title, $para);
    $title = content ($title_node) if $title_node;
    $title = 'No title.' unless $title;
    $para  = content ($para_nodes->[0]) if @$para_nodes;
    $para  = 'No text.'  unless $para;
    print "----- $title\n$para\n...\n"; 
  }


DESCRIPTION

Currently (end of 2000) Perl is not very suitable for XML processing. Modules set is not stable. Functionality is spreaded over the incompatible modules. This module is a collection of functions for working with XML::DOM.

First task of module is advanced search of elements. XML::XPath module is perfect, but it can't be used with XML::DOM. I implemented these selectors: node name (or list of node names), attribute name, attribute value. Some of these selectors can be undef. It means that they are not used.

Second task is to scan tree. Application register events handlers. This handlers can stop scanning or deny scanning of subtree. You probably don't need this function unless you wish to extend functionality of this module.

Third task is to get text content of tree. It is very useful function, but I did not found it anywhere in standard modules.

You are welcome to extend this module. Bug fixes are especially appreciated. This module was written under pressure of time, was not well tested, so I am not sure for its quality. Anyway, it seems to be working.

I hope that I will have time for further development of module. But I think that it will be a set of modules for new model of Perl XML processing. This model will use best features of Balise -- powerful SGML/XML programming language. See http://balise.xoasis.com/index.html for more details.


FUNCTIONS

hasAttr ($node, $aname)

Check if XML::DOM::Node $node have attribute $aname. Returns true or false.

getAttr ($node, $aname)

Return value of XML::DOM::Node $node attribute $aname. Usually returns a string. If attribute is not exist, returned value is undef.

ancestor ($root [, $gi [, $level])

Find $level ancestor of XML::DOM::Node $root with name $gi. If $gi is undef then name of node is not significant. Default value of $level is ``1''. Returns XML::DOM::Node or undef.

children ($root [, $gi [, $aname [, $aval]]])

Return children of XML::DOM::Node $root. Parameter $gi is a name of children or array of names. If this parameter is undef, then name of children is not significant. Parameter $aname is a name of attribute that children should have. If this parameter is undef, then name of attribute is not significant. Parameter $aval is a value of attribute $aname (or some attribute if $aname is undef) that children should have. If this parameter is undef, then value of attribute is not significant. Function returns reference to array of XML::DOM::Node.

child ($root [, $gi [, $aname [, $aval]]])

Return child of XML::DOM::Node $root. Meaning of parameters $gi, $aname and $aval is the same as in function children. Function returns XML::DOM::Node or undef.

search_elem_nodes ($root [, $gi [, $aname [, $aval]]])

Search for all elements in subtree of XML::DOM::Node $root. Meaning of parameters $gi, $aname and $aval is the same as in function children. Function returns array of XML::DOM::Node.

scanSubTree ($root, $begin, $end, $param)

Scan sub tree starting from XML::DOM::Node $root. Function $begin (if not undef) is called every time scanning function enters node. Function $end (if not undef) is called every time scanning function leaves node. Callback functions have two parameters: first is a current node, second parameter is user-defined parameter $param. Callback function should return one of values: -1, 0 or 1. Normal return value is 0. If return value is -1, then scan is aborted. If return value is 1, then scanning function skip subtree of current node. For every 'begin' event corresponding 'end' event is called, even if scanning of tree is aborted. Function does not return a value. Result can be returned indirectly through $param variable.

content ($node)

Return text content of subtree of XML::DOM::Node $node. For example, text content of '<para>The <emph>test</emph><para>' is 'The test'.

from_8bitDOM_to_string ($node)

Return string with XML presentation of subtree of XML::DOM::Node $node. It is near the same as XML::DOM::toString, but it consider that DOM tree is 8bit, not utf8. See http://uucode.com/xml/perl/index.html#8bit for more information abour 8bit DOM.


AUTHOR

Oleg A. Paraschenko <olpa uucode com>