XML::LibXML
Matt
Sergeant
Christian
Glahn
1.58
2001-2004
AxKit.com Ltd; 2002-2004 Christian Glahn
Introduction
README
This module implements a Perl interface to the Gnome libxml2
library. The libxml2 libxml2 library provides interfaces for parsing and
manipulating XML Files. This Module allows Perl programmers to make use of
the highly capable validating XML parser and the high performance DOM
implementation.
Important Notes
XML::LibXML was almost entirely reimplemented between version 1.40
to version 1.49. This may cause problems on some production machines.
With version 1.50 a lot of compatibility fixes were applied, so programs
written for XML::LibXML 1.40 or less should run with version 1.50 again.
Dependencies
Prior to installation you MUST have installed the libxml2 library.
You can get the latest libxml2 version from
http://xmlsoft.org
Without libxml2 installed this module will neither build nor run.
Also XML::LibXML requires the following packages:
XML::LibXML::Common - general functions used by various
XML::LibXML modules
XML::SAX - DOM building support from SAX
XML::NamespaceSupport - DOM building support from SAX
These packages are required. If one is missing some tests will
fail.
Again, libxml2 is required to make XML::LibXML work. The library
is not just requiered to build XML::LibXML, it has to be accessible
during runtime as well. Because of this you need to make sure libxml2 is
installed properly. To test this, run the xmllint program on your
system. xmllint is shipped with libxml2 and therefore should be
available.
Installation
To install XML::LibXML just follow the standard installation
routine for Perl modules:
perl Makefile.PL
make
make test
make install # as superuser
Note that you have to rebuild XML::LibXML once you upgrade
libxml2. This avoids problems with binary incompatibilities between
releases of the library.
Notes On libxml2 Versions
libxml2 claims binary compatibility between its patch levels.
This is not all true:
First of all XML::LibXML requires at least libxml2 2.4.25. For
most OS this means that an update of the prebuild packages is
required, since most distributors ship ancient libxml2 versions most
users will need to upgrade their installation.
If you already run an older version of XML::LibXML and you wish
to upgrade to a bug fixed version of libxml2. libxml2 2.4.25 and 2.5.x
versions are not 100% binary compatible. So if you intend to upgrade
to such a version you will need to rebuild XML::LibXML (and
XML::LibXML::Common) as well.
Users of perl 5.005_03 and perl 5.6.1 with thread support will
also like to avoid libxml2 version 2.4.25 and use later versions
instead.
If your libxml2 installation is not within your $PATH. you can
set the environment variable XMLPREFIX=$YOURLIBXMLPREFIX to make
XML::LibXML recognize the correct libxml2 version in use.
e.g.
perl Makefile.PL XMLPREFIX=/usr/brand-new
will ask '/usr/brand-new/bin/xml2-config' about your
real libxml2 configuration.
Try to avoid to set INC and LIBS on the commandline. One will
skip the configuration tests in these cases. There will be no report,
if the given installation is known to be broken.
Which Version of libxml2 should be used?
XML::LibXML is tested against many versions of libxml2 before it
is released. Thus there are versions of libxml2 that are known not to
work properly with XML::LibXML. The Makefile.PL keeps a blacklist of
these broken libxml2 versions.
If one has one of these versions it will be notified during
installation. One may find that XML::LibXML builds and tests fine in a
particular environment. But if XML::LibXML is run in such an
environment, there will be no support at all!
The following versions are tested:
past 2.4.20: tested; working.
2.4.25: tested; not working
past 2.4.25: tested, working
past 2.5.0: tested; brocken Attribute handling
version 2.5.5: tested; tests pass, but known as broken
up to version 2.5.11: tested; working
version 2.6.0: tested; not working
to version 2.6.2: tested; working
version 2.6.3: tested; not working
version 2.6.4: tested; not working (XML Schema errors)
version 2.6.5: tested; not working (broken XIncludes)
up to version 2.6.8: tested; working
It happens, that an older version of libxml2 passes all tests
under certain conditions. This is no reason to assume that version to
work on all platforms. If versions of libxml2 are marked as not
working this is done for good reasons.
Notes for Microsoft Windows
Thanks to Randy Kobes there is a precompiled PPM package
available on
http://theoryx5.uwinnipeg.ca/ppmpackages/
Usually it takes a little time to build the package for the
latest release.
Notes for Mac OS X
Due refactoring the module, XML::LibXML will not run with Mac OS
X anymore. It appears this is related to special linker options for
that OS prior to version 10.2.2. Since I don't have full access to
this OS, help/ patches from OS X gurus are highly apprecheated.
It is confirmed that XML::LibXML builds and runs without
problems since Mac OS X 10.2.6.
Notes for HPUX
XML::LibXML requires libxml2 2.4.25 or later. That means there
may not exist a usable binary libxml2 package for HPUX and
XML::LibXML. For some reasons the HPUX cc will not compile libxml2
correctly, which will force you to recompile perl with gcc (if you
havn't already done that).
Additionally I received the following Note from Rozi Kovesdi:
Here is my report if someone else runs into the same problem:
Finally I am done with installing all the libraries and XML Perl
modules
The combination that worked best for me was:
gcc
GNU make
Most importantly - before trying to install Perl modules that depend on
libxml2:
must set SHLIB_PATH to include the path to libxml2 shared library
assuming that you used the default:
export SHLIB=/usr/local/lib
also, make sure that the config files have execute permission:
/usr/local/bin/xml2-config
/usr/local/bin/xslt-config
they did not have +x after they were installed by 'make install'
and it took me a while to realize that this was my problem
or one can use:
perl Makefile.PL LIBS='-L/path/to/lib' INC='-I/path/to/include'
Contact
For suggestions etc. you may contact the maintainer directly
christian.glahn@uibk.ac.at
For bug reports, please use the CPAN request tracker on
http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-LibXML
Also XML::LibXML issues are discussed among other things on the
perl XML mailing list (perl-xml@listserv.ActiveState.com).
In case of problems you should check the archives of that list first.
Many problems are already discussed there. You can find the list's
archives at http://mailarchive.activestate.com/browse/perl-xml/
Package History
Version < 0.98 were maintained by Matt Sergeant
0.98 > Version > 1.49 were maintained by Matt Sergeant and
Christian Glahn
Versions >= 1.49 are maintained by Christian Glahn
Versions > 1.56 are co-maintained by Petr Pajas
Patches and Developer Version
As XML::LibXML is open source software help and patches are
appreciated. If you find a bug in the current release, make sure this
bug still exists in the developer version of XML::LibXML. This version
can be downloaded from cvs. The cvs version can be be loaded via
cvs -d:pserver:anonymous@axkit.org:/home/cvs -z3 co XML-LibXML
Note this account does not allow direct commits.
Please consider the tests as correct. If any test fails it is most
certainly related to a bug.
If you find documentation bugs, please fix them in the libxml.dkb
file, stored in the docs directory.
Known Issues
The push-parser implementation causes memory leaks.
License
LICENSE
This is free software, you may use it and distribute it under the
same terms as Perl itself.
Copyright 2001-2003 AxKit.com Ltd, All rights reserved.
Disclaimer
THIS PROGRAM IS DISTRIBUTED IN THE HOPE THAT IT WILL BE USEFUL,
BUT WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED WARRANTY OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Perl Binding for libxml2
XML::LibXML
Synopsis
use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string(<<'EOT');
<some-xml/>
EOT
Description
This module is an interface to the gnome libxml2 DOM and SAX
parser and the DOM tree. It also provides an XML::XPath-like findnodes()
interface, providing access to the XPath API in libxml2. The module is
split into several packages which are not described in this section.
For further information, please check the following documentation:
XML::LibXML::Parser
Parsing XML Files with XML::LibXML
XML::LibXML::DOM
XML::LibXML DOM Implementation
XML::LibXML::SAX
XML::LibXML direct SAX parser
XML::LibXML::Document
XML::LibXML DOM Document Class
XML::LibXML::Node
Abstract Base Class of XML::LibXML Nodes
XML::LibXML::Element
XML::LibXML Class for Element Nodes
XML::LibXML::Text
XML::LibXML Class for Text Nodes
XML::LibXML::Comment
XML::LibXML Comment Nodes
XML::LibXML::CDATASection
XML::LibXML Class for CDATA Sections
XML::LibXML::Attr
XML::LibXML Attribute Class
XML::LibXML::DocumentFragment
XML::LibXML's DOM L2 Document Fragment Implementation
XML::LibXML::Namespace
XML::LibXML Namespace Implementation
XML::LibXML::PI
XML::LibXML Processing Instructions
XML::LibXML::Dtd
XML::LibXML DTD Support
XML::LibXML::RelaxNG
XML::LibXML frontend for RelaxNG schema validation
XML::LibXMLguts
Internal of the Perl Layer for libxml2 (not done yet)
Version Information
Sometimes it is usefull to figure out, for which version
XML::LibXML was compiled for. In most cases this is for debugging or to
check if a given installation meets all functionality for the package.
The functiones XML::LibXML::LIBXML_DOTTED_VERSION and
XML::LibXML::LIBXML_VERSION provide this version information. Both
functions simply pass through the values of the similar named macros of
libxml2.
XML::LibXML::LIBXML_DOTTED_VERSION
$Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;
Returns the Versionstring of the libxml2 version XML::LibXML
was compiled for. This will be "2.6.2" for "libxml2
2.6.2".
XML::LibXML::LIBXML_VERSION
$Version_ID = XML::LibXML::LIBXML_VERSION;
Returns the version id of the libxml2 version XML::LibXML
was compiled for. This will be "20602" for "libxml2
2.6.2". Don't mix this version id with
$XML::LibXML::VERSION. The latter contains the version of
XML::LibXML itself while the first contains the version of libxml2
XML::LibXML was compiled for.
Related Modules
The modules described in this section are not part of the
XML::LibXML package itself. As they support some additional features,
they are mentioned here.
XML::LibXSLT
XSLT Processor using libxslt and XML::LibXML
XML::LibXML::Common
Common functions for XML::LibXML related Classes
XML::LibXML::Iterator
XML::LibXML Implementation of the DOM Traversal
Specification
XML::LibXML::XPathContext
Advanced XPath processing using libxml2 and XML::LibXML
XML::LibXML and XML::GDOME
Note: THE FUNCTIONS DESCRIBED HERE ARE STILL
EXPERIMENTAL
Although both modules make use of libxml2's XML capabilities,
the DOM implementation of both modules are not compatible. But still it
is possible to exchange nodes from one DOM to the other. The concept of
this exchange is pretty similar to the function cloneNode(): The
particular node is copied on the lowlevel to the opposite DOM
implementation.
Since the DOM implementations cannot coexist within one document,
one is forced to copy each node that should be used. Because you are
always keeping two nodes this may cause quite an impact on a machines
memory usage.
XML::LibXML provides two functions to export or import GDOME
nodes: import_GDOME() and export_GDOME(). Both function have two
parameters: the node and a flag for recursive import. The flag works as
in cloneNode().
The two functions allow to export and import XML::GDOME nodes
explicitly, however, XML::LibXML allows also the transparent import of
XML::GDOME nodes in functions such as appendChild(), insertAfter() and
so on. While native nodes are automaticly adopted in most functions
XML::GDOME nodes are always cloned in advance. Thus if the original node
is modified after the operation, the node in the XML::LibXML document
will not have this information.
import_GDOME
$libxmlnode = XML::LibXML->import_GDOME( $node, $deep );
This clones an XML::GDOME node to a XML::LibXML node
explicitly.
export_GDOME
$gdomenode = XML::LibXML->export_GDOME( $node, $deep );
Allows to clone an XML::LibXML node into a XML::GDOME node.
Parsing XML Data with XML::LibXML
XML::LibXML::Parser
Synopsis
use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string(<<'EOT');
<some-xml/>
EOT
my $fdoc = $parser->parse_file( $xmlfile );
my $fhdoc = $parser->parse_fh( $xmlstream );
my $fragment = $parser->parse_xml_chunk( $xml_wb_chunk );
Parsing
A XML document is read into a datastructure such as a DOM tree by
a piece of software, called a parser. XML::LibXML currently provides
four diffrent parser interfaces:
A DOM Pull-Parser
A DOM Push-Parser
A SAX Parser
A DOM based SAX Parser.
Creating a Parser Instance
XML::LibXML provides an OO interface to the libxml2 parser
functions. Thus you have to create a parser instance before you can
parse any XML data.
new
$parser = XML::LibXML->new();
There is nothing much to say about the constructor. It
simply creates a new parser instance.
Although libxml2 uses mainly global flags to alter the
behaviour of the parser, each XML::LibXML parser instance has
its own flags or callbacks and does not interfere with other
instances.
DOM Parser
One of the common parser interfaces of XML::LibXML is the DOM
parser. This parser reads XML data into a DOM like datastructure, so
each tag can get accessed and transformed.
XML::LibXML's DOM parser is not only capable to parse XML
data, but also (strict) HTML and SGML files. There are three ways to
parse documents - as a string, as a Perl filehandle, or as a filename.
The return value from each is a XML::LibXML::Document object, which is
a DOM object.
All of the functions listed below will throw an exception if the
document is invalid. To prevent this causing your program exiting,
wrap the call in an eval{} block
parse_file
$doc = $parser->parse_file( $xmlfilename );
This function reads an absolute filename into the memory.
It causes XML::LibXML to use libxml2's file parser instead
of letting perl reading the file such as with parse_fh(). If you
need to parse files directly, this function would be the faster
choice, since this function is about 6-8 times faster then
parse_fh().
parse_fh
$doc = $parser->parse_fh( $io_fh );
parse_fh() parses a IOREF or a subclass of IO::Handle.
Because the data comes from an open handle, libxml2's
parser does not know about the base URI of the document. To set
the base URI one should use parse_fh() as follows:
my $doc = $parser->parse_fh( $io_fh, $baseuri );
parse_string
$doc = $parser->parse_string( $xmlstring);
This function is similar to parse_fh(), but it parses a
XML document that is available as a single string in memory.
Again, you can pass an optional base URI to the function.
my $doc = $parser->parse_stirng( $xmlstring, $baseuri );
parse_html_file
$doc = $parser->parse_html_file( $htmlfile );
Similar to parse_file() but parses HTML (strict)
documents.
parse_html_fh
$doc = $parser->parse_html_fh( $io_fh );
Similar to parse_fh() but parses HTML (strict) streams.
parse_html_string
$doc = $parser->parse_html_string( $htmlstring );
Similar to parse_file() but parses HTML (strict) strings.
parse_sgml_file
$doc = $parser->parse_sgml_file( $sgmlfile );
Similar to parse_file() but parses SGML documents.
parse_sgml_fh
$doc = $parser->parse_sgml_fh( $io_fh );
Similar to parse_file() but parses SGML streams.
parse_sgml_string
$doc = $parser->parse_sgml_string( $sgmlstring );
Similar to parse_file() but parses SGML strings.
Parsing HTML may cause problems, especially if the ampersand
('&') is used. This is a common problem if HTML code is
parsed that contains links to CGI-scripts. Such links cause the parser
to throw errors. In such cases libxml2 still parses the entire
document as there was no error, but the error causes XML::LibXML to
stop the parsing process. However, the document is not lost. Such HTML
documents should be parsed using the recover
flag. By default recovering is deactivated.
The functions described above are implemented to parse well
formed documents. In some cases a program gets well balanced XML
instead of well formed documents (e.g. a XML fragment from a
Database). With XML::LibXML it is not required to wrap such fragments
in the code, because XML::LibXML is capable even to parse well
balanced XML fragments.
parse_balanced_chunk
$fragment = $parser->parse_balanced_chunk( $wbxmlstring );
This function parses a well balanced XML string into a
XML::LibXML::DocumentFragment.
parse_xml_chunk
$fragment = $parser->parse_xml_chunk( $wbxmlstring );
This is the old name of parse_balanced_chunk(). Because it
may causes confusion with the push parser interface, this
function should be used anymore.
By default XML::LibXML does not process XInclude tags within a
XML Document (see options section below). XML::LibXML allows to post
process a document to expand XInclude tags.
process_xincludes
$parser->process_xincludes( $doc );
After a document is parsed into a DOM structure, you may
want to expand the documents XInclude tags. This function
processes the given document structure and expands all XInclude
tags (or throws an error) by using the flags and callbacks of
the given parser instance.
Note that the resulting Tree contains some extra nodes (of
type XML_XINCLUDE_START and XML_XINCLUDE_END) after successfully
processing the document. These nodes indicate where data was
included into the original tree. if the document is serialized,
these extra nodes will not show up.
Remember: A Document with processed XIncludes differs from
the original document after serialization, because the original
XInclude tags will not get restored!
If the parser flag "expand_xincludes" is set to 1,
you need not to post process the parsed document.
processXIncludes
$parser->processXIncludes( $doc );
This is an alias to process_xincludes, but through a JAVA
like function name.
Push Parser
XML::LibXML provides a push parser interface. Rather than
pulling the data from a given source the push parser waits for the
data to be pushed into it.
This allows one to parse large documents without waiting for the
parser to finish. The interface is especially useful if a program
needs to preprocess the incoming pieces of XML (e.g. to detect
document boundaries).
While XML::LibXML parse_*() functions force the data to be a
wellformed XML, the push parser will take any arbitrary string that
contains some XML data. The only requirement is that all the pushed
strings are together a well formed document. With the push parser
interface a programm can interrupt the parsing process as required,
where the parse_*() functions give not enough flexibility.
Different to the pull parser implemented in parse_fh() or
parse_file(), the push parser is not able to find out about the
documents end itself. Thus the calling program needs to indicate
explicitly when the parsing is done.
In XML::LibXML this is done by a single function:
parse_chunk
$parser->parse_chunk($string, $terminate);
parse_chunk() tries to parse a given chunk of data, which
isn't nessecarily well balanced data. The function takes two
parameters: The chunk of data as a string and optional a
termination flag. If the termination flag is set to a true value
(e.g. 1), the parsing will be stopped and the resulting document
will be returned as the following exable describes:
my $parser = XML::LibXML->new;
for my $string ( "<", "foo", ' bar="hello worls"', "/>") {
$parser->parse_chunk( $string );
}
my $doc = $parser->parse_chunk("", 1); # terminate the parsing
Internally XML::LibXML provides three functions that control the
push parser process:
start_push
$parser->start_push();
Initializes the push parser.
push
$parser->push(@data);
This function pushes the data stored inside the array to
libxml2's parser. Each entry in @data must be a normal
scalar!
finish_push
$doc = $parser->finish_push( $recover );
This function returns the result of the parsing process.
If this function is called without a parameter it will complain
about non wellformed documents. If $restore is 1, the push
parser can be used to restore broken or non well formed (XML)
documents as the following example shows:
eval {
$parser->push( "<foo>", "bar" );
$doc = $parser->finish_push(); # will report broken XML
};
if ( $@ ) {
# ...
}
This can be annoying if the closing tag is missed by
accident. The following code will restore the document:
eval {
$parser->push( "<foo>", "bar" );
$doc = $parser->finish_push(1); # will return the data parsed
# unless an error happened
};
print $doc->toString(); # returns "<foo>bar</foo>"
Of course finish_push() will return nothing if there was
no data pushed to the parser before.
DOM based SAX Parser
XML::LibXML provides a DOM based SAX parser. The SAX parser is
defined in XML::LibXML::SAX::Parser. As it is not a stream based
parser, it parses documents into a DOM and traverses the DOM tree
instead.
The API of this parser is exactly the same as any other Perl
SAX2 parser. See XML::SAX::Intro for details.
Aside from the regular parsing methods, you can access the DOM
tree traverser directly, using the generate() method:
my $doc = build_yourself_a_document();
my $saxparser = $XML::LibXML::SAX::Parser->new( ... );
$parser->generate( $doc );
This is useful for serializing DOM trees, for example that you
might have done prior processing on, or that you have as a result of
XSLT processing.
WARNING
This is NOT a streaming SAX parser. As I said above, this parser
reads the entire document into a DOM and serialises it. Some people
couldn't read that in the paragraph above so I've added this
warning.
If you want a streaming SAX parser look at the XML::LibXML::SAX
man page
Serialization
XML::LibXML provides some functions to serialize nodes and
documents. The serialization functions are described on the
XML::LibXML::Node manpage or the XML::LibXML::Document manpage.
XML::LibXML checks three global flags that alter the serialization
process:
skipXMLDeclaration
skipDTD
setTagCompression
of that three functions only setTagCompression is available for
all serialization functions.
Because XML::LibXML does these flags not itself, one has to define
them locally as the following example shows:
local $XML::LibXML::skipXMLDeclaration = 1;
local $XML::LibXML::skipDTD = 1;
local $XML::LibXML::setTagCompression = 1;
If skipXMLDeclaration is defined and not '0', the XML
declaration is omitted during serialization.
If skipDTD is defined and not '0', an existing DTD would
not be serialized with the document.
If setTagCompression is defined and not '0' empty tags are
displayed as open and closing tags ranther than the shortcut. For
example the empty tag foo will be rendered as
<foo></foo> rather than
<foo/>.
Parser Options
LibXML options are global (unfortunately this is a limitation of
the underlying implementation, not this interface). They can either be
set using $parser->option(...), or XML::LibXML->option(...),
both are treated in the same manner. Note that even two parser processes
will share some of the same options, so be careful out there!
Every option returns the previous value, and can be called without
parameters to get the current value.
validation
$parser->validation(1);
Turn validation on (or off). Defaults to off.
recover
$parser->recover(1);
Turn the parsers recover mode on (or off). Defaults to off.
This allows one to parse broken XML data into memory. This
switch will only work with XML data rather than HTML data. Also
the validation will be switched off automaticly.
The recover mode helps to recover documents that are almost
wellformed very efficiently. That is for example a document that
forgets to close the document tag (or any other tag inside the
document). The recover mode of XML::LibXML has problems restoring
documents that are more like well ballanced chunks.
XML::LibXML will only parse until the first fatal error
occours.
expand_entities
$parser->expand_entities(0);
Turn entity expansion on or off, enabled by default. If
entity expansion is off, any external parsed entities in the
document are left as entities. Probably not very useful for most
purposes.
keep_blanks
$parser->keep_blanks(0);
Allows you to turn off XML::LibXML's default behaviour
of maintaining whitespace in the document.
pedantic_parser
$parser->pedantic_parser(1)
You can make XML::LibXML more pedantic if you want to.
line_numbers
$parser->line_numbers(1)
If this option is activated XML::LibXML will store the line
number of a node. This gives more information where a validation
error occoured. It could be also used to find out about the
position of a node after parsing (see also
XML::LibXML::Node::line_number())
By default line numbering is switched off (0).
load_ext_dtd
$parser->load_ext_dtd(1);
Load external DTD subsets while parsing.
complete_attributes
$parser->complete_attributes(1);
Complete the elements attributes lists with the ones
defaulted from the DTDs. By default, this option is enabled.
expand_xinclude
$parser->expand_xinclude(1);
Expands XIinclude tags immediately while parsing the
document. This flag assures that the parser callbacks are used
while parsing the included document.
load_catalog
$parser->load_catalog( $catalog_file );
Will use $catalog_file as a catalog during all parsing
processes. Using a catalog will significantly speed up parsing
processes if many external resources are loaded into the parsed
documents (such as DTDs or XIncludes).
Note that catalogs will not be available if an external
entity handler was specified. At the current state it is not
possible to make use of both types of resolving systems at the
same time.
base_uri
$parser->base_uri( $your_base_uri );
In case of parsing strings or file handles, XML::LibXML
doesn't know about the base uri of the document. To make
relative references such as XIncludes work, one has to set a
separate base URI, that is then used for the parsed documents.
gdome_dom
$parser->gdome_dom(1);
THIS FLAG IS EXPERIMENTAL!
Although quite powerful XML:LibXML's DOM implementation
is limited if one needs or wants full DOM level 2 or level 3
support. XML::GDOME is based on libxml2 as well but provides a
rather complete DOM implementation by wrapping libgdome. This
allows you to make use of XML::LibXML's full parser options
and XML::GDOME's DOM implementation at the same time.
To make use of this function, one has to install libgdome
and configure XML::LibXML to use this library. For this you need
to rebuild XML::LibXML!
clean_namespaces
$parser->clean_namespaces( 1 );
libxml2 2.6.0 and later allows to strip redundant namespace
declarations from the DOM tree. To do this, one has to set
clean_namespaces() to 1 (TRUE). By default no namespace cleanup is
done.
Input Callbacks
If libxml2 has to load external documents during parsing, this
may cause strange results, if the location is not a HTTP, FTP or
relative location. To get around this limitation, one may add its own
input handler, to open, read and close particular locations or URI
classes.
The input callbacks are used whenever LibXML has to get
something other than external parsed entities from somewhere. The
input callbacks in LibXML are stacked on top of the original input
callbacks within the libxml library. This means that if you decide not
to use your own callbacks (see match()), then you can revert to the
default way of handling input. This allows, for example, to only
handle certain URI schemes.
Callbacks are only used on files, but not on strings or
filehandles. This is because LibXML requires the match event to find
out about which callback set is shall be used for the current input
stream. LibXML can decide this only before the stream is open. For
LibXML strings and filehandles are already opened streams.
The following callbacks are defined:
match_callback
$parser->match_callback($subref);
If you want to handle the URI, simply return a true value
from this callback.
open_callback
$parser->open_callback($subref);
Open something and return it to handle that resource.
read_callback
$parser->read_callback($subref);
Read a certain number of bytes from the resource. This
callback is called even if the entire Document has already read.
This callback has to return a string which will be parsed by the
libxml2 parser.
close_callback
$parser->close_callback($subref);
Close the handle associated with the resource.
It is important that one must not create a new parser instance
and parse some XML data from within any callback. This is forbidden,
because the new parser will override the existing callbacks and will
leave the calling parser in an undefined state. Most likely memory
violations will follow and break the running parsing process without
returning control to the perl layer.
The following example explains the concept a bit. It is a purely
fictitious example that uses a MyScheme::Handler object that responds
to methods similar to an IO::Handle.
$parser->match_callback(\&match_uri);
$parser->open_callback(\&open_uri);
$parser->read_callback(\&read_uri);
$parser->close_callback(\&close_uri);
sub match_uri {
my $uri = shift;
return $uri =~ /^myscheme:/;
}
sub open_uri {
my $uri = shift;
return MyScheme::Handler->new($uri);
}
sub read_uri {
my $handler = shift;
my $length = shift;
my $buffer;
read($handler, $buffer, $length);
return $buffer;
}
sub close_uri {
my $handler = shift;
close($handler);
}
A more realistic example can be found in the "example"
directory.
Since the parser requires all callbacks defined it is also
possible to set all callbacks with a single call of callbacks(). This
would implify the example code to:
$parser->callbacks( \&match_uri, \&open_uri, \&read_uri, \&close_uri);
All functions that are used to set the callbacks, can also be
used to retrieve the callbacks from the parser.
Optionaly it is possible to apply global callback on the
XML::LibXML class level. This allows multiple parses to share the same
callbacks. To set these global callbacks one can use the callback
access functions directly on the class.
XML::LibXML->callbacks( \&match_uri, \&open_uri, \&read_uri, \&close_uri);
The previous code snippet will set the callbacks from the first
example as global callbacks.
Error Reporting
XML::LibXML throws exceptions during parsing, validation or XPath
processing (and some other occations). These errors can be caught by
using eval blocks. The error then will be stored in
$@. Alternatively one can use the get_last_error()
function of XML::LibXML. It will return the same string that is stored
in $@. Using get_last_error() makes it still
nessecary to eval the statement, since these function groups will die()
on errors.
Note, that the use of get_last_error() still requires eval blocks.
XML::LibXML throws errors as they occour and does not wait if a user
test for them. This is a very common misunderstanding in the use of
XML::LibXML. If the eval is ommited, XML::LibXML will allways halt your
script by "croaking" (see Carp man page for details).
Also note that an increasing number throws errors if bad data is
passed. If you cannot asure valid data passed to XML::LibXML you should
eval these functions.
get_last_error() can be called either by the class itself or by a
parser instance:
$errstring = XML::LibXML->get_last_error();
$errstring = $parser->get_last_error();
However, XML::LibXML exceptions are global. That means if
get_last_error() is called on an parser instance, the last
global error will be returned. This is not
necessarily the error caused by the parser instance itself.
XML::LibXML direct SAX parser
XML::LibXML::SAX
Description
XML::LibXML provides an interface to libxml2 direct SAX interface.
Through this interface it is possible to generate SAX events directly
while parsing a document. While using the SAX parser XML::LibXML will
not create a DOM Document tree.
Such an interface is useful if very large XML documents have to be
processed and no DOM functions are required. By using this interface it
is possible to read data stored within a XML document directly into the
application datastructures without loading the document into memory.
The SAX interface of XML::LibXML is based on the famous XML::SAX
interface. It uses the generic interface as provided by XML::SAX::Base.
Additionally to the generic functions, which are only able to
process entire documents, XML::LibXML::SAX provides
parse_chunk(). This method generates SAX events
from well ballanced data such as is often provided by databases.
NOTE: At the moment XML::LibXML provides only
an incomplete interface to libxml2's native SAX implementaion. The
current implementation is not tested in production environment. It may
causes significant memory problems or shows wrong behaviour. If you run
into specific problems using this part of XML::LibXML, let me know.
Building DOM trees from SAX events.
XML::LibXML::SAX::Builder
Synopsis
my $builder = XML::LibXML::SAX::Builder->new();
my $gen = XML::Generator::DBI->new(Handler => $builder, dbh => $dbh);
$gen->execute("SELECT * FROM Users");
my $doc = $builder->result();
Description
This is a SAX handler that generates a DOM tree from SAX events.
Usage is as above. Input is accepted from any SAX1 or SAX2 event
generator.
Building DOM trees from SAX events is quite easy with
XML::LibXML::SAX::Builder. The class is designed as a SAX2 final handler
not as a filter!
Since SAX is strictly stream oriented, you should not expect
anything to return from a generator. Instead you have to ask the builder
instance directly to get the document built.
XML::LibXML::SAX::Builder's result() function holds the document
generated from the last SAX stream.
XML::LibXML DOM Implementation
XML::LibXML::DOM
Description
XML::LibXML provides an lightwight interface to
modify a node of the document tree generated by the
XML::LibXML parser. This interface follows as far as possible the DOM
Level 3 specification. Additionally to the specified functions the
XML::LibXML supports some functions that are more handy to use in the
perl environment.
One also has to remember, that XML::LibXML is an interface to
libxml2 nodes which actually reside on the C-Level of XML::LibXML. This
means each node is a reference to a structure different than a perl hash
or array. The only way to access these structure's values is through
the DOM interface provided by XML::LibXML. This also means, that one
can't simply inherit a XML::LibXML node and add
new member variables as they were hash keys.
The DOM interface of XML::LibXML does not intend to implement a
full DOM interface as it is done by XML::GDOME and used for full
featured application. Moreover, it offers an simple way to build or
modify documents that are created by XML::LibXML's parser.
Another target of the XML::LibXML interface is to make the
interfaces of libxml2 available to the perl community. This includes
also some workarounds to some features where libxml2 assumes more
control over the C-Level that most perl users don't have.
One of the most important parts of the XML::LibXML DOM interface
is, that the interfaces try do follow the DOM Level 3 specification
rather strictly. This means the interface functions are named as the DOM
specification says and not what widespread Java interfaces claim to be
standard. Although there are several functions that have only a singular
interface that conforms to the DOM spec XML::LibXML provides an
additional Java style alias interface.
Also there are some function interfaces left over from early
stages of XML::LibXML for compatibility reasons. These interfaces are
for compatibility reasons only. They might
disappear in one of the future versions of XML::LibXML, so a user is
requested to switch over to the official functions.
More recent versions of perl (e.g. 5.6.1 or higher) support
special flags to disinguish between UTF8 and so called binary data.
XML::LibXML provides for these versions functionality to make efficient
use of these flags: If a document has set an encoding other than UTF8
all strings that are not already in UTF8 are implicitly encoded from the
document encoding to UTF8. On output these strings are commonly returned
as UTF8 unless a user does request explicitly the original (aka.
document) encoding.
Older version of perl (such as 5.00503 or less) do not support
these flags. If XML::LibXML is build for these versions, all strings
have to get encoded to UTF8 manualy before they are passed to any DOM
functions.
NOTE: XML::LibXML's magic encoding may
not work on all plattforms. Some platforms are known to have a broken
iconv(), which is partly used by libxml2. To test if your platform works
correctly with your language encoding, build a simple document in the
particular encoding and try to parse it with XML::LibXML. If your
document gets parsed with out causing any segmentation faults, bus
errors or whatever your OS throws. An example for such a test can be
found in test 19encoding.t of the distribution.
Namespaces and XML::LibXML's DOM implementation
XML::LibXML's DOM implementation follows the DOM
implementation of libxml2. This is important to know if namespaces are
used. Namespaces cannot be declared on an document node. This is basicly
because XPath doesn't know about document nodes. Therefore
namespaces have to be declared on element nodes. This can happen
explicitly by using XML::LibXML:Element's setNamespace() function or
more or less implicitly by using XML::LibXML::Document's
createElementNS() or createAttributeNS() function. If the a namespace is
not declared on the documentElement, the namespace will be localy
declared for the newly created node. In case of Attributes this may look
a bit confusing, since these nodes cannot have namespace declarations
itself. In this case the namespace in internally applied to the
attribute and later declared on the node the attribute is appended to.
The following example may explain this a bit:
my $doc = XML::LibXML->createDocument;
my $root = $doc->createElementNS( "", "foo" );
$doc->setDocumentElement( $root );
my $attr = $doc->createAttributeNS( "bar", "bar:foo", "test" );
$root->setAttributeNodeNS( $attr );
This piece of code will result in the following document:
<?xml version="1.0"?>
<foo xmlns:bar="bar" bar:foo="test"/>
Note that the namespace is declared on the document element while
the setAttributeNodeNS() call.
Here it is important to repeat the specification: While working
with namespaces you should use the namespace aware functions instead of
the simplified versions. For example you should never
use setAttributeNode() but setAttributeNodeNS().
XML::LibXML DOM Document Class
XML::LibXML::Document
The Document Class is in most cases the result of a parsing process.
But sometimes it is necessary to create a Document from scratch. The DOM
Document Class provides functions that conform to the DOM Core naming
style.
It inherits all functions from XML::LibXML::Node
as specified in the DOM specification. This enables access to the nodes
besides the root element on document level - a DTD
for example. The support for these nodes is limited at the moment.
While generaly nodes are bound to a document in the DOM concept it
is suggested that one should always create a node not bound to any
document. There is no need of really including the node to the document,
but once the node is bound to a document, it is quite safe that all
strings have the correct encoding. If an unbound textnode with an iso
encoded string is created (e.g. with $CLASS->new()), the
toString function may not return the expected result.
All this seems like a limitation as long as UTF8 encoding is
assured. If iso encoded strings come into play it is much safer to use the
node creation functions of XML::LibXML::Document.
new
$dom = XML::LibXML::Document->new( $version, $encoding );
alias for createDocument()
createDocument
$dom = XML::LibXML::Document->createDocument( $version, $encoding );
The constructor for the document class. As Parameter it takes
the version string and (optionally) the encoding string. Simply
calling createDocument() will create the
document:
<?xml version="your version" encoding="your encoding"?>
Both parameter are optional. The default value for
$version is 1.0, of
course. If the $encoding parameter is not set,
the encoding will be left unset, which means UTF8 is implied.
The call of createDocument() without any
parameter will result the following code:
<?xml version="1.0"?>
Alternatively one can call this constructor directly from the
XML::LibXML class level, to avoid some typing. This will not have
any effect on the class instance, which is always
XML::LibXML::Document.
my $document = XML::LibXML->createDocument( "1.0", "UTF8" );
is therefore a shortcut for
my $document = XML::LibXML::Document->createDocument( "1.0", "UTF8" );
encoding
$strEncoding = $doc->encoding();
returns the encoding string of the document.
my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
print $doc->encoding; # prints ISO-8859-15
Optionally this function can be accessed by
actualEncoding or getEncoding.
setEncoding
$doc->setEncoding($new_encoding);
From time to time it is useful to change the effective
encoding of a document. This method provides the interface to
manipulate the encoding of a document.
Note that this function has to be used very carefully, since
you can't simply convert one encoding in any other, since some
(or even all) characters may not exist in the new encoding.
XML::LibXML will not test if the operation is allowed or possible
for the given document. The only switching assured to work is to
UTF8.
version
$strVersion = $doc->version();
returns the version string of the document
getVersion() is an alternative form of
this function.
standalone
$doc->standalone
This function returns the Numerical value of a documents XML
declarations standalone attribute. It returns 1
if standalone="yes" was found, 0 if
standalone="no" was found and -1 if
standalone was not specified (default on creation).
setStandalone
$doc->setStandalone($numvalue);
Through this method it is possible to alter the value of a
documents standalone attribute. Set it to 1 to
set standalone="yes", to 0 to set
standalone="no" or set it to -1 to
remove the standalone attribute from the XML declaration.
compression
my $compression = $doc->compression;
libxml2 allows reading of documents directly from gziped
files. In this case the compression variable is set to the
compression level of that file (0-8). If XML::LibXML parsed a
different source or the file wasn't compressed, the returned
value will be -1.
setCompression
$doc->setCompression($ziplevel);
If one intends to write the document directly to a file, it is
possible to set the compression level for a given document. This
level can be in the range from 0 to 8. If XML::LibXML should not try
to compress use -1 (default).
Note that this feature will only work if
libxml2 is compiled with zlib support and toFile() is used for
output.
toString
$docstring = $dom->toString($format);
toString is a deparsing function, so the
DOM Tree can be translated into a string, ready for output.
The optional $format parameter sets the
indenting of the output. This parameter is expected to be an
integer value, that specifies that indentation
should be used. The format parameter can have three different values
if it is used:
If $format is 0, than the document is dumped as it was
originally parsed
If $format is 1, libxml2 will add ignorable whitespaces, so
the nodes content is easier to read. Existing text nodes will not be
altered
If $format is 2 (or higher), libxml2 will act as $format == 1
but it add a leading and a trailing linebreak to each text node.
libxml2 uses a hardcoded indentation of 2 space characters per
indentation level. This value can not be altered on runtime.
NOTE: XML::LibXML::Document::toString
returns the data in the document encoding rather than UTF8! If you
want UTF8 ecoded XML, you have to change the conding by using
setEncoding()
toStringC14N
$c14nstr = $doc->toStringC14N($comment_flag,$xpath); A
variation to toString, that returns the canonized form of the given
document.
serialize
$str = $doc->serialze($format);
Alternative form of toString(). This function name added to be
more conformant with libxml2's examples.
serialize_c14n
$c14nstr = $doc->serialize_c14n($comment_flag,$xpath);
Alternative form of toStringC14N().
toFile
$state = $doc->toFile($filename, $format);
This function is similar to toString(), but it writes the
document directly into a filesystem. This function is very useful,
if one needs to store large documents.
The format parameter has the same behaviour as in toString().
toFH
$state = $doc->toFH($fh, $format);
This function is similar to toString(), but it writes the
document directly to a filehandler or a stream.
The format parameter has the same behaviour as in toString().
toStringHTML
$str = $document->toStringHTML();
toStringHTML deparses the tree to a
string as HTML. With this method indenting is automatic and managed
by libxml2 internally.
serialize_html
$str = $document->serialize_html();
Alternative form of toStringHTML().
is_valid
$bool = $dom->is_valid();
Returns either TRUE or FALSE depending on whether the DOM Tree
is a valid Document or not.
You may also pass in a XML::LibXML::Dtd object, to validate
against an external DTD:
if (!$dom->is_valid($dtd)) {
warn("document is not valid!");
}
validate
$dom->validate();
This is an exception throwing equivalent of is_valid. If the
document is not valid it will throw an exception containing the
error. This allows you much better error reporting than simply
is_valid or not.
Again, you may pass in a DTD object
documentElement
$root = $dom->documentElement();
Returns the root element of the Document. A document can have
just one root element to contain the documents data.
Optionaly one can use getDocumentElement.
setDocumentElement
$dom->setDocumentElement( $root );
This function enables you to set the root element for a
document. The function supports the import of a node from a
different document tree.
createElement
$element = $dom->createElement( $nodename );
This function creates a new Element Node bound to the DOM with
the name $nodename.
createElementNS
$element = $dom->createElementNS( $namespaceURI, $qname );
This function creates a new Element Node bound to the DOM with
the name $nodename and placed in the given
namespace.
createTextNode
$text = $dom->createTextNode( $content_text );
As an equivalent of createElement, but it
creates a Text Node bound to the DOM.
createComment
$comment = $dom->createComment( $comment_text );
As an equivalent of createElement, but it
creates a Comment Node bound to the DOM.
createAttribute
$attrnode = $doc->createAttribute($name [,$value]);
Creates a new Attribute node.
createAttributeNS
$attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] );
Creates an Attribute bound to a namespace.
createDocumentFragment
$fragment = $doc->createDocumentFragment()
This function creates a DocumentFragment.
createCDATASection
$cdata = $dom->create( $cdata_content );
Similar to createTextNode and createComment, this function
creates a CDataSection bound to the current DOM.
createProcessingInstruction
my $pi = $doc->createProcessingInstruction( $target, $data );
create a processing instruction node.
Since this method is quite long one may use its short form
createPI().
createEntityReference
my $entref = $doc->createEntityReference($refname);
If a document has a DTD specified, one can create entity
references by using this function. If one wants to add a entity
reference to the document, this reference has to be created by this
function.
An entity reference is unique to a document and cannot be
passed to other documents as other nodes can be passed.
NOTE: A text content containing something
that looks like an entity reference, will not be expanded to a real
entity reference unless it is a predefined entity
my $string = "&foo;";
$some_element->appendText( $string );
print $some_element->textContent; # prints "&foo;"
createInternalSubset
$dtd = $document->createInternalSubset( $rootnode, $public, $system);
This function creates and adds an internal subset to the given
document. Because the function automaticly adds the DTD to the
document there is no need to add the created node explicitly to the
document.
my $document = XML::LibXML::Document->new();
my $dtd = $document->createInternalSubset( "foo", undef, "foo.dtd" );
will result in the following XML document:
<?xml version="1.0"?>
<!DOCTYPE foo SYSTEM "foo.dtd">
By setting the public parameter it is possible to set PUBLIC
dtds to a given document. So
my $document = XML::LibXML::Document->new();
my $dtd = $document->createInternalSubset( "foo", "-//FOO//DTD FOO 0.1//EN", undef );
will cause the following declaration to be created on the
document:
<?xml version="1.0"?>
<!DOCTYPE foo PUBLIC "-//FOO//DTD FOO 0.1//EN">
createExternalSubset
$dtd = $document->createExternalSubset( $rootnode, $public, $system);
This function is similar to createInternalSubset()
but this DTD is considered to be external and is therefore not added
to the document itself. Nevertheless it can be used for validation
purposes.
importNode
$document->importNode( $node );
If a node is not part of a document, it can be imported to
another document. As specified in DOM Level 2 Specification the Node
will not be altered or removed from its original document ($node->cloneNode(1)
will get called implicitly).
NOTE: Don't try to use importNode()
to import subtrees that contain an entity reference - even if the
entity reference is the root node of the subtree. This will cause
serious problems to your program. This is a limitation of libxml2
and not of XML::LibXML itself.
adoptNode
$document->adoptNode( $node );
If a node is not part of a document, it can be imported to
another document. As specified in DOM Level 3 Specification the Node
will not be altered but it will removed from its original document.
After a document adopted a node, the node, its attributes and
all its descendants belong to the new document. Because the node
does not belong to the old document, it will be unlinked from its
old location first.
NOTE: Don't try to adoptNode() to
import subtrees that contain entity references - even if the entity
reference is the root node of the subtree. This will cause serious
problems to your program. This is a limitation of libxml2 and not of
XML::LibXML itself.
externalSubset
my $dtd = $doc->externalSubset;
If a document has an external subset defined it will be
returned by this function.
NOTE Dtd nodes are no ordinary nodes in
libxml2. The support for these nodes in XML::LibXML is still
limited. In particular one may not want use common node function on
doctype declaration nodes!
internalSubset
my $dtd = $doc->internalSubset;
If a document has an internal subset defined it will be
returned by this function.
NOTE Dtd nodes are no ordinary nodes in
libxml2. The support for these nodes in XML::LibXML is still
limited. In particular one may not want use common node function on
doctype declaration nodes!
setExternalSubset
$doc->setExternalSubset($dtd);
EXPERIMENTAL!
This method sets a DTD node as an external subset of the given
document.
setInternalSubset
$doc->setInternalSubset($dtd);
EXPERIMENTAL!
This method sets a DTD node as an internal subset of the given
document.
removeExternalSubset
my $dtd = $doc->removeExternalSubset();
EXPERIMENTAL!
If a document has an external subset defined it can be removed
from the document by using this function. The removed dtd node will
be returned.
removeInternalSubset
my $dtd = $doc->removeInternalSubset();
EXPERIMENTAL!
If a document has an internal subset defined it can be removed
from the document by using this function. The removed dtd node will
be returned.
getElementsByTagName
my @nodelist = $doc->getElementsByTagName($tagname);
Implements the DOM Level 2 function
In SCALAR context this function returns a
XML::LibXML::NodeList object.
getElementsByTagNameNS
my @nodelist = $doc->getElementsByTagName($nsURI,$tagname);
Implements the DOM Level 2 function
In SCALAR context this function returns a
XML::LibXML::NodeList object.
getElementsByLocalName
my @nodelist = $doc->getElementsByLocalName($localname);
This allows the fetching of all nodes from a given document
with the given Localname.
In SCALAR context this function returns a
XML::LibXML::NodeList object.
getElementsById
my $node = $doc->getElementsById($id);
This allows the fetching of the node at a given position in
the DOM.
Note: The Id of a node might change while manipulating the
document.
indexElements
$dom->indexElements();
This function causes libxml2 to stamp all elements in a
document with their document position index which considerably
speeds up XPath queries for large documents. It should only be used
with static documents that won't be further changed by any DOM
methods, because once a document is indexed, XPath will always
prefer the index to other methods of determining the document order
of nodes. XPath could therefore return improperly ordered node-lists
when applied on a document that has been changed after being
indexed. It is of course possible to use this method to re-index a
modified document before using it with XPath again. This function is
not a part of the DOM specification.
This function returns number of elements indexed, -1 if error
occurred, or -2 if this feature is not available in the running
libxml2.
Abstract Base Class of XML::LibXML Nodes
XML::LibXML::Node
XML::LibXML::Node defines functions that are common to all Node
Types. A LibXML::Node should never be created standalone, but as an
instance of a high level class such as LibXML::Element or LibXML::Text.
The class itself should provide only common functionality. In XML::LibXML
each node is part either of a document or a document-fragment. Because of
this there is no node without a parent. This may causes confusion with
"unbound" nodes.
nodeName
$name = $node->nodeName;
Returns the node's name. This Function is aware of
namesaces and returns the full name of the current node (prefix:localname)
setNodeName
$node->setNodeName( $newName );
In very limited situations, it is useful to change a nodes
name. In the DOM specification this should throw an error. This
Function is aware of namespaces.
isSameNode
$bool = $node->isSameNode( $other_node );
returns TRUE (1) if the given nodes refer to the same node
structure, otherwise FALSE (0) is returned.
isEqual
$bool = $node->isEqual( $other_node );
deprecated version of isSameNode().
NOTE isEqual will change behaviour to
follow the DOM specification
nodeValue
$content = $node->nodeValue;
If the node has any content (such as stored in a
text node) it can get requested through this
function.
NOTE: Element Nodes have no content per
definition. To get the text value of an Element use textContent()
instead!
textContent
$content = $node->textContent;
this function returns the content of all text nodes in the
descendants of the given node as spacified in DOM.
line_number
$lineno = $node->line_number();
This function returns the line number where the tag was found
during parsing. If a node is added to the document the line number
is 0. Problems may occour, if a node from one document is passed to
another one.
Note: line_number() is special to XML::LibXML and not part of
the DOM specification.
If the line_numbers flag of the parser was not activated
before parsing, line_number() will always return 0.
nodeType
$type = $node->nodeType;
Retrun the node's type. The possible types are described
in the libxml2 tree.h documentation. The return
value of this function is a numeric value. Therefore it differs from
the result of perl ref function.
line_number
$lineno = $node->line_number();
This function returns the line number where the tag was found
during parsing. If a node is added to the document the line number
is 0. Problems may occur, if a node from one document is passed to
another one.
Note: line_number() is special to XML::LibXML and not part of
the DOM specification.
If the line_numbers flag of the parser was not activated
before parsing, line_number() will always return 0.
unbindNode
$node->unbindNode()
Unbinds the Node from its siblings and Parent, but not from
the Document it belongs to. If the node is not inserted into the DOM
afterwards it will be lost after the programm terminated. From a low
level view, the unbound node is stripped from the context it is and
inserted into a (hidden) document-fragment.
removeChild
$childnode = $node->removeChild( $childnode )
This will unbind the Child Node from its parent
$node. The function returns the unbound node.
If oldNode is not a child of the given Node the
function will fail.
replaceChild
$oldnode = $node->replaceChild( $newNode, $oldNode )
Replaces the $oldNode with the
$newNode. The $oldNode
will be unbound from the Node. This function differs from the DOM L2
specification, in the case, if the new node is not part of the
document, the node will be imported first.
replaceNode
$node->replaceNode($newNode);
This function is very similar to replaceChild(), but it
replaces the node itself rather than a childnode. This is useful if
a node found by any XPath function, should be replaced.
appendChild
$childnode = $node->appendChild( $childnode );
The function will add the $childnode to
the end of $node's children. The function
should fail, if the new childnode is allready a child of
$node. This function differs from the DOM L2
specification, in the case, if the new node is not part of the
document, the node will be imported first.
addChild
$childnode = $node->addChild( $chilnode );
As an alternative to appendChild() one can use the addChild()
function. This function is a bit faster, because it avoids all DOM
conformity checks. Therefore this function is quite useful if one
builds XML documents in memory where the order and ownership (ownerDocument)
is assured.
addChild() uses libxml2's own xmlAddChild() function. Thus
it has to be used with extra care: If a text node is added to a node
and the node itself or its last childnode is as well a text node,
the node to add will be merged with the one already available. The
current node will be removed from memory after this action. Because
perl is not aware of this action, the perl instance is still
available. XML::LibXML will catch the loss of a node and refuse to
run any function called on that node.
my $t1 = $doc->createTextNode( "foo" );
my $t2 = $doc->createTextNode( "bar" );
$t1->addChild( $t2 ); # is ok
my $val = $t2->nodeValue(); # will fail, script dies
Also addChild() will not check it the added node belongs to
the same document as the node it will be added to. This could lead
to inconsistent documents and in more worse cases even to memory
violations, if one does not keep track of this issue.
Although this sounds like a lot of trouble, addChild() is
useful if a document is built from a stream, such as happens
sometimes in SAX handlers or filters.
If you are not sure about the source of your nodes, you better
stay with appendChild(), because this function is more user friendly
in the sense of being more error tolerant.
addNewChild
$node = $parent->addNewChild( $nsURI, $name );
Similar to addChild(), this function uses
low level libxml2 functionality to provide faster interface for DOM
building. addNewChild() uses
xmlNewChild() to create a new node on a given
parent element.
addNewChild() has two parameters $nsURI and $name, where
$nsURI is an (optional) namespace URI. $name is the fully qualified
element name; addNewChild() will determine the correct prefix if
nessecary.
The function returns the newly created node.
This function is very useful for DOM building, where a created
node can be directly associated with its parent.
NOTE this function is not part of the DOM
specification and its use will limit your code to XML::LibXML.
addSibling
$node->addSibling($newNode);
addSibling() allows adding an additional node to the end of a
nodelist, defined by the given node.
cloneNode
$newnode =$node->cloneNode( $deep )
cloneNode creates a copy of
$node. When $deep is set to 1 (true) the
function will copy all childnodes as well. If $deep is 0 only the
current node will be copied.
cloneNode will not copy any namespace
information if it is not run recursivly.
parentNode
$parentnode = $node->parentNode;
Returns simply the Parent Node of the current node.
nextSibling
$nextnode = $node->nextSibling()
Returns the next sibling if any .
previousSibling
$prevnode = $node->previousSibling()
Analogous to getNextSibling the function
returns the previous sibling if any.
hasChildNodes
$boolean = $node->hasChildNodes();
If the current node has Childnodes this function returns TRUE
(1), otherwise it returns FALSE (0, not undef).
firstChild
$childnode = $node->firstChild;
If a node has childnodes this function will return the first
node in the childlist.
lastChild
$childnode = $node->lastChild;
If the $node has childnodes this function
returns the last child node.
ownerDocument
$documentnode = $node->ownerDocument;
Through this function it is always possible to access the
document the current node is bound to.
getOwner
$node = $node->getOwner;
This function returns the node the current node is associated
with. In most cases this will be a document node or a document
fragment node.
setOwnerDocument
$node->setOwnerDocument( $doc );
This function binds a node to another DOM. This method unbinds
the node first, if it is allready bound to another document.
This function is the oposite calling of
XML::LibXML::Document's adoptNode() function. Because of this it
has the same limitations with Entity References as adoptNode().
insertBefore
$node->insertBefore( $newNode, $refNode )
The method inserts $newNode before
$refNode. If $refNode is
undefined, the newNode will be set as the new last child of the
parent node. This function differs from the DOM L2 specification, in
the case, if the new node is not part of the document, the node will
be imported first, automatically.
$refNode has to be passed to the function even if it is
undefined:
$node->insertBefore( $newNode, undef ); # the same as $node->appendChild( $newNode );
$node->insertBefore( $newNode ); # wrong
Note, that the reference node has to be a direct child of the
node the function is called on. Also, $newChild is not allowed to be
an ancestor of the new parent node.
insertAfter
$node->insertAfter( $newNode, $refNode )
The method inserts $newNode after
$refNode. If $refNode is
undefined, the newNode will be set as the new last child of the
parent node.
Note, that $refNode has to be passed explicitly even if it is
undef.
findnodes
@nodes = $node->findnodes( $xpath_statement );
findnodes performs the xpath statement on
the current node and returns the result as an array. In scalar
context returns a XML::LibXML::NodeList object.
find
$result = $node->find( $xpath );
find performs the xpath expression using
the current node as the context of the expression, and returns the
result depending on what type of result the XPath expression had.
For example, the XPath "1 * 3 + 52" results in a
XML::LibXML::Number object being returned.
Other expressions might return a XML::LibXML::Boolean
object, or a XML::LibXML::Literal object (a
string). Each of those objects uses Perl's overload feature to
"do the right thing" in different contexts.
findvalue
print $node->findvalue( $xpath );
findvalue is exactly equivalent to:
$node->find( $xpath )->to_literal;
That is, it returns the literal value of the results. This
enables you to ensure that you get a string back from your search,
allowing certain shortcuts. This could be used as the equivalent of
XSLT's <xsl:value-of select="some_xpath"/>.
childNodes
@childnodes = $node->childNodes;
getChildnodes implements a more intuitive
interface to the childnodes of the current node. It enables you to
pass all children directly to a map or
grep. If this function is called in scalar
context, a XML::LibXML::NodeList object will be
returned.
toString
$xmlstring = $node->toString($format,$docencoding);
This is the equivalent to XML::LibXML::Document::toString
for a single node. This means a node and all its childnodes will be
dumped into the result string.
Additionally to the $format flag of XML::LibXML::Document,
this version accepts the optional $docencoding flag. If this flag is
set this function returns the string in its original encoding (the
encoding of the document) rather than UTF8.
toStringC14N
$c14nstring = $node->toString($with_comments, $xpath_expression);
The function is similar to toString(). Instead of simply
serializing the document tree, it transforms it as it is specified
in the XML-C14N Specification. Such transformation is known as
canonization.
If $with_comments is 0 or not defined, the result-document
will not contain any comments that exist in the original document.
To include comments into the canonized document, $with_comments has
to be set to 1.
The parameter $xpath_expression defines the nodeset of nodes
that should be visible in the resulting document. This can be used
to filter out some nodes. One has to note, that only the nodes that
are part of the nodeset, will be included into the result-document.
Their child-nodes will not exist in the resulting document, unless
they are part of the nodeset defined by the xpath expression.
If $xpath_expression is ommitted or empty, toStringC14N() will
include all nodes in the given sub-tree.
No serializing flags will be recognized by this function!
serialize
$str = $doc->serialze($format);
Alternative form of toString(). This function name added to be
more conform with libxml2's examples.
serialize_c14n
$c14nstr = $doc->serialize_c14n($comment_flag,$xpath);
Alternative form of toStringC14N().
localname
$localname = $node->localname;
Returns the local name of a tag. This is the part behind the
colon.
prefix
$nameprefix = $node->prefix;
Returns the prefix of a tag. This is the part before the
colon.
namespaceURI
$uri = $node->namespaceURI()
returns the URI of the current namespace.
hasAttributes
$boolean = $node->hasAttributes();
returns 1 (TRUE) if the current node has any attributes set,
otherwise 0 (FALSE) is returned.
attributes
@attributelist = $node->attributes();
This function returns all attributes and namespace
declarations assigned to the given node.
Because XML::LibXML does not implement namespace declarations
and attributes the same way, it is required to test what kind of
node is handled while accessing the functions result.
If this function is called in array context the attribute
nodes are returned as an array. In scalar context the function will
return a XML::LibXML::NamedNodeMap object.
lookupNamespaceURI
$URI = $node->lookupNamespaceURI( $prefix );
Find a namespace URI by its prefix starting at the current
node.
lookupNamespacePrefix
$prefix = $node->lookupNamespacePrefix( $URI );
Find a namespace prefix by its URI starting at the current
node.
NOTE Only the namespace URIs are meant to
be unique. The prefix is only document related. Also the document
might have more than a single prefix defined for a namespace.
iterator
$iter = $node->iterator;
This function is deprecated since XML::LibXML 1.54. It is only
a dummy function that will get removed entirely in one of the next
versions.
To make use of iterator functions use XML::LibXML::Iterator
Module available on CPAN.
normalize
$node->normalize;
This function normalizes adjacent textnodes. This function is
not as strict as libxml2's xmlTextMerge() function, since it
will not free a node that is still referenced by the perl layer.
getNamespaces
@nslist = $node->getNamespaces;
If a node has any namespaces defined, this function will
return these namespaces. Note, that this will not return all
namespaces that are in scope, but only the ones declared explicitly
for that node.
Although getNamespaces is available for all nodes, it only
makes sense if used with element nodes.
removeChildNodes
$node->removeChildNodes();
This function is not specified for any DOM level: It removes
all childnodes from a node in a single step. Other than the libxml2
function itself (xmlFreeNodeList), this function will not
immediately remove the nodes from the memory. This saves one from
getting memory violations, if there are nodes still referred to from
the Perl level.
XML::LibXML Class for Element Nodes
XML::LibXML::Element
new
$node = XML::LibXML::Element->new( $name )
This function creates a new node unbound to any DOM.
setAttribute
$node->setAttribute( $aname, $avalue );
This method sets or replaces the node's attribute
$aname to the value $avalue
setAttributeNS
$node->setAttributeNS( $nsURI, $aname, $avalue );
Namespaceversion of setAttribute.
getAttribute
$avalue = $node->getAttribute( $aname );
If $node has an attribute with the name
$aname, the value of this attribute will get
returned.
getAttributeNS
$avalue = $node->setAttributeNS( $nsURI, $aname );
Namespaceversion of getAttribute.
getAttributeNode
$attrnode = $node->getAttributeNode( $aname );
Returns the attribute as a node if the attribute exists. If
the Attribute does not exists undef will be
returned.
getAttributeNodeNS
$attrnode = $node->getAttributeNodeNS( $namespaceURI, $aname );
Namespaceversion of getAttributeNode.
removeAttribute
$node->removeAttribute( $aname );
The method removes the attribute $aname
from the node's attribute list, if the attribute can be found.
removeAttributeNS
$node->removeAttributeNS( $nsURI, $aname );
Namespace version of removeAttribute
hasAttribute
$boolean = $node->hasAttribute( $aname );
This funcion tests if the named attribute is set for the node.
If the attribute is specified, TRUE (1) will be returned, otherwise
the returnvalue is FALSE (0).
hasAttributeNS
$boolean = $node->hasAttributeNS( $nsURI, $aname );
namespace version of hasAttribute
getChildrenByTagName
@nodes = $node->getChildrenByTagName($tagname);
The function gives direct access to all childnodes of the
current node with the same tagname. It makes things a lot easier if
you need to handle big datasets.
If this function is called in SCALAR context, it returns the
number of Elements found.
getChildrenByTagNameNS
@nodes = $node->getChildrenByTagNameNS($nsURI,$tagname);
Namespace version of getChildrenByTagName.
If this function is called in SCALAR context, it returns the
number of Elements found.
getElementsByTagName
@nodes = $node->;getElementsByTagName($tagname);
This function is part of the spec it fetches all descendants
of a node with a given tagname. If one is as confused with
tagname as I was, tagname is a qualified
tagname which is in case of namespace useage prefix and local name
In SCALAR context this function returns a
XML::LibXML::NodeList object.
getElementsByTagNameNS
@nodes = $node->getElementsByTagNameNS($nsURI,$localname);
Namespace version of getElementsByTagName
as found in the DOM spec.
In SCALAR context this function returns a
XML::LibXML::NodeList object.
getElementsByLocalName
@nodes = $node->getElementsByLocalName($localname);
This function is not found in the DOM specification. It is a
mix of getElementsByTagName and getElementsByTagNameNS. It will
fetch all tags matching the given local-name. This alows one to
select tags with the same local name across namespace borders.
In SCALAR context this function returns a
XML::LibXML::NodeList object.
appendWellBalancedChunk
$node->appendWellBalancedChunk( $chunk )
Sometimes it is nessecary to append a string coded XML Tree to
a node. appendWellBalancedChunk will do the
trick for you. But this is only done if the String is
well-balanced.
Note that appendWellBalancedChunk() is only left for
compatibility reasons. Implicitly it uses
my $fragment = $parser->parse_xml_chunk( $chunk );
$node->appendChild( $fragment );
This form is more explicit and makes it easier to control the
flow of a script.
appendText
$node->appendText( $PCDATA );
alias for appendTextNode().
appendTextNode
$node->appendTextNode( $PCDATA );
This wrapper function lets you add a string directly to an
element node.
appendTextChild
$node->appendTextChild( $childname , $PCDATA )
Somewhat similar with appendTextNode: It
lets you set an Element, that contains only a text node
directly by specifying the name and the text content.
setNamespace
$node->setNamespace( $nsURI , $nsPrefix, $activate )
setNamespace() allows one to apply a namespace to an element.
The function takes three parameters: 1. the namespace URI, which is
required and the two optional values prefix, which is the namespace
prefix, as it should be used in child elements or attributes as well
as the additional activate parameter.
The activate parameter is most useful: If this parameter is
set to FALSE (0), the namespace is simply added to the namespacelist
of the node, while the element's namespace itself is not
altered. Nevertheless activate is set to TRUE (1) on default. In
this case the namespace automatically is used as the nodes effective
namespace. This means the namespace prefix is added to the node name
and if there was a namespace already active for the node, this will
be replaced (but not removed from the global namespace list)
The following example may clarify this:
my $e1 = $doc->createElement("bar");
$e1->setNamespace("http://foobar.org", "foo")
results
<foo:bar xmlns:foo="http://foobar.org"/>
while
my $e2 = $doc->createElement("bar");
$e2->setNamespace("http://foobar.org", "foo",0)
results only
<bar xmlns:foo="http://foobar.org"/>
By using $activate == 0 it is possible to apply multiple
namepace declarations to a single element.
Alternativly you can call setAttribute() simply to declare a
new namespace for a node, without activating it:
$e2->setAttribute( "xmlns:foo", "http://bar.org" );
has the same result as
$e2->setNamespace( "http://foobar.org", "foo", 0 );
XML::LibXML Class for Text Nodes
XML::LibXML::Text
Different to the DOM specification XML::LibXML implements the text
node as the base class of all character data node. Therefor there exists
no CharacterData class. This allow one to use all methods that are
available for textnodes as well for Comments or CDATA-sections.
new
$text = XML::LibXML::Text->new( $content );
The constuctor of the class. It creates an unbound text node.
data
$nodedata = $text->data;
Although there exists the nodeValue
attribute in the Node class, the DOM specification defines data as a
separate attribute. XML::LibXML implements
these two attributes not as different attributes, but as aliases,
such as libxml2 does. Therefore
$text->data;
and
$text->nodeValue;
will have the same result and are not different entities.
setData($string)
$text->setData( $text_content );
This function sets or replaces text content to a node. The
node has to be of the type "text", "cdata" or
"comment".
substringData($offset,$length)
$text->substringData($offset, $length);
Extracts a range of data from the node. (DOM Spec) This
function takes the two parameters $offset and $length and returns
the substring, if available.
If the node contains no data or $offset refers to an
nonexisting string index, this function will return
undef. If $length is out of range
substringData will return the data starting at
$offset instead of causing an error.
appendData($string)
$text->appendData( $somedata );
Appends a string to the end of the existing data. If the
current text node contains no data, this function has the same
effect as setData.
insertData($offset,$string)
$text->insertData($offset, $string);
Inserts the parameter $string at the given $offset of the
existing data of the node. This operation will not remove existing
data, but change the order of the existing data.
The $offset has to be a positive value. If $offset is out of
range, insertData will have the same behaviour
as appendData.
deleteData($offset, $length)
$text->deleteData($offset, $length);
This method removes a chunk from the existing node data at the
given offset. The $length parameter tells, how many characters
should be removed from the string.
deleteDataString($string, [$all])
$text->deleteDataString($remstring, $all);
This method removes a chunk from the existing node data. Since
the DOM spec is quite unhandy if you already know
which string to remove from a text node, this
method allows more perlish code :)
The functions takes two parameters: $string
and optional the $all flag. If $all is not set,
undef or 0,
deleteDataString will remove only the first
occourance of $string. If $all is TRUE
deleteDataString will remove all occurrences of
$string from the node data.
replaceData($offset, $length, $string)
$text->replaceData($offset, $length, $string);
The DOM style version to replace node data.
replaceDataString($oldstring, $newstring, [$all])
$text->replaceDataString($old, $new, $flag);
The more programmer friendly version of replaceData() :)
Instead of giving offsets and length one can specify the exact
string ($oldstring) to be replaced.
Additionally the $all flag allows to replace
all occourences of $oldstring.
replaceDataRegEx( $search_cond, $replace_cond, $reflags )
$text->replaceDataRegEx( $search_cond, $replace_cond, $reflags );
This method replaces the node's data by a
simple regular expression. Optional, this
function allows to pass some flags that will be added as flag to the
replace statement.
NOTE: This is a shortcut for
my $datastr = $node->getData();
$datastr =~ s/somecond/replacement/g; # 'g' is just an example for any flag
$node->setData( $datastr );
This function can make things easier to read for simple
replacements. For more complex variants it is recommended to use the
code snippet above.
XML::LibXML Comment Class
XML::LibXML::Comment
This class provides all functions of XML::LibXML::Text,
but for comment nodes. This can be done, since only the output of the
nodetypes is different, but not the datastructure. :-)
new
$node = XML::LibXML::Comment( $content );
The constructor is the only provided function for this
package. It is required, because libxml2 treats
text nodes and comment nodes slightly differently.
XML::LibXML Class for CDATA Sections
XML::LibXML::CDATASection
This class provides all functions of XML::LibXML::Text,
but for CDATA nodes.
new
$node = XML::LibXML::CDATASection( $content );
The constructor is the only provided function for this
package. It is required, because libxml2 treats
the different textnode types slightly differently.
XML::LibXML Attribute Class
XML::LibXML::Attr
This is the interface to handle Attributes like ordinary nodes. The
naming of the class relies on the W3C DOM documentation.
new
$attr = XML::LibXML::Attr->new($name [,$value]);
Class constructor. If you need to work with iso encoded
strings, you should always use the
createAttrbute of XML::LibXML::Document.
getValue
$string = $attr->getValue();
Returns the value stored for the attribute. If undef is
returned, the attribute has no value, which is different of being
not specified.
value
$value = $attr->value;
Alias for getValue()
setValue
$attr->setValue( $string );
This is needed to set a new attribute value. If iso encoded
strings are passed as parameter, the node has to be bound to a
document, otherwise the encoding might be done incorrectly.
getOwnerElement
$node = $attr->getOwnerElement();
returns the node the attribute belongs to. If the attribute is
not bound to a node, undef will be returned. Overwriting the
underlying implementation, the parentNode
function will return undef, instead of the owner element.
setNamespace
$attr->setNamespace($nsURI, $prefix);
This function activates a namespace for the given attribute.
If the attribute was not previously declared in the context of the
attribute this function will be silently ignored. In this case you
may wish to call setNamespace() on the ownerElement.
XML::LibXML's DOM L2 Document Fragment Implementation
XML::LibXML::DocumentFragment
This class is a helper class as described in the DOM Level 2
Specification. It is implemented as a node without name. All adding,
inserting or replacing functions are aware of document fragments now.
As well all unbound nodes (all nodes that do
not belong to any document subtree) are implicit members of document
fragments.
XML::LibXML Namespace Implementation
XML::LibXML::Namespace
Namespace nodes are returned by both
$element->findnodes('namespace::foo') or by
$node->getNamespaces().
The namespace node API is not part of any current DOM API, and so it
is quite minimal. It should be noted that namespace nodes are
not a sub class of XML::LibXML::Node, however
Namespace nodes act a lot like attribute nodes, and similarly named
methods will return what you would expect if you treated the namespace
node as an attribute.
new
my $ns = XML::LibXML::Namespace->new($nsURI);
Creates a new Namespace node. Note that this is not a
'node' as an attribute or an element node. Therefore you
can't do call all XML::LibXML::Node Functions. All functions
available for this node are listed below.
Optionally you can pass the prefix to the namespace
constructor. If this second parameter is omitted you will create a
so called default namespace. Note, the newly created namespace is
not bound to any docuement or node, therefore you should not expect
it to be available in an existing document.
getName
print $ns->getName()
Returns "xmlns:prefix", where prefix is the prefix for
this namespace.
name
print $ns->name()
Alias for getName()
prefix
print $ns->prefix()
Returns the prefix bound to this namespace declaration.
getLocalName
$localname = $ns->getLocalName()
Alias for prefix()
getData
print $ns->getData()
Returns the URI of the namespace.
getValue
print $ns->getValue()
Alias for getData()
value
print $ns->value()
Alias for getData()
uri
print $ns->uri()
Alias for getData()
getNamespaceURI
$known_uri = $ns->getNamespaceURI()
Returns the string "http://www.w3.org/2000/xmlns/"
getPrefix
$known_prefix = $ns->getPredix()
Returns the string "xmlns"
XML::LibXML Processing Instructions
XML::LibXML::PI
Processing instructions are implemented with XML::LibXML with read
and write access. The PI data is the PI without the PI target (as
specified in XML 1.0 [17]) as a string. This string can be accessed with
getData as implemented in XML::LibXML::Node.
The write access is aware about the fact, that many processing
instructions have attribute like data. Therefore setData() provides
besides the DOM spec conform Interface to pass a set of named parameter.
So the code segment
my $pi = $dom->createProcessingInstruction("abc");
$pi->setData(foo=>'bar', foobar=>'foobar');
$dom->appendChild( $pi );
will result the following PI in the DOM:
<?abc foo="bar" foobar="foobar"?>
Which is how it is specified in the DOM specification. This three
step interface creates temporary a node in perl space. This can be avoided
while using the insertProcessingInstruction() method. Instead of the three
calls described above, the call
$dom->insertProcessingInstruction("abc",'foo="bar" foobar="foobar"');
will have the same result as above.
XML::LibXML::PI's implementation of setData() differs a bit from
the the standard version as available in XML::LibXML::Node():
setData
$pinode->setData( $data_string );
$pinode->setData( name=>string_value [...] );
This method allows to change the content data of a PI.
Additionaly to the interface specified for DOM Level2, the method
provides a named parameter interface to set the data. This
parameterlist is converted into a string before it is appended to
the PI.
XML::LibXML DTD Handling
XML::LibXML::Dtd
This class holds a DTD. You may parse a DTD from either a string, or
from an external SYSTEM identifier.
No support is available as yet for parsing from a filehandle.
XML::LibXML::Dtd is a sub-class of Node, so all the methods
available to nodes (particularly toString()) are available to Dtd objects.
new
$dtd = XML::LibXML::Dtd->new($public_id, $system_id)
Parse a DTD from the system identifier, and return a DTD
object that you can pass to $doc->is_valid() or
$doc->validate().
my $dtd = XML::LibXML::Dtd->new(
"SOME // Public / ID / 1.0",
"test.dtd"
);
my $doc = XML::LibXML->new->parse_file("test.xml");
$doc->validate($dtd);
parse_string
$dtd = XML::LibXML::Dtd->parse_string($dtd_str)
The same as new() above, except you can parse a DTD from a
string.
RelaxNG Schema Validation
XML::LibXML::RelaxNG
The XML::LibXML::RelaxNG class is a tiny frontend to libxml2's
RelaxNG implementation. Currently it supports only schema parsing and
document validation.
new
$rngschema = XML::LibXML::RelaxNG->new( location => $filename_or_url );
$rngschema = XML::LibXML::RelaxNG->new( string => $xmlschemastring );
$rngschema = XML::LibXML::RelaxNG->new( DOM => $doc );
The constructor of XML::LibXML::RelaxNG may get called with
either one of three parameters. The parameter tells the class from
which source it should generate a validation schema. It is
important, that each schema only have a single source.
The location parameter allows to parse a schema from the
filesystem or a URL.
The string parameter will parse the schema from the given XML
string.
The DOM parameter allows to parse the schema from a preparsed
XML::LibXML::Document.
Note that the constructor will die() if the schema does not
meed the constraints of the RelaxNG specification.
validate
eval { $rngschema->validate( $doc ); };
This function allows to validate a document against the given
RelaxNG schema. If this function succeeds, it will return 0,
otherwise it will die() and report the errors found. Because of this
validate() should be always evaluated.
XML Schema Validation
XML::LibXML::Schema
The XML::LibXML::Schema class is a tiny frontend to libxml2's
XML Schema implementation. Currently it supports only schema parsing and
document validation.
new
$xmlschema = XML::LibXML::Schema->new( location => $filename_or_url );
$xmlschema = XML::LibXML::Schema->new( string => $xmlschemastring );
The constructor of XML::LibXML::Schema may get called with
either one of two parameters. The parameter tells the class from
which source it should generate a validation schema. It is
important, that each schema only have a single source.
The location parameter allows to parse a schema from the
filesystem or a URL.
The string parameter will parse the schema from the given XML
string.
Note that the constructor will die() if the schema does not
meed the constraints of the XML Schema specification.
validate
eval { $xmlschema->validate( $doc ); };
This function allows to validate a document against the given
XML Schema. If this function succeeds, it will return 0, otherwise
it will die() and report the errors found. Because of this
validate() should be always evaluated.