Libxml2 Parsers Interfaces tutorial

This document provides an overview of the the different parsers interfaces provided by libxml2. There is 2 parsers available to deal with both XML and HTML, which can be used with 3 groups of APIs offering callbacks, streaming or tree results, and then there is different ways to provide the data to the parser. This document describes the set of interfaces available in version 2.6.5:

Description of the parsers

The parser are the core piece of the library which consumes the data, analyze and check the content and structure and returns the the informations and errors in a structured fashion to the application. The C structure xmlParserCtxt driving this process is public but should rather be used through the available APIs. It is the same for the HTML and XML parsers though most of the data are used only for XML parsing.

The callback based SAX(2) interface

The SAX callback interface is the lowest level interface available from the parsers, all the other interfaces are actually built on top of this very low level layer. It is really fast but somewhat complex due to the callback programming model and lack of advanced features like validation. The principle is that as the parser is making progresses through the document data it will indicate the application of the informations found using callback registered when building the parser.

The xmlReader interface

The tree interface

Parser in pull mode

Parser in push mode

Daniel Veillard

$Id: parsers.html,v 1.1.1.1 2004/02/26 20:58:30 rbraun Exp $