Describe the functions and capabilities of the APIs included within JAXP.


The Java API for XML Processing (JAXP) is for processing XML data using applications written in the Java programming language. JAXP leverages the parser standards Simple API for XML Parsing (SAX) and Document Object Model (DOM) so that you can choose to parse your data as a stream of events or to build an object representation of it. JAXP also supports the Extensible Stylesheet Language Transformations (XSLT) standard, giving you control over the presentation of the data and enabling you to convert the data to other XML documents or to other formats, such as HTML. JAXP also provides namespace support, allowing you to work with DTDs that might otherwise have naming conflicts.

Streaming API for XML (StAX) provides is the latest API in the JAXP family.

The main JAXP APIs are defined in the javax.xml.parsers package. That package contains vendor-neutral factory classes - SAXParserFactory, DocumentBuilderFactory, and TransformerFactory -which give you a SAXParser, a DocumentBuilder, and an XSLT transformer, respectively. DocumentBuilder, in turn, creates a DOM-compliant Document object.

The factory APIs let you plug in an XML implementation offered by another vendor without changing your source code. The implementation you get depends on the setting of the javax.xml.parsers.SAXParserFactory, javax.xml.parsers.DocumentBuilderFactory, and javax.xml.transform.TransformerFactory system properties, using System.setProperties() in the code, or -DpropertyName="..." on the command line. The default values (unless overridden at runtime on the command line or in the code) point to Sun's implementation.

The Simple API for XML (SAX) APIs

The basic outline of the SAX parsing APIs are shown below. To start the process, an instance of the SAXParserFactory class is used to generate an instance of the parser.

The Simple API for XML (SAX)

The parser wraps a SAXReader object. When the parser's parse() method is invoked, the reader invokes one of several callback methods implemented in the application. Those methods are defined by the interfaces ContentHandler, ErrorHandler, DTDHandler, and EntityResolver.

Here is a summary of the key SAX APIs:

A typical application implements most of the ContentHandler methods, at a minimum. Because the default implementations of the interfaces ignore all inputs except for fatal errors, a robust implementation may also want to implement the ErrorHandler methods.

When to Use SAX

SAX is fast and efficient, but its event model makes it most useful for such state-independent filtering. For example, a SAX parser calls one method in your application when an element tag is encountered and calls a different method when text is found. If the processing you're doing is state-independent (meaning that it does not depend on the elements have come before), then SAX works fine.

On the other hand, for state-dependent processing, where the program needs to do one thing with the data under element A but something different with the data under element B, then a pull parser such as the Streaming API for XML (StAX) would be a better choice. With a pull parser, you get the next node, whatever it happens to be, at any point in the code that you ask for it. So it's easy to vary the way you process text (for example), because you can process it multiple places in the program.

SAX requires much less memory than DOM, because SAX does not construct an internal representation (tree structure) of the XML data, as a DOM does. Instead, SAX simply sends data to the application as it is read; your application can then do whatever it wants to do with the data it sees.

Pull parsers and the SAX API both act like a serial I/O stream. You see the data as it streams in, but you can't go back to an earlier position or leap ahead to a different position. In general, such parsers work well when you simply want to read data and have the application act on it.

But when you need to modify an XML structure - especially when you need to modify it interactively - an in-memory structure makes more sense. DOM is one such model. However, although DOM provides many powerful capabilities for large-scale documents (like books and articles), it also requires a lot of complex coding.

The Document Object Model (DOM) APIs

Figure below shows the DOM APIs in action:


You use the javax.xml.parsers.DocumentBuilderFactory class to get a DocumentBuilder instance, and you use that instance to produce a Document object that conforms to the DOM specification. The builder you get, in fact, is determined by the system property javax.xml.parsers.DocumentBuilderFactory, which selects the factory implementation that is used to produce the builder.

static Document document;
public static void main(String argv[]) {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    try { 
        DocumentBuilder builder = factory.newDocumentBuilder();
        document = builder.parse( new File(argv[0]) );
    } catch (SAXParseException spe) {

You can also use the DocumentBuilder.newDocument() method to create an empty Document that implements the org.w3c.dom.Document interface. Alternatively, you can use one of the builder's parse methods to create a Document from existing XML data.

public static void buildDom() {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    try {
        DocumentBuilder builder = factory.newDocumentBuilder();
        document = builder.newDocument();  
        Element root = (Element) document.createElement("rootElement"); 
        root.appendChild(document.createTextNode(" "));
        document.createTextNode("text") );
    } catch (ParserConfigurationException pce) {
        // Parser with specified options can't be built

When to Use DOM

DOM is platform-neutral and language-neutral API that enables programs to dynamically update the contents of XML documents.

DOM creates an in-memory object representation of an entire XML document. This allows extreme flexibility in parsing, navigating, and updating the contents of a document. DOM's drawbacks are high memory requirements and the need for more powerful processing capabilities.

The Extensible Stylesheet Language Transformations (XSLT) APIs

Figure below shows the XSLT APIs in action:


A TransformerFactory object is instantiated and used to create a Transformer. The source object is the input to the transformation process. A source object can be created from a SAX reader, from a DOM, or from an input stream.

Similarly, the result object is the result of the transformation process. That object can be a SAX event handler, a DOM, or an output stream.

When the transformer is created, it can be created from a set of transformation instructions, in which case the specified transformations are carried out. If it is created without any specific instructions, then the transformer object simply copies the source to the result.

Here is a description of the packages that make up the JAXP Transformation APIs:


The StAX API exposes methods for iterative, event-based processing of XML documents. XML documents are treated as a filtered series of events, and infoset states can be stored in a procedural fashion. Moreover, unlike SAX, the StAX API is bidirectional, meaning that it can both read and write XML documents.

The StAX API is really two distinct API sets: a cursor API and an iterator API.

Comparing Cursor and Iterator APIs

Before choosing between the cursor and iterator APIs, you should note a few things that you can do with the iterator API that you cannot do with cursor API:

Similarly, keep some general recommendations in mind when making your choice:

When to Use StAX

StAX's bidirectional features, small memory footprint, and low processor requirements give it an advantage over APIs such as JAXB or DOM. StAX is particularly effective in extracting a small set of information from a large document. The primary drawback in using StAX is that you get a narrow view of the document - essentially you have to know what processing you will do before reading the XML document. Another drawback is that StAX is difficult to use if you return XML documents that follow complex schema.

Professional hosting     Belorussian informational portal         Free SCWCD 1.4 Study Guide     Free SCDJWS 1.4 Study Guide     SCDJWS 1.4 Quiz     Free IBM Certified Associate Developer Study Guide     IBM Test 000-287. Enterprise Application Development with IBM WebSphere Studio, V5.0 Study Guide     SCDJWS 5.0 Quiz