You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commons-dev@ws.apache.org by ve...@apache.org on 2009/07/26 15:41:30 UTC

svn commit: r797928 - /webservices/commons/trunk/modules/axiom/src/docbkx/devguide.xml

Author: veithen
Date: Sun Jul 26 13:41:30 2009
New Revision: 797928

URL: http://svn.apache.org/viewvc?rev=797928&view=rev
Log:
Added some StAX related information to the dev guide.

Modified:
    webservices/commons/trunk/modules/axiom/src/docbkx/devguide.xml

Modified: webservices/commons/trunk/modules/axiom/src/docbkx/devguide.xml
URL: http://svn.apache.org/viewvc/webservices/commons/trunk/modules/axiom/src/docbkx/devguide.xml?rev=797928&r1=797927&r2=797928&view=diff
==============================================================================
--- webservices/commons/trunk/modules/axiom/src/docbkx/devguide.xml (original)
+++ webservices/commons/trunk/modules/axiom/src/docbkx/devguide.xml Sun Jul 26 13:41:30 2009
@@ -85,4 +85,230 @@
             </variablelist>
         </section>
     </chapter>
+    
+    <chapter>
+        <title>The StAX specification</title>
+        <para>
+            The StAX specification comprises two parts: a specification document titled <quote>Streaming API
+            For XML JSR-173 Specification</quote> and a Javadoc describing the API. Both can be downloaded from the
+            <ulink url="http://jcp.org/en/jsr/detail?id=173">JSR-173 page</ulink>. Since StAX is part of Java 6,
+            the Javadocs can also be viewed
+            <ulink url="http://java.sun.com/javase/6/docs/api/javax/xml/stream/package-summary.html">online</ulink>. 
+        </para>
+        <section>
+            <title>Semantics of the <methodname>setPrefix</methodname> method</title>
+            <para>
+                Probably one of the more obscure parts of the StAX specifications is the meaning of the
+                <methodname>setPrefix</methodname><footnote><para>For simplicity, we only discuss
+                <methodname>setPrefix</methodname> here. The same remarks also apply to
+                <methodname>setDefaultNamespace</methodname>.</para></footnote> method defined by <classname>XMLStreamWriter</classname>.
+                To understand how this method works, it is necessary to look at different parts of the specification:
+            </para>
+            <itemizedlist>
+                <listitem>
+                    <para>
+                        The Javadoc of the <methodname>setPrefix</methodname> method.
+                    </para>
+                </listitem>
+                <listitem>
+                    <para>
+                        The table shown in the Javadoc of the <classname>XMLStreamWriter</classname> class
+                        in Java 6<footnote><para>This table is not included in the Javadoc in the original StAX
+                        specification.</para></footnote>.
+                    </para>
+                </listitem>
+                <listitem>
+                    <para>
+                        Section 5.2.2, <quote>Binding Prefixes</quote> of the specification.
+                    </para>
+                </listitem>
+                <listitem>
+                    <para>
+                        The example shown in section 5.3.2, <quote>XMLStreamWriter</quote> of the specification.
+                    </para>
+                </listitem>
+            </itemizedlist>
+            <para>
+                In addition, it is important to note the following facts:
+            </para>
+            <itemizedlist>
+                <listitem>
+                    <para>
+                        The terms <firstterm>defaulting prefixes</firstterm> used in section 5.2.2 of the
+                        specification and <firstterm>namespace repairing</firstterm> used in the Javadocs
+                        of <classname>XMLStreamWriter</classname> are synonyms.
+                    </para>
+                </listitem>
+                <listitem>
+                    <para>
+                        The methods writing namespace qualified information items, i.e.
+                        <methodname>writeStartElement</methodname>, <methodname>writeEmptyElement</methodname>
+                        and <methodname>writeAttribute</methodname> all come in two variants: one that
+                        takes a namespace URI and a prefix as arguments and one that only takes a
+                        namespace URI, but no prefix.
+                    </para>
+                </listitem>
+            </itemizedlist>
+            <para>
+                The purpose of the <methodname>setPrefix</methodname> method is simply to define the prefixes that
+                will be used by the variants of the <methodname>writeStartElement</methodname>,
+                <methodname>writeEmptyElement</methodname> and <methodname>writeAttribute</methodname> methods
+                that only take a namespace URI (and the local name). This becomes clear by looking at the
+                table in the <classname>XMLStreamWriter</classname> Javadoc. Note that a call to
+                <methodname>setPrefix</methodname> doesn't cause any output and it is still necessary
+                to use <methodname>writeNamespace</methodname> to actually write the necessary
+                namespace declarations. Otherwise the produced document will not be well formed with
+                respect to namespaces.
+            </para>
+            <para>
+                The Javadoc of the <methodname>setPrefix</methodname> method also clearly defines the scope
+                of the prefix bindings defined using that method: a prefix bound using
+                <methodname>setPrefix</methodname> remains valid till the invocation of
+                <methodname>writeEndElement</methodname> corresponding to the last invocation of
+                <methodname>writeStartElement</methodname>. While not explicitly mentioned in the
+                specifications, it is clear that a prefix binding may be masked by another binding
+                for the same prefix defined in a nested element.
+            </para>
+            <para>
+                An aspect that may cause confusion is the fact that in the example shown in section
+                5.3.2 of the specifications, the calls to <methodname>setPrefix</methodname> (and
+                <methodname>setDefaultNamespace</methodname>) all appear immediately before a
+                call to <methodname>writeStartElement</methodname> or <methodname>writeEmptyElement</methodname>.
+                This may lead people to incorrectly believe that a prefix binding defined using
+                <methodname>setPrefix</methodname> only applies to the next element
+                written<footnote><para>Another factor that contributes to the confusion is that in SAX,
+                prefix mappings are always generated before the corresponding <methodname>startElement</methodname>
+                event and that their scope ends with the corresponding <methodname>endElement</methodname>
+                event. This is so because the <classname>ContentHandler</classname> interface specifies that
+                <quote>all <methodname>startPrefixMapping</methodname> events will occur immediately before the
+                corresponding <methodname>startElement</methodname> event, and all <methodname>endPrefixMapping</methodname>
+                events will occur immediately after the corresponding <methodname>endElement</methodname>
+                event</quote>.</para></footnote>.
+                This interpretation is clearly in contradiction with the <methodname>setPrefix</methodname>
+                Javadoc, unless one assumes that <quote>the current START_ELEMENT / END_ELEMENT pair</quote>
+                means the element opened by a call to <methodname>writeStartElement</methodname> immediately following
+                the call to <methodname>setPrefix</methodname>. This however would be a very arbitrary interpretation
+                of the Javadoc.
+            </para>
+            <para>
+                The correctness of the comments in the previous paragraph can be checked using the following
+                code snippet:
+            </para>
+<programlisting>XMLOutputFactory f = XMLOutputFactory.newInstance();
+XMLStreamWriter writer = f.createXMLStreamWriter(System.out);
+writer.writeStartElement("root");
+writer.setPrefix("p", "urn:ns1");
+writer.writeEmptyElement("urn:ns1", "element1");
+writer.writeEmptyElement("urn:ns1", "element2");
+writer.writeEndElement();
+writer.flush();
+writer.close();</programlisting>
+            <para>
+                This produces the following output<footnote><para>This has been tested with
+                Woodstox 3.2.9, SJSXP 1.0.1 and version 1.2.0 of the reference
+                implementation.</para></footnote>:
+            </para>
+<screen><![CDATA[<root><p:element1/><p:element2/></root>]]></screen>
+            <para>
+                Since the code doesn't call <methodname>writeNamespace</methodname>, the output is obviously not
+                well formed with respect to namespaces, but it also clearly shows that the scope of the
+                prefix binding for <literal>p</literal> extends to the end of the
+                <sgmltag class="element">root</sgmltag> element and is not limited to
+                <sgmltag class="element">element1</sgmltag>.
+            </para>
+            <para>
+                To avoid unexpected results and keep the code maintainable, it is in general advisable to keep
+                the calls to <methodname>setPrefix</methodname> and <methodname>writeNamespace</methodname> aligned,
+                i.e. to make sure that the scope (in <classname>XMLStreamWriter</classname>) of the prefix binding
+                defined by <methodname>setPrefix</methodname> is compatible with the scope (in the produced
+                document) of the namespace declaration written by the corresponding call
+                to <methodname>writeNamespace</methodname>. This makes it necessary to write code like this:
+            </para>
+<programlisting>writer.writeStartElement("p", "element1", "urn:ns1");
+writer.setPrefix("p", "urn:ns1");
+writer.writeNamespace("p", "urn:ns1");</programlisting>
+            <para>
+                As can be seen from this code snippet, keeping the two scopes in sync makes it necessary to use
+                the <methodname>writeStartElement</methodname> variant which takes an explicit prefix. Note that
+                this somewhat conflicts with the purpose of the <methodname>setPrefix</methodname> method;
+                one may consider this as a flaw in the design of the StAX API.
+            </para>
+        </section>
+        <section>
+            <title>The three <classname>XMLStreamWriter</classname> usage patterns</title>
+            <para>
+                Drawing the conclusions from the previous section and taking into account that
+                <classname>XMLStreamWriter</classname> also has a <quote>namespace repairing</quote>
+                mode, one can see that there are in fact three different ways to use
+                <classname>XMLStreamWriter</classname>. These usage patterns correspond to the
+                three bullets in section 5.2.2 of the StAX specification<footnote><para>The content
+                of this section is largely based on a <ulink url="http://markmail.org/message/olsdl3p3gciqqeob">reply
+                posted by Tatu Saloranta on the Axiom mailing list</ulink>. Tatu is the main developer of the
+                Woodstox project.</para></footnote>:
+            </para>
+            <orderedlist>
+                <listitem>
+                    <para>
+                        In the <quote>namespace repairing</quote> mode (enabled by the
+                        <varname>javax.xml.stream.isRepairingNamespaces</varname> property), the writer
+                        takes care of all namespace bindings and declarations, with minimal help from
+                        the calling code. This will always produce output that is well-formed with respect
+                        to namespaces. On the other hand, this adds some overhead and the result may
+                        depend on the particular StAX implementation (though the result produced by
+                        different implementations will be equivalent).
+                    </para>
+                    <para>
+                        In repairing mode the calling code should avoid writing namespaces explicitly
+                        and leave that job to the writer. There is also no need to call
+                        <methodname>setPrefix</methodname>, except to suggest a preferred prefix for
+                        a namespace URI. All variants of <methodname>writeStartElement</methodname>,
+                        <methodname>writeEmptyElement</methodname> and <methodname>writeAttribute</methodname>
+                        may be used in this mode, but the implementation can choose whatever prefix mapping
+                        it wants, as long as the output results in proper URI mapping for elements and
+                        attributes.
+                    </para>
+                </listitem>
+                <listitem>
+                    <para>
+                        Only use the variants of the writer methods that take an explicit prefix together
+                        with the namespace URI. In this usage pattern, <methodname>setPrefix</methodname>
+                        is not used at all and it is the responsibility of the calling code to keep
+                        track of prefix bindings.
+                    </para>
+                    <para>
+                        Note that this approach is difficult to implement when different parts of the output document
+                        will be produced by different components (or even different libraries). Indeed, when
+                        passing the <classname>XMLStreamWriter</classname> from one method or component
+                        to the other, it will also be necessary to pass additional information about the
+                        prefix mappings in scope at that moment, unless the it is acceptable to let the
+                        called method write (potentially redundant) namespace declarations for all namespaces
+                        it uses.
+                    </para>
+                </listitem>
+                <listitem>
+                    <para>
+                        Use <methodname>setPrefix</methodname> to keep track of prefix bindings and make sure that
+                        the bindings are in sync with the namespace declarations that have been written,
+                        i.e. always use <methodname>setPrefix</methodname> immediately before or immediately
+                        after each call to <methodname>writeNamespace</methodname>. Note that the code is
+                        still free to use all variants of <methodname>writeStartElement</methodname>,
+                        <methodname>writeEmptyElement</methodname> and <methodname>writeAttribute</methodname>;
+                        it only needs to make sure that the usage it makes of these methods is consistent with
+                        the prefix bindings in scope.
+                    </para>
+                    <para>
+                        The advantage of this approach is that it allows to write modular code: when a
+                        method receives an <classname>XMLStreamWriter</classname> object (to write
+                        part of the document), it can use
+                        the namespace context of that writer (i.e. <methodname>getPrefix</methodname>
+                        and <methodname>getNamespaceContext</methodname>) to determine which namespace
+                        declarations are currently in scope in the output document and to avoid
+                        redundant or conflicting namespace declarations. Note that in order to do so,
+                        such code will have to check for an existing prefix binding before starting
+                        to use a namespace.
+                    </para>
+                </listitem>
+            </orderedlist>
+        </section>
+    </chapter>
 </book>
\ No newline at end of file