You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@xalan.apache.org by sh...@apache.org on 2014/05/16 18:11:35 UTC
svn commit: r1595253 [14/18] - in /xalan/java/branches/WebSite: ./ xalan-j/
xalan-j/design/ xalan-j/design/resources/ xalan-j/resources/ xalan-j/xsltc/
xalan-j/xsltc/resources/
Added: xalan/java/branches/WebSite/xalan-j/xsltc/xsl_whitespace_design.html
URL: http://svn.apache.org/viewvc/xalan/java/branches/WebSite/xalan-j/xsltc/xsl_whitespace_design.html?rev=1595253&view=auto
==============================================================================
--- xalan/java/branches/WebSite/xalan-j/xsltc/xsl_whitespace_design.html (added)
+++ xalan/java/branches/WebSite/xalan-j/xsltc/xsl_whitespace_design.html Fri May 16 16:11:33 2014
@@ -0,0 +1,471 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html>
+<head>
+<title>ASF: <xsl:strip/preserve-space></title>
+<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
+<meta http-equiv="Content-Style-Type" content="text/css" />
+<link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" />
+</head>
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+<body>
+<div id="title">
+<table class="HdrTitle">
+<tbody>
+<tr>
+<th rowspan="2">
+<a href="../index.html">
+<img alt="Trademark Logo" src="resources/XalanJ-Logo-tm.png" width="190" height="90" />
+</a>
+</th>
+<th text-align="center" width="75%">
+<a href="index.html">XSLTC Design</a>
+</th>
+</tr>
+<tr>
+<td valign="middle"><xsl:strip/preserve-space></td>
+</tr>
+</tbody>
+</table>
+<table class="HdrButtons" align="center" border="1">
+<tbody>
+<tr>
+<td>
+<a href="http://www.apache.org">Apache Foundation</a>
+</td>
+<td>
+<a href="http://xalan.apache.org">Xalan Project</a>
+</td>
+<td>
+<a href="http://xerces.apache.org">Xerces Project</a>
+</td>
+<td>
+<a href="http://www.w3.org/TR">Web Consortium</a>
+</td>
+<td>
+<a href="http://www.oasis-open.org/standards">Oasis Open</a>
+</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div id="navLeft">
+<ul>
+<li>
+<a href="index.html">Overview</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_compiler.html">Compiler design</a>
+</li></ul><hr /><ul>
+<li>Whitespace<br />
+</li>
+<li>
+<a href="xsl_sort_design.html">xsl:sort</a>
+</li>
+<li>
+<a href="xsl_key_design.html">Keys</a>
+</li>
+<li>
+<a href="xsl_comment_design.html">Comment design</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_lang_design.html">lang()</a>
+</li>
+<li>
+<a href="xsl_unparsed_design.html">Unparsed entities</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_if_design.html">If design</a>
+</li>
+<li>
+<a href="xsl_choose_design.html">Choose|When|Otherwise design</a>
+</li>
+<li>
+<a href="xsl_include_design.html">Include|Import design</a>
+</li>
+<li>
+<a href="xsl_variable_design.html">Variable|Param design</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_runtime.html">Runtime</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_dom.html">Internal DOM</a>
+</li>
+<li>
+<a href="xsltc_namespace.html">Namespaces</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_trax.html">Translet & TrAX</a>
+</li>
+<li>
+<a href="xsltc_predicates.html">XPath Predicates</a>
+</li>
+<li>
+<a href="xsltc_iterators.html">Xsltc Iterators</a>
+</li>
+<li>
+<a href="xsltc_native_api.html">Xsltc Native API</a>
+</li>
+<li>
+<a href="xsltc_trax_api.html">Xsltc TrAX API</a>
+</li>
+<li>
+<a href="xsltc_performance.html">Performance Hints</a>
+</li>
+</ul>
+</div>
+<div id="content">
+<h2><xsl:strip/preserve-space></h2>
+
+ <ul>
+ <li>
+<a href="#functionality">Functionality</a>
+</li>
+ <li>
+<a href="#identify">Identifying strippable whitespace nodes</a>
+</li>
+ <li>
+<a href="#which">Determining which nodes to strip</a>
+</li>
+ <li>
+<a href="#strip">Stripping nodes</a>
+</li>
+ <li>
+<a href="#filter">Filtering whitespace nodes</a>
+</li>
+ </ul>
+
+ <a name="functionality">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Functionality</h3>
+
+ <p>The <code><xsl:strip-space></code> and <code><xsl:preserve-space></code>
+ elements are used to control the way whitespace nodes in the source XML
+ document are handled. These elements have no impact on whitespace in the XSLT
+ stylesheet. Both elements can occur only as top-level elements, possible more
+ than once, and the elements are always empty</p>
+
+ <p>Both elements take one attribute "elements" which contains a
+ whitespace separated list of named nodes which should be or preserved
+ stripped from the source document. These names can be on one of these three
+ formats (NameTest format):</p>
+
+ <ul>
+ <li>
+ All whitespace nodes:
+ <code>elements="*"</code>
+ </li>
+ <li>
+ All whitespace nodes with a namespace:
+ <code>elements="<namespace>:*"</code>
+ </li>
+ <li>
+ Specific whitespace nodes: <code>elements="<qname>"</code>
+ </li>
+ </ul>
+
+ <a name="identify">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Identifying strippable whitespace nodes</h3>
+
+ <p>The DOM detects all text nodes and assigns them the type <code>TEXT</code>.
+ All text nodes are scanned to detect whitespace-only nodes. A text-node is
+ considered a whitespace node only if it consist entirely of characters from
+ the set { 0x09, 0x0a, 0x0d, 0x20 }. The DOM implementation class has a static
+ method used to detect such nodes:</p>
+
+<blockquote class="source">
+<pre>
+ private static final boolean isWhitespaceChar(char c) {
+ return c == 0x20 || c == 0x0A || c == 0x0D || c == 0x09;
+ }
+</pre>
+</blockquote>
+
+ <p>The characters are checked in probable order.</p>
+
+ <p> The DOM has a bit-array that is used to tag text-nodes as strippable
+ whitespace nodes:</p>
+
+ <blockquote class="source">
+<pre>private int[] _whitespace;</pre>
+</blockquote>
+
+ <p>There are two methods in the DOM implementation class for accessing this
+ bit-array: <code>markWhitespace(node)</code> and <code>isWhitespace(node)</code>.
+ The array is resized together with all other arrays in the DOM by the
+ <code>DOM.resizeArrays()</code> method. The bits in the array are set in the
+ <code>DOM.maybeCreateTextNode()</code> method. This method must know whether
+ the current node is a located under an element with an
+ <code>xml:space="<value>"</code> attribute in the DOM, in which
+ case it is not a strippable whitespace node.</p>
+
+ <p>An auxillary class, WhitespaceHandler, is used for this purpose. The class
+ works in a way as a stack, where you "push" a new strip/preserve setting
+ together with the node in which this setting was determined. This means that
+ for every time the DOM builder encounters an <code>xml:space</code> attribute
+ it will invoke a method on an instance of the WhitespaceHandler class to
+ signal that a new preserve/strip setting has been encountered. This is done
+ in the <code>makeAttributeNode()</code> method. The whitespace handler stores the
+ new setting and pushes the current element node on its stack. When the
+ DOM builder closes up an element (in <code>endElement()</code>), it invokes
+ another method of the whitespace handler to check if the strip/preserve
+ setting is still valid. If the setting is now invalid (we're closing the
+ element whose node id is on the top of the stack) the handler inverts the
+ setting and pops the element node id off the stack. The previous
+ strip/preserve setting is then valid, and the id of node where this setting
+ was defined is on the top of the stack.</p>
+
+ <a name="which">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Determining which nodes to strip</h3>
+
+ <p>A text node is never stripped unless it contains only whitespace
+ characters (Unicode characters 0x09, 0x0A, 0x0D and 0x20). Stripping a text
+ node means that the node disappears from the DOM; so that it is never
+ included in the output and that it is ignored by all functions such as
+ <code>count()</code>. A text node is preserved if any of the following apply:</p>
+
+ <ul>
+ <li>
+ the element name of the parent of the text node is in the set of
+ elements listed in <code><xsl:preserve-space></code>
+ </li>
+ <li>
+ the text node contains at least one non-whitespace character
+ </li>
+ <li>
+ an ancenstor of the whitespace text node has an attribute of
+ <code>xsl:space="preserve"</code>, and no close ancestor has and
+ attribute of <code>xsl:space="default"</code>.
+ </li>
+ </ul>
+
+ <p>Otherwise, the text node is stripped. Initially the set of
+ whitespace-preserving element names contains all element names, so the
+ default behaviour is to preserve all whitespace text nodes.</p>
+
+ <p>This seems simple enough, but resolving conflicts between matching
+ <code><xsl:strip-space></code> and <code><xsl:preserve-space></code>
+ elements requires a lot of thought. Our first consideration is import
+ precedence; the match with the highest import precedence is always chosen.
+ Import precedence is determined by the order in which the compared elements
+ are visited. (In this case those elements are the top-level
+ <code><xsl:strip-space></code> and <code><xsl:preserve-space></code>
+ elements.) This example is taken from the XSLT recommendation:</p>
+
+ <ul>
+ <li>stylesheet A imports stylesheets B and C in that order;</li>
+ <li>stylesheet B imports stylesheet D;</li>
+ <li>stylesheet C imports stylesheet E.</li>
+ </ul>
+
+ <p>Then the order of import precedence (lowest first) is D, B, E, C, A.</p>
+
+ <p>Our next consideration is the priority of NameTests (XPath spec):</p>
+ <ul>
+ <li>
+ <code>elements="<qname>"</code> has priority 0
+ </li>
+ <li>
+ <code>elements="<namespace>:*"</code> has priority -0.25
+ </li>
+ <li>
+ <code>elements="*"</code> has priority -0.5
+ </li>
+ </ul>
+
+ <p>It is considered an error if the desicion is still ambiguous after this,
+ and it is up to the implementors to decide what the apropriate action is.</p>
+
+ <p>With all this complexity, the normal usage for these elements is quite
+ smiple; either preserve all whitespace nodes but one type:</p>
+
+ <blockquote class="source">
+<pre><xsl:strip-space elements="foo"/></pre>
+</blockquote>
+
+ <p>or strip all whitespace nodes but one type:</p>
+
+ <blockquote class="source">
+<pre>
+ <xsl:strip-space elements="*"/>
+ <xsl:preserve-space elements="foo"/></pre>
+</blockquote>
+
+ <a name="strip">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Stripping nodes</h3>
+
+ <p>The ultimate goal of our design would be to totally screen all stripped
+ nodes from the translet; to either physically remove them from the DOM or to
+ make it appear as if they are not there. The first approach will cause
+ problems in cases where multiple translets access the same DOM. In the future
+ we wish to let translets run within servlets / JSPs with a common DOM cache.
+ This DOM cache will keep copies of DOMs in memory to prevent the same XML
+ file from being downloaded and parsed several times. This is a scenarios we
+ might see:</p>
+
+ <p>
+<img src="DOMInterface.gif" alt="DOMInterface.gif" />
+</p>
+ <p>
+<b>
+<i>Figure 1: Multiple translets accessing a common pool of DOMs</i>
+</b>
+</p>
+
+ <p>The three translets running on this host access a common pool of 4 DOMs.
+ The DOMs are accessed through a common DOM interface. Translets accessing
+ a single DOM will have a DOMAdapter and a single DOMImpl object behind this
+ interface, while translets accessing several DOMs will be given a MultiDOM
+ and a set of DOMImpl objects.</p>
+
+ <p>The translet to the left may want to strip some nodes from the shared DOM
+ in the cache, while the other translets may want to preserve all whitespace
+ nodes. Our initial thought then is to keep the DOM as it is and somehow
+ screen the left-hand translet of all the whitespace nodes it does not want to
+ process. There are a few ways in which we can accomplish this:</p>
+
+ <ul>
+ <li>
+ The translet can, prior to starting to traverse the DOM, send a reference
+ to the tables containing information on which nodes we want stripped to
+ the DOM interface. The DOM interface is then responsible for hiding all
+ stripped whitespace nodes from the iterators and the translet. A problem
+ with this approach is that we want to omit the DOM interface layer if
+ the translet is only accessing a single DOM. The DOM interface layer will
+ only be instanciated by the translet if the stylesheet contained a call
+ to the <code>document()</code> function.<br />
+<br />
+ </li>
+ <li>
+ The translet can provide its iterators with information on which nodes it
+ does not want to see. The translet is still shielded from unwanted
+ whitespace nodes, but it has the hassle of passing extra information over
+ to most iterators it ever instanciates. Note that all iterators do not
+ need be aware of whitepspace nodes in this case. If you have a look at
+ the figure again you will see that only the first level iterator (that is
+ the one closest to the DOM or DOM interface) will have to strip off
+ whitespace nodes. But, there may be several iterators that operate
+ directly on the DOM ( invoked by code handling XSL functions such as
+ <code>count()</code>) and every single one of those will need to be told
+ which whitespace nodes the translet does not want to see.<br />
+<br />
+ </li>
+ <li>
+ The third approach will take advantage of the fact that not all
+ translets will want to strip whitespace nodes. The most effective way of
+ removing unwanted whitespace nodes is to do it once and for all, before
+ the actual traversal of the DOM starts. This can be done by making a
+ clone of the DOM with exlusive-access rights for this translet only. We
+ still gain performance from the cache because we do not have to pay the
+ cost of the delay caused by downloading and parsing the XML source file.
+ The cost we have to pay is the time needed for the actual cloning and the
+ extra memory we use.<br />
+<br />
+ Normally one would imagine the translet (or the wrapper class that
+ invokes the translet) calls the DOM cache with just an URL and receives
+ a reference to an instanciated DOM. The cache will either have built
+ this DOM on-demand or just passed back a reference to an existing tree.
+ In this case the DOM would need an extra call that a translet would use
+ to clone a DOM, passing the existing DOM reference to the cache and
+ recieving a new reference to the cloned DOM. The translet can then do
+ whatever it wants with this DOM (the cache need not even keep a reference
+ to this tree).
+ </li>
+ </ul>
+
+ <p>We are lucky enough to be able to combine the first two approaches. All
+ iterators that directly access the DOM (axis iterators) are instanciated by
+ calls to the DOM interface layer (the DOM class). The actual iterators are
+ created in the DOM implementation layer (the DOMImpl class). So, we can pass
+ references to the preserve/strip whitespace tables to the DOM, and the DOM
+ will make sure that all axis iterators return node sets with respect to these
+ tables.</p>
+ <a name="filter">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Filtering whitespace nodes</h3>
+
+ <p>For each axis iterator and for <code>DOM.makeStringValue()</code> and
+ <code>DOM.stringValueAux()</code> we must apply a filter for eliminating all
+ unwanted whitespace nodes. To achive this we must build a very efficient
+ predicate for determining if the current node should be stripped or not. This
+ predicate is built by <code>Whitespace.compilePredicate()</code>. This method is
+ static and builds a predicate for a vector of WhitespaceRule objects. (The
+ WhitespaceRule class is defined within the Whitespace class.) Each
+ WhitespaceRule object contains information for a single element listed
+ in an <code><xsl:strip/preserve-space></code> element, and is broken down
+ into the following information:</p>
+
+ <ul>
+ <li>the namespace (can be the default namespace)</li>
+ <li>the element name or "<code>*</code>"</li>
+ <li>the type of rule; NS:EL, NS:<code>*</code> or <code>*</code>
+</li>
+ <li>the priority of the rule (based on import precedence and type)</li>
+ <li>the action; either strip or preserver</li>
+ </ul>
+
+ <p>The Vector of WhitespaceRules is arranged in order of priority and
+ redundant rules are removed. A predicate method is then compiled into the
+ translet as:</p>
+
+<blockquote class="source">
+<pre>
+ public boolean stripSpace(int node);
+</pre>
+</blockquote>
+
+ <p>Unfortunately this method cannot be declared static.</p>
+
+ <p>When the Stylesheet objectcompiles the <code>topLevel()</code> method of the
+ translet it checks for the existance of the <code>stripSpace()</code> method. If
+ this method exists the <code>topLevel()</code> will be compiled to pass the
+ translet to the DOM as a StripWhitespaceFilter (the translet implements this
+ interface when the <code>stripSpace()</code> method is compiled).</p>
+
+ <p>All axis iterators and the <code>DOM.makeStringValue()</code> and
+ <code>DOM.stringValueAux()</code> methods check for the existance of this filter
+ (it is kept in a global variable in the DOM implementation class) and takes
+ the appropriate actions. The methods in the DOM for returning axis iterators
+ will place a StrippingIterator on top of the axis iterator if the filter is
+ present, and the two methods just mentioned will return empty strings for
+ whitespace nodes that should be stripped.</p>
+
+
+<p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+</div>
+<div id="footer">Copyright © 1999-2014 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Thu 2014-05-15</div>
+</div>
+</body>
+</html>
Propchange: xalan/java/branches/WebSite/xalan-j/xsltc/xsl_whitespace_design.html
------------------------------------------------------------------------------
svn:eol-style = native
Added: xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_compiler.html
URL: http://svn.apache.org/viewvc/xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_compiler.html?rev=1595253&view=auto
==============================================================================
--- xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_compiler.html (added)
+++ xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_compiler.html Fri May 16 16:11:33 2014
@@ -0,0 +1,560 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html>
+<head>
+<title>ASF: XSLTC Compiler Design</title>
+<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
+<meta http-equiv="Content-Style-Type" content="text/css" />
+<link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" />
+</head>
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+<body>
+<div id="title">
+<table class="HdrTitle">
+<tbody>
+<tr>
+<th rowspan="2">
+<a href="../index.html">
+<img alt="Trademark Logo" src="resources/XalanJ-Logo-tm.png" width="190" height="90" />
+</a>
+</th>
+<th text-align="center" width="75%">
+<a href="index.html">XSLTC Design</a>
+</th>
+</tr>
+<tr>
+<td valign="middle">XSLTC Compiler Design</td>
+</tr>
+</tbody>
+</table>
+<table class="HdrButtons" align="center" border="1">
+<tbody>
+<tr>
+<td>
+<a href="http://www.apache.org">Apache Foundation</a>
+</td>
+<td>
+<a href="http://xalan.apache.org">Xalan Project</a>
+</td>
+<td>
+<a href="http://xerces.apache.org">Xerces Project</a>
+</td>
+<td>
+<a href="http://www.w3.org/TR">Web Consortium</a>
+</td>
+<td>
+<a href="http://www.oasis-open.org/standards">Oasis Open</a>
+</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div id="navLeft">
+<ul>
+<li>
+<a href="index.html">Overview</a>
+</li></ul><hr /><ul>
+<li>Compiler design<br />
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_whitespace_design.html">Whitespace</a>
+</li>
+<li>
+<a href="xsl_sort_design.html">xsl:sort</a>
+</li>
+<li>
+<a href="xsl_key_design.html">Keys</a>
+</li>
+<li>
+<a href="xsl_comment_design.html">Comment design</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_lang_design.html">lang()</a>
+</li>
+<li>
+<a href="xsl_unparsed_design.html">Unparsed entities</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_if_design.html">If design</a>
+</li>
+<li>
+<a href="xsl_choose_design.html">Choose|When|Otherwise design</a>
+</li>
+<li>
+<a href="xsl_include_design.html">Include|Import design</a>
+</li>
+<li>
+<a href="xsl_variable_design.html">Variable|Param design</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_runtime.html">Runtime</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_dom.html">Internal DOM</a>
+</li>
+<li>
+<a href="xsltc_namespace.html">Namespaces</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_trax.html">Translet & TrAX</a>
+</li>
+<li>
+<a href="xsltc_predicates.html">XPath Predicates</a>
+</li>
+<li>
+<a href="xsltc_iterators.html">Xsltc Iterators</a>
+</li>
+<li>
+<a href="xsltc_native_api.html">Xsltc Native API</a>
+</li>
+<li>
+<a href="xsltc_trax_api.html">Xsltc TrAX API</a>
+</li>
+<li>
+<a href="xsltc_performance.html">Performance Hints</a>
+</li>
+</ul>
+</div>
+<div id="content">
+<h2>XSLTC Compiler Design</h2>
+
+ <ul>
+ <li>
+<a href="#overview">Compiler Overview</a>
+</li>
+ <li>
+<a href="#ast">Building the Abstract Syntax Tree</a>
+</li>
+ <li>
+<a href="#typecheck">Type-check and Cast Expressions</a>
+</li>
+ <li>
+<a href="#compile">JVM byte-code generation</a>
+</li>
+ </ul>
+
+
+
+ <a name="overview">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Compiler overview</h3>
+
+ <p>The main component of the XSLTC compiler is the class</p>
+ <ul>
+ <li>
+<code>org.apache.xalan.xsltc.compiler.XSLTC</code>
+</li>
+ </ul>
+
+ <p>This class uses three parsers to consume the input stylesheet(s):</p>
+
+ <ul>
+ <li>
+<code>javax.xml.parsers.SAXParser</code>
+</li>
+ </ul>
+
+ <p>is used to parse the stylesheet document and pass its contents to
+ the compiler as basic SAX2 events.</p>
+
+ <ul>
+ <li>
+<code>com.sun.xslt.compiler.XPathParser</code>
+</li>
+ </ul>
+
+ <p> is a parser used to parse XPath expressions and patterns. This parser
+ is generated using JavaCUP and JavaLEX from Princeton University.</p>
+
+ <ul>
+ <li>
+<code>com.sun.xslt.compiler.Parser</code>
+</li>
+ </ul>
+
+ <p>is a wrapper for the other two parsers. This parser is responsible for
+ using the other two parsers to build the compiler's abstract syntax tree
+ (which is described in more detail in the next section of this document).
+ </p>
+
+
+
+
+ <a name="ast">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Building an Abstract Syntax Tree</h3>
+
+ <p>An abstract syntax tree (AST) is a data-structure commonly used by
+ compilers to separate the parse-phase from the later phases of the
+ compilation. The AST has one node for each parsed token from the stylesheet
+ and can easily be parsed at the stages of type-checking and bytecode
+ generation.</p>
+
+ <ul>
+ <li>
+ <a href="#mapping">Mapping stylesheet elements to AST nodes</a>
+ </li>
+ <li>
+ <a href="#domxsl">Building the AST from AST nodes</a>
+ </li>
+ <li>
+ <a href="#mapping">Mapping XPath expressions and patterns to additional AST nodes</a>
+ </li>
+ </ul>
+
+ <p>The SAX parser passes the contents of the stylesheet to XSLTC's main
+ parser. The SAX events represent a decomposition of the XML document that
+ contains the stylesheet. The main parser needs to create one AST node from
+ each node that it receives from the SAX parser. It also needs to use the
+ XPath parser to decompose attributes that contain XPath expressions and
+ patterns. Remember that XSLT is in effect two languages: XML and XPath,
+ and one parser is needed for each of these languages. The SAX parser breaks
+ down the stylesheet document, the XPath parser breaks down XPath expressions
+ and patterns, and the main parser maps the decomposed elements into nodes
+ in the abstract syntax tree.</p>
+
+ <a name="mapping">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Mapping stylesheets elements to AST nodes</h4>
+
+ <p>Every element that is defined in the XSLT 1.0 spec is represented by a
+ a class in the <code>org.apache.xalan.xsltc.compiler</code> package. The
+ main parser class contains a <code>Hashtable</code> that that maps XSL
+ elements into Java classes that make up the nodes in the AST. These Java
+ classes all reside in the <code>org.apache.xalan.xsltc.compiler</code>
+ package and extend either the <code>TopLevelElement</code> or the
+ <code>Instruction</code> classes. (Both these classes extend the
+ <code>SyntaxTreeNode</code> class.)</p>
+
+ <p>The mapping from XSL element names to Java classes/AST nodes is set up
+ in the <code>initClasses()</code> method of the main parser:</p>
+<blockquote class="source">
+<pre>
+ private void initStdClasses() {
+ try {
+ initStdClass("template", "Template");
+ initStdClass("param", "Param");
+ initStdClass("with-param", "WithParam");
+ initStdClass("variable", "Variable");
+ initStdClass("output", "Output");
+ :
+ :
+ :
+ }
+ }
+
+ private void initClass(String elementName, String className)
+ throws ClassNotFoundException {
+ _classes.put(elementName,
+ Class.forName(COMPILER_PACKAGE + '.' + className));
+ }</pre>
+</blockquote>
+
+
+
+ <a name="domxsl">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Building the AST from AST nodes</h4>
+ <p>The parser builds an AST from the various syntax tree nodes. Each node
+ contains a reference to its parent node, a vector containing references
+ to all child nodes and a structure containing all attribute nodes:</p>
+<blockquote class="source">
+<pre>
+ protected SyntaxTreeNode _parent; // Parent node
+ private Vector _contents; // Child nodes
+ protected Attributes _attributes; // Attributes of this element</pre>
+</blockquote>
+
+
+ <p>These variables should be accessed using these methods:</p>
+<blockquote class="source">
+<pre>
+ protected final SyntaxTreeNode getParent();
+ protected final Vector getContents();
+ protected String getAttribute(String qname);
+ protected Attributes getAttributes();</pre>
+</blockquote>
+
+ <p>At this time the AST only contains nodes that represent the XSL elements
+ from the stylesheet. A SAX parse is generic and can only handle XML files
+ and will not break up and identify XPath patterns/expressions (these are
+ stored as attributes to the various nodes in the tree). Each XSL instruction
+ gets its own node in the AST, and the XPath patterns/expressions are stored
+ as attributes of these nodes. A stylesheet looking like this:</p>
+<blockquote class="source">
+<pre>
+ <xsl:stylesheet .......>
+ <xsl:template match="chapter">
+ <xsl:text>Chapter</xsl:text>
+ <xsl:value-of select=".">
+ </xsl:template>
+ </xsl>stylesheet></pre>
+</blockquote>
+
+ <p>will be stored in the AST as indicated in the following picture:</p>
+ <p>
+<img src="ast_stage1.gif" alt="ast_stage1.gif" />
+</p>
+ <p>
+<b>
+<i>Figure 1: The AST in its first stage</i>
+</b>
+</p>
+
+ <p>All objects that make up the nodes in the initial AST have a
+ <code>parseContents()</code> method. This method is responsible for:</p>
+
+ <ul>
+ <li>parsing the values of those attributes that contain XPath expressions
+ or patterns, breaking each expression/pattern into AST nodes and inserting
+ them into the tree.</li>
+ <li>reading/checking all other required attributes</li>
+ <li>propagate the <code>parseContents()</code> call down the tree</li>
+ </ul>
+
+
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Mapping XPath expressions and patterns to additional AST nodes</h4>
+
+ <p>The nodes that represent the XPath expressions and patterns extend
+ either the <code>Expression</code> or <code>Pattern</code> class
+ respectively. These nodes are not appended to the <code>_contents</code>
+ vectory of each node, but rather stored as individual references in each
+ AST element node. One example is the <code>ForEach</code> class that
+ represents the <code><xsl:for-each></code> element. This class has
+ a variable that contains a reference to the AST sub-tree that represents
+ its <code>select</code> attribute:</p>
+<blockquote class="source">
+<pre>
+ private Expression _select;</pre>
+</blockquote>
+
+ <p>There is no standard way of storing these XPath expressions and each
+ AST node that contains one or more XPath expression/pattern must handle
+ that itself. This handling basically involves passing the attribute's
+ value to the XPath parser and receiving back an AST sub-tree.</p>
+
+ <p>With all XPath expressions/patterns expanded, the AST will look somewhat
+ like this:</p>
+
+ <p>
+<img src="ast_stage2.gif" alt="ast_stage2.gif" />
+</p>
+ <p>
+<b>
+<i>Fiugre 2: The AST in its second stage</i>
+</b>
+</p>
+
+
+
+
+
+
+ <a name="typecheck">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Type-check and Cast Expressions</h3>
+
+ <p>In many cases we will need to typecast the top node in the expression
+ sub-tree to suit the expected result-type of the expression, or to typecast
+ child nodes to suit the allowed types for the various operators in the
+ expression. This is done by calling 'typeCheck()' on the root-node in the
+ XSL tree. Each SyntaxTreeNode node is responsible for inserting type-cast
+ nodes between itself and its child nodes or XPath nodes. These type-cast
+ nodes will convert the output-types of the child/XPath nodes to the expected
+ input-type of the parent node. Let look at our AST again and the node that
+ represents the <code><xsl:value-of></code> element. This element
+ expects to receive a string from its <code>select</code> XPath expression,
+ but the <code>Step</code> expression will return either a node-set or a
+ single node. An extra node is inserted into the AST to perform the
+ necessary type conversions:</p>
+
+ <p>
+<img src="ast_stage3.gif" alt="ast_stage3.gif" />
+</p>
+ <p>
+<b>
+<i>Figure 3: XPath expression type cast</i>
+</b>
+</p>
+
+ <p>The <code>typeCheck()</code> method of each SyntaxTreeNode object will
+ call <code>typeCheck()</code> on each of its XPath expressions. This method
+ will return the native type returned by the expression. The AST node will
+ insert an additional type-conversion node if the return-type does not match
+ the expected data-type. Each possible return type is represented by a class
+ in the <code>org.apache.xalan.xsltc.compiler.util</code> package. These
+ classes all contain methods that will generate bytecodes needed to perform
+ the actual type conversions (at runtime). The type-cast nodes in the AST
+ mainly consist of calls to these methods.</p>
+
+
+
+
+ <a name="compile">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>JVM byte-code generation</h3>
+
+ <ul>
+ <li>
+<a href="#stylesheet">Compiling the stylesheet</a>
+</li>
+ <li>
+<a href="#toplevel">Compiling top-level elements</a>
+</li>
+ <li>
+<a href="#templates">Compiling template code</a>
+</li>
+ <li>
+<a href="#instructions">Compiling instructions, functions expressions and patterns</a>
+</li>
+ </ul>
+
+ <p>Evey node in the AST extends the <code>SyntaxTreeNode</code> base class
+ and implements the <code>translate()</code> method. This method is
+ responsible for outputting the actual bytecodes that make up the
+ functionality required for each element, function, expression or pattern.
+ </p>
+
+ <a name="stylesheet">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Compiling the stylesheet</h4>
+ <p>Some nodes in the AST require more complex code than others. The best
+ example is the <code><xsl:stylesheet></code> element. The code that
+ represents this element has to tie together the code that is generated by
+ all the other elements and generate the actual class definition for the main
+ translet class. The <code>Stylesheet</code> class generates the translet's
+ constructor and methods that handle all top-level elements.</p>
+
+
+ <a name="toplevel">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Compiling top-level elements</h4>
+ <p>The bytecode that handles top-level elements must be generated before any
+ other code. The '<code>translate()</code>' method in these classes are
+ mainly called from these methods in the Stylesheet class:</p>
+<blockquote class="source">
+<pre>
+ private String compileBuildKeys(ClassGenerator);
+ private String compileTopLevel(ClassGenerator, Enumeration);
+ private void compileConstructor(ClassGenerator, Output);</pre>
+</blockquote>
+
+ <p>These methods handle most top-level elements, such as global variables
+ and parameters, <code><xsl:output></code> and
+ <code><xsl:decimal-format></code> instructions.</p>
+
+
+ <a name="templates">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Compiling template code</h4>
+ <p>All XPath patterns in <code><xsl:apply-template></code>
+ instructions are converted into numeric values (known as the pattern's
+ kernel 'type'). All templates with identical pattern kernel types are
+ grouped together and inserted into a table known as a test sequence.
+ (The table of test sequences is found in the Mode class in the compiler
+ package. There will be one such table for each mode that is used in the
+ stylesheet). This table is used to build a big <code>switch()</code>
+ statement in the translet's <code>applyTemplates()</code> method. This
+ method is initially called with the root node of the input document.</p>
+
+ <p>The <code>applyTemplates()</code> method determines the node's type and
+ passes this type to the <code>switch()</code> statement to look up the
+ matching template. The test sequence code (the <code>TestSeq</code> class)
+ is responsible for inserting bytecodes to find one matching template
+ in cases where more than one template matches the current node type.</p>
+
+ <p>There may be several templates that share the same pattern kernel type.
+ Here are a few examples of templates with patterns that all have the same
+ kernel type:</p>
+<blockquote class="source">
+<pre>
+ <xsl:template match="A/C">
+ <xsl:template match="A/B/C">
+ <xsl:template match="A | C"></pre>
+</blockquote>
+
+ <p>All these templates will be grouped under the type for
+ <code><C></code> and will all get the same kernel type (the type for
+ <code>"C"</code>). The last template will be grouped both under
+ <code>"C"</code> and <code>"A"</code>, since it matches either element.
+ If the type identifier for <code>"C"</code> in this case is 8, all these
+ templates will be put under <code>case 8:</code> in
+ <code>applyTemplates()</code>'s big <code>switch()</code> statement. The
+ <code>TestSeq</code> class will insert some code under the
+ <code>case 8:</code> statement (similar to if's and then's) in order to
+ determine which of the three templates to trigger.</p>
+
+
+ <a name="instructions">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Compiling instructions, functions, expressions and patterns</h4>
+
+ <p>The template code is generated by calling <code>translate()</code> on
+ each <code>Template</code> object in the abstract syntax tree. This call
+ will be propagated down the abstract syntax tree and every element will
+ output the bytecodes necessary to complete its task.</p>
+
+ <p>The Java Virtual Machine is stack-based, which goes hand-in-hand with
+ the tree structure of a stylesheet and the AST. A node in the AST will
+ call <code>translate()</code> on its child nodes and any XPath nodes before
+ it generates its own bytecodes. In that way the correct sequence of JVM
+ instructions is generated. Each one of the child nodes is responsible of
+ creating code that leaves the node's output value (if any) on the stack.
+ The typical procedure for the parent node is to create JVM code that
+ consumes these values off the stack and then leave its own output on the
+ stack (for its parent).</p>
+
+ <p>The tree-structure of the stylesheet is in this way closely tied with
+ the stack-based JVM. The design does not offer any obvious way of extending
+ the compiler to output code for other non-stack-based VMs or processors.</p>
+
+
+
+
+<p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+</div>
+<div id="footer">Copyright © 1999-2014 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Thu 2014-05-15</div>
+</div>
+</body>
+</html>
Propchange: xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_compiler.html
------------------------------------------------------------------------------
svn:eol-style = native
Added: xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_dom.html
URL: http://svn.apache.org/viewvc/xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_dom.html?rev=1595253&view=auto
==============================================================================
--- xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_dom.html (added)
+++ xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_dom.html Fri May 16 16:11:33 2014
@@ -0,0 +1,748 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html>
+<head>
+<title>ASF: XSLTC Internal DOM</title>
+<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
+<meta http-equiv="Content-Style-Type" content="text/css" />
+<link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" />
+</head>
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+<body>
+<div id="title">
+<table class="HdrTitle">
+<tbody>
+<tr>
+<th rowspan="2">
+<a href="../index.html">
+<img alt="Trademark Logo" src="resources/XalanJ-Logo-tm.png" width="190" height="90" />
+</a>
+</th>
+<th text-align="center" width="75%">
+<a href="index.html">XSLTC Design</a>
+</th>
+</tr>
+<tr>
+<td valign="middle">XSLTC Internal DOM</td>
+</tr>
+</tbody>
+</table>
+<table class="HdrButtons" align="center" border="1">
+<tbody>
+<tr>
+<td>
+<a href="http://www.apache.org">Apache Foundation</a>
+</td>
+<td>
+<a href="http://xalan.apache.org">Xalan Project</a>
+</td>
+<td>
+<a href="http://xerces.apache.org">Xerces Project</a>
+</td>
+<td>
+<a href="http://www.w3.org/TR">Web Consortium</a>
+</td>
+<td>
+<a href="http://www.oasis-open.org/standards">Oasis Open</a>
+</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div id="navLeft">
+<ul>
+<li>
+<a href="index.html">Overview</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_compiler.html">Compiler design</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_whitespace_design.html">Whitespace</a>
+</li>
+<li>
+<a href="xsl_sort_design.html">xsl:sort</a>
+</li>
+<li>
+<a href="xsl_key_design.html">Keys</a>
+</li>
+<li>
+<a href="xsl_comment_design.html">Comment design</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_lang_design.html">lang()</a>
+</li>
+<li>
+<a href="xsl_unparsed_design.html">Unparsed entities</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_if_design.html">If design</a>
+</li>
+<li>
+<a href="xsl_choose_design.html">Choose|When|Otherwise design</a>
+</li>
+<li>
+<a href="xsl_include_design.html">Include|Import design</a>
+</li>
+<li>
+<a href="xsl_variable_design.html">Variable|Param design</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_runtime.html">Runtime</a>
+</li></ul><hr /><ul>
+<li>Internal DOM<br />
+</li>
+<li>
+<a href="xsltc_namespace.html">Namespaces</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_trax.html">Translet & TrAX</a>
+</li>
+<li>
+<a href="xsltc_predicates.html">XPath Predicates</a>
+</li>
+<li>
+<a href="xsltc_iterators.html">Xsltc Iterators</a>
+</li>
+<li>
+<a href="xsltc_native_api.html">Xsltc Native API</a>
+</li>
+<li>
+<a href="xsltc_trax_api.html">Xsltc TrAX API</a>
+</li>
+<li>
+<a href="xsltc_performance.html">Performance Hints</a>
+</li>
+</ul>
+</div>
+<div id="content">
+<h2>XSLTC Internal DOM</h2>
+
+ <ul>
+ <li>
+<a href="#functionality">General functionlaity</a>
+</li>
+ <li>
+<a href="#components">Components of the internal DOM</a>
+</li>
+ <li>
+<a href="#structure">Internal structure</a>
+</li>
+ <li>
+<a href="#navigation">Tree navigation</a>
+</li>
+ <li>
+<a href="#namespaces">Namespaces</a>
+</li>
+ <li>
+<a href="#w3c">W3C DOM2 navigation support</a>
+</li>
+ <li>
+<a href="#adapter">The DOM adapter - DOMAdapter</a>
+</li>
+ <li>
+<a href="#multiplexer">The DOM multiplexer - MultiDOM</a>
+</li>
+ <li>
+<a href="#builder">The DOM builder - DOMImpl$DOMBuilder</a>
+</li>
+ </ul>
+
+ <a name="functionality">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>General functionality</h3>
+ <p>The internal DOM gives the translet access to the XML document(s) it has
+ to transform. The interface to the internal DOM is specified in the DOM.java
+ class. This is the interface that the translet uses to access the DOM.
+ There is also an interface specified for DOM caches -- DOMCache.java</p>
+
+ <a name="components">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Components of the internal DOM</h3>
+
+ <p>This DOM interface is implemented by three classes:</p>
+ <ul>
+ <li>
+<b>org.apache.xalan.xsltc.dom.DOMImpl</b>
+<br />
+<br />
+ This is the main DOM class. An instance of this class contains the nodes
+ of a <b>single</b> XML document.<br />
+<br />
+ </li>
+ <li>
+<b>org.apache.xalan.xsltc.dom.MultiDOM</b>
+<br />
+<br />
+ This class is best described as a DOM multiplexer. XSLTC was initially
+ designed to operate on a single XML document, and the initial DOM and
+ the DOM interface were designed and implemented without the
+ <code>document()</code> function in mind. This class will allow a translet
+ to access multiple DOMs through the original DOM interface.<br />
+<br />
+ </li>
+ <li>
+<b>org.apache.xalan.xsltc.dom.DOMAdapter</b>
+<br />
+<br />
+ The DOM adapter is a mediator between a DOMImpl or a MultiDOM object and
+ a single translet. A DOMAdapter object contains mappings and reverse
+ mappings between node types in the DOM(s) and node types in the translet.
+ This mediator is needed to allow several translets access to a single DOM.
+ <br />
+<br />
+ </li>
+ <li>
+<b>org.apache.xalan.xsltc.dom.DocumentCache</b>
+<br />
+<br />
+ A sample DOM cache (implementing DOMCache) that is used with our sample
+ transformation applications.
+ </li>
+ </ul>
+
+ <p>
+<img src="DOMInterface.gif" alt="DOMInterface.gif" />
+</p>
+ <p>
+<b>
+<i>Figure 1: Main components of the internal DOM</i>
+</b>
+</p>
+
+ <p>The figure above shows how several translets can access one or more
+ internal DOM from a shared pool of cached DOMs. A translet can also access a
+ DOM tree outside of a cache. The Stylesheet class that represents the XSL
+ stylesheet to compile contains a flag that indicates if the translet uses the
+ <code>document()</code> function. The code compiled into the translet will act
+ accordingly and instanciate a MultiDOM object if needed (this code is compiled
+ in the compiler's <code>Stylesheet.compileTransform()</code> method).</p>
+
+ <a name="structure">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Internal Structure</h3>
+ <ul>
+ <li>
+<a href="#node-id">Node identification</a>
+</li>
+ <li>
+<a href="#element-nodes">Element nodes</a>
+</li>
+ <li>
+<a href="#attribute-nodes">Attribute nodes</a>
+</li>
+ <li>
+<a href="#text-nodes">Text nodes</a>
+</li>
+ <li>
+<a href="#comment-nodes">Comment nodes</a>
+</li>
+ <li>
+<a href="#pi" />Processing instructions</li>
+ </ul>
+ <a name="node-id">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Node identifation</h4>
+
+ <p>Each node in the DOM is represented by an integer. This integer is an
+ index into a series of arrays that describes the node. Most important is
+ the <code>_type[]</code> array, which holds the (DOM internal) node type. There
+ are some general node types that are described in the DOM.java interface:</p>
+
+<blockquote class="source">
+<pre>
+ public final static int ROOT = 0;
+ public final static int TEXT = 1;
+ public final static int UNUSED = 2;
+ public final static int ELEMENT = 3;
+ public final static int ATTRIBUTE = 4;
+ public final static int PROCESSING_INSTRUCTION = 5;
+ public final static int COMMENT = 6;
+ public final static int NTYPES = 7;
+</pre>
+</blockquote>
+
+ <p>Element and attribute nodes will be assigned types based on their expanded
+ QNames. The <code>_type[]</code> array is used for this:</p>
+
+<blockquote class="source">
+<pre>
+ int type = _type[node]; // get node type
+</pre>
+</blockquote>
+
+ <p>The node type can be used to look up the element/attribute name in the
+ element/attribute name array <code>_namesArray[]</code>:</p>
+
+<blockquote class="source">
+<pre>
+ String name = _namesArray[type-NTYPES]; // get node element name
+</pre>
+</blockquote>
+
+ <p>The resulting string contains the full, expanded QName of the element or
+ attribute. Retrieving the namespace URI of an element/attribute is done in a
+ very similar fashion:</p>
+
+<blockquote class="source">
+<pre>
+ int nstype = _namespace[type-NTYPES]; // get namespace type
+ String namespace = _nsNamesArray[nstype]; // get node namespace name
+</pre>
+</blockquote>
+ <a name="element-nodes">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Element nodes</h4>
+
+ <p>The contents of an element node (child nodes) can be identified using
+ the <code>_offsetOrChild[]</code> and <code>_nextSibling[]</code> arrays. The
+ <code>_offsetOrChild[]</code> array will give you the first child of an element
+ node:</p>
+
+<blockquote class="source">
+<pre>
+ int child = _offsetOrChild[node]; // first child
+ child = _nextSibling[child]; // next child
+</pre>
+</blockquote>
+
+ <p>The last child will have a "<code>_nextSibling[]</code>" of 0 (zero).
+ This value is OK since the root node (the 0 node) will not be a child of
+ any element.</p>
+
+ <a name="attribute-nodes">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Attribute nodes</h4>
+
+ <p>The first attribute node of an element is found by a lookup in the
+ <code>_lengthOrAttr[]</code> array using the node index:</p>
+
+<blockquote class="source">
+<pre>
+ int attribute = _offsetOrChild[node]; // first attribute
+ attribute = _nextSibling[attribute]; // next attribute
+</pre>
+</blockquote>
+
+ <p>The names of attributes are contained in the <code>_namesArray[]</code> just
+ like the names of element nodes. The value of attributes are store the same
+ way as text nodes:</p>
+
+<blockquote class="source">
+<pre>
+ int offset = _offsetOrChild[attribute]; // offset into character array
+ int length = _lengthOrAttr[attribute]; // length of attribute value
+ String value = new String(_text, offset, length);
+</pre>
+</blockquote>
+ <a name="text-nodes">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Text nodes</h4>
+
+ <p>Text nodes are stored identically to attribute values. See the previous
+ section on <a href="#attribute-nodes">attribute nodes</a>.</p>
+ <a name="comment-nodes">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Comment nodes</h4>
+
+ <p>The internal DOM does currently <b>not</b> contain comment nodes. Yes, I
+ am quite aware that the DOM has a type assigned to comment nodes, but comments
+ are still not inserted into the DOM.</p>
+ <a name="pi">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Processing instructions</h4>
+
+ <p>Processing instructions are handled as text nodes. These nodes are stored
+ identically to attribute values. See the previous section on
+ <a href="#attribute-nodes">attribute nodes</a>.</p>
+
+ <a name="navigation">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Tree navigation</h3>
+
+ <p>The DOM implementation contains a series of iterator that implement the
+ XPath axis. All these iterators implement the NodeIterator interface and
+ extend the NodeIteratorBase base class. These iterators do the job of
+ navigating the tree using the <code>_offsetOrChild[]</code>, <code>_nextSibling</code>
+ and <code>_parent[]</code> arrays. All iterators that handles XPath axis are
+ implemented as a private inner class of DOMImpl. The translet uses a handful
+ of methods to instanciate these iterators:</p>
+
+<blockquote class="source">
+<pre>
+ public NodeIterator getIterator();
+ public NodeIterator getChildren(final int node);
+ public NodeIterator getTypedChildren(final int type);
+ public NodeIterator getAxisIterator(final int axis);
+ public NodeIterator getTypedAxisIterator(final int axis, final int type);
+ public NodeIterator getNthDescendant(int node, int n);
+ public NodeIterator getNamespaceAxisIterator(final int axis, final int ns);
+ public NodeIterator orderNodes(NodeIterator source, int node);
+</pre>
+</blockquote>
+
+ <p>There are a few iterators in addition to these, such as sorting/ordering
+ iterators and filtering iterators. These iterators are implemented in
+ separate classes and can be instanciated directly by the translet.</p>
+
+ <a name="namespaces">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Namespaces</h3>
+
+ <p>Namespace support was added to the internal DOM at a late stage, and the
+ design and implementation of the DOM bears a few scars because of this.
+ There is a separate <a href="xsltc_namespace.html">design
+ document</a> that covers namespaces.</p>
+
+ <a name="w3c">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>W3C DOM2 navigation support</h3>
+
+ <p>The DOM has a few methods that give basic W3C-type DOM navigation. These
+ methods are:</p>
+
+<blockquote class="source">
+<pre>
+ public Node makeNode(int index);
+ public Node makeNode(NodeIterator iter);
+ public NodeList makeNodeList(int index);
+ public NodeList makeNodeList(NodeIterator iter);
+</pre>
+</blockquote>
+
+ <p>These methods return instances of inner classes of the DOM that implement
+ the W3C Node and NodeList interfaces.</p>
+
+ <a name="adapter">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>The DOM adapter - DOMAdapter</h3>
+ <ul>
+ <li>
+<a href="#translet-dom">Translet/DOM type mapping</a>
+</li>
+ <li>
+<a href="#whitespace">Whitespace text-node stripping</a>
+</li>
+ <li>
+<a href="#method-mapping">Method mapping</a>
+</li>
+ </ul>
+ <a name="translet-dom">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Translet/DOM type mapping</h4>
+
+ <p>The DOMAdapter class performs the mappings between DOM and translet node
+ types, and vice versa. These mappings are necessary in order for the translet
+ to correctly identify an element/attribute in the DOM and for the DOM to
+ correctly identify the element/attribute type of a typed iterator requested
+ by the translet. Note that the DOMAdapter also maps translet namespace types
+ to DOM namespace types, and vice versa.</p>
+
+ <p>The DOMAdapter class has four global tables that hold the translet/DOM
+ type and namespace-type mappings. If the DOM knows an element as type
+ 19, the DOMAdapter will translate this to some other integer using the
+ <code>_mapping[]</code> array:</p>
+
+<blockquote class="source">
+<pre>
+ int domType = _mapping[transletType];
+</pre>
+</blockquote>
+
+ <p>This action will be performed when the DOM asks what type a specific node
+ is. The reverse is done then the translet wants an iterator for a specific
+ node type. The DOMAdapter must translate the translet-type to the type used
+ internally in the DOM by looking up the <code>_reverse[]</code> array:</p>
+
+<blockquote class="source">
+<pre>
+ int transletType = _mapping[domType];
+</pre>
+</blockquote>
+
+ <p>There are two additional mapping tables: <code>_NSmapping[]</code> and
+ <code>_NSreverse[]</code> that do the same for namespace types.</p>
+ <a name="whitespace">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Whitespace text-node stripping</h4>
+
+ <p>The DOMAdapter class has the additional function of stripping whitespace
+ nodes in the DOM. This functionality had to be put in the DOMAdapter, as
+ different translets will have different preferences for node stripping.</p>
+ <a name="method-mapping">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Method mapping</h4>
+
+ <p>The DOMAdapter class implements the same <code>DOM</code> interface as the
+ DOMImpl class. A DOMAdapter object will look like a DOMImpl tree, but the
+ translet can access it directly without being concerned with type mapping
+ and whitespace stripping. The <code>getTypedChildren()</code> demonstrates very
+ well how this is done:</p>
+
+<blockquote class="source">
+<pre>
+ public NodeIterator getTypedChildren(int type) {
+ // Get the DOM type for the requested typed iterator
+ final int domType = _reverse[type];
+ // Now get the typed child iterator from the DOMImpl object
+ NodeIterator iterator = _domImpl.getTypedChildren(domType);
+ // Wrap the iterator in a WS stripping iterator if child-nodes are text nodes
+ if ((domType == DOM.TEXT) && (_filter != null))
+ iterator = _domImpl.strippingIterator(iterator,_mapping,_filter);
+ return(iterator);
+ }
+</pre>
+</blockquote>
+
+ <a name="multiplexer">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>The DOM multiplexer - MultiDOM</h3>
+
+ <p>The DOM multiplexer class is only used when the compiled stylesheet uses
+ the <code>document()</code> function. An instance of the MultiDOM class also
+ implements the DOM interface, so that it can be accessed in the same way
+ as a DOMAdapter object.</p>
+
+ <p>A node in the DOM is identified by an integer. The first 8 bits of this
+ integer are used to identify the DOM in which the node belongs, while the
+ lower 24 bits are used to identify the node within the DOM:</p>
+ <table border="1">
+ <tr>
+ <td class="content" rowspan="1" colspan="1">31-24</td>
+ <td class="content" rowspan="1" colspan="1">23-16</td>
+ <td class="content" rowspan="1" colspan="1">16-8</td>
+ <td class="content" rowspan="1" colspan="1">7-0</td>
+ </tr>
+ <tr>
+ <td class="content" rowspan="1" colspan="1">DOM id</td>
+ <td class="content" rowspan="1" colspan="3">node id</td>
+ </tr>
+ </table>
+
+ <p>The DOM multiplexer has an array of DOMAdapter objects. The topmost 8
+ bits of the identifier is used to find the correct DOM from the array. Then
+ the lower 24 bits are used in calls to methods in the DOMAdapter object:</p>
+
+<blockquote class="source">
+<pre>
+ public int getParent(int node) {
+ return _adapters[node>>>24].getParent(node & 0x00ffffff) | node & 0xff000000;
+ }
+</pre>
+</blockquote>
+
+ <p>Note that the node identifier returned by this method has the same upper 8
+ bits as the input node. This is why we <code>OR</code> the result from
+ <code>DOMAdapter.getParent()</code> with the top 8 bits of the input node.</p>
+
+ <a name="builder">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>The DOM builder - DOMImpl$DOMBuilder</h3>
+ <ul>
+ <li>
+<a href="#startelement">startElement()</a>
+</li>
+ <li>
+<a href="#endelement">endElement()</a>
+</li>
+ <li>
+<a href="#startprefixmapping">startPrefixMapping()</a>
+</li>
+ <li>
+<a href="#endprefixmapping">endPrefixMapping()</a>
+</li>
+ <li>
+<a href="#characters">characters()</a>
+</li>
+ <li>
+<a href="#startdocument">startDocument()</a>
+</li>
+ <li>
+<a href="#enddocument">endDocument()</a>
+</li>
+ </ul>
+
+ <p>The DOM builder is an inner class of the DOM implementation. The builder
+ implements the SAX2 <code>ContentHandler</code> interface and populates the DOM
+ by receiving SAX2 events from a SAX2 parser (presently xerces). An instance
+ of the DOM builder class can be retrieved from <code>DOMImpl.getBuilder()</code>
+ method, and this handler can be set as an XMLReader's content handler:</p>
+
+<blockquote class="source">
+<pre>
+ final SAXParserFactory factory = SAXParserFactory.newInstance();
+ final SAXParser parser = factory.newSAXParser();
+ final XMLReader reader = parser.getXMLReader();
+ final DOMImpl dom = new DOMImpl();
+ reader.setContentHandler(dom.getBuilder());
+</pre>
+</blockquote>
+
+ <p>The DOM builder will start to populate the DOM once the XML parser starts
+ generating SAX2 events:</p>
+ <a name="startelement">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>startElement()</h4>
+
+ <p>This method can be called in one of two ways; either with the expanded
+ QName (the element's separate uri and local name are supplied) or as a
+ normal QName (one String on the format prefix:local-name). The DOM stores
+ elements as expanded QNames so it needs to know the element's namespace URI.
+ Since the URI is not supplied with this call, we have to keep track of
+ namespace prefix/uri mappings while we're building the DOM. See
+ <code>
+<a href="#startprefixmapping">startPrefixMapping()</a>
+</code> below for details on
+ namespace handling.</p>
+
+ <p>The <code>startElement()</code> inserts the element as a child of the current
+ parent element, creates attribute nodes for all attributes in the supplied
+ "<code>Attributes</code>" attribute list (by a series of calls to
+ <code>makeAttributeNode()</code>), and finally creates the actual element node
+ (by calling <code>internElement()</code>, which inserts a new entry in the
+ <code>_type[]</code> array).</p>
+ <a name="endelement">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>endElement()</h4>
+
+ <p>This method does some cleanup after the <code>startElement()</code> method,
+ such as revering <code>xml:space</code> settings and linking the element's
+ child nodes.</p>
+ <a name="startprefixmapping">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>startPrefixMapping()</h4>
+
+ <p>This method is called for each namespace declaration in the source
+ document. The parser should call this method before the prefix is referenced
+ in a QName that is passed to the <code>startElement()</code> call. Namespace
+ prefix/uri mappings are stored in a Hashtable structure. Namespace prefixes
+ are used as the keys in the Hashtable, and each key maps to a Stack that
+ contains the various URIs that the prefix maps to. The URI on top of the
+ stack is the URI that the prefix currently maps to.</p>
+
+
+ <p>
+<img src="namespace_stack.gif" alt="namespace_stack.gif" />
+</p>
+ <p>
+<b>
+<i>Figure 2: Namespace handling in the DOM builder</i>
+</b>
+</p>
+
+
+ <p>Each call to <code>startPrefixMapping()</code> results in a lookup in the
+ Hashtable (using the prefix), and a <code>push()</code> of the URI onto the
+ Stack that the prefix maps to.</p>
+ <a name="endprefixmapping">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>endPrefixMapping()</h4>
+
+ <p>A namespace prefix/uri mapping is closed by locating the Stack for the
+ prefix, and then <code>pop()</code>'ing the topmost URI off this Stack.</p>
+ <a name="characters">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>characters()</h4>
+
+ <p>Text nodes are stored as simple character sequences in the character array
+ <code>_text[]</code>. The start and lenght of a node's text can be determined by
+ using the node index to look up <code>_offsetOrChild[]</code> and
+ <code>_lengthOrAttribute[]</code>.</p>
+
+ <p>We want to re-use character sequences if two or more text nodes have
+ identical content. This can be achieved by having two different text node
+ indexes map to the same character sequence. The <code>maybeReuseText()</code>
+ method is always called before a new character string is stored in the
+ <code>_text[]</code> array. This method will locate the offset of an existing
+ instance of a character sequence.</p>
+ <a name="startdocument">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>startDocument()</h4>
+
+ <p>This method initialises a bunch of data structures that are used by the
+ builder. It also pushes the default namespace on the namespace stack (so that
+ the "" prefix maps to the <code>null</code> namespace).</p>
+ <a name="enddocument">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>endDocument()</h4>
+
+ <p>This method builds the <code>_namesArray[]</code>, <code>_namespace[]</code>
+ and <code>_nsNamesArray[]</code> structures from temporary datastructures used
+ in the DOM builder.</p>
+
+
+
+<p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+</div>
+<div id="footer">Copyright © 1999-2014 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Thu 2014-05-15</div>
+</div>
+</body>
+</html>
Propchange: xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_dom.html
------------------------------------------------------------------------------
svn:eol-style = native
Added: xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_iterators.html
URL: http://svn.apache.org/viewvc/xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_iterators.html?rev=1595253&view=auto
==============================================================================
--- xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_iterators.html (added)
+++ xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_iterators.html Fri May 16 16:11:33 2014
@@ -0,0 +1,573 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html>
+<head>
+<title>ASF: XSLTC node iterators</title>
+<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
+<meta http-equiv="Content-Style-Type" content="text/css" />
+<link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" />
+</head>
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+<body>
+<div id="title">
+<table class="HdrTitle">
+<tbody>
+<tr>
+<th rowspan="2">
+<a href="../index.html">
+<img alt="Trademark Logo" src="resources/XalanJ-Logo-tm.png" width="190" height="90" />
+</a>
+</th>
+<th text-align="center" width="75%">
+<a href="index.html">XSLTC Design</a>
+</th>
+</tr>
+<tr>
+<td valign="middle">XSLTC node iterators</td>
+</tr>
+</tbody>
+</table>
+<table class="HdrButtons" align="center" border="1">
+<tbody>
+<tr>
+<td>
+<a href="http://www.apache.org">Apache Foundation</a>
+</td>
+<td>
+<a href="http://xalan.apache.org">Xalan Project</a>
+</td>
+<td>
+<a href="http://xerces.apache.org">Xerces Project</a>
+</td>
+<td>
+<a href="http://www.w3.org/TR">Web Consortium</a>
+</td>
+<td>
+<a href="http://www.oasis-open.org/standards">Oasis Open</a>
+</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div id="navLeft">
+<ul>
+<li>
+<a href="index.html">Overview</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_compiler.html">Compiler design</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_whitespace_design.html">Whitespace</a>
+</li>
+<li>
+<a href="xsl_sort_design.html">xsl:sort</a>
+</li>
+<li>
+<a href="xsl_key_design.html">Keys</a>
+</li>
+<li>
+<a href="xsl_comment_design.html">Comment design</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_lang_design.html">lang()</a>
+</li>
+<li>
+<a href="xsl_unparsed_design.html">Unparsed entities</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsl_if_design.html">If design</a>
+</li>
+<li>
+<a href="xsl_choose_design.html">Choose|When|Otherwise design</a>
+</li>
+<li>
+<a href="xsl_include_design.html">Include|Import design</a>
+</li>
+<li>
+<a href="xsl_variable_design.html">Variable|Param design</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_runtime.html">Runtime</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_dom.html">Internal DOM</a>
+</li>
+<li>
+<a href="xsltc_namespace.html">Namespaces</a>
+</li></ul><hr /><ul>
+<li>
+<a href="xsltc_trax.html">Translet & TrAX</a>
+</li>
+<li>
+<a href="xsltc_predicates.html">XPath Predicates</a>
+</li>
+<li>Xsltc Iterators<br />
+</li>
+<li>
+<a href="xsltc_native_api.html">Xsltc Native API</a>
+</li>
+<li>
+<a href="xsltc_trax_api.html">Xsltc TrAX API</a>
+</li>
+<li>
+<a href="xsltc_performance.html">Performance Hints</a>
+</li>
+</ul>
+</div>
+<div id="content">
+<h2>XSLTC node iterators</h2>
+
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Contents</h3>
+
+ <p>This document describes the function of XSLTC's node iterators. It also
+ describes the <code>NodeIterator</code> interface and some implementations of
+ this interface are described in detail:</p>
+
+ <ul>
+ <li>
+<a href="#purpose">Node iterator function</a>
+</li>
+ <li>
+<a href="#interface">NodeIterator interface</a>
+</li>
+ <li>
+<a href="#baseclass">Node iterator base class</a>
+</li>
+ <li>
+<a href="#details">Implementation details</a>
+</li>
+ </ul>
+
+
+
+
+
+ <a name="purpose">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Node Iterator Function</h3>
+
+ <p>Node iterators have several functions in XSLTC. The most obvious is
+ acting as a placeholder for node-sets. Node iterators also act as a link
+ between the translet and the DOM(s), they can act as filters (implementing
+ predicates), they contain the functionality necessary to cover all XPath
+ axes and they even serve as a front-end to XSLTC's node-indexing mechanism
+ (for the <code>id()</code> and <code>key()</code> functions).</p>
+
+
+
+
+ <a name="interface">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Node Iterator Interface</h3>
+
+ <p>The node iterator interface is defined in
+ <code>org.apache.xalan.xsltc.NodeIterator</code>.</p>
+
+ <p>The most basic operations in the <code>NodeIterator</code> interface are
+ for setting the iterators start-node. The "start-node" is
+ an index into the DOM. This index, and the axis of the iterator, determine
+ the node-set that the iterator contains. The axis is programmed into the
+ various node iterator implementations, while the start-node can be set by
+ calling:</p>
+<blockquote class="source">
+<pre>
+ public NodeIterator setStartNode(int node);</pre>
+</blockquote>
+
+ <p>Once the start node is set the node-set can be traversed by a sequence of
+ calls to:</p>
+<blockquote class="source">
+<pre>
+ public int next();</pre>
+</blockquote>
+
+ <p>This method will return the constant <code>NodeIterator.END</code> when
+ the whole node-set has been returned. The iterator can be reset to the start
+ of the node-set by calling:</p>
+<blockquote class="source">
+<pre>
+ public NodeIterator reset();</pre>
+</blockquote>
+
+ <p>Two additional methods are provided to set the position within the
+ node-set. The first method below will mark the current node in the
+ node-set, while the second will (at any point) set the iterators position
+ back to that node.</p>
+<blockquote class="source">
+<pre>
+ public void setMark();
+ public void gotoMark();</pre>
+</blockquote>
+
+ <p>Every node iterator implements two functions that make up the
+ functionality behind XPath's <code>getPosition()</code> and
+ <code>getLast()</code> functions.</p>
+<blockquote class="source">
+<pre>
+ public int getPosition();
+ public int getLast();</pre>
+</blockquote>
+
+ <p>The <code>getLast()</code> function returns the number of nodes in the
+ set, while the <code>getPosition()</code> returns the current position
+ within the node-set. The value returned by <code>getPosition()</code> for
+ the first node in the set is always 1 (one), and the value returned for the
+ last node in the set is always the same value as is returned by
+ <code>getLast()</code>.</p>
+
+ <p>All node iterators that implement an XPath axis will return the node-set
+ in the natural order of the axis. For example, the iterator implementing the
+ ancestor axis will return nodes in reverse document order (bottom to
+ top), while the iterator implementing the descendant will return
+ nodes in document order. The node iterator interface has a method that can
+ be used to determine if an iterator returns nodes in reverse document order:
+ </p>
+<blockquote class="source">
+<pre>
+ public boolean isReverse();</pre>
+</blockquote>
+
+ <p>Two methods are provided for when node iterators are encapsulated inside
+ a variable or parameter. To understand the purpose behind these two methods
+ we should have a look at a sample XML document and stylesheet first:</p>
+ <blockquote class="source">
+<pre>
+ <?xml version="1.0"?>
+ <foo>
+ <bar>
+ <baz>A</baz>
+ <baz>B</baz>
+ </bar>
+ <bar>
+ <baz>C</baz>
+ <baz>D</baz>
+ </bar>
+ </foo>
+
+ <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
+
+ <xsl:template match="foo">
+ <xsl:variable name="my-nodes" select="//foo/bar/baz"/>
+ <xsl:for-each select="bar">
+ <xsl:for-each select="baz">
+ <xsl:value-of select="."/>
+ </xsl:for-each>
+ <xsl:for-each select="$my-nodes">
+ <xsl:value-of select="."/>
+ </xsl:for-each>
+ </xsl:for-each>
+ </xsl:template>
+
+ </xsl:stylesheet></pre>
+</blockquote>
+
+ <p>Now, there are three iterators at work here. The first iterator is the
+ one that is wrapped inside the variable <code>my-nodes</code> - this
+ iterator contains all <code><baz/></code> elements in the
+ document. The second iterator contains all <code><bar></code>
+ elements under the current element (this is the iterator used by the
+ outer <code>for-each</code> loop). The third and last iterator is the one
+ used by the first of the inner <code>for-each</code> loops. When the outer
+ loop is run the first time, this third iterator will be initialized to
+ contain the first two <code><baz></code> elements under the context
+ node (the first <code><bar></code> element). Iterators are by default
+ restarted from the current node when used inside a <code>for-each</code>
+ loop like this. But what about the iterator inside the variable
+ <code>my-nodes</code>? The variable should keep its assigned value, no
+ matter what the context node is. In able to prevent the iterator from being
+ reset, we must use a mechanism to block calls to the
+ <code>setStartNode()</code> method. This is done in three steps:</p>
+
+ <ul>
+ <li>The iterator is created and initialized when the variable gets
+ assigned its value (node-set).</li>
+ <li>When the variable is read, the iterator is copied (cloned). The
+ original iterator inside the variable is never used directly. This is
+ to make sure that the iterator inside the variable is always in its
+ original state when read.</li>
+ <li>The iterator clone is marked as not restartable to prevent it from
+ being restarted when used to iterate the <code><xsl:for-each></code>
+ element loop.</li>
+ </ul>
+
+ <p>These are the two methods used for the three steps above:</p>
+<blockquote class="source">
+<pre>
+ public NodeIterator cloneIterator();
+ public void setRestartable(boolean isRestartable);</pre>
+</blockquote>
+
+ <p>Special care must be taken when implementing these methods in some
+ iterators. The <code>StepIterator</code> class is the best example of this.
+ This iterator wraps two other iterators; one of which is used to generate
+ start-nodes for the other - so one of the encapsulated node iterators must
+ always remain restartable - even when used inside variables. The
+ <code>StepIterator</code> class is described in detail later in this
+ document.</p>
+
+
+
+
+
+
+ <a name="baseclass">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Node Iterator Base Class</h3>
+
+ <p>A node iterator base class is provided to contain some common
+ functionality. The base class implements the node iterator interface, and
+ has a few additional methods:</p>
+<blockquote class="source">
+<pre>
+ public NodeIterator includeSelf();
+ protected final int returnNode(final int node);
+ protected final NodeIterator resetPosition();</pre>
+</blockquote>
+
+ <p>The <code>includeSelf()</code> is used with certain axis iterators that
+ implement both the <code>ancestor</code> and <code>ancestor-or-self</code>
+ axis and similar. One common implementation is used for these axes and
+ this method is used to signal that the "self" node should
+ also be included in the node-set.</p>
+
+ <p>The <code>returnNode()</code> method is called by the implementation of
+ the <code>next()</code> method. <code>returnNode()</code> increments an
+ internal node counter/cursor that keeps track of the current position within
+ the node set. This counter/cursor is then used by the
+ <code>getPosition()</code> implementation to return the current position.
+ The node cursor can be reset by calling <code>resetPosition()</code>. This
+ method is normally called by an iterator's <code>reset()</code> method.</p>
+
+
+
+
+
+ <a name="details">â</a>
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h3>Node Iterator Implementation Details</h3>
+
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Axis iterators</h4>
+
+ <p>All axis iterators are implemented as inner classes of the internal
+ DOM implementation <code>org.apache.xalan.xsltc.dom.DOMImpl</code>. In this
+ way all axis iterator classes have direct access to the internal node
+ type- and navigation arrays of the DOM:</p>
+<blockquote class="source">
+<pre>
+ private short[] _type; // Node types
+ private short[] _namespace; // Namespace URI types
+ private short[] _prefix; // Namespace prefix types
+
+ private int[] _parent; // Index of a node's parent
+ private int[] _nextSibling; // Index of a node's next sibling node
+ private int[] _offsetOrChild; // Index of an elements first child node
+ private int[] _lengthOrAttr; // Index of an elements first attribute node</pre>
+</blockquote>
+
+ <p>The axis iterators can be instanciated by calling either of these two
+ methods of the DOM:</p>
+<blockquote class="source">
+<pre>
+ public NodeIterator getAxisIterator(final int axis);
+ public NodeIterator getTypedAxisIterator(final int axis, final int type);</pre>
+</blockquote>
+
+
+
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>StepIterator</h4>
+
+ <p>The <code>StepIterator</code> is used to chain other iterators. A
+ very basic example is this XPath expression:</p>
+<blockquote class="source">
+<pre>
+ <xsl:for-each select="foo/bar"></pre>
+</blockquote>
+
+ <p>To generate the appropriate node-set for this loop we need three
+ iterators. The compiler will generate code that first creates a typed axis
+ iterator; the axis will be child and the type will be that assigned
+ to <code><foo></code> elements. Then a second typed axis iterator will
+ be created; this also a child -iterator, but this one with the type
+ assigned to <code><bar></code> elements. The third iterator is a
+ step iterator that encapsulates the two axis iterators. The step iterator is
+ the initialized with the context node.</p>
+
+ <p>The step iterator will use the first axis iterator to generate
+ start-nodes for the second axis iterator. In plain english this means that
+ the step iterator will scan all <code>foo</code> elements for any
+ <code>bar</code> child elements. When a <code>StepIterator</code> is
+ initialized with a start-node it passes the start node to the
+ <code>setStartNode()</code> method of its source -iterator (left).
+ It then calls <code>next()</code> on that iterator to get the start-node
+ for the iterator iterator (right):</p>
+<blockquote class="source">
+<pre>
+ // Set start node for left-hand iterator...
+ _source.setStartNode(_startNode);
+ // ... and get start node for right-hand iterator from left-hand,
+ _iterator.setStartNode(_source.next());</pre>
+</blockquote>
+
+ <p>The step iterator will keep returning nodes from its right iterator until
+ it runs out of nodes. Then a new start-node is retrieved by again calling
+ <code>next()</code> on the source -iterator. This is why the
+ right-hand iterator always has to be restartable - even if the step iterator
+ is placed inside a variable or parameter. This becomes even more complicated
+ for step iterators that encapsulate other step iterators. We'll make our
+ previous example a bit more interesting:</p>
+<blockquote class="source">
+<pre>
+ <xsl:for-each select="foo/bar[@name='cat and cage']/baz"></pre>
+</blockquote>
+
+ <p>This will result in an iterator-tree similar to this:</p>
+
+ <p>
+<img src="iterator_stack.gif" alt="iterator_stack.gif" />
+</p>
+ <p>
+<b>
+<i>Figure 1: Stacked step iterators</i>
+</b>
+</p>
+
+ <p>The "foo" iterator is used to supply the second step
+ iterator with start nodes. The second step iterator will pass these start
+ nodes to the "bar" iterator, which will be used to get the
+ start nodes for the third step iterator, and so on....</p>
+
+
+
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>Iterators for Filtering/Predicates</h4>
+
+ <p>The <code>org.apache.xalan.xsltc.dom</code> package contains a few
+ iterators that are used to implement predicates and filters. Such iterators
+ are normally placed on top of another iterator, and return only those nodes
+ that match a specific node value, position, etc.
+ These iterators include:</p>
+
+ <ul>
+ <li>NthIterator</li>
+ <li>NodeValueIterator</li>
+ <li>FilteredStepIterator</li>
+ <li>CurrentNodeListIterator</li>
+ </ul>
+
+ <p>The last one is the most interesting. This iterator is used to implement
+ chained predicates, such as:</p>
+<blockquote class="source">
+<pre>
+ <xsl:value-of select="foo[@blob='boo'][2]"></pre>
+</blockquote>
+
+ <p>The first predicate reduces the node set from containing all
+ <code><foo></code> elements, to containing only those elements that
+ have a "blob" attribute with the value 'boo'. The
+ <code>CurrentNodeListIterator</code> is used to contain this reduced
+ node-set. The iterator is constructed by passing it a source iterator (in
+ this case an iterator that contains all <code><foo></code> elements)
+ and a filter that implements the predicate (<code>@blob = 'boo'</code>).</p>
+
+
+
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>SortingIterator</h4>
+
+ <p>The sorting iterator is one of the main functional components behind the
+ implementation of the <code><xsl:sort></code> element. This element,
+ including the sorting iterator, is described in detail in the
+ <code><xsl:sort></code>
+ <a href="xsl_sort_design.html">design document</a>.</p>
+
+
+
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>SingletonIterator</h4>
+
+ <p>The singleton iterator is a wrapper for a single node. The node passed
+ in to the <code>setStartNode()</code> method is the only node that will be
+ returned by the <code>next()</code> method. The singleton iterator is used
+ mainly for node to node-set type conversions.</p>
+
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>UnionIterator</h4>
+
+ <p>The union iterator is used to contain unions of node-sets contained in
+ other iterators. Some of the methods in this iterator are unnecessary
+ comlicated. The <code>next()</code> method contains an algorithm for
+ ensuring that the union node-set is returned in document order. We might be
+ better off by simply wrapping the union iterator inside a duplicate filter
+ iterator, but there could be some performance implications. Worth checking.
+ </p>
+
+
+
+ <p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+<h4>KeyIndex</h4>
+
+ <p>This is not just an node iterator. An index used for keys and ids will
+ return a set of nodes that are contained within the named index and that
+ share a certain property. The <code>KeyIndex</code> implements the node
+ iterator interface, so that these nodes can be returned and handled just
+ like any other node set. See the
+ <a href="xsl_key_design.html">design document</a> for
+ <code><xsl:key></code>, <code>key()</code> and <code>id()</code>
+ for further details.</p>
+
+
+
+
+
+<p align="right" size="2">
+<a href="#content">(top)</a>
+</p>
+</div>
+<div id="footer">Copyright © 1999-2014 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Thu 2014-05-15</div>
+</div>
+</body>
+</html>
Propchange: xalan/java/branches/WebSite/xalan-j/xsltc/xsltc_iterators.html
------------------------------------------------------------------------------
svn:eol-style = native
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@xalan.apache.org
For additional commands, e-mail: commits-help@xalan.apache.org