You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@camel.apache.org by bu...@apache.org on 2014/06/30 17:17:56 UTC
svn commit: r914438 - in /websites/production/camel/content: cache/main.pageCache splitter.html

Author: buildbot
Date: Mon Jun 30 15:17:56 2014
New Revision: 914438

Log:
Production update by buildbot for camel

Modified:
    websites/production/camel/content/cache/main.pageCache
    websites/production/camel/content/splitter.html

Modified: websites/production/camel/content/cache/main.pageCache
==============================================================================
Binary files - no diff available.

Modified: websites/production/camel/content/splitter.html
==============================================================================
--- websites/production/camel/content/splitter.html (original)
+++ websites/production/camel/content/splitter.html Mon Jun 30 15:17:56 2014
@@ -165,7 +165,7 @@ from(&quot;direct:streaming&quot;)
      .split(beanExpression(new MyCustomIteratorFactory(),  &quot;iterator&quot;))
      .streaming().to(&quot;activemq:my.parts&quot;)
 ]]></script>
-</div></div><h4 id="Splitter-StreamingbigXMLpayloadsusingTokenizerlanguage">Streaming big XML payloads using Tokenizer language</h4><p><strong>Available as of Camel 2.9</strong><br clear="none"> If you have a big XML payload, from a file source, and want to split it in streaming mode, then you can use the Tokenizer language with start/end tokens to do this with low memory footprint.</p>    <div class="aui-message success shadowed information-macro">
+</div></div><h4 id="Splitter-StreamingbigXMLpayloadsusingTokenizerlanguage">Streaming big XML payloads using Tokenizer language</h4><p>There are two tokenizers that can be used to tokenize an XML payload. The first tokenizer uses the same principle as in the text tokenizer to scan the XML payload and extract a sequence of tokens.</p><p><strong>Available as of Camel 2.9</strong><br clear="none"> If you have a big XML payload, from a file source, and want to split it in streaming mode, then you can use the Tokenizer language with start/end tokens to do this with low memory footprint.</p>    <div class="aui-message success shadowed information-macro">
                     <p class="title">StAX component</p>
                             <span class="aui-icon icon-success">Icon</span>
                 <div class="message-content">
@@ -200,7 +200,7 @@ from(&quot;direct:streaming&quot;)
   &lt;/split&gt;
 &lt;/route&gt;
 ]]></script>
-</div></div><p>Notice the <code>tokenizeXML</code> method which will split the file using the tag name of the child node, which mean it will grab the content between the <code>&lt;order&gt;</code> and <code>&lt;/order&gt;</code> tags (incl. the tokens). So for example a splitted message would be as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
+</div></div><p>Notice the <code>tokenizeXML</code> method which will split the file using the tag name of the child node (more precisely speaking, the local name of the element without its namespace prefix if any), which mean it will grab the content between the <code>&lt;order&gt;</code> and <code>&lt;/order&gt;</code> tags (incl. the tokens). So for example a splitted message would be as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
 <script class="theme: Default; brush: xml; gutter: false" type="syntaxhighlighter"><![CDATA[  &lt;order&gt;
     &lt;!-- order stuff here --&gt;
   &lt;/order&gt;
@@ -219,7 +219,31 @@ from(&quot;direct:streaming&quot;)
     .split().tokenizeXML(&quot;order&quot;, &quot;orders&quot;).streaming()
        .to(&quot;activemq:queue:order&quot;);
 ]]></script>
-</div></div><p><span style="line-height: 1.4285715;">Available as of Camel 2.13.1, you can set the above inheritNamsepaceTagName property to "*" to&#160;include the preceding context in each token (i.e., generating each token enclosed in its ancestor elements). It is noted that each token must share the same ancestor elements in this case.</span></p><h4 id="Splitter-SplittingfilesbygroupingNlinestogether">Splitting files by grouping N lines together</h4><p><strong>Available as of Camel 2.10</strong></p><p>The <a shape="rect" href="tokenizer.html">Tokenizer</a> language has a new option <code>group</code> that allows you to group N parts together, for example to split big files into chunks of 1000 lines.</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
+</div></div><p><span style="line-height: 1.4285715;">Available as of Camel 2.13.1, you can set the above inheritNamsepaceTagName property to "*" to&#160;include the preceding context in each token (i.e., generating each token enclosed in its ancestor elements). It is noted that each token must share the same ancestor elements in this case.</span></p><p><span style="line-height: 1.4285715;">&#160;</span><span style="line-height: 1.4285715;">The above tokenizer works well on simple structures but has some inherent limitations in handling more complex XML structures.</span></p><p><strong>Available as of Camel 2.14</strong></p><p>The second tokenizer that uses a StAX parser to overcome these limitations. This tokenizer recognizes XML namespaces and also supports complex XML structures.</p><p>To split using this tokenizer at {<a shape="rect" rel="nofollow">urn:shop}order</a>, we can write</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
+<script class="theme: Default; brush: java; gutter: false" type="syntaxhighlighter"><![CDATA[  Namespaces ns = new Namespaces(&quot;ns1&quot;, &quot;urn:shop&quot;);
+  ...
+  from(&quot;file:inbox&quot;)
+    .split().xtokenize(&quot;//ns1:order&quot;, &#39;i&#39;, ns).streaming()
+      .to(&quot;activemq:queue:order)]]></script>
+</div></div><p><span style="line-height: 1.4285715;">Two arguments control the behavior of the tokenizer. The first argument specifies the element using a path notation. This path notation uses a subset of xpath with wildcard support. The second argument represents the extraction mode. The available extraction modes are:</span></p><div class="table-wrap"><table class="confluenceTable"><tbody><tr><th colspan="1" rowspan="1" class="confluenceTh">mode</th><th colspan="1" rowspan="1" class="confluenceTh">description</th></tr><tr><td colspan="1" rowspan="1" class="confluenceTd">i</td><td colspan="1" rowspan="1" class="confluenceTd">injecting the contextual namespace bindings into the extracted token (default)</td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd">w</td><td colspan="1" rowspan="1" class="confluenceTd">wrapping the extracted token in its ancestor context</td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd">u</td><td colspan="1" rowspan="1" class="confluence
 Td">unwrapping the extracted token to its child content</td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd">t</td><td colspan="1" rowspan="1" class="confluenceTd">extracting the text content of the specified element</td></tr></tbody></table></div><p><span style="line-height: 1.4285715;">&#160;</span><span style="line-height: 1.4285715;">Having an input XML</span></p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
+<script class="theme: Default; brush: xml; gutter: false" type="syntaxhighlighter"><![CDATA[&lt;m:orders xmlns:m=&quot;urn:shop&quot; xmlns:cat=&quot;urn:shop:catalog&quot;&gt;
+  &lt;m:order&gt;&lt;id&gt;123&lt;/id&gt;&lt;date&gt;2014-02-25&lt;/date&gt;...&lt;/m:order&gt;
+...]]></script>
+</div></div><p><span style="line-height: 1.4285715;">Each mode will result in the following tokens,&#160;</span></p><div class="table-wrap"><table class="confluenceTable"><tbody><tr><td colspan="1" rowspan="1" class="confluenceTd">i</td><td colspan="1" rowspan="1" class="confluenceTd">&lt;m:order&#160;<a shape="rect" rel="nofollow">xmlns:m="urn:shop</a>"&#160;<a shape="rect" rel="nofollow">xmlns:cat="urn:shop:catalog</a>"&gt;&lt;id&gt;123&lt;/id&gt;&lt;date&gt;2014-02-25&lt;/date&gt;...&lt;/m:order&gt;</td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd">w</td><td colspan="1" rowspan="1" class="confluenceTd"><p>&lt;m:orders&#160;<a shape="rect" rel="nofollow">xmlns:m="urn:shop</a>"&#160;<a shape="rect" rel="nofollow">xmlns:cat="urn:shop:catalog</a>"&gt;<br clear="none"> &lt;m:order&gt;&lt;id&gt;123&lt;/id&gt;&lt;date&gt;2014-02-25&lt;/date&gt;...&lt;/m:order&gt;<span style="line-height: 1.4285715;">&lt;/m:orders&gt;</span></p></td></tr><tr><td colspan="1" rowspan="1" class=
 "confluenceTd">u</td><td colspan="1" rowspan="1" class="confluenceTd">&lt;id&gt;123&lt;/id&gt;&lt;date&gt;2014-02-25&lt;/date&gt;...</td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd">t</td><td colspan="1" rowspan="1" class="confluenceTd">1232014-02-25...</td></tr></tbody></table></div><p><span style="line-height: 1.4285715;">&#160;</span><span style="line-height: 1.4285715;">In XML DSL, the equivalent route would be written as follows:</span></p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
+<script class="theme: Default; brush: xml; gutter: false" type="syntaxhighlighter"><![CDATA[&lt;camelContext xmlns:ns1=&quot;urn:shop&quot;&gt;
+  &lt;route&gt;
+    &lt;from uri=&quot;file:inbox&quot;/&gt;
+    &lt;split streaming=&quot;true&quot;&gt;
+      &lt;xtokenize&gt;//ns1:order&lt;/xtokenize&gt;
+      &lt;to uri=&quot;activemq:queue:order&quot;/&gt;
+    &lt;/split&gt;
+  &lt;/route&gt;
+&lt;/camelContext&gt;]]></script>
+</div></div><p><span style="line-height: 1.4285715;">&#160;</span><span style="line-height: 1.4285715;">or setting the extraction mode explicitly as</span></p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
+<script class="theme: Default; brush: xml; gutter: false" type="syntaxhighlighter"><![CDATA[    ...
+    &lt;xtokenize mode=&quot;i&quot;&gt;//ns1:order&lt;/xtokenize&gt;
+    ...]]></script>
+</div></div><p>Note that this StAX based tokenizer's uses StAX Location API and requires a StAX Reader implementation (e.g., wookdstox) that correctly returns the beginning of each evenat each event.</p><h4 id="Splitter-SplittingfilesbygroupingNlinestogether">Splitting files by grouping N lines together</h4><p><strong>Available as of Camel 2.10</strong></p><p>The <a shape="rect" href="tokenizer.html">Tokenizer</a> language has a new option <code>group</code> that allows you to group N parts together, for example to split big files into chunks of 1000 lines.</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
 <script class="theme: Default; brush: java; gutter: false" type="syntaxhighlighter"><![CDATA[  from(&quot;file:inbox&quot;)
     .split().tokenize(&quot;\n&quot;, 1000).streaming()
        .to(&quot;activemq:queue:order&quot;);