You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@camel.apache.org by bu...@apache.org on 2015/12/11 12:19:20 UTC

svn commit: r975264 - in /websites/production/camel/content: apache-spark.html cache/main.pageCache

Author: buildbot
Date: Fri Dec 11 11:19:20 2015
New Revision: 975264

Log:
Production update by buildbot for camel

Modified:
    websites/production/camel/content/apache-spark.html
    websites/production/camel/content/cache/main.pageCache

Modified: websites/production/camel/content/apache-spark.html
==============================================================================
--- websites/production/camel/content/apache-spark.html (original)
+++ websites/production/camel/content/apache-spark.html Fri Dec 11 11:19:20 2015
@@ -85,17 +85,33 @@
 	<tbody>
         <tr>
         <td valign="top" width="100%">
-<div class="wiki-content maincontent"><h2 id="ApacheSpark-ApacheSparkcomponent">Apache Spark component</h2><div class="confluence-information-macro confluence-information-macro-information"><span class="aui-icon aui-icon-small aui-iconfont-info confluence-information-macro-icon"></span><div class="confluence-information-macro-body"><p>&#160;Apache Spark component is available starting from Camel <strong>2.17</strong>.</p></div></div><p>&#160;</p><p><span style="line-height: 1.5625;font-size: 16.0px;">This documentation page covers the <a shape="rect" class="external-link" href="http://spark.apache.org/">Apache Spark</a> component for the Apache Camel. The main purpose of the Spark integration with Camel is to provide a bridge between Camel connectors and Spark tasks. In particular Camel connector provides a way to route message from various transports, dynamically choose a task to execute, use incoming message as input data for that task and finally deliver the results of the execut
 ion back to the Camel pipeline.</span></p><h3 id="ApacheSpark-Supportedarchitecturalstyles"><span>Supported architectural styles</span></h3><p><span style="line-height: 1.5625;font-size: 16.0px;">Spark component can be used as a driver application deployed into an application server (or executed as a fat jar).</span></p><p><span style="line-height: 1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" height="250" src="apache-spark.data/camel_spark_driver.png" data-image-src="/confluence/download/attachments/61331559/camel_spark_driver.png?version=2&amp;modificationDate=1449478362000&amp;api=v2" data-unresolved-comment-count="0" data-linked-resource-id="61331563" data-linked-resource-version="2" data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_driver.png" data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png" data
 -linked-resource-container-id="61331559" data-linked-resource-container-version="4"></span><br clear="none"></span></p><p><span style="line-height: 1.5625;font-size: 16.0px;">Spark component can also be submitted as a job directly into the Spark cluster.</span></p><p><span style="line-height: 1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" height="250" src="apache-spark.data/camel_spark_cluster.png" data-image-src="/confluence/download/attachments/61331559/camel_spark_cluster.png?version=1&amp;modificationDate=1449478393000&amp;api=v2" data-unresolved-comment-count="0" data-linked-resource-id="61331565" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_cluster.png" data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png" data-linked-resource-container-id="61331559" data-linked-r
 esource-container-version="4"></span><br clear="none"></span></p><p><span style="line-height: 1.5625;font-size: 16.0px;">While Spark component is primary designed to work as a <em>long running job</em>&#160;serving as an bridge between Spark cluster and the other endpoints, you can also use it as a <em>fire-once</em> short job. &#160;</span></p><div><span><br clear="none"></span></div><p><span style="line-height: 1.5625;font-size: 16.0px;">&#160;</span></p><h3 id="ApacheSpark-RunningSparkinOSGiservers"><span>Running Spark in OSGi servers</span></h3><p><span style="line-height: 1.5625;font-size: 16.0px;">&#160;</span></p><p>Currently the Spark component doesn't support execution in the OSGi container. Spark has been designed to be executed as a fat jar, usually submitted as a job to a cluster. For those reasons running Spark in an OSGi server is at least challenging and is not support by Camel as well.</p><p><span style="line-height: 1.5625;font-size: 16.0px;"><br clear="none"><br cl
 ear="none"></span></p><h3 id="ApacheSpark-KuraRouteractivator"><span style="line-height: 1.5625;font-size: 16.0px;">KuraRouter activator</span></h3><p>Bundles deployed to the Eclipse&#160;Kura&#160;are usually <a shape="rect" class="external-link" href="http://eclipse.github.io/kura/doc/hello-example.html#create-java-class" rel="nofollow">developed as bundle activators</a>. So the easiest way to deploy Apache Camel routes into the Kura is to create an OSGi bundle containing the class extending <code>org.apache.camel.kura.KuraRouter</code> class:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
-<script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[public class MyKuraRouter extends KuraRouter {
-
-  @Override
-  public void configure() throws Exception {
-    from(&quot;timer:trigger&quot;).
-      to(&quot;netty-http:http://app.mydatacenter.com/api&quot;);
-  }
-
+<div class="wiki-content maincontent"><h2 id="ApacheSpark-ApacheSparkcomponent">Apache Spark component</h2><div class="confluence-information-macro confluence-information-macro-information"><span class="aui-icon aui-icon-small aui-iconfont-info confluence-information-macro-icon"></span><div class="confluence-information-macro-body"><p>&#160;Apache Spark component is available starting from Camel <strong>2.17</strong>.</p></div></div><p>&#160;</p><p><span style="line-height: 1.5625;font-size: 16.0px;">This documentation page covers the <a shape="rect" class="external-link" href="http://spark.apache.org/">Apache Spark</a> component for the Apache Camel. The main purpose of the Spark integration with Camel is to provide a bridge between Camel connectors and Spark tasks. In particular Camel connector provides a way to route message from various transports, dynamically choose a task to execute, use incoming message as input data for that task and finally deliver the results of the execut
 ion back to the Camel pipeline.</span></p><h3 id="ApacheSpark-Supportedarchitecturalstyles"><span>Supported architectural styles</span></h3><p><span style="line-height: 1.5625;font-size: 16.0px;">Spark component can be used as a driver application deployed into an application server (or executed as a fat jar).</span></p><p><span style="line-height: 1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" height="250" src="apache-spark.data/camel_spark_driver.png" data-image-src="/confluence/download/attachments/61331559/camel_spark_driver.png?version=2&amp;modificationDate=1449478362000&amp;api=v2" data-unresolved-comment-count="0" data-linked-resource-id="61331563" data-linked-resource-version="2" data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_driver.png" data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png" data
 -linked-resource-container-id="61331559" data-linked-resource-container-version="6"></span><br clear="none"></span></p><p><span style="line-height: 1.5625;font-size: 16.0px;">Spark component can also be submitted as a job directly into the Spark cluster.</span></p><p><span style="line-height: 1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" height="250" src="apache-spark.data/camel_spark_cluster.png" data-image-src="/confluence/download/attachments/61331559/camel_spark_cluster.png?version=1&amp;modificationDate=1449478393000&amp;api=v2" data-unresolved-comment-count="0" data-linked-resource-id="61331565" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_cluster.png" data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png" data-linked-resource-container-id="61331559" data-linked-r
 esource-container-version="6"></span><br clear="none"></span></p><p><span style="line-height: 1.5625;font-size: 16.0px;">While Spark component is primary designed to work as a <em>long running job</em>&#160;serving as an bridge between Spark cluster and the other endpoints, you can also use it as a <em>fire-once</em> short job. &#160;</span></p><div>&#160;</div><h3 id="ApacheSpark-RunningSparkinOSGiservers"><span>Running Spark in OSGi servers</span></h3><p>Currently the Spark component doesn't support execution in the OSGi container. Spark has been designed to be executed as a fat jar, usually submitted as a job to a cluster. For those reasons running Spark in an OSGi server is at least challenging and is not support by Camel as well.</p><p><span style="line-height: 1.5625;font-size: 16.0px;">&#160;</span></p><h3 id="ApacheSpark-URIformat">URI format</h3><p><span style="line-height: 1.5625;font-size: 16.0px;">&#160;</span></p><p>Currently the Spark component supports only producers 
 - it it intended to invoke a Spark job and return results. You can call RDD, data frame or Hive SQL job.</p><p><span style="line-height: 1.5625;font-size: 16.0px;">&#160;</span></p><div><p>&#160;</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark URI format</b></div><div class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[spark:{rdd|dataframe|hive}]]></script>
+</div></div><p>&#160;</p></div><h3 id="ApacheSpark-RDDjobs">RDD jobs&#160;</h3><div>To invoke an RDD job, use the following URI:</div><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD producer</b></div><div class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[spark:rdd?rdd=#testFileRdd&amp;rddCallback=#transformation]]></script>
+</div></div><p>&#160;Where <code>rdd</code> option refers to the name of an RDD instance (subclass of <code>org.apache.spark.api.java.AbstractJavaRDDLike</code>) from a Camel registry, while <code>rddCallback</code> refers to the implementation of&#160;<code>org.apache.camel.component.spark.RddCallback</code> class (also from a registry). RDD callback provides a single method used to apply incoming messages against the given RDD. Results of callback computations are saved as a body to an exchange.</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD callback</b></div><div class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[public interface RddCallback&lt;T&gt; {
+    T onRdd(AbstractJavaRDDLike rdd, Object... payloads);
 }]]></script>
-</div></div><p>Keep in mind that <code>KuraRouter</code> implements the&#160;<code>org.osgi.framework.BundleActivator</code>&#160;interface, so you need to register its&#160;<code>start</code>&#160;and&#160;<code>stop</code>&#160;lifecycle methods while&#160;<a shape="rect" class="external-link" href="http://eclipse.github.io/kura/doc/hello-example.html#create-component-class" rel="nofollow">creating Kura bundle component class</a>.</p><p>Kura router starts its own OSGi-aware <code>CamelContext</code>. It means that for every class extending <code>KuraRouter</code>, there will be a dedicated <code>CamelContext</code> instance. Ideally we recommend to deploy one <code>KuraRouter</code> per OSGi bundle.</p><h3 id="ApacheSpark-DeployingKuraRouter">Deploying KuraRouter</h3><p>Bundle containing your Kura router class should import the following packages in the OSGi manifest:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
+</div></div><p>The following snippet demonstrates how to send message as an input to the job and return results:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Calling spark job</b></div><div class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[String pattern = &quot;job input&quot;;
+long linesCount = producerTemplate.requestBody(&quot;spark:rdd?myRdd=#testFileRdd&amp;rddCallback=#countLinesContaining&quot;, pattern, long.class);]]></script>
+</div></div><p>The RDD callback for the snippet above registered as Spring bean could look as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD callback</b></div><div class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[@Bean
+RddCallback&lt;Long&gt; countLinesContaining() {
+    return new RddCallback&lt;Long&gt;() {
+        Long onRdd(AbstractJavaRDDLike rdd, Object... payloads) {
+            String pattern = (String) payloads[0];
+            return rdd.filter({line -&gt; line.contains(pattern)}).count();
+        }
+    }
+}]]></script>
+</div></div><p>The RDD definition in Spring could looks as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD definition</b></div><div class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[@Bean
+AbstractJavaRDDLike myRdd(JavaSparkContext sparkContext) {
+  return sparkContext.textFile(&quot;testrdd.txt&quot;);
+}]]></script>
+</div></div><h3 id="ApacheSpark-DeployingKuraRouter">Deploying KuraRouter</h3><p>Bundle containing your Kura router class should import the following packages in the OSGi manifest:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
 <script class="brush: xml; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[Import-Package: org.osgi.framework;version=&quot;1.3.0&quot;,
   org.slf4j;version=&quot;1.6.4&quot;,
   org.apache.camel,org.apache.camel.impl,org.apache.camel.core.osgi,org.apache.camel.builder,org.apache.camel.model,

Modified: websites/production/camel/content/cache/main.pageCache
==============================================================================
Binary files - no diff available.