You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@storm.apache.org by na...@apache.org on 2014/05/25 19:47:13 UTC
svn commit: r1597454 [9/9] - in /incubator/storm/site: ./ publish/ publish/about/ publish/documentation/

Modified: incubator/storm/site/publish/documentation/Tutorial.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Tutorial.html?rev=1597454&r1=1597453&r2=1597454&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Tutorial.html (original)
+++ incubator/storm/site/publish/documentation/Tutorial.html Sun May 25 17:47:12 2014
@@ -65,210 +65,215 @@
   </ul>
 </div>
 <div id="aboutcontent">
-<p>In this tutorial, you'll learn how to create Storm topologies and deploy them to a Storm cluster. Java will be the main language used, but a few examples will use Python to illustrate Storm's multi-language capabilities.</p>
+<p>In this tutorial, you&#8217;ll learn how to create Storm topologies and deploy them to a Storm cluster. Java will be the main language used, but a few examples will use Python to illustrate Storm&#8217;s multi-language capabilities.</p>
 
-<h2>Preliminaries</h2>
+<h2 id="preliminaries">Preliminaries</h2>
 
-<p>This tutorial uses examples from the <a href="http://github.com/nathanmarz/storm-starter">storm-starter</a> project. It's recommended that you clone the project and follow along with the examples. Read <a href="Setting-up-development-environment.html">Setting up a development environment</a> and <a href="Creating-a-new-Storm-project.html">Creating a new Storm project</a> to get your machine set up.</p>
+<p>This tutorial uses examples from the <a href="http://github.com/nathanmarz/storm-starter">storm-starter</a> project. It&#8217;s recommended that you clone the project and follow along with the examples. Read <a href="Setting-up-development-environment.html">Setting up a development environment</a> and <a href="Creating-a-new-Storm-project.html">Creating a new Storm project</a> to get your machine set up. </p>
 
-<h2>Components of a Storm cluster</h2>
+<h2 id="components-of-a-storm-cluster">Components of a Storm cluster</h2>
 
-<p>A Storm cluster is superficially similar to a Hadoop cluster. Whereas on Hadoop you run "MapReduce jobs", on Storm you run "topologies". "Jobs" and "topologies" themselves are very different -- one key difference is that a MapReduce job eventually finishes, whereas a topology processes messages forever (or until you kill it).</p>
+<p>A Storm cluster is superficially similar to a Hadoop cluster. Whereas on Hadoop you run &#8220;MapReduce jobs&#8221;, on Storm you run &#8220;topologies&#8221;. &#8220;Jobs&#8221; and &#8220;topologies&#8221; themselves are very different &#8211; one key difference is that a MapReduce job eventually finishes, whereas a topology processes messages forever (or until you kill it).</p>
 
-<p>There are two kinds of nodes on a Storm cluster: the master node and the worker nodes. The master node runs a daemon called "Nimbus" that is similar to Hadoop's "JobTracker". Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.</p>
+<p>There are two kinds of nodes on a Storm cluster: the master node and the worker nodes. The master node runs a daemon called &#8220;Nimbus&#8221; that is similar to Hadoop&#8217;s &#8220;JobTracker&#8221;. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.</p>
 
-<p>Each worker node runs a daemon called the "Supervisor". The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Each worker process executes a subset of a topology; a running topology consists of many worker processes spread across many machines.</p>
+<p>Each worker node runs a daemon called the &#8220;Supervisor&#8221;. The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Each worker process executes a subset of a topology; a running topology consists of many worker processes spread across many machines.</p>
 
 <p><img src="images/storm-cluster.png" alt="Storm cluster" /></p>
 
-<p>All coordination between Nimbus and the Supervisors is done through a <a href="http://zookeeper.apache.org/">Zookeeper</a> cluster. Additionally, the Nimbus daemon and Supervisor daemons are fail-fast and stateless; all state is kept in Zookeeper or on local disk. This means you can kill -9 Nimbus or the Supervisors and they'll start back up like nothing happened. This design leads to Storm clusters being incredibly stable.</p>
+<p>All coordination between Nimbus and the Supervisors is done through a <a href="http://zookeeper.apache.org/">Zookeeper</a> cluster. Additionally, the Nimbus daemon and Supervisor daemons are fail-fast and stateless; all state is kept in Zookeeper or on local disk. This means you can kill -9 Nimbus or the Supervisors and they&#8217;ll start back up like nothing happened. This design leads to Storm clusters being incredibly stable.</p>
 
-<h2>Topologies</h2>
+<h2 id="topologies">Topologies</h2>
 
-<p>To do realtime computation on Storm, you create what are called "topologies". A topology is a graph of computation. Each node in a topology contains processing logic, and links between nodes indicate how data should be passed around between nodes.</p>
+<p>To do realtime computation on Storm, you create what are called &#8220;topologies&#8221;. A topology is a graph of computation. Each node in a topology contains processing logic, and links between nodes indicate how data should be passed around between nodes.</p>
 
 <p>Running a topology is straightforward. First, you package all your code and dependencies into a single jar. Then, you run a command like the following:</p>
 
-<pre><code>storm jar all-my-code.jar backtype.storm.MyTopology arg1 arg2
-</code></pre>
+<p><code>
+storm jar all-my-code.jar backtype.storm.MyTopology arg1 arg2
+</code></p>
 
 <p>This runs the class <code>backtype.storm.MyTopology</code> with the arguments <code>arg1</code> and <code>arg2</code>. The main function of the class defines the topology and submits it to Nimbus. The <code>storm jar</code> part takes care of connecting to Nimbus and uploading the jar.</p>
 
 <p>Since topology definitions are just Thrift structs, and Nimbus is a Thrift service, you can create and submit topologies using any programming language. The above example is the easiest way to do it from a JVM-based language. See <a href="Running-topologies-on-a-production-cluster.html">Running topologies on a production cluster</a>] for more information on starting and stopping topologies.</p>
 
-<h2>Streams</h2>
+<h2 id="streams">Streams</h2>
 
-<p>The core abstraction in Storm is the "stream". A stream is an unbounded sequence of tuples. Storm provides the primitives for transforming a stream into a new stream in a distributed and reliable way. For example, you may transform a stream of tweets into a stream of trending topics.</p>
+<p>The core abstraction in Storm is the &#8220;stream&#8221;. A stream is an unbounded sequence of tuples. Storm provides the primitives for transforming a stream into a new stream in a distributed and reliable way. For example, you may transform a stream of tweets into a stream of trending topics.</p>
 
-<p>The basic primitives Storm provides for doing stream transformations are "spouts" and "bolts". Spouts and bolts have interfaces that you implement to run your application-specific logic.</p>
+<p>The basic primitives Storm provides for doing stream transformations are &#8220;spouts&#8221; and &#8220;bolts&#8221;. Spouts and bolts have interfaces that you implement to run your application-specific logic.</p>
 
 <p>A spout is a source of streams. For example, a spout may read tuples off of a <a href="http://github.com/nathanmarz/storm-kestrel">Kestrel</a> queue and emit them as a stream. Or a spout may connect to the Twitter API and emit a stream of tweets.</p>
 
 <p>A bolt consumes any number of input streams, does some processing, and possibly emits new streams. Complex stream transformations, like computing a stream of trending topics from a stream of tweets, require multiple steps and thus multiple bolts. Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases, and more.</p>
 
-<p>Networks of spouts and bolts are packaged into a "topology" which is the top-level abstraction that you submit to Storm clusters for execution. A topology is a graph of stream transformations where each node is a spout or bolt. Edges in the graph indicate which bolts are subscribing to which streams. When a spout or bolt emits a tuple to a stream, it sends the tuple to every bolt that subscribed to that stream.</p>
+<p>Networks of spouts and bolts are packaged into a &#8220;topology&#8221; which is the top-level abstraction that you submit to Storm clusters for execution. A topology is a graph of stream transformations where each node is a spout or bolt. Edges in the graph indicate which bolts are subscribing to which streams. When a spout or bolt emits a tuple to a stream, it sends the tuple to every bolt that subscribed to that stream.</p>
 
 <p><img src="images/topology.png" alt="A Storm topology" /></p>
 
-<p>Links between nodes in your topology indicate how tuples should be passed around. For example, if there is a link between Spout A and Bolt B, a link from Spout A to Bolt C, and a link from Bolt B to Bolt C, then everytime Spout A emits a tuple, it will send the tuple to both Bolt B and Bolt C. All of Bolt B's output tuples will go to Bolt C as well.</p>
+<p>Links between nodes in your topology indicate how tuples should be passed around. For example, if there is a link between Spout A and Bolt B, a link from Spout A to Bolt C, and a link from Bolt B to Bolt C, then everytime Spout A emits a tuple, it will send the tuple to both Bolt B and Bolt C. All of Bolt B&#8217;s output tuples will go to Bolt C as well.</p>
 
 <p>Each node in a Storm topology executes in parallel. In your topology, you can specify how much parallelism you want for each node, and then Storm will spawn that number of threads across the cluster to do the execution.</p>
 
 <p>A topology runs forever, or until you kill it. Storm will automatically reassign any failed tasks. Additionally, Storm guarantees that there will be no data loss, even if machines go down and messages are dropped.</p>
 
-<h2>Data model</h2>
+<h2 id="data-model">Data model</h2>
 
 <p>Storm uses tuples as its data model. A tuple is a named list of values, and a field in a tuple can be an object of any type. Out of the box, Storm supports all the primitive types, strings, and byte arrays as tuple field values. To use an object of another type, you just need to implement <a href="Serialization.html">a serializer</a> for the type.</p>
 
-<p>Every node in a topology must declare the output fields for the tuples it emits. For example, this bolt declares that it emits 2-tuples with the fields "double" and "triple":</p>
+<p>Every node in a topology must declare the output fields for the tuples it emits. For example, this bolt declares that it emits 2-tuples with the fields &#8220;double&#8221; and &#8220;triple&#8221;:</p>
 
-<pre><code class="java">public class DoubleAndTripleBolt extends BaseRichBolt {
-    private OutputCollectorBase _collector;
+<p>```java
+public class DoubleAndTripleBolt extends BaseRichBolt {
+    private OutputCollectorBase _collector;</p>
+
+<pre><code>@Override
+public void prepare(Map conf, TopologyContext context, OutputCollectorBase collector) {
+    _collector = collector;
+}
 
-    @Override
-    public void prepare(Map conf, TopologyContext context, OutputCollectorBase collector) {
-        _collector = collector;
-    }
-
-    @Override
-    public void execute(Tuple input) {
-        int val = input.getInteger(0);        
-        _collector.emit(input, new Values(val*2, val*3));
-        _collector.ack(input);
-    }
-
-    @Override
-    public void declareOutputFields(OutputFieldsDeclarer declarer) {
-        declarer.declare(new Fields("double", "triple"));
-    }    
+@Override
+public void execute(Tuple input) {
+    int val = input.getInteger(0);        
+    _collector.emit(input, new Values(val*2, val*3));
+    _collector.ack(input);
 }
+
+@Override
+public void declareOutputFields(OutputFieldsDeclarer declarer) {
+    declarer.declare(new Fields("double", "triple"));
+}     } ```
 </code></pre>
 
 <p>The <code>declareOutputFields</code> function declares the output fields <code>["double", "triple"]</code> for the component. The rest of the bolt will be explained in the upcoming sections.</p>
 
-<h2>A simple topology</h2>
+<h2 id="a-simple-topology">A simple topology</h2>
 
-<p>Let's take a look at a simple topology to explore the concepts more and see how the code shapes up. Let's look at the <code>ExclamationTopology</code> definition from storm-starter:</p>
+<p>Let&#8217;s take a look at a simple topology to explore the concepts more and see how the code shapes up. Let&#8217;s look at the <code>ExclamationTopology</code> definition from storm-starter:</p>
 
-<pre><code class="java">TopologyBuilder builder = new TopologyBuilder();        
+<p><code>java
+TopologyBuilder builder = new TopologyBuilder();        
 builder.setSpout("words", new TestWordSpout(), 10);        
 builder.setBolt("exclaim1", new ExclamationBolt(), 3)
         .shuffleGrouping("words");
 builder.setBolt("exclaim2", new ExclamationBolt(), 2)
         .shuffleGrouping("exclaim1");
-</code></pre>
+</code></p>
 
-<p>This topology contains a spout and two bolts. The spout emits words, and each bolt appends the string "!!!" to its input. The nodes are arranged in a line: the spout emits to the first bolt which then emits to the second bolt. If the spout emits the tuples ["bob"] and ["john"], then the second bolt will emit the words ["bob!!!!!!"] and ["john!!!!!!"].</p>
+<p>This topology contains a spout and two bolts. The spout emits words, and each bolt appends the string &#8220;!!!&#8221; to its input. The nodes are arranged in a line: the spout emits to the first bolt which then emits to the second bolt. If the spout emits the tuples [&#8220;bob&#8221;] and [&#8220;john&#8221;], then the second bolt will emit the words [&#8220;bob!!!!!!&#8221;] and [&#8220;john!!!!!!&#8221;].</p>
 
-<p>This code defines the nodes using the <code>setSpout</code> and <code>setBolt</code> methods. These methods take as input a user-specified id, an object containing the processing logic, and the amount of parallelism you want for the node. In this example, the spout is given id "words" and the bolts are given ids "exclaim1" and "exclaim2".</p>
+<p>This code defines the nodes using the <code>setSpout</code> and <code>setBolt</code> methods. These methods take as input a user-specified id, an object containing the processing logic, and the amount of parallelism you want for the node. In this example, the spout is given id &#8220;words&#8221; and the bolts are given ids &#8220;exclaim1&#8221; and &#8220;exclaim2&#8221;. </p>
 
-<p>The object containing the processing logic implements the <a href="/apidocs/backtype/storm/topology/IRichSpout.html">IRichSpout</a> interface for spouts and the <a href="/apidocs/backtype/storm/topology/IRichBolt.html">IRichBolt</a> interface for bolts.</p>
+<p>The object containing the processing logic implements the <a href="/apidocs/backtype/storm/topology/IRichSpout.html">IRichSpout</a> interface for spouts and the <a href="/apidocs/backtype/storm/topology/IRichBolt.html">IRichBolt</a> interface for bolts. </p>
 
 <p>The last parameter, how much parallelism you want for the node, is optional. It indicates how many threads should execute that component across the cluster. If you omit it, Storm will only allocate one thread for that node.</p>
 
-<p><code>setBolt</code> returns an <a href="/apidocs/backtype/storm/topology/InputDeclarer.html">InputDeclarer</a> object that is used to define the inputs to the Bolt. Here, component "exclaim1" declares that it wants to read all the tuples emitted by component "words" using a shuffle grouping, and component "exclaim2" declares that it wants to read all the tuples emitted by component "exclaim1" using a shuffle grouping. "shuffle grouping" means that tuples should be randomly distributed from the input tasks to the bolt's tasks. There are many ways to group data between components. These will be explained in a few sections.</p>
+<p><code>setBolt</code> returns an <a href="/apidocs/backtype/storm/topology/InputDeclarer.html">InputDeclarer</a> object that is used to define the inputs to the Bolt. Here, component &#8220;exclaim1&#8221; declares that it wants to read all the tuples emitted by component &#8220;words&#8221; using a shuffle grouping, and component &#8220;exclaim2&#8221; declares that it wants to read all the tuples emitted by component &#8220;exclaim1&#8221; using a shuffle grouping. &#8220;shuffle grouping&#8221; means that tuples should be randomly distributed from the input tasks to the bolt&#8217;s tasks. There are many ways to group data between components. These will be explained in a few sections.</p>
 
-<p>If you wanted component "exclaim2" to read all the tuples emitted by both component "words" and component "exclaim1", you would write component "exclaim2"'s definition like this:</p>
+<p>If you wanted component &#8220;exclaim2&#8221; to read all the tuples emitted by both component &#8220;words&#8221; and component &#8220;exclaim1&#8221;, you would write component &#8220;exclaim2&#8221;&#8217;s definition like this:</p>
 
-<pre><code class="java">builder.setBolt("exclaim2", new ExclamationBolt(), 5)
+<p><code>java
+builder.setBolt("exclaim2", new ExclamationBolt(), 5)
             .shuffleGrouping("words")
             .shuffleGrouping("exclaim1");
-</code></pre>
+</code></p>
 
 <p>As you can see, input declarations can be chained to specify multiple sources for the Bolt.</p>
 
-<p>Let's dig into the implementations of the spouts and bolts in this topology. Spouts are responsible for emitting new messages into the topology. <code>TestWordSpout</code> in this topology emits a random word from the list ["nathan", "mike", "jackson", "golda", "bertels"] as a 1-tuple every 100ms. The implementation of <code>nextTuple()</code> in TestWordSpout looks like this:</p>
+<p>Let&#8217;s dig into the implementations of the spouts and bolts in this topology. Spouts are responsible for emitting new messages into the topology. <code>TestWordSpout</code> in this topology emits a random word from the list [&#8220;nathan&#8221;, &#8220;mike&#8221;, &#8220;jackson&#8221;, &#8220;golda&#8221;, &#8220;bertels&#8221;] as a 1-tuple every 100ms. The implementation of <code>nextTuple()</code> in TestWordSpout looks like this:</p>
 
-<pre><code class="java">public void nextTuple() {
+<p><code>java
+public void nextTuple() {
     Utils.sleep(100);
     final String[] words = new String[] {"nathan", "mike", "jackson", "golda", "bertels"};
     final Random rand = new Random();
     final String word = words[rand.nextInt(words.length)];
     _collector.emit(new Values(word));
 }
-</code></pre>
+</code></p>
 
 <p>As you can see, the implementation is very straightforward.</p>
 
-<p><code>ExclamationBolt</code> appends the string "!!!" to its input. Let's take a look at the full implementation for <code>ExclamationBolt</code>:</p>
+<p><code>ExclamationBolt</code> appends the string &#8220;!!!&#8221; to its input. Let&#8217;s take a look at the full implementation for <code>ExclamationBolt</code>:</p>
 
-<pre><code class="java">public static class ExclamationBolt implements IRichBolt {
-    OutputCollector _collector;
+<p>```java
+public static class ExclamationBolt implements IRichBolt {
+    OutputCollector _collector;</p>
 
-    public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
-        _collector = collector;
-    }
-
-    public void execute(Tuple tuple) {
-        _collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
-        _collector.ack(tuple);
-    }
-
-    public void cleanup() {
-    }
-
-    public void declareOutputFields(OutputFieldsDeclarer declarer) {
-        declarer.declare(new Fields("word"));
-    }
-
-    public Map getComponentConfiguration() {
-        return null;
-    }
+<pre><code>public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
+    _collector = collector;
 }
+
+public void execute(Tuple tuple) {
+    _collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
+    _collector.ack(tuple);
+}
+
+public void cleanup() {
+}
+
+public void declareOutputFields(OutputFieldsDeclarer declarer) {
+    declarer.declare(new Fields("word"));
+}
+
+public Map getComponentConfiguration() {
+    return null;
+} } ```
 </code></pre>
 
-<p>The <code>prepare</code> method provides the bolt with an <code>OutputCollector</code> that is used for emitting tuples from this bolt. Tuples can be emitted at anytime from the bolt -- in the <code>prepare</code>, <code>execute</code>, or <code>cleanup</code> methods, or even asynchronously in another thread. This <code>prepare</code> implementation simply saves the <code>OutputCollector</code> as an instance variable to be used later on in the <code>execute</code> method.</p>
+<p>The <code>prepare</code> method provides the bolt with an <code>OutputCollector</code> that is used for emitting tuples from this bolt. Tuples can be emitted at anytime from the bolt &#8211; in the <code>prepare</code>, <code>execute</code>, or <code>cleanup</code> methods, or even asynchronously in another thread. This <code>prepare</code> implementation simply saves the <code>OutputCollector</code> as an instance variable to be used later on in the <code>execute</code> method.</p>
 
-<p>The <code>execute</code> method receives a tuple from one of the bolt's inputs. The <code>ExclamationBolt</code> grabs the first field from the tuple and emits a new tuple with the string "!!!" appended to it. If you implement a bolt that subscribes to multiple input sources, you can find out which component the <a href="/apidocs/backtype/storm/tuple/Tuple.html">Tuple</a> came from by using the <code>Tuple#getSourceComponent</code> method.</p>
+<p>The <code>execute</code> method receives a tuple from one of the bolt&#8217;s inputs. The <code>ExclamationBolt</code> grabs the first field from the tuple and emits a new tuple with the string &#8220;!!!&#8221; appended to it. If you implement a bolt that subscribes to multiple input sources, you can find out which component the <a href="/apidocs/backtype/storm/tuple/Tuple.html">Tuple</a> came from by using the <code>Tuple#getSourceComponent</code> method.</p>
 
-<p>There's a few other things going in in the <code>execute</code> method, namely that the input tuple is passed as the first argument to <code>emit</code> and the input tuple is acked on the final line. These are part of Storm's reliability API for guaranteeing no data loss and will be explained later in this tutorial.</p>
+<p>There&#8217;s a few other things going in in the <code>execute</code> method, namely that the input tuple is passed as the first argument to <code>emit</code> and the input tuple is acked on the final line. These are part of Storm&#8217;s reliability API for guaranteeing no data loss and will be explained later in this tutorial. </p>
 
-<p>The <code>cleanup</code> method is called when a Bolt is being shutdown and should cleanup any resources that were opened. There's no guarantee that this method will be called on the cluster: for example, if the machine the task is running on blows up, there's no way to invoke the method. The <code>cleanup</code> method is intended for when you run topologies in <a href="Local-mode.html">local mode</a> (where a Storm cluster is simulated in process), and you want to be able to run and kill many topologies without suffering any resource leaks.</p>
+<p>The <code>cleanup</code> method is called when a Bolt is being shutdown and should cleanup any resources that were opened. There&#8217;s no guarantee that this method will be called on the cluster: for example, if the machine the task is running on blows up, there&#8217;s no way to invoke the method. The <code>cleanup</code> method is intended for when you run topologies in <a href="Local-mode.html">local mode</a> (where a Storm cluster is simulated in process), and you want to be able to run and kill many topologies without suffering any resource leaks.</p>
 
-<p>The <code>declareOutputFields</code> method declares that the <code>ExclamationBolt</code> emits 1-tuples with one field called "word".</p>
+<p>The <code>declareOutputFields</code> method declares that the <code>ExclamationBolt</code> emits 1-tuples with one field called &#8220;word&#8221;.</p>
 
 <p>The <code>getComponentConfiguration</code> method allows you to configure various aspects of how this component runs. This is a more advanced topic that is explained further on <a href="Configuration.html">Configuration</a>.</p>
 
 <p>Methods like <code>cleanup</code> and <code>getComponentConfiguration</code> are often not needed in a bolt implementation. You can define bolts more succinctly by using a base class that provides default implementations where appropriate. <code>ExclamationBolt</code> can be written more succinctly by extending <code>BaseRichBolt</code>, like so:</p>
 
-<pre><code class="java">public static class ExclamationBolt extends BaseRichBolt {
-    OutputCollector _collector;
+<p>```java
+public static class ExclamationBolt extends BaseRichBolt {
+    OutputCollector _collector;</p>
 
-    public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
-        _collector = collector;
-    }
-
-    public void execute(Tuple tuple) {
-        _collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
-        _collector.ack(tuple);
-    }
-
-    public void declareOutputFields(OutputFieldsDeclarer declarer) {
-        declarer.declare(new Fields("word"));
-    }    
+<pre><code>public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
+    _collector = collector;
 }
+
+public void execute(Tuple tuple) {
+    _collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
+    _collector.ack(tuple);
+}
+
+public void declareOutputFields(OutputFieldsDeclarer declarer) {
+    declarer.declare(new Fields("word"));
+}     } ```
 </code></pre>
 
-<h2>Running ExclamationTopology in local mode</h2>
+<h2 id="running-exclamationtopology-in-local-mode">Running ExclamationTopology in local mode</h2>
 
-<p>Let's see how to run the <code>ExclamationTopology</code> in local mode and see that it's working.</p>
+<p>Let&#8217;s see how to run the <code>ExclamationTopology</code> in local mode and see that it&#8217;s working.</p>
 
-<p>Storm has two modes of operation: local mode and distributed mode. In local mode, Storm executes completely in process by simulating worker nodes with threads. Local mode is useful for testing and development of topologies. When you run the topologies in storm-starter, they'll run in local mode and you'll be able to see what messages each component is emitting. You can read more about running topologies in local mode on <a href="Local-mode.html">Local mode</a>.</p>
+<p>Storm has two modes of operation: local mode and distributed mode. In local mode, Storm executes completely in process by simulating worker nodes with threads. Local mode is useful for testing and development of topologies. When you run the topologies in storm-starter, they&#8217;ll run in local mode and you&#8217;ll be able to see what messages each component is emitting. You can read more about running topologies in local mode on <a href="Local-mode.html">Local mode</a>.</p>
 
-<p>In distributed mode, Storm operates as a cluster of machines. When you submit a topology to the master, you also submit all the code necessary to run the topology. The master will take care of distributing your code and allocating workers to run your topology. If workers go down, the master will reassign them somewhere else. You can read more about running topologies on a cluster on <a href="Running-topologies-on-a-production-cluster.html">Running topologies on a production cluster</a>].</p>
+<p>In distributed mode, Storm operates as a cluster of machines. When you submit a topology to the master, you also submit all the code necessary to run the topology. The master will take care of distributing your code and allocating workers to run your topology. If workers go down, the master will reassign them somewhere else. You can read more about running topologies on a cluster on <a href="Running-topologies-on-a-production-cluster.html">Running topologies on a production cluster</a>]. </p>
 
-<p>Here's the code that runs <code>ExclamationTopology</code> in local mode:</p>
+<p>Here&#8217;s the code that runs <code>ExclamationTopology</code> in local mode:</p>
 
-<pre><code class="java">Config conf = new Config();
+<p>```java
+Config conf = new Config();
 conf.setDebug(true);
-conf.setNumWorkers(2);
+conf.setNumWorkers(2);</p>
 
-LocalCluster cluster = new LocalCluster();
-cluster.submitTopology("test", conf, builder.createTopology());
+<p>LocalCluster cluster = new LocalCluster();
+cluster.submitTopology(&#8220;test&#8221;, conf, builder.createTopology());
 Utils.sleep(10000);
-cluster.killTopology("test");
+cluster.killTopology(&#8220;test&#8221;);
 cluster.shutdown();
-</code></pre>
+```</p>
 
 <p>First, the code defines an in-process cluster by creating a <code>LocalCluster</code> object. Submitting topologies to this virtual cluster is identical to submitting topologies to distributed clusters. It submits a topology to the <code>LocalCluster</code> by calling <code>submitTopology</code>, which takes as arguments a name for the running topology, a configuration for the topology, and then the topology itself.</p>
 
@@ -277,16 +282,15 @@ cluster.shutdown();
 <p>The configuration is used to tune various aspects of the running topology. The two configurations specified here are very common:</p>
 
 <ol>
-<li><strong>TOPOLOGY_WORKERS</strong> (set with <code>setNumWorkers</code>) specifies how many <em>processes</em> you want allocated around the cluster to execute the topology. Each component in the topology will execute as many <em>threads</em>. The number of threads allocated to a given component is configured through the <code>setBolt</code> and <code>setSpout</code> methods. Those <em>threads</em> exist within worker <em>processes</em>. Each worker <em>process</em> contains within it some number of <em>threads</em> for some number of components. For instance, you may have 300 threads specified across all your components and 50 worker processes specified in your config. Each worker process will execute 6 threads, each of which of could belong to a different component. You tune the performance of Storm topologies by tweaking the parallelism for each component and the number of worker processes those threads should run within.</li>
-<li><strong>TOPOLOGY_DEBUG</strong> (set with <code>setDebug</code>), when set to true, tells Storm to log every message every emitted by a component. This is useful in local mode when testing topologies, but you probably want to keep this turned off when running topologies on the cluster.</li>
+  <li><strong>TOPOLOGY_WORKERS</strong> (set with <code>setNumWorkers</code>) specifies how many <em>processes</em> you want allocated around the cluster to execute the topology. Each component in the topology will execute as many <em>threads</em>. The number of threads allocated to a given component is configured through the <code>setBolt</code> and <code>setSpout</code> methods. Those <em>threads</em> exist within worker <em>processes</em>. Each worker <em>process</em> contains within it some number of <em>threads</em> for some number of components. For instance, you may have 300 threads specified across all your components and 50 worker processes specified in your config. Each worker process will execute 6 threads, each of which of could belong to a different component. You tune the performance of Storm topologies by tweaking the parallelism for each component and the number of worker processes those threads should run within.</li>
+  <li><strong>TOPOLOGY_DEBUG</strong> (set with <code>setDebug</code>), when set to true, tells Storm to log every message every emitted by a component. This is useful in local mode when testing topologies, but you probably want to keep this turned off when running topologies on the cluster.</li>
 </ol>
 
-
-<p>There's many other configurations you can set for the topology. The various configurations are detailed on <a href="/apidocs/backtype/storm/Config.html">the Javadoc for Config</a>.</p>
+<p>There&#8217;s many other configurations you can set for the topology. The various configurations are detailed on <a href="/apidocs/backtype/storm/Config.html">the Javadoc for Config</a>.</p>
 
 <p>To learn about how to set up your development environment so that you can run topologies in local mode (such as in Eclipse), see <a href="Creating-a-new-Storm-project.html">Creating a new Storm project</a>.</p>
 
-<h2>Stream groupings</h2>
+<h2 id="stream-groupings">Stream groupings</h2>
 
 <p>A stream grouping tells a topology how to send tuples between two components. Remember, spouts and bolts execute in parallel as many tasks across the cluster. If you look at how a topology is executing at the task level, it looks something like this:</p>
 
@@ -294,74 +298,76 @@ cluster.shutdown();
 
 <p>When a task for Bolt A emits a tuple to Bolt B, which task should it send the tuple to?</p>
 
-<p>A "stream grouping" answers this question by telling Storm how to send tuples between sets of tasks. Before we dig into the different kinds of stream groupings, let's take a look at another topology from <a href="http://github.com/nathanmarz/storm-starter">storm-starter</a>. This <a href="https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/WordCountTopology.java">WordCountTopology</a> reads sentences off of a spout and streams out of <code>WordCountBolt</code> the total number of times it has seen that word before:</p>
+<p>A &#8220;stream grouping&#8221; answers this question by telling Storm how to send tuples between sets of tasks. Before we dig into the different kinds of stream groupings, let&#8217;s take a look at another topology from <a href="http://github.com/nathanmarz/storm-starter">storm-starter</a>. This <a href="https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/WordCountTopology.java">WordCountTopology</a> reads sentences off of a spout and streams out of <code>WordCountBolt</code> the total number of times it has seen that word before:</p>
 
-<pre><code class="java">TopologyBuilder builder = new TopologyBuilder();
+<p>```java
+TopologyBuilder builder = new TopologyBuilder();</p>
 
-builder.setSpout("sentences", new RandomSentenceSpout(), 5);        
-builder.setBolt("split", new SplitSentence(), 8)
-        .shuffleGrouping("sentences");
-builder.setBolt("count", new WordCount(), 12)
-        .fieldsGrouping("split", new Fields("word"));
-</code></pre>
+<p>builder.setSpout(&#8220;sentences&#8221;, new RandomSentenceSpout(), 5);      <br />
+builder.setBolt(&#8220;split&#8221;, new SplitSentence(), 8)
+        .shuffleGrouping(&#8220;sentences&#8221;);
+builder.setBolt(&#8220;count&#8221;, new WordCount(), 12)
+        .fieldsGrouping(&#8220;split&#8221;, new Fields(&#8220;word&#8221;));
+```</p>
 
 <p><code>SplitSentence</code> emits a tuple for each word in each sentence it receives, and <code>WordCount</code> keeps a map in memory from word to count. Each time <code>WordCount</code> receives a word, it updates its state and emits the new word count.</p>
 
-<p>There's a few different kinds of stream groupings.</p>
+<p>There&#8217;s a few different kinds of stream groupings.</p>
 
-<p>The simplest kind of grouping is called a "shuffle grouping" which sends the tuple to a random task. A shuffle grouping is used in the <code>WordCountTopology</code> to send tuples from <code>RandomSentenceSpout</code> to the <code>SplitSentence</code> bolt. It has the effect of evenly distributing the work of processing the tuples across all of <code>SplitSentence</code> bolt's tasks.</p>
+<p>The simplest kind of grouping is called a &#8220;shuffle grouping&#8221; which sends the tuple to a random task. A shuffle grouping is used in the <code>WordCountTopology</code> to send tuples from <code>RandomSentenceSpout</code> to the <code>SplitSentence</code> bolt. It has the effect of evenly distributing the work of processing the tuples across all of <code>SplitSentence</code> bolt&#8217;s tasks.</p>
 
-<p>A more interesting kind of grouping is the "fields grouping". A fields grouping is used between the <code>SplitSentence</code> bolt and the <code>WordCount</code> bolt. It is critical for the functioning of the <code>WordCount</code> bolt that the same word always go to the same task. Otherwise, more than one task will see the same word, and they'll each emit incorrect values for the count since each has incomplete information. A fields grouping lets you group a stream by a subset of its fields. This causes equal values for that subset of fields to go to the same task. Since <code>WordCount</code> subscribes to <code>SplitSentence</code>'s output stream using a fields grouping on the "word" field, the same word always goes to the same task and the bolt produces the correct output.</p>
+<p>A more interesting kind of grouping is the &#8220;fields grouping&#8221;. A fields grouping is used between the <code>SplitSentence</code> bolt and the <code>WordCount</code> bolt. It is critical for the functioning of the <code>WordCount</code> bolt that the same word always go to the same task. Otherwise, more than one task will see the same word, and they&#8217;ll each emit incorrect values for the count since each has incomplete information. A fields grouping lets you group a stream by a subset of its fields. This causes equal values for that subset of fields to go to the same task. Since <code>WordCount</code> subscribes to <code>SplitSentence</code>&#8217;s output stream using a fields grouping on the &#8220;word&#8221; field, the same word always goes to the same task and the bolt produces the correct output.</p>
 
 <p>Fields groupings are the basis of implementing streaming joins and streaming aggregations as well as a plethora of other use cases. Underneath the hood, fields groupings are implemented using mod hashing.</p>
 
-<p>There's a few other kinds of stream groupings. You can read more about them on <a href="Concepts.html">Concepts</a>.</p>
+<p>There&#8217;s a few other kinds of stream groupings. You can read more about them on <a href="Concepts.html">Concepts</a>. </p>
 
-<h2>Defining Bolts in other languages</h2>
+<h2 id="defining-bolts-in-other-languages">Defining Bolts in other languages</h2>
 
-<p>Bolts can be defined in any language. Bolts written in another language are executed as subprocesses, and Storm communicates with those subprocesses with JSON messages over stdin/stdout. The communication protocol just requires an ~100 line adapter library, and Storm ships with adapter libraries for Ruby, Python, and Fancy.</p>
+<p>Bolts can be defined in any language. Bolts written in another language are executed as subprocesses, and Storm communicates with those subprocesses with JSON messages over stdin/stdout. The communication protocol just requires an ~100 line adapter library, and Storm ships with adapter libraries for Ruby, Python, and Fancy. </p>
 
-<p>Here's the definition of the <code>SplitSentence</code> bolt from <code>WordCountTopology</code>:</p>
+<p>Here&#8217;s the definition of the <code>SplitSentence</code> bolt from <code>WordCountTopology</code>:</p>
 
-<pre><code class="java">public static class SplitSentence extends ShellBolt implements IRichBolt {
+<p>```java
+public static class SplitSentence extends ShellBolt implements IRichBolt {
     public SplitSentence() {
-        super("python", "splitsentence.py");
-    }
+        super(&#8220;python&#8221;, &#8220;splitsentence.py&#8221;);
+    }</p>
 
-    public void declareOutputFields(OutputFieldsDeclarer declarer) {
-        declarer.declare(new Fields("word"));
-    }
-}
+<pre><code>public void declareOutputFields(OutputFieldsDeclarer declarer) {
+    declarer.declare(new Fields("word"));
+} } ```
 </code></pre>
 
-<p><code>SplitSentence</code> overrides <code>ShellBolt</code> and declares it as running using <code>python</code> with the arguments <code>splitsentence.py</code>. Here's the implementation of <code>splitsentence.py</code>:</p>
+<p><code>SplitSentence</code> overrides <code>ShellBolt</code> and declares it as running using <code>python</code> with the arguments <code>splitsentence.py</code>. Here&#8217;s the implementation of <code>splitsentence.py</code>:</p>
 
-<pre><code class="python">import storm
+<p>```python
+import storm</p>
 
-class SplitSentenceBolt(storm.BasicBolt):
+<p>class SplitSentenceBolt(storm.BasicBolt):
     def process(self, tup):
-        words = tup.values[0].split(" ")
+        words = tup.values[0].split(&#8220; &#8220;)
         for word in words:
-          storm.emit([word])
+          storm.emit([word])</p>
 
-SplitSentenceBolt().run()
-</code></pre>
+<p>SplitSentenceBolt().run()
+```</p>
 
 <p>For more information on writing spouts and bolts in other languages, and to learn about how to create topologies in other languages (and avoid the JVM completely), see <a href="Using-non-JVM-languages-with-Storm.html">Using non-JVM languages with Storm</a>.</p>
 
-<h2>Guaranteeing message processing</h2>
+<h2 id="guaranteeing-message-processing">Guaranteeing message processing</h2>
 
-<p>Earlier on in this tutorial, we skipped over a few aspects of how tuples are emitted. Those aspects were part of Storm's reliability API: how Storm guarantees that every message coming off a spout will be fully processed. See <a href="Guaranteeing-message-processing.html">Guaranteeing message processing</a> for information on how this works and what you have to do as a user to take advantage of Storm's reliability capabilities.</p>
+<p>Earlier on in this tutorial, we skipped over a few aspects of how tuples are emitted. Those aspects were part of Storm&#8217;s reliability API: how Storm guarantees that every message coming off a spout will be fully processed. See <a href="Guaranteeing-message-processing.html">Guaranteeing message processing</a> for information on how this works and what you have to do as a user to take advantage of Storm&#8217;s reliability capabilities.</p>
 
-<h2>Transactional topologies</h2>
+<h2 id="transactional-topologies">Transactional topologies</h2>
 
-<p>Storm guarantees that every message will be played through the topology at least once. A common question asked is "how do you do things like counting on top of Storm? Won't you overcount?" Storm has a feature called transactional topologies that let you achieve exactly-once messaging semantics for most computations. Read more about transactional topologies <a href="Transactional-topologies.html">here</a>.</p>
+<p>Storm guarantees that every message will be played through the topology at least once. A common question asked is &#8220;how do you do things like counting on top of Storm? Won&#8217;t you overcount?&#8221; Storm has a feature called transactional topologies that let you achieve exactly-once messaging semantics for most computations. Read more about transactional topologies <a href="Transactional-topologies.html">here</a>. </p>
 
-<h2>Distributed RPC</h2>
+<h2 id="distributed-rpc">Distributed RPC</h2>
 
-<p>This tutorial showed how to do basic stream processing on top of Storm. There's lots more things you can do with Storm's primitives. One of the most interesting applications of Storm is Distributed RPC, where you parallelize the computation of intense functions on the fly. Read more about Distributed RPC <a href="Distributed-RPC.html">here</a>.</p>
+<p>This tutorial showed how to do basic stream processing on top of Storm. There&#8217;s lots more things you can do with Storm&#8217;s primitives. One of the most interesting applications of Storm is Distributed RPC, where you parallelize the computation of intense functions on the fly. Read more about Distributed RPC <a href="Distributed-RPC.html">here</a>. </p>
 
-<h2>Conclusion</h2>
+<h2 id="conclusion">Conclusion</h2>
 
 <p>This tutorial gave a broad overview of developing, testing, and deploying Storm topologies. The rest of the documentation dives deeper into all the aspects of using Storm.</p>
 

Modified: incubator/storm/site/publish/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Understanding-the-parallelism-of-a-Storm-topology.html?rev=1597454&r1=1597453&r2=1597454&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Understanding-the-parallelism-of-a-Storm-topology.html (original)
+++ incubator/storm/site/publish/documentation/Understanding-the-parallelism-of-a-Storm-topology.html Sun May 25 17:47:12 2014
@@ -65,17 +65,16 @@
   </ul>
 </div>
 <div id="aboutcontent">
-<h1>What makes a running topology: worker processes, executors and tasks</h1>
+<h1 id="what-makes-a-running-topology-worker-processes-executors-and-tasks">What makes a running topology: worker processes, executors and tasks</h1>
 
 <p>Storm distinguishes between the following three main entities that are used to actually run a topology in a Storm cluster:</p>
 
 <ol>
-<li>Worker processes</li>
-<li>Executors (threads)</li>
-<li>Tasks</li>
+  <li>Worker processes</li>
+  <li>Executors (threads)</li>
+  <li>Tasks</li>
 </ol>
 
-
 <p>Here is a simple illustration of their relationships:</p>
 
 <p><img src="images/relationships-worker-processes-executors-tasks.png" alt="The relationships of worker processes, executors (threads) and tasks in Storm" /></p>
@@ -86,66 +85,61 @@
 
 <p>A <em>task</em> performs the actual data processing â each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: <code>#threads â¤ #tasks</code>. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.</p>
 
-<h1>Configuring the parallelism of a topology</h1>
+<h1 id="configuring-the-parallelism-of-a-topology">Configuring the parallelism of a topology</h1>
 
-<p>Note that in Stormâs terminology "parallelism" is specifically used to describe the so-called <em>parallelism hint</em>, which means the initial number of executor (threads) of a component. In this document though we use the term "parallelism" in a more general sense to describe how you can configure not only the number of executors but also the number of worker processes and the number of tasks of a Storm topology. We will specifically call out when "parallelism" is used in the normal, narrow definition of Storm.</p>
+<p>Note that in Stormâs terminology &#8220;parallelism&#8221; is specifically used to describe the so-called <em>parallelism hint</em>, which means the initial number of executor (threads) of a component. In this document though we use the term &#8220;parallelism&#8221; in a more general sense to describe how you can configure not only the number of executors but also the number of worker processes and the number of tasks of a Storm topology. We will specifically call out when &#8220;parallelism&#8221; is used in the normal, narrow definition of Storm.</p>
 
 <p>The following sections give an overview of the various configuration options and how to set them in your code. There is more than one way of setting these options though, and the table lists only some of them. Storm currently has the following <a href="Configuration.html">order of precedence for configuration settings</a>: <code>defaults.yaml</code> &lt; <code>storm.yaml</code> &lt; topology-specific configuration &lt; internal component-specific configuration &lt; external component-specific configuration.</p>
 
-<h2>Number of worker processes</h2>
-
-<ul>
-<li>Description: How many worker processes to create <em>for the topology</em> across machines in the cluster.</li>
-<li>Configuration option: <a href="/apidocs/backtype/storm/Config.html#TOPOLOGY_WORKERS">TOPOLOGY_WORKERS</a></li>
-<li>How to set in your code (examples):
+<h2 id="number-of-worker-processes">Number of worker processes</h2>
 
 <ul>
-<li><a href="/apidocs/backtype/storm/Config.html">Config#setNumWorkers</a></li>
+  <li>Description: How many worker processes to create <em>for the topology</em> across machines in the cluster.</li>
+  <li>Configuration option: <a href="/apidocs/backtype/storm/Config.html#TOPOLOGY_WORKERS">TOPOLOGY_WORKERS</a></li>
+  <li>How to set in your code (examples):
+    <ul>
+      <li><a href="/apidocs/backtype/storm/Config.html">Config#setNumWorkers</a></li>
+    </ul>
+  </li>
 </ul>
-</li>
-</ul>
-
 
-<h2>Number of executors (threads)</h2>
+<h2 id="number-of-executors-threads">Number of executors (threads)</h2>
 
 <ul>
-<li>Description: How many executors to spawn <em>per component</em>.</li>
-<li>Configuration option: ?</li>
-<li>How to set in your code (examples):
-
-<ul>
-<li><a href="/apidocs/backtype/storm/topology/TopologyBuilder.html">TopologyBuilder#setSpout()</a></li>
-<li><a href="/apidocs/backtype/storm/topology/TopologyBuilder.html">TopologyBuilder#setBolt()</a></li>
-<li>Note that as of Storm 0.8 the <code>parallelism_hint</code> parameter now specifies the initial number of executors (not tasks!) for that bolt.</li>
-</ul>
-</li>
+  <li>Description: How many executors to spawn <em>per component</em>.</li>
+  <li>Configuration option: ?</li>
+  <li>How to set in your code (examples):
+    <ul>
+      <li><a href="/apidocs/backtype/storm/topology/TopologyBuilder.html">TopologyBuilder#setSpout()</a></li>
+      <li><a href="/apidocs/backtype/storm/topology/TopologyBuilder.html">TopologyBuilder#setBolt()</a></li>
+      <li>Note that as of Storm 0.8 the <code>parallelism_hint</code> parameter now specifies the initial number of executors (not tasks!) for that bolt.</li>
+    </ul>
+  </li>
 </ul>
 
-
-<h2>Number of tasks</h2>
-
-<ul>
-<li>Description: How many tasks to create <em>per component</em>.</li>
-<li>Configuration option: <a href="/apidocs/backtype/storm/Config.html#TOPOLOGY_TASKS">TOPOLOGY_TASKS</a></li>
-<li>How to set in your code (examples):
+<h2 id="number-of-tasks">Number of tasks</h2>
 
 <ul>
-<li><a href="/apidocs/backtype/storm/topology/ComponentConfigurationDeclarer.html">ComponentConfigurationDeclarer#setNumTasks()</a></li>
+  <li>Description: How many tasks to create <em>per component</em>.</li>
+  <li>Configuration option: <a href="/apidocs/backtype/storm/Config.html#TOPOLOGY_TASKS">TOPOLOGY_TASKS</a></li>
+  <li>How to set in your code (examples):
+    <ul>
+      <li><a href="/apidocs/backtype/storm/topology/ComponentConfigurationDeclarer.html">ComponentConfigurationDeclarer#setNumTasks()</a></li>
+    </ul>
+  </li>
 </ul>
-</li>
-</ul>
-
 
 <p>Here is an example code snippet to show these settings in practice:</p>
 
-<pre><code class="java">topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
+<p><code>java
+topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
                .setNumTasks(4)
                .shuffleGrouping("blue-spout);
-</code></pre>
+</code></p>
 
 <p>In the above code we configured Storm to run the bolt <code>GreenBolt</code> with an initial number of two executors and four associated tasks. Storm will run two tasks per executor (thread). If you do not explicitly configure the number of tasks, Storm will run by default one task per executor.</p>
 
-<h1>Example of a running topology</h1>
+<h1 id="example-of-a-running-topology">Example of a running topology</h1>
 
 <p>The following illustration shows how a simple topology would look like in operation. The topology consists of three components: one spout called <code>BlueSpout</code> and two bolts called <code>GreenBolt</code> and <code>YellowBolt</code>. The components are linked such that <code>BlueSpout</code> sends its output to <code>GreenBolt</code>, which in turns sends its own output to <code>YellowBolt</code>.</p>
 
@@ -153,62 +147,62 @@
 
 <p>The <code>GreenBolt</code> was configured as per the code snippet above whereas <code>BlueSpout</code> and <code>YellowBolt</code> only set the parallelism hint (number of executors). Here is the relevant code:</p>
 
-<pre><code class="java">Config conf = new Config();
-conf.setNumWorkers(2); // use two worker processes
+<p>```java
+Config conf = new Config();
+conf.setNumWorkers(2); // use two worker processes</p>
 
-topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2
+<p>topologyBuilder.setSpout(&#8220;blue-spout&#8221;, new BlueSpout(), 2); // set parallelism hint to 2</p>
 
-topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
+<p>topologyBuilder.setBolt(&#8220;green-bolt&#8221;, new GreenBolt(), 2)
                .setNumTasks(4)
-               .shuffleGrouping("blue-spout");
+               .shuffleGrouping(&#8220;blue-spout&#8221;);</p>
 
-topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
-               .shuffleGrouping("green-bolt");
+<p>topologyBuilder.setBolt(&#8220;yellow-bolt&#8221;, new YellowBolt(), 6)
+               .shuffleGrouping(&#8220;green-bolt&#8221;);</p>
 
-StormSubmitter.submitTopology(
-        "mytopology",
+<p>StormSubmitter.submitTopology(
+        &#8220;mytopology&#8221;,
         conf,
         topologyBuilder.createTopology()
     );
-</code></pre>
+```</p>
 
 <p>And of course Storm comes with additional configuration settings to control the parallelism of a topology, including:</p>
 
 <ul>
-<li><a href="/apidocs/backtype/storm/Config.html#TOPOLOGY_MAX_TASK_PARALLELISM">TOPOLOGY_MAX_TASK_PARALLELISM</a>: This setting puts a ceiling on the number of executors that can be spawned for a single component. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set this option via e.g. <a href="/apidocs/backtype/storm/Config.html">Config#setMaxTaskParallelism()</a>.</li>
+  <li><a href="/apidocs/backtype/storm/Config.html#TOPOLOGY_MAX_TASK_PARALLELISM">TOPOLOGY_MAX_TASK_PARALLELISM</a>: This setting puts a ceiling on the number of executors that can be spawned for a single component. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set this option via e.g. <a href="/apidocs/backtype/storm/Config.html">Config#setMaxTaskParallelism()</a>.</li>
 </ul>
 
-
-<h1>How to change the parallelism of a running topology</h1>
+<h1 id="how-to-change-the-parallelism-of-a-running-topology">How to change the parallelism of a running topology</h1>
 
 <p>A nifty feature of Storm is that you can increase or decrease the number of worker processes and/or executors without being required to restart the cluster or the topology. The act of doing so is called rebalancing.</p>
 
 <p>You have two options to rebalance a topology:</p>
 
 <ol>
-<li>Use the Storm web UI to rebalance the topology.</li>
-<li>Use the CLI tool storm rebalance as described below.</li>
+  <li>Use the Storm web UI to rebalance the topology.</li>
+  <li>Use the CLI tool storm rebalance as described below.</li>
 </ol>
 
-
 <p>Here is an example of using the CLI tool:</p>
 
-<pre><code># Reconfigure the topology "mytopology" to use 5 worker processes,
-# the spout "blue-spout" to use 3 executors and
-# the bolt "yellow-bolt" to use 10 executors.
-
-$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
-</code></pre>
-
-<h1>References for this article</h1>
-
-<ul>
-<li><a href="Concepts.html">Concepts</a></li>
-<li><a href="Configuration.html">Configuration</a></li>
-<li><a href="Running-topologies-on-a-production-cluster.html">Running topologies on a production cluster</a>]</li>
-<li><a href="Local-mode.html">Local mode</a></li>
-<li><a href="Tutorial.html">Tutorial</a></li>
-<li><a href="/apidocs/">Storm API documentation</a>, most notably the class <code>Config</code></li>
+<p>```
+# Reconfigure the topology &#8220;mytopology&#8221; to use 5 worker processes,
+# the spout &#8220;blue-spout&#8221; to use 3 executors and
+# the bolt &#8220;yellow-bolt&#8221; to use 10 executors.</p>
+
+<p>$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
+```</p>
+
+<h1 id="references-for-this-article">References for this article</h1>
+
+<ul>
+  <li><a href="Concepts.html">Concepts</a></li>
+  <li><a href="Configuration.html">Configuration</a></li>
+  <li><a href="Running-topologies-on-a-production-cluster.html">Running topologies on a production cluster</a>]</li>
+  <li><a href="Local-mode.html">Local mode</a></li>
+  <li><a href="Tutorial.html">Tutorial</a></li>
+  <li><a href="/apidocs/">Storm API documentation</a>, most notably the class <code>Config</code></li>
 </ul>
 
 

Modified: incubator/storm/site/publish/documentation/Using-non-JVM-languages-with-Storm.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Using-non-JVM-languages-with-Storm.html?rev=1597454&r1=1597453&r2=1597454&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Using-non-JVM-languages-with-Storm.html (original)
+++ incubator/storm/site/publish/documentation/Using-non-JVM-languages-with-Storm.html Sun May 25 17:47:12 2014
@@ -66,73 +66,71 @@
 </div>
 <div id="aboutcontent">
 <ul>
-<li>two pieces: creating topologies and implementing spouts and bolts in other languages</li>
-<li>creating topologies in another language is easy since topologies are just thrift structures (link to storm.thrift)</li>
-<li>implementing spouts and bolts in another language is called a "multilang components" or "shelling"
-
-<ul>
-<li>Here's a specification of the protocol: <a href="Multilang-protocol.html">Multilang protocol</a></li>
-<li>the thrift structure lets you define multilang components explicitly as a program and a script (e.g., python and the file implementing your bolt)</li>
-<li>In Java, you override ShellBolt or ShellSpout to create multilang components
-
-<ul>
-<li>note that output fields declarations happens in the thrift structure, so in Java you create multilang components like the following:
-
-<ul>
-<li> declare fields in java, processing code in the other language by specifying it in constructor of shellbolt</li>
-</ul>
-</li>
-</ul>
-</li>
-<li>multilang uses json messages over stdin/stdout to communicate with the subprocess</li>
-<li>storm comes with ruby, python, and fancy adapters that implement the protocol. show an example of python
-
-<ul>
-<li>python supports emitting, anchoring, acking, and logging</li>
-</ul>
-</li>
-</ul>
-</li>
-<li>"storm shell" command makes constructing jar and uploading to nimbus easy
-
-<ul>
-<li>makes jar and uploads it</li>
-<li>calls your program with host/port of nimbus and the jarfile id</li>
+  <li>two pieces: creating topologies and implementing spouts and bolts in other languages</li>
+  <li>creating topologies in another language is easy since topologies are just thrift structures (link to storm.thrift)</li>
+  <li>implementing spouts and bolts in another language is called a &#8220;multilang components&#8221; or &#8220;shelling&#8221;
+    <ul>
+      <li>Here&#8217;s a specification of the protocol: <a href="Multilang-protocol.html">Multilang protocol</a></li>
+      <li>the thrift structure lets you define multilang components explicitly as a program and a script (e.g., python and the file implementing your bolt)</li>
+      <li>In Java, you override ShellBolt or ShellSpout to create multilang components
+        <ul>
+          <li>note that output fields declarations happens in the thrift structure, so in Java you create multilang components like the following:
+            <ul>
+              <li>declare fields in java, processing code in the other language by specifying it in constructor of shellbolt</li>
+            </ul>
+          </li>
+        </ul>
+      </li>
+      <li>multilang uses json messages over stdin/stdout to communicate with the subprocess</li>
+      <li>storm comes with ruby, python, and fancy adapters that implement the protocol. show an example of python
+        <ul>
+          <li>python supports emitting, anchoring, acking, and logging</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>&#8220;storm shell&#8221; command makes constructing jar and uploading to nimbus easy
+    <ul>
+      <li>makes jar and uploads it</li>
+      <li>calls your program with host/port of nimbus and the jarfile id</li>
+    </ul>
+  </li>
 </ul>
-</li>
-</ul>
-
 
-<h2>Notes on implementing a DSL in a non-JVM language</h2>
+<h2 id="notes-on-implementing-a-dsl-in-a-non-jvm-language">Notes on implementing a DSL in a non-JVM language</h2>
 
 <p>The right place to start is src/storm.thrift. Since Storm topologies are just Thrift structures, and Nimbus is a Thrift daemon, you can create and submit topologies in any language.</p>
 
 <p>When you create the Thrift structs for spouts and bolts, the code for the spout or bolt is specified in the ComponentObject struct:</p>
 
-<pre><code>union ComponentObject {
+<p><code>
+union ComponentObject {
   1: binary serialized_java;
   2: ShellComponent shell;
   3: JavaObject java_object;
 }
-</code></pre>
+</code></p>
 
-<p>For a non-JVM DSL, you would want to make use of "2" and "3". ShellComponent lets you specify a script to run that component (e.g., your python code). And JavaObject lets you specify native java spouts and bolts for the component (and Storm will use reflection to create that spout or bolt).</p>
+<p>For a non-JVM DSL, you would want to make use of &#8220;2&#8221; and &#8220;3&#8221;. ShellComponent lets you specify a script to run that component (e.g., your python code). And JavaObject lets you specify native java spouts and bolts for the component (and Storm will use reflection to create that spout or bolt).</p>
 
-<p>There's a "storm shell" command that will help with submitting a topology. Its usage is like this:</p>
+<p>There&#8217;s a &#8220;storm shell&#8221; command that will help with submitting a topology. Its usage is like this:</p>
 
-<pre><code>storm shell resources/ python topology.py arg1 arg2
-</code></pre>
+<p><code>
+storm shell resources/ python topology.py arg1 arg2
+</code></p>
 
 <p>storm shell will then package resources/ into a jar, upload the jar to Nimbus, and call your topology.py script like this:</p>
 
-<pre><code>python topology.py arg1 arg2 {nimbus-host} {nimbus-port} {uploaded-jar-location}
-</code></pre>
+<p><code>
+python topology.py arg1 arg2 {nimbus-host} {nimbus-port} {uploaded-jar-location}
+</code></p>
 
-<p>Then you can connect to Nimbus using the Thrift API and submit the topology, passing {uploaded-jar-location} into the submitTopology method. For reference, here's the submitTopology definition:</p>
+<p>Then you can connect to Nimbus using the Thrift API and submit the topology, passing {uploaded-jar-location} into the submitTopology method. For reference, here&#8217;s the submitTopology definition:</p>
 
-<pre><code>void submitTopology(1: string name, 2: string uploadedJarLocation, 3: string jsonConf, 4: StormTopology topology)
+<p><code>
+void submitTopology(1: string name, 2: string uploadedJarLocation, 3: string jsonConf, 4: StormTopology topology)
     throws (1: AlreadyAliveException e, 2: InvalidTopologyException ite);
-</code></pre>
+</code></p>
 
 </div>
 </div>