You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@storm.apache.org by pt...@apache.org on 2014/05/27 20:39:09 UTC

svn commit: r1597847 [4/6] - in /incubator/storm/site: _posts/ publish/ publish/2012/08/02/ publish/2012/09/06/ publish/2013/01/11/ publish/2013/12/08/ publish/2014/04/10/ publish/2014/04/17/ publish/2014/04/19/ publish/2014/04/21/ publish/2014/04/22/ ...

Modified: incubator/storm/site/publish/documentation/Multilang-protocol.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Multilang-protocol.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Multilang-protocol.html (original)
+++ incubator/storm/site/publish/documentation/Multilang-protocol.html Tue May 27 18:39:07 2014
@@ -74,7 +74,7 @@
 <p>Support for multiple languages is implemented via the ShellBolt,
 ShellSpout, and ShellProcess classes.  These classes implement the
 IBolt and ISpout interfaces and the protocol for executing a script or
-program via the shell using Java&#8217;s ProcessBuilder class.</p>
+program via the shell using Java’s ProcessBuilder class.</p>
 
 <h2 id="output-fields">Output fields</h2>
 
@@ -104,8 +104,8 @@ directory just needs to be on the classp
 <ul>
   <li>Both ends of this protocol use a line-reading mechanism, so be sure to
 trim off newlines from the input and to append them to your output.</li>
-  <li>All JSON inputs and outputs are terminated by a single line containing &#8220;end&#8221;. Note that this delimiter is not itself JSON encoded.</li>
-  <li>The bullet points below are written from the perspective of the script writer&#8217;s
+  <li>All JSON inputs and outputs are terminated by a single line containing “end”. Note that this delimiter is not itself JSON encoded.</li>
+  <li>The bullet points below are written from the perspective of the script writer’s
 STDIN and STDOUT.</li>
 </ul>
 
@@ -153,19 +153,19 @@ file lets the supervisor know the PID so
   <li>STDIN: Either a next, ack, or fail command.</li>
 </ul>
 
-<p>&#8220;next&#8221; is the equivalent of ISpout&#8217;s <code>nextTuple</code>. It looks like:</p>
+<p>“next” is the equivalent of ISpout’s <code>nextTuple</code>. It looks like:</p>
 
 <p><code>
 {"command": "next"}
 </code></p>
 
-<p>&#8220;ack&#8221; looks like:</p>
+<p>“ack” looks like:</p>
 
 <p><code>
 {"command": "ack", "id": "1231231"}
 </code></p>
 
-<p>&#8220;fail&#8221; looks like:</p>
+<p>“fail” looks like:</p>
 
 <p><code>
 {"command": "fail", "id": "1231231"}
@@ -195,7 +195,7 @@ be a sequence of emits and logs.</li>
 
 <p>If not doing an emit direct, you will immediately receive the task ids to which the tuple was emitted on STDIN as a JSON array.</p>
 
-<p>A &#8220;log&#8221; will log a message in the worker log. It looks like:</p>
+<p>A “log” will log a message in the worker log. It looks like:</p>
 
 <p><code>
 {
@@ -206,7 +206,7 @@ be a sequence of emits and logs.</li>
 </code></p>
 
 <ul>
-  <li>STDOUT: a &#8220;sync&#8221; command ends the sequence of emits and logs. It looks like:</li>
+  <li>STDOUT: a “sync” command ends the sequence of emits and logs. It looks like:</li>
 </ul>
 
 <p><code>
@@ -286,7 +286,7 @@ emits, however.</p>
 }
 </code></p>
 
-<p>A &#8220;log&#8221; will log a message in the worker log. It looks like:</p>
+<p>A “log” will log a message in the worker log. It looks like:</p>
 
 <p><code>
 {
@@ -298,7 +298,7 @@ emits, however.</p>
 
 <ul>
   <li>Note that, as of version 0.7.1, there is no longer any need for a
-shell bolt to &#8216;sync&#8217;.</li>
+shell bolt to ‘sync’.</li>
 </ul>
 
 </div>

Modified: incubator/storm/site/publish/documentation/Project-ideas.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Project-ideas.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Project-ideas.html (original)
+++ incubator/storm/site/publish/documentation/Project-ideas.html Tue May 27 18:39:07 2014
@@ -66,9 +66,9 @@
 </div>
 <div id="aboutcontent">
 <ul>
-  <li><strong>DSLs for non-JVM languages:</strong> These DSL&#8217;s should be all-inclusive and not require any Java for the creation of topologies, spouts, or bolts. Since topologies are <a href="http://thrift.apache.org/">Thrift</a> structs, Nimbus is a Thrift service, and bolts can be written in any language, this is possible.</li>
+  <li><strong>DSLs for non-JVM languages:</strong> These DSL’s should be all-inclusive and not require any Java for the creation of topologies, spouts, or bolts. Since topologies are <a href="http://thrift.apache.org/">Thrift</a> structs, Nimbus is a Thrift service, and bolts can be written in any language, this is possible.</li>
   <li><strong>Online machine learning algorithms:</strong> Something like <a href="http://mahout.apache.org/">Mahout</a> but for online algorithms</li>
-  <li><strong>Suite of performance benchmarks:</strong> These benchmarks should test Storm&#8217;s performance on CPU and IO intensive workloads. There should be benchmarks for different classes of applications, such as stream processing (where throughput is the priority) and distributed RPC (where latency is the priority). </li>
+  <li><strong>Suite of performance benchmarks:</strong> These benchmarks should test Storm’s performance on CPU and IO intensive workloads. There should be benchmarks for different classes of applications, such as stream processing (where throughput is the priority) and distributed RPC (where latency is the priority). </li>
 </ul>
 
 </div>

Modified: incubator/storm/site/publish/documentation/Rationale.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Rationale.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Rationale.html (original)
+++ incubator/storm/site/publish/documentation/Rationale.html Tue May 27 18:39:07 2014
@@ -65,9 +65,9 @@
   </ul>
 </div>
 <div id="aboutcontent">
-<p>The past decade has seen a revolution in data processing. MapReduce, Hadoop, and related technologies have made it possible to store and process data at scales previously unthinkable. Unfortunately, these data processing technologies are not realtime systems, nor are they meant to be. There&#8217;s no hack that will turn Hadoop into a realtime system; realtime data processing has a fundamentally different set of requirements than batch processing.</p>
+<p>The past decade has seen a revolution in data processing. MapReduce, Hadoop, and related technologies have made it possible to store and process data at scales previously unthinkable. Unfortunately, these data processing technologies are not realtime systems, nor are they meant to be. There’s no hack that will turn Hadoop into a realtime system; realtime data processing has a fundamentally different set of requirements than batch processing.</p>
 
-<p>However, realtime data processing at massive scale is becoming more and more of a requirement for businesses. The lack of a &#8220;Hadoop of realtime&#8221; has become the biggest hole in the data processing ecosystem.</p>
+<p>However, realtime data processing at massive scale is becoming more and more of a requirement for businesses. The lack of a “Hadoop of realtime” has become the biggest hole in the data processing ecosystem.</p>
 
 <p>Storm fills that hole.</p>
 
@@ -75,27 +75,27 @@
 
 <ol>
   <li><strong>Tedious</strong>: You spend most of your development time configuring where to send messages, deploying workers, and deploying intermediate queues. The realtime processing logic that you care about corresponds to a relatively small percentage of your codebase.</li>
-  <li><strong>Brittle</strong>: There&#8217;s little fault-tolerance. You&#8217;re responsible for keeping each worker and queue up.</li>
+  <li><strong>Brittle</strong>: There’s little fault-tolerance. You’re responsible for keeping each worker and queue up.</li>
   <li><strong>Painful to scale</strong>: When the message throughput get too high for a single worker or queue, you need to partition how the data is spread around. You need to reconfigure the other workers to know the new locations to send messages. This introduces moving parts and new pieces that can fail.</li>
 </ol>
 
-<p>Although the queues and workers paradigm breaks down for large numbers of messages, message processing is clearly the fundamental paradigm for realtime computation. The question is: how do you do it in a way that doesn&#8217;t lose data, scales to huge volumes of messages, and is dead-simple to use and operate?</p>
+<p>Although the queues and workers paradigm breaks down for large numbers of messages, message processing is clearly the fundamental paradigm for realtime computation. The question is: how do you do it in a way that doesn’t lose data, scales to huge volumes of messages, and is dead-simple to use and operate?</p>
 
 <p>Storm satisfies these goals. </p>
 
 <h2 id="why-storm-is-important">Why Storm is important</h2>
 
-<p>Storm exposes a set of primitives for doing realtime computation. Like how MapReduce greatly eases the writing of parallel batch processing, Storm&#8217;s primitives greatly ease the writing of parallel realtime computation.</p>
+<p>Storm exposes a set of primitives for doing realtime computation. Like how MapReduce greatly eases the writing of parallel batch processing, Storm’s primitives greatly ease the writing of parallel realtime computation.</p>
 
 <p>The key properties of Storm are:</p>
 
 <ol>
-  <li><strong>Extremely broad set of use cases</strong>: Storm can be used for processing messages and updating databases (stream processing), doing a continuous query on data streams and streaming the results into clients (continuous computation), parallelizing an intense query like a search query on the fly (distributed RPC), and more. Storm&#8217;s small set of primitives satisfy a stunning number of use cases.</li>
-  <li><strong>Scalable</strong>: Storm scales to massive numbers of messages per second. To scale a topology, all you have to do is add machines and increase the parallelism settings of the topology. As an example of Storm&#8217;s scale, one of Storm&#8217;s initial applications processed 1,000,000 messages per second on a 10 node cluster, including hundreds of database calls per second as part of the topology. Storm&#8217;s usage of Zookeeper for cluster coordination makes it scale to much larger cluster sizes.</li>
+  <li><strong>Extremely broad set of use cases</strong>: Storm can be used for processing messages and updating databases (stream processing), doing a continuous query on data streams and streaming the results into clients (continuous computation), parallelizing an intense query like a search query on the fly (distributed RPC), and more. Storm’s small set of primitives satisfy a stunning number of use cases.</li>
+  <li><strong>Scalable</strong>: Storm scales to massive numbers of messages per second. To scale a topology, all you have to do is add machines and increase the parallelism settings of the topology. As an example of Storm’s scale, one of Storm’s initial applications processed 1,000,000 messages per second on a 10 node cluster, including hundreds of database calls per second as part of the topology. Storm’s usage of Zookeeper for cluster coordination makes it scale to much larger cluster sizes.</li>
   <li><strong>Guarantees no data loss</strong>: A realtime system must have strong guarantees about data being successfully processed. A system that drops data has a very limited set of use cases. Storm guarantees that every message will be processed, and this is in direct contrast with other systems like S4. </li>
   <li><strong>Extremely robust</strong>: Unlike systems like Hadoop, which are notorious for being difficult to manage, Storm clusters just work. It is an explicit goal of the Storm project to make the user experience of managing Storm clusters as painless as possible.</li>
   <li><strong>Fault-tolerant</strong>: If there are faults during execution of your computation, Storm will reassign tasks as necessary. Storm makes sure that a computation can run forever (or until you kill the computation).</li>
-  <li><strong>Programming language agnostic</strong>: Robust and scalable realtime processing shouldn&#8217;t be limited to a single platform. Storm topologies and processing components can be defined in any language, making Storm accessible to nearly anyone.</li>
+  <li><strong>Programming language agnostic</strong>: Robust and scalable realtime processing shouldn’t be limited to a single platform. Storm topologies and processing components can be defined in any language, making Storm accessible to nearly anyone.</li>
 </ol>
 
 </div>

Modified: incubator/storm/site/publish/documentation/Running-topologies-on-a-production-cluster.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Running-topologies-on-a-production-cluster.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Running-topologies-on-a-production-cluster.html (original)
+++ incubator/storm/site/publish/documentation/Running-topologies-on-a-production-cluster.html Tue May 27 18:39:07 2014
@@ -78,9 +78,9 @@ conf.setMaxSpoutPending(5000);
 StormSubmitter.submitTopology("mytopology", conf, topology);
 </code></p>
 
-<p>3) Create a jar containing your code and all the dependencies of your code (except for Storm &#8211; the Storm jars will be added to the classpath on the worker nodes).</p>
+<p>3) Create a jar containing your code and all the dependencies of your code (except for Storm – the Storm jars will be added to the classpath on the worker nodes).</p>
 
-<p>If you&#8217;re using Maven, the <a href="http://maven.apache.org/plugins/maven-assembly-plugin/">Maven Assembly Plugin</a> can do the packaging for you. Just add this to your pom.xml:</p>
+<p>If you’re using Maven, the <a href="http://maven.apache.org/plugins/maven-assembly-plugin/">Maven Assembly Plugin</a> can do the packaging for you. Just add this to your pom.xml:</p>
 
 <p>```xml</p>
 <plugin>
@@ -103,19 +103,19 @@ Then run mvn assembly:assembly to get an
 
 <p><code>storm jar path/to/allmycode.jar org.me.MyTopology arg1 arg2 arg3</code></p>
 
-<p><code>storm jar</code> will submit the jar to the cluster and configure the <code>StormSubmitter</code> class to talk to the right cluster. In this example, after uploading the jar <code>storm jar</code> calls the main function on <code>org.me.MyTopology</code> with the arguments &#8220;arg1&#8221;, &#8220;arg2&#8221;, and &#8220;arg3&#8221;.</p>
+<p><code>storm jar</code> will submit the jar to the cluster and configure the <code>StormSubmitter</code> class to talk to the right cluster. In this example, after uploading the jar <code>storm jar</code> calls the main function on <code>org.me.MyTopology</code> with the arguments “arg1”, “arg2”, and “arg3”.</p>
 
 <p>You can find out how to configure your <code>storm</code> client to talk to a Storm cluster on <a href="Setting-up-development-environment.html">Setting up development environment</a>.</p>
 
 <h3 id="common-configurations">Common configurations</h3>
 
-<p>There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found <a href="/apidocs/backtype/storm/Config.html">here</a>. The ones prefixed with &#8220;TOPOLOGY&#8221; can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden). Here are some common ones that are set for a topology:</p>
+<p>There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found <a href="/apidocs/backtype/storm/Config.html">here</a>. The ones prefixed with “TOPOLOGY” can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden). Here are some common ones that are set for a topology:</p>
 
 <ol>
   <li><strong>Config.TOPOLOGY_WORKERS</strong>: This sets the number of worker processes to use to execute the topology. For example, if you set this to 25, there will be 25 Java processes across the cluster executing all the tasks. If you had a combined 150 parallelism across all components in the topology, each worker process will have 6 tasks running within it as threads.</li>
-  <li><strong>Config.TOPOLOGY_ACKERS</strong>: This sets the number of tasks that will track tuple trees and detect when a spout tuple has been fully processed. Ackers are an integral part of Storm&#8217;s reliability model and you can read more about them on <a href="Guaranteeing-message-processing.html">Guaranteeing message processing</a>.</li>
+  <li><strong>Config.TOPOLOGY_ACKERS</strong>: This sets the number of tasks that will track tuple trees and detect when a spout tuple has been fully processed. Ackers are an integral part of Storm’s reliability model and you can read more about them on <a href="Guaranteeing-message-processing.html">Guaranteeing message processing</a>.</li>
   <li><strong>Config.TOPOLOGY_MAX_SPOUT_PENDING</strong>: This sets the maximum number of spout tuples that can be pending on a single spout task at once (pending means the tuple has not been acked or failed yet). It is highly recommended you set this config to prevent queue explosion.</li>
-  <li><strong>Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS</strong>: This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed. This value defaults to 30 seconds, which is sufficient for most topologies. See <a href="Guaranteeing-message-processing.html">Guaranteeing message processing</a> for more information on how Storm&#8217;s reliability model works.</li>
+  <li><strong>Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS</strong>: This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed. This value defaults to 30 seconds, which is sufficient for most topologies. See <a href="Guaranteeing-message-processing.html">Guaranteeing message processing</a> for more information on how Storm’s reliability model works.</li>
   <li><strong>Config.TOPOLOGY_SERIALIZATIONS</strong>: You can register more serializers to Storm using this config so that you can use custom types within tuples.</li>
 </ol>
 
@@ -127,7 +127,7 @@ Then run mvn assembly:assembly to get an
 
 <p>Give the same name to <code>storm kill</code> as you used when submitting the topology.</p>
 
-<p>Storm won&#8217;t kill the topology immediately. Instead, it deactivates all the spouts so that they don&#8217;t emit any more tuples, and then Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds before destroying all the workers. This gives the topology enough time to complete any tuples it was processing when it got killed.</p>
+<p>Storm won’t kill the topology immediately. Instead, it deactivates all the spouts so that they don’t emit any more tuples, and then Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds before destroying all the workers. This gives the topology enough time to complete any tuples it was processing when it got killed.</p>
 
 <h3 id="updating-a-running-topology">Updating a running topology</h3>
 

Modified: incubator/storm/site/publish/documentation/Serialization-(prior-to-0.6.0).html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Serialization-%28prior-to-0.6.0%29.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Serialization-(prior-to-0.6.0).html (original)
+++ incubator/storm/site/publish/documentation/Serialization-(prior-to-0.6.0).html Tue May 27 18:39:07 2014
@@ -65,21 +65,21 @@
   </ul>
 </div>
 <div id="aboutcontent">
-<p>Tuples can be comprised of objects of any types. Since Storm is a distributed system, it needs to know how to serialize and deserialize objects when they&#8217;re passed between tasks. By default Storm can serialize ints, shorts, longs, floats, doubles, bools, bytes, strings, and byte arrays, but if you want to use another type in your tuples, you&#8217;ll need to implement a custom serializer.</p>
+<p>Tuples can be comprised of objects of any types. Since Storm is a distributed system, it needs to know how to serialize and deserialize objects when they’re passed between tasks. By default Storm can serialize ints, shorts, longs, floats, doubles, bools, bytes, strings, and byte arrays, but if you want to use another type in your tuples, you’ll need to implement a custom serializer.</p>
 
 <h3 id="dynamic-typing">Dynamic typing</h3>
 
-<p>There are no type declarations for fields in a Tuple. You put objects in fields and Storm figures out the serialization dynamically. Before we get to the interface for serialization, let&#8217;s spend a moment understanding why Storm&#8217;s tuples are dynamically typed.</p>
+<p>There are no type declarations for fields in a Tuple. You put objects in fields and Storm figures out the serialization dynamically. Before we get to the interface for serialization, let’s spend a moment understanding why Storm’s tuples are dynamically typed.</p>
 
-<p>Adding static typing to tuple fields would add large amount of complexity to Storm&#8217;s API. Hadoop, for example, statically types its keys and values but requires a huge amount of annotations on the part of the user. Hadoop&#8217;s API is a burden to use and the &#8220;type safety&#8221; isn&#8217;t worth it. Dynamic typing is simply easier to use.</p>
+<p>Adding static typing to tuple fields would add large amount of complexity to Storm’s API. Hadoop, for example, statically types its keys and values but requires a huge amount of annotations on the part of the user. Hadoop’s API is a burden to use and the “type safety” isn’t worth it. Dynamic typing is simply easier to use.</p>
 
-<p>Further than that, it&#8217;s not possible to statically type Storm&#8217;s tuples in any reasonable way. Suppose a Bolt subscribes to multiple streams. The tuples from all those streams may have different types across the fields. When a Bolt receives a <code>Tuple</code> in <code>execute</code>, that tuple could have come from any stream and so could have any combination of types. There might be some reflection magic you can do to declare a different method for every tuple stream a bolt subscribes to, but Storm opts for the simpler, straightforward approach of dynamic typing.</p>
+<p>Further than that, it’s not possible to statically type Storm’s tuples in any reasonable way. Suppose a Bolt subscribes to multiple streams. The tuples from all those streams may have different types across the fields. When a Bolt receives a <code>Tuple</code> in <code>execute</code>, that tuple could have come from any stream and so could have any combination of types. There might be some reflection magic you can do to declare a different method for every tuple stream a bolt subscribes to, but Storm opts for the simpler, straightforward approach of dynamic typing.</p>
 
 <p>Finally, another reason for using dynamic typing is so Storm can be used in a straightforward manner from dynamically typed languages like Clojure and JRuby.</p>
 
 <h3 id="custom-serialization">Custom serialization</h3>
 
-<p>Let&#8217;s dive into Storm&#8217;s API for defining custom serializations. There are two steps you need to take as a user to create a custom serialization: implement the serializer, and register the serializer to Storm.</p>
+<p>Let’s dive into Storm’s API for defining custom serializations. There are two steps you need to take as a user to create a custom serialization: implement the serializer, and register the serializer to Storm.</p>
 
 <h4 id="creating-a-serializer">Creating a serializer</h4>
 
@@ -95,9 +95,9 @@ public interface ISerialization&lt;T&gt;
 }
 </code></p>
 
-<p>Storm uses the <code>accept</code> method to determine if a type can be serialized by this serializer. Remember, Storm&#8217;s tuples are dynamically typed so Storm determines what serializer to use at runtime.</p>
+<p>Storm uses the <code>accept</code> method to determine if a type can be serialized by this serializer. Remember, Storm’s tuples are dynamically typed so Storm determines what serializer to use at runtime.</p>
 
-<p><code>serialize</code> writes the object out to the output stream in binary format. The field must be written in a way such that it can be deserialized later. For example, if you&#8217;re writing out a list of objects, you&#8217;ll need to write out the size of the list first so that you know how many elements to deserialize.</p>
+<p><code>serialize</code> writes the object out to the output stream in binary format. The field must be written in a way such that it can be deserialized later. For example, if you’re writing out a list of objects, you’ll need to write out the size of the list first so that you know how many elements to deserialize.</p>
 
 <p><code>deserialize</code> reads the serialized object off of the stream and returns it.</p>
 
@@ -111,7 +111,7 @@ public interface ISerialization&lt;T&gt;
 
 <p>Storm provides helpers for registering serializers in a topology config. The <a href="/apidocs/backtype/storm/Config.html">Config</a> class has a method called <code>addSerialization</code> that takes in a serializer class to add to the config.</p>
 
-<p>There&#8217;s an advanced config called Config.TOPOLOGY_SKIP_MISSING_SERIALIZATIONS. If you set this to true, Storm will ignore any serializations that are registered but do not have their code available on the classpath. Otherwise, Storm will throw errors when it can&#8217;t find a serialization. This is useful if you run many topologies on a cluster that each have different serializations, but you want to declare all the serializations across all topologies in the <code>storm.yaml</code> files.</p>
+<p>There’s an advanced config called Config.TOPOLOGY_SKIP_MISSING_SERIALIZATIONS. If you set this to true, Storm will ignore any serializations that are registered but do not have their code available on the classpath. Otherwise, Storm will throw errors when it can’t find a serialization. This is useful if you run many topologies on a cluster that each have different serializations, but you want to declare all the serializations across all topologies in the <code>storm.yaml</code> files.</p>
 
 </div>
 </div>

Modified: incubator/storm/site/publish/documentation/Serialization.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Serialization.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Serialization.html (original)
+++ incubator/storm/site/publish/documentation/Serialization.html Tue May 27 18:39:07 2014
@@ -67,34 +67,34 @@
 <div id="aboutcontent">
 <p>This page is about how the serialization system in Storm works for versions 0.6.0 and onwards. Storm used a different serialization system prior to 0.6.0 which is documented on <a href="Serialization-\(prior-to-0.6.0\).html">Serialization (prior to 0.6.0)</a>. </p>
 
-<p>Tuples can be comprised of objects of any types. Since Storm is a distributed system, it needs to know how to serialize and deserialize objects when they&#8217;re passed between tasks.</p>
+<p>Tuples can be comprised of objects of any types. Since Storm is a distributed system, it needs to know how to serialize and deserialize objects when they’re passed between tasks.</p>
 
 <p>Storm uses <a href="http://code.google.com/p/kryo/">Kryo</a> for serialization. Kryo is a flexible and fast serialization library that produces small serializations.</p>
 
-<p>By default, Storm can serialize primitive types, strings, byte arrays, ArrayList, HashMap, HashSet, and the Clojure collection types. If you want to use another type in your tuples, you&#8217;ll need to register a custom serializer.</p>
+<p>By default, Storm can serialize primitive types, strings, byte arrays, ArrayList, HashMap, HashSet, and the Clojure collection types. If you want to use another type in your tuples, you’ll need to register a custom serializer.</p>
 
 <h3 id="dynamic-typing">Dynamic typing</h3>
 
-<p>There are no type declarations for fields in a Tuple. You put objects in fields and Storm figures out the serialization dynamically. Before we get to the interface for serialization, let&#8217;s spend a moment understanding why Storm&#8217;s tuples are dynamically typed.</p>
+<p>There are no type declarations for fields in a Tuple. You put objects in fields and Storm figures out the serialization dynamically. Before we get to the interface for serialization, let’s spend a moment understanding why Storm’s tuples are dynamically typed.</p>
 
-<p>Adding static typing to tuple fields would add large amount of complexity to Storm&#8217;s API. Hadoop, for example, statically types its keys and values but requires a huge amount of annotations on the part of the user. Hadoop&#8217;s API is a burden to use and the &#8220;type safety&#8221; isn&#8217;t worth it. Dynamic typing is simply easier to use.</p>
+<p>Adding static typing to tuple fields would add large amount of complexity to Storm’s API. Hadoop, for example, statically types its keys and values but requires a huge amount of annotations on the part of the user. Hadoop’s API is a burden to use and the “type safety” isn’t worth it. Dynamic typing is simply easier to use.</p>
 
-<p>Further than that, it&#8217;s not possible to statically type Storm&#8217;s tuples in any reasonable way. Suppose a Bolt subscribes to multiple streams. The tuples from all those streams may have different types across the fields. When a Bolt receives a <code>Tuple</code> in <code>execute</code>, that tuple could have come from any stream and so could have any combination of types. There might be some reflection magic you can do to declare a different method for every tuple stream a bolt subscribes to, but Storm opts for the simpler, straightforward approach of dynamic typing.</p>
+<p>Further than that, it’s not possible to statically type Storm’s tuples in any reasonable way. Suppose a Bolt subscribes to multiple streams. The tuples from all those streams may have different types across the fields. When a Bolt receives a <code>Tuple</code> in <code>execute</code>, that tuple could have come from any stream and so could have any combination of types. There might be some reflection magic you can do to declare a different method for every tuple stream a bolt subscribes to, but Storm opts for the simpler, straightforward approach of dynamic typing.</p>
 
 <p>Finally, another reason for using dynamic typing is so Storm can be used in a straightforward manner from dynamically typed languages like Clojure and JRuby.</p>
 
 <h3 id="custom-serialization">Custom serialization</h3>
 
-<p>As mentioned, Storm uses Kryo for serialization. To implement custom serializers, you need to register new serializers with Kryo. It&#8217;s highly recommended that you read over <a href="http://code.google.com/p/kryo/">Kryo&#8217;s home page</a> to understand how it handles custom serialization.</p>
+<p>As mentioned, Storm uses Kryo for serialization. To implement custom serializers, you need to register new serializers with Kryo. It’s highly recommended that you read over <a href="http://code.google.com/p/kryo/">Kryo’s home page</a> to understand how it handles custom serialization.</p>
 
-<p>Adding custom serializers is done through the &#8220;topology.kryo.register&#8221; property in your topology config. It takes a list of registrations, where each registration can take one of two forms:</p>
+<p>Adding custom serializers is done through the “topology.kryo.register” property in your topology config. It takes a list of registrations, where each registration can take one of two forms:</p>
 
 <ol>
-  <li>The name of a class to register. In this case, Storm will use Kryo&#8217;s <code>FieldsSerializer</code> to serialize the class. This may or may not be optimal for the class &#8211; see the Kryo docs for more details.</li>
+  <li>The name of a class to register. In this case, Storm will use Kryo’s <code>FieldsSerializer</code> to serialize the class. This may or may not be optimal for the class – see the Kryo docs for more details.</li>
   <li>A map from the name of a class to register to an implementation of <a href="http://code.google.com/p/kryo/source/browse/trunk/src/com/esotericsoftware/kryo/Serializer.java">com.esotericsoftware.kryo.Serializer</a>.</li>
 </ol>
 
-<p>Let&#8217;s look at an example.</p>
+<p>Let’s look at an example.</p>
 
 <p><code>
 topology.kryo.register:
@@ -107,23 +107,23 @@ topology.kryo.register:
 
 <p>Storm provides helpers for registering serializers in a topology config. The <a href="/apidocs/backtype/storm/Config.html">Config</a> class has a method called <code>registerSerialization</code> that takes in a registration to add to the config.</p>
 
-<p>There&#8217;s an advanced config called <code>Config.TOPOLOGY_SKIP_MISSING_KRYO_REGISTRATIONS</code>. If you set this to true, Storm will ignore any serializations that are registered but do not have their code available on the classpath. Otherwise, Storm will throw errors when it can&#8217;t find a serialization. This is useful if you run many topologies on a cluster that each have different serializations, but you want to declare all the serializations across all topologies in the <code>storm.yaml</code> files.</p>
+<p>There’s an advanced config called <code>Config.TOPOLOGY_SKIP_MISSING_KRYO_REGISTRATIONS</code>. If you set this to true, Storm will ignore any serializations that are registered but do not have their code available on the classpath. Otherwise, Storm will throw errors when it can’t find a serialization. This is useful if you run many topologies on a cluster that each have different serializations, but you want to declare all the serializations across all topologies in the <code>storm.yaml</code> files.</p>
 
 <h3 id="java-serialization">Java serialization</h3>
 
-<p>If Storm encounters a type for which it doesn&#8217;t have a serialization registered, it will use Java serialization if possible. If the object can&#8217;t be serialized with Java serialization, then Storm will throw an error.</p>
+<p>If Storm encounters a type for which it doesn’t have a serialization registered, it will use Java serialization if possible. If the object can’t be serialized with Java serialization, then Storm will throw an error.</p>
 
-<p>Beware that Java serialization is extremely expensive, both in terms of CPU cost as well as the size of the serialized object. It is highly recommended that you register custom serializers when you put the topology in production. The Java serialization behavior is there so that it&#8217;s easy to prototype new topologies.</p>
+<p>Beware that Java serialization is extremely expensive, both in terms of CPU cost as well as the size of the serialized object. It is highly recommended that you register custom serializers when you put the topology in production. The Java serialization behavior is there so that it’s easy to prototype new topologies.</p>
 
 <p>You can turn off the behavior to fall back on Java serialization by setting the <code>Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION</code> config to false.</p>
 
 <h3 id="component-specific-serialization-registrations">Component-specific serialization registrations</h3>
 
-<p>Storm 0.7.0 lets you set component-specific configurations (read more about this at <a href="Configuration.html">Configuration</a>). Of course, if one component defines a serialization that serialization will need to be available to other bolts &#8211; otherwise they won&#8217;t be able to receive messages from that component!</p>
+<p>Storm 0.7.0 lets you set component-specific configurations (read more about this at <a href="Configuration.html">Configuration</a>). Of course, if one component defines a serialization that serialization will need to be available to other bolts – otherwise they won’t be able to receive messages from that component!</p>
 
 <p>When a topology is submitted, a single set of serializations is chosen to be used by all components in the topology for sending messages. This is done by merging the component-specific serializer registrations with the regular set of serialization registrations. If two components define serializers for the same class, one of the serializers is chosen arbitrarily.</p>
 
-<p>To force a serializer for a particular class if there&#8217;s a conflict between two component-specific registrations, just define the serializer you want to use in the topology-specific configuration. The topology-specific configuration has precedence over component-specific configurations for serialization registrations.</p>
+<p>To force a serializer for a particular class if there’s a conflict between two component-specific registrations, just define the serializer you want to use in the topology-specific configuration. The topology-specific configuration has precedence over component-specific configurations for serialization registrations.</p>
 
 </div>
 </div>

Modified: incubator/storm/site/publish/documentation/Setting-up-a-Storm-cluster.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Setting-up-a-Storm-cluster.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Setting-up-a-Storm-cluster.html (original)
+++ incubator/storm/site/publish/documentation/Setting-up-a-Storm-cluster.html Tue May 27 18:39:07 2014
@@ -65,18 +65,18 @@
   </ul>
 </div>
 <div id="aboutcontent">
-<p>This page outlines the steps for getting a Storm cluster up and running. If you&#8217;re on AWS, you should check out the <a href="https://github.com/nathanmarz/storm-deploy/wiki">storm-deploy</a> project. <a href="https://github.com/nathanmarz/storm-deploy/wiki">storm-deploy</a> completely automates the provisioning, configuration, and installation of Storm clusters on EC2. It also sets up Ganglia for you so you can monitor CPU, disk, and network usage.</p>
+<p>This page outlines the steps for getting a Storm cluster up and running. If you’re on AWS, you should check out the <a href="https://github.com/nathanmarz/storm-deploy/wiki">storm-deploy</a> project. <a href="https://github.com/nathanmarz/storm-deploy/wiki">storm-deploy</a> completely automates the provisioning, configuration, and installation of Storm clusters on EC2. It also sets up Ganglia for you so you can monitor CPU, disk, and network usage.</p>
 
 <p>If you run into difficulties with your Storm cluster, first check for a solution is in the <a href="Troubleshooting.html">Troubleshooting</a> page. Otherwise, email the mailing list.</p>
 
-<p>Here&#8217;s a summary of the steps for setting up a Storm cluster:</p>
+<p>Here’s a summary of the steps for setting up a Storm cluster:</p>
 
 <ol>
   <li>Set up a Zookeeper cluster</li>
   <li>Install dependencies on Nimbus and worker machines</li>
   <li>Download and extract a Storm release to Nimbus and worker machines</li>
   <li>Fill in mandatory configurations into storm.yaml</li>
-  <li>Launch daemons under supervision using &#8220;storm&#8221; script and a supervisor of your choice</li>
+  <li>Launch daemons under supervision using “storm” script and a supervisor of your choice</li>
 </ol>
 
 <h3 id="set-up-a-zookeeper-cluster">Set up a Zookeeper cluster</h3>
@@ -86,13 +86,13 @@
 <p>A few notes about Zookeeper deployment:</p>
 
 <ol>
-  <li>It&#8217;s critical that you run Zookeeper under supervision, since Zookeeper is fail-fast and will exit the process if it encounters any error case. See <a href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_supervision">here</a> for more details. </li>
-  <li>It&#8217;s critical that you set up a cron to compact Zookeeper&#8217;s data and transaction logs. The Zookeeper daemon does not do this on its own, and if you don&#8217;t set up a cron, Zookeeper will quickly run out of disk space. See <a href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_maintenance">here</a> for more details.</li>
+  <li>It’s critical that you run Zookeeper under supervision, since Zookeeper is fail-fast and will exit the process if it encounters any error case. See <a href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_supervision">here</a> for more details. </li>
+  <li>It’s critical that you set up a cron to compact Zookeeper’s data and transaction logs. The Zookeeper daemon does not do this on its own, and if you don’t set up a cron, Zookeeper will quickly run out of disk space. See <a href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_maintenance">here</a> for more details.</li>
 </ol>
 
 <h3 id="install-dependencies-on-nimbus-and-worker-machines">Install dependencies on Nimbus and worker machines</h3>
 
-<p>Next you need to install Storm&#8217;s dependencies on Nimbus and the worker machines. These are:</p>
+<p>Next you need to install Storm’s dependencies on Nimbus and the worker machines. These are:</p>
 
 <ol>
   <li>Java 6</li>
@@ -107,7 +107,7 @@
 
 <h3 id="fill-in-mandatory-configurations-into-stormyaml">Fill in mandatory configurations into storm.yaml</h3>
 
-<p>The Storm release contains a file at <code>conf/storm.yaml</code> that configures the Storm daemons. You can see the default configuration values <a href="https://github.com/apache/incubator-storm/blob/master/conf/defaults.yaml">here</a>. storm.yaml overrides anything in defaults.yaml. There&#8217;s a few configurations that are mandatory to get a working cluster:</p>
+<p>The Storm release contains a file at <code>conf/storm.yaml</code> that configures the Storm daemons. You can see the default configuration values <a href="https://github.com/apache/incubator-storm/blob/master/conf/defaults.yaml">here</a>. storm.yaml overrides anything in defaults.yaml. There’s a few configurations that are mandatory to get a working cluster:</p>
 
 <p>1) <strong>storm.zookeeper.servers</strong>: This is a list of the hosts in the Zookeeper cluster for your Storm cluster. It should look something like:</p>
 
@@ -141,14 +141,14 @@ supervisor.slots.ports:
     - 6703
 </code></p>
 
-<h3 id="launch-daemons-under-supervision-using-storm-script-and-a-supervisor-of-your-choice">Launch daemons under supervision using &#8220;storm&#8221; script and a supervisor of your choice</h3>
+<h3 id="launch-daemons-under-supervision-using-storm-script-and-a-supervisor-of-your-choice">Launch daemons under supervision using “storm” script and a supervisor of your choice</h3>
 
-<p>The last step is to launch all the Storm daemons. It is critical that you run each of these daemons under supervision. Storm is a <strong>fail-fast</strong> system which means the processes will halt whenever an unexpected error is encountered. Storm is designed so that it can safely halt at any point and recover correctly when the process is restarted. This is why Storm keeps no state in-process &#8211; if Nimbus or the Supervisors restart, the running topologies are unaffected. Here&#8217;s how to run the Storm daemons:</p>
+<p>The last step is to launch all the Storm daemons. It is critical that you run each of these daemons under supervision. Storm is a <strong>fail-fast</strong> system which means the processes will halt whenever an unexpected error is encountered. Storm is designed so that it can safely halt at any point and recover correctly when the process is restarted. This is why Storm keeps no state in-process – if Nimbus or the Supervisors restart, the running topologies are unaffected. Here’s how to run the Storm daemons:</p>
 
 <ol>
-  <li><strong>Nimbus</strong>: Run the command &#8220;bin/storm nimbus&#8221; under supervision on the master machine.</li>
-  <li><strong>Supervisor</strong>: Run the command &#8220;bin/storm supervisor&#8221; under supervision on each worker machine. The supervisor daemon is responsible for starting and stopping worker processes on that machine.</li>
-  <li><strong>UI</strong>: Run the Storm UI (a site you can access from the browser that gives diagnostics on the cluster and topologies) by running the command &#8220;bin/storm ui&#8221; under supervision. The UI can be accessed by navigating your web browser to http://{nimbus host}:8080. </li>
+  <li><strong>Nimbus</strong>: Run the command “bin/storm nimbus” under supervision on the master machine.</li>
+  <li><strong>Supervisor</strong>: Run the command “bin/storm supervisor” under supervision on each worker machine. The supervisor daemon is responsible for starting and stopping worker processes on that machine.</li>
+  <li><strong>UI</strong>: Run the Storm UI (a site you can access from the browser that gives diagnostics on the cluster and topologies) by running the command “bin/storm ui” under supervision. The UI can be accessed by navigating your web browser to http://{nimbus host}:8080. </li>
 </ol>
 
 <p>As you can see, running the daemons is very straightforward. The daemons will log to the logs/ directory in wherever you extracted the Storm release.</p>

Modified: incubator/storm/site/publish/documentation/Setting-up-development-environment.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Setting-up-development-environment.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Setting-up-development-environment.html (original)
+++ incubator/storm/site/publish/documentation/Setting-up-development-environment.html Tue May 27 18:39:07 2014
@@ -80,7 +80,7 @@
 
 <p>A Storm development environment has everything installed so that you can develop and test Storm topologies in local mode, package topologies for execution on a remote cluster, and submit/kill topologies on a remote cluster.</p>
 
-<p>Let&#8217;s quickly go over the relationship between your machine and a remote cluster. A Storm cluster is managed by a master node called &#8220;Nimbus&#8221;. Your machine communicates with Nimbus to submit code (packaged as a jar) and topologies for execution on the cluster, and Nimbus will take care of distributing that code around the cluster and assigning workers to run your topology. Your machine uses a command line client called <code>storm</code> to communicate with Nimbus. The <code>storm</code> client is only used for remote mode; it is not used for developing and testing topologies in local mode.</p>
+<p>Let’s quickly go over the relationship between your machine and a remote cluster. A Storm cluster is managed by a master node called “Nimbus”. Your machine communicates with Nimbus to submit code (packaged as a jar) and topologies for execution on the cluster, and Nimbus will take care of distributing that code around the cluster and assigning workers to run your topology. Your machine uses a command line client called <code>storm</code> to communicate with Nimbus. The <code>storm</code> client is only used for remote mode; it is not used for developing and testing topologies in local mode.</p>
 
 <h3 id="installing-a-storm-release-locally">Installing a Storm release locally</h3>
 
@@ -96,7 +96,7 @@
 nimbus.host: "123.45.678.890"
 </code></p>
 
-<p>Alternatively, if you use the <a href="https://github.com/nathanmarz/storm-deploy">storm-deploy</a> project to provision Storm clusters on AWS, it will automatically set up your ~/.storm/storm.yaml file. You can manually attach to a Storm cluster (or switch between multiple clusters) using the &#8220;attach&#8221; command, like so:</p>
+<p>Alternatively, if you use the <a href="https://github.com/nathanmarz/storm-deploy">storm-deploy</a> project to provision Storm clusters on AWS, it will automatically set up your ~/.storm/storm.yaml file. You can manually attach to a Storm cluster (or switch between multiple clusters) using the “attach” command, like so:</p>
 
 <p><code>
 lein run :deploy --attach --name mystormcluster

Modified: incubator/storm/site/publish/documentation/Storm-multi-language-protocol-(versions-0.7.0-and-below).html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Storm-multi-language-protocol-%28versions-0.7.0-and-below%29.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Storm-multi-language-protocol-(versions-0.7.0-and-below).html (original)
+++ incubator/storm/site/publish/documentation/Storm-multi-language-protocol-(versions-0.7.0-and-below).html Tue May 27 18:39:07 2014
@@ -73,7 +73,7 @@
 
 <p>Support for multiple languages is implemented via the ShellBolt class.  This
 class implements the IBolt interfaces and implements the protocol for
-executing a script or program via the shell using Java&#8217;s ProcessBuilder class.</p>
+executing a script or program via the shell using Java’s ProcessBuilder class.</p>
 
 <h2 id="output-fields">Output fields</h2>
 
@@ -99,14 +99,14 @@ directory just needs to be on the classp
 <p>Notes:
 * Both ends of this protocol use a line-reading mechanism, so be sure to
 trim off newlines from the input and to append them to your output.
-* All JSON inputs and outputs are terminated by a single line contained &#8220;end&#8221;.
-* The bullet points below are written from the perspective of the script writer&#8217;s
+* All JSON inputs and outputs are terminated by a single line contained “end”.
+* The bullet points below are written from the perspective of the script writer’s
 STDIN and STDOUT.</p>
 
 <ul>
   <li>Your script will be executed by the Bolt.</li>
   <li>STDIN: A string representing a path. This is a PID directory.
-Your script should create an empty file named with it&#8217;s pid in this directory. e.g.
+Your script should create an empty file named with it’s pid in this directory. e.g.
 the PID is 1234, so an empty file named 1234 is created in the directory. This
 file lets the supervisor know the PID so it can shutdown the process later on.</li>
   <li>STDOUT: Your PID. This is not JSON encoded, just a string. ShellBolt will log the PID to its log.</li>
@@ -169,19 +169,19 @@ file lets the supervisor know the PID so
 }
 </code></p>
 
-<p>A &#8220;log&#8221; will log a message in the worker log. It looks like:</p>
+<p>A “log” will log a message in the worker log. It looks like:</p>
 
 <p>```
 {
-	&#8220;command&#8221;: &#8220;log&#8221;,
+	“command”: “log”,
 	// the message to log
-	&#8220;msg&#8221;: &#8220;hello world!&#8221;</p>
+	“msg”: “hello world!”</p>
 
 <p>}
 ```</p>
 
 <ul>
-  <li>STDOUT: emit &#8220;sync&#8221; as a single line by itself when the bolt has finished emitting/acking/failing and is ready for the next input</li>
+  <li>STDOUT: emit “sync” as a single line by itself when the bolt has finished emitting/acking/failing and is ready for the next input</li>
 </ul>
 
 <h3 id="sync">sync</h3>

Modified: incubator/storm/site/publish/documentation/Structure-of-the-codebase.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Structure-of-the-codebase.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Structure-of-the-codebase.html (original)
+++ incubator/storm/site/publish/documentation/Structure-of-the-codebase.html Tue May 27 18:39:07 2014
@@ -65,25 +65,25 @@
   </ul>
 </div>
 <div id="aboutcontent">
-<p>There are three distinct layers to Storm&#8217;s codebase.</p>
+<p>There are three distinct layers to Storm’s codebase.</p>
 
 <p>First, Storm was designed from the very beginning to be compatible with multiple languages. Nimbus is a Thrift service and topologies are defined as Thrift structures. The usage of Thrift allows Storm to be used from any language.</p>
 
-<p>Second, all of Storm&#8217;s interfaces are specified as Java interfaces. So even though there&#8217;s a lot of Clojure in Storm&#8217;s implementation, all usage must go through the Java API. This means that every feature of Storm is always available via Java.</p>
+<p>Second, all of Storm’s interfaces are specified as Java interfaces. So even though there’s a lot of Clojure in Storm’s implementation, all usage must go through the Java API. This means that every feature of Storm is always available via Java.</p>
 
-<p>Third, Storm&#8217;s implementation is largely in Clojure. Line-wise, Storm is about half Java code, half Clojure code. But Clojure is much more expressive, so in reality the great majority of the implementation logic is in Clojure. </p>
+<p>Third, Storm’s implementation is largely in Clojure. Line-wise, Storm is about half Java code, half Clojure code. But Clojure is much more expressive, so in reality the great majority of the implementation logic is in Clojure. </p>
 
 <p>The following sections explain each of these layers in more detail.</p>
 
 <h3 id="stormthrift">storm.thrift</h3>
 
-<p>The first place to look to understand the structure of Storm&#8217;s codebase is the <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/storm.thrift">storm.thrift</a> file.</p>
+<p>The first place to look to understand the structure of Storm’s codebase is the <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/storm.thrift">storm.thrift</a> file.</p>
 
-<p>Storm uses <a href="https://github.com/nathanmarz/thrift/tree/storm">this fork</a> of Thrift (branch &#8216;storm&#8217;) to produce the generated code. This &#8220;fork&#8221; is actually Thrift 7 with all the Java packages renamed to be <code>org.apache.thrift7</code>. Otherwise, it&#8217;s identical to Thrift 7. This fork was done because of the lack of backwards compatibility in Thrift and the need for many people to use other versions of Thrift in their Storm topologies.</p>
+<p>Storm uses <a href="https://github.com/nathanmarz/thrift/tree/storm">this fork</a> of Thrift (branch ‘storm’) to produce the generated code. This “fork” is actually Thrift 7 with all the Java packages renamed to be <code>org.apache.thrift7</code>. Otherwise, it’s identical to Thrift 7. This fork was done because of the lack of backwards compatibility in Thrift and the need for many people to use other versions of Thrift in their Storm topologies.</p>
 
-<p>Every spout or bolt in a topology is given a user-specified identifier called the &#8220;component id&#8221;. The component id is used to specify subscriptions from a bolt to the output streams of other spouts or bolts. A <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/storm.thrift#L91">StormTopology</a> structure contains a map from component id to component for each type of component (spouts and bolts).</p>
+<p>Every spout or bolt in a topology is given a user-specified identifier called the “component id”. The component id is used to specify subscriptions from a bolt to the output streams of other spouts or bolts. A <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/storm.thrift#L91">StormTopology</a> structure contains a map from component id to component for each type of component (spouts and bolts).</p>
 
-<p>Spouts and bolts have the same Thrift definition, so let&#8217;s just take a look at the <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/storm.thrift#L79">Thrift definition for bolts</a>. It contains a <code>ComponentObject</code> struct and a <code>ComponentCommon</code> struct.</p>
+<p>Spouts and bolts have the same Thrift definition, so let’s just take a look at the <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/storm.thrift#L79">Thrift definition for bolts</a>. It contains a <code>ComponentObject</code> struct and a <code>ComponentCommon</code> struct.</p>
 
 <p>The <code>ComponentObject</code> defines the implementation for the bolt. It can be one of three types:</p>
 
@@ -96,13 +96,13 @@
 <p><code>ComponentCommon</code> defines everything else for this component. This includes:</p>
 
 <ol>
-  <li>What streams this component emits and the metadata for each stream (whether it&#8217;s a direct stream, the fields declaration)</li>
+  <li>What streams this component emits and the metadata for each stream (whether it’s a direct stream, the fields declaration)</li>
   <li>What streams this component consumes (specified as a map from component_id:stream_id to the stream grouping to use)</li>
   <li>The parallelism for this component</li>
   <li>The component-specific <a href="https://github.com/apache/incubator-storm/wiki/Configuration">configuration</a> for this component</li>
 </ol>
 
-<p>Note that the structure spouts also have a <code>ComponentCommon</code> field, and so spouts can also have declarations to consume other input streams. Yet the Storm Java API does not provide a way for spouts to consume other streams, and if you put any input declarations there for a spout you would get an error when you tried to submit the topology. The reason that spouts have an input declarations field is not for users to use, but for Storm itself to use. Storm adds implicit streams and bolts to the topology to set up the <a href="https://github.com/apache/incubator-storm/wiki/Guaranteeing-message-processing">acking framework</a>, and two of these implicit streams are from the acker bolt to each spout in the topology. The acker sends &#8220;ack&#8221; or &#8220;fail&#8221; messages along these streams whenever a tuple tree is detected to be completed or failed. The code that transforms the user&#8217;s topology into the runtime topology is located <a href="https://github.com/a
 pache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/common.clj#L279">here</a>.</p>
+<p>Note that the structure spouts also have a <code>ComponentCommon</code> field, and so spouts can also have declarations to consume other input streams. Yet the Storm Java API does not provide a way for spouts to consume other streams, and if you put any input declarations there for a spout you would get an error when you tried to submit the topology. The reason that spouts have an input declarations field is not for users to use, but for Storm itself to use. Storm adds implicit streams and bolts to the topology to set up the <a href="https://github.com/apache/incubator-storm/wiki/Guaranteeing-message-processing">acking framework</a>, and two of these implicit streams are from the acker bolt to each spout in the topology. The acker sends “ack” or “fail” messages along these streams whenever a tuple tree is detected to be completed or failed. The code that transforms the user’s topology into the runtime topology is located <a href="https://github.com/apache
 /incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/common.clj#L279">here</a>.</p>
 
 <h3 id="java-interfaces">Java interfaces</h3>
 
@@ -125,7 +125,7 @@
 
 <p>Spouts and bolts are serialized into the Thrift definition of the topology as described above. </p>
 
-<p>One subtle aspect of the interfaces is the difference between <code>IBolt</code> and <code>ISpout</code> vs. <code>IRichBolt</code> and <code>IRichSpout</code>. The main difference between them is the addition of the <code>declareOutputFields</code> method in the &#8220;Rich&#8221; versions of the interfaces. The reason for the split is that the output fields declaration for each output stream needs to be part of the Thrift struct (so it can be specified from any language), but as a user you want to be able to declare the streams as part of your class. What <code>TopologyBuilder</code> does when constructing the Thrift representation is call <code>declareOutputFields</code> to get the declaration and convert it into the Thrift structure. The conversion happens <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/topology/TopologyBuilder.java#L205">at this portion</a> of the <code>TopologyBuilder</code> code. </p>
+<p>One subtle aspect of the interfaces is the difference between <code>IBolt</code> and <code>ISpout</code> vs. <code>IRichBolt</code> and <code>IRichSpout</code>. The main difference between them is the addition of the <code>declareOutputFields</code> method in the “Rich” versions of the interfaces. The reason for the split is that the output fields declaration for each output stream needs to be part of the Thrift struct (so it can be specified from any language), but as a user you want to be able to declare the streams as part of your class. What <code>TopologyBuilder</code> does when constructing the Thrift representation is call <code>declareOutputFields</code> to get the declaration and convert it into the Thrift structure. The conversion happens <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/topology/TopologyBuilder.java#L205">at this portion</a> of the <code>TopologyBuilder</code> code. </p>
 
 <h3 id="implementation">Implementation</h3>
 
@@ -133,7 +133,7 @@
 
 <p>The implementation of Storm, on the other hand, is primarily in Clojure. While the codebase is about 50% Java and 50% Clojure in terms of LOC, most of the implementation logic is in Clojure. There are two notable exceptions to this, and that is the <a href="https://github.com/apache/incubator-storm/wiki/Distributed-RPC">DRPC</a> and <a href="https://github.com/apache/incubator-storm/wiki/Transactional-topologies">transactional topologies</a> implementations. These are implemented purely in Java. This was done to serve as an illustration for how to implement a higher level abstraction on Storm. The DRPC and transactional topologies implementations are in the <a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/coordination">backtype.storm.coordination</a>, <a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/drpc">backtype.storm.drpc</a>, and <a href="https://github.com/apache/incubator-storm/t
 ree/master/storm-core/src/jvm/backtype/storm/transactional">backtype.storm.transactional</a> packages.</p>
 
-<p>Here&#8217;s a summary of the purpose of the main Java packages and Clojure namespace:</p>
+<p>Here’s a summary of the purpose of the main Java packages and Clojure namespace:</p>
 
 <h4 id="java-packages">Java packages</h4>
 
@@ -153,13 +153,13 @@
 
 <p><a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/task">backtype.storm.task</a>: Definition of bolt and associated interfaces (like <code>OutputCollector</code>). Also contains <code>ShellBolt</code> which implements the protocol for defining bolts in non-JVM languages. Finally, <code>TopologyContext</code> is defined here as well, which is provided to spouts and bolts so they can get data about the topology and its execution at runtime.</p>
 
-<p><a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/testing">backtype.storm.testing</a>: Contains a variety of test bolts and utilities used in Storm&#8217;s unit tests.</p>
+<p><a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/testing">backtype.storm.testing</a>: Contains a variety of test bolts and utilities used in Storm’s unit tests.</p>
 
-<p><a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/topology">backtype.storm.topology</a>: Java layer over the underlying Thrift structure to provide a clean, pure-Java API to Storm (users don&#8217;t have to know about Thrift). <code>TopologyBuilder</code> is here as well as the helpful base classes for the different spouts and bolts. The slightly-higher level <code>IBasicBolt</code> interface is here, which is a simpler way to write certain kinds of bolts.</p>
+<p><a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/topology">backtype.storm.topology</a>: Java layer over the underlying Thrift structure to provide a clean, pure-Java API to Storm (users don’t have to know about Thrift). <code>TopologyBuilder</code> is here as well as the helpful base classes for the different spouts and bolts. The slightly-higher level <code>IBasicBolt</code> interface is here, which is a simpler way to write certain kinds of bolts.</p>
 
 <p><a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/transactional">backtype.storm.transactional</a>: Implementation of transactional topologies.</p>
 
-<p><a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/tuple">backtype.storm.tuple</a>: Implementation of Storm&#8217;s tuple data model.</p>
+<p><a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/tuple">backtype.storm.tuple</a>: Implementation of Storm’s tuple data model.</p>
 
 <p><a href="https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/tuple">backtype.storm.utils</a>: Data structures and miscellaneous utilities used throughout the codebase.</p>
 
@@ -169,15 +169,15 @@
 
 <p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/clojure.clj">backtype.storm.clojure</a>: Implementation of the Clojure DSL for Storm.</p>
 
-<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/cluster.clj">backtype.storm.cluster</a>: All Zookeeper logic used in Storm daemons is encapsulated in this file. This code manages how cluster state (like what tasks are running where, what spout/bolt each task runs as) is mapped to the Zookeeper &#8220;filesystem&#8221; API.</p>
+<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/cluster.clj">backtype.storm.cluster</a>: All Zookeeper logic used in Storm daemons is encapsulated in this file. This code manages how cluster state (like what tasks are running where, what spout/bolt each task runs as) is mapped to the Zookeeper “filesystem” API.</p>
 
 <p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/command">backtype.storm.command.*</a>: These namespaces implement various commands for the <code>storm</code> command line client. These implementations are very short.</p>
 
 <p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/config.clj">backtype.storm.config</a>: Implementation of config reading/parsing code for Clojure. Also has utility functions for determining what local path nimbus/supervisor/daemons should be using for various things. e.g. the <code>master-inbox</code> function will return the local path that Nimbus should use when jars are uploaded to it.</p>
 
-<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/acker.clj">backtype.storm.daemon.acker</a>: Implementation of the &#8220;acker&#8221; bolt, which is a key part of how Storm guarantees data processing.</p>
+<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/acker.clj">backtype.storm.daemon.acker</a>: Implementation of the “acker” bolt, which is a key part of how Storm guarantees data processing.</p>
 
-<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/common.clj">backtype.storm.daemon.common</a>: Implementation of common functions used in Storm daemons, like getting the id for a topology based on the name, mapping a user&#8217;s topology into the one that actually executes (with implicit acking streams and acker bolt added - see <code>system-topology!</code> function), and definitions for the various heartbeat and other structures persisted by Storm.</p>
+<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/common.clj">backtype.storm.daemon.common</a>: Implementation of common functions used in Storm daemons, like getting the id for a topology based on the name, mapping a user’s topology into the one that actually executes (with implicit acking streams and acker bolt added - see <code>system-topology!</code> function), and definitions for the various heartbeat and other structures persisted by Storm.</p>
 
 <p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/drpc.clj">backtype.storm.daemon.drpc</a>: Implementation of the DRPC server for use with DRPC topologies.</p>
 
@@ -197,17 +197,17 @@
 
 <p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/stats.clj">backtype.storm.stats</a>: Implementation of stats rollup routines used when sending stats to ZK for use by the UI. Does things like windowed and rolling aggregations at multiple granularities.</p>
 
-<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/testing.clj">backtype.storm.testing</a>: Implementation of facilities used to test Storm topologies. Includes time simulation, <code>complete-topology</code> for running a fixed set of tuples through a topology and capturing the output, tracker topologies for having fine grained control over detecting when a cluster is &#8220;idle&#8221;, and other utilities.</p>
+<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/testing.clj">backtype.storm.testing</a>: Implementation of facilities used to test Storm topologies. Includes time simulation, <code>complete-topology</code> for running a fixed set of tuples through a topology and capturing the output, tracker topologies for having fine grained control over detecting when a cluster is “idle”, and other utilities.</p>
 
 <p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/thrift.clj">backtype.storm.thrift</a>: Clojure wrappers around the generated Thrift API to make working with Thrift structures more pleasant.</p>
 
-<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/timer.clj">backtype.storm.timer</a>: Implementation of a background timer to execute functions in the future or on a recurring interval. Storm couldn&#8217;t use the <a href="http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Timer.html">Timer</a> class because it needed integration with time simulation in order to be able to unit test Nimbus and the Supervisor.</p>
+<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/timer.clj">backtype.storm.timer</a>: Implementation of a background timer to execute functions in the future or on a recurring interval. Storm couldn’t use the <a href="http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Timer.html">Timer</a> class because it needed integration with time simulation in order to be able to unit test Nimbus and the Supervisor.</p>
 
 <p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/ui">backtype.storm.ui.*</a>: Implementation of Storm UI. Completely independent from rest of code base and uses the Nimbus Thrift API to get data.</p>
 
 <p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/util.clj">backtype.storm.util</a>: Contains generic utility functions used throughout the code base.</p>
 
-<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/zookeeper.clj">backtype.storm.zookeeper</a>: Clojure wrapper around the Zookeeper API and implements some &#8220;high-level&#8221; stuff like &#8220;mkdirs&#8221; and &#8220;delete-recursive&#8221;.</p>
+<p><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/zookeeper.clj">backtype.storm.zookeeper</a>: Clojure wrapper around the Zookeeper API and implements some “high-level” stuff like “mkdirs” and “delete-recursive”.</p>
 
 </div>
 </div>