You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@storm.apache.org by pt...@apache.org on 2014/05/27 20:39:09 UTC

svn commit: r1597847 [5/6] - in /incubator/storm/site: _posts/ publish/ publish/2012/08/02/ publish/2012/09/06/ publish/2013/01/11/ publish/2013/12/08/ publish/2014/04/10/ publish/2014/04/17/ publish/2014/04/19/ publish/2014/04/21/ publish/2014/04/22/ ...

Modified: incubator/storm/site/publish/documentation/Transactional-topologies.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Transactional-topologies.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Transactional-topologies.html (original)
+++ incubator/storm/site/publish/documentation/Transactional-topologies.html Tue May 27 18:39:07 2014
@@ -65,67 +65,67 @@
   </ul>
 </div>
 <div id="aboutcontent">
-<p><strong>NOTE</strong>: Transactional topologies have been deprecated &#8211; use the <a href="Trident-tutorial.html">Trident</a> framework instead.</p>
+<p><strong>NOTE</strong>: Transactional topologies have been deprecated – use the <a href="Trident-tutorial.html">Trident</a> framework instead.</p>
 
 <hr />
 
-<p>Storm <a href="Guaranteeing-message-processing.html">guarantees data processing</a> by providing an at least once processing guarantee. The most common question asked about Storm is &#8220;Given that tuples can be replayed, how do you do things like counting on top of Storm? Won&#8217;t you overcount?&#8221;</p>
+<p>Storm <a href="Guaranteeing-message-processing.html">guarantees data processing</a> by providing an at least once processing guarantee. The most common question asked about Storm is “Given that tuples can be replayed, how do you do things like counting on top of Storm? Won’t you overcount?”</p>
 
 <p>Storm 0.7.0 introduces transactional topologies, which enable you to get exactly once messaging semantics for pretty much any computation. So you can do things like counting in a fully-accurate, scalable, and fault-tolerant way.</p>
 
-<p>Like <a href="Distributed-RPC.html">Distributed RPC</a>, transactional topologies aren&#8217;t so much a feature of Storm as they are a higher level abstraction built on top of Storm&#8217;s primitives of streams, spouts, bolts, and topologies.</p>
+<p>Like <a href="Distributed-RPC.html">Distributed RPC</a>, transactional topologies aren’t so much a feature of Storm as they are a higher level abstraction built on top of Storm’s primitives of streams, spouts, bolts, and topologies.</p>
 
 <p>This page explains the transactional topology abstraction, how to use the API, and provides details as to its implementation.</p>
 
 <h2 id="concepts">Concepts</h2>
 
-<p>Let&#8217;s build up to Storm&#8217;s abstraction for transactional topologies one step at a time. Let&#8217;s start by looking at the simplest possible approach, and then we&#8217;ll iterate on the design until we reach Storm&#8217;s design.</p>
+<p>Let’s build up to Storm’s abstraction for transactional topologies one step at a time. Let’s start by looking at the simplest possible approach, and then we’ll iterate on the design until we reach Storm’s design.</p>
 
 <h3 id="design-1">Design 1</h3>
 
-<p>The core idea behind transactional topologies is to provide a <em>strong ordering</em> on the processing of data. The simplest manifestation of this, and the first design we&#8217;ll look at, is processing the tuples one at a time and not moving on to the next tuple until the current tuple has been successfully processed by the topology.</p>
+<p>The core idea behind transactional topologies is to provide a <em>strong ordering</em> on the processing of data. The simplest manifestation of this, and the first design we’ll look at, is processing the tuples one at a time and not moving on to the next tuple until the current tuple has been successfully processed by the topology.</p>
 
 <p>Each tuple is associated with a transaction id. If the tuple fails and needs to be replayed, then it is emitted with the exact same transaction id. A transaction id is an integer that increments for every tuple, so the first tuple will have transaction id <code>1</code>, the second id <code>2</code>, and so on.</p>
 
-<p>The strong ordering of tuples gives you the capability to achieve exactly-once semantics even in the case of tuple replay. Let&#8217;s look at an example of how you would do this.</p>
+<p>The strong ordering of tuples gives you the capability to achieve exactly-once semantics even in the case of tuple replay. Let’s look at an example of how you would do this.</p>
 
 <p>Suppose you want to do a global count of the tuples in the stream. Instead of storing just the count in the database, you instead store the count and the latest transaction id together as one value in the database. When your code updates the count in the db, it should update the count <em>only if the transaction id in the database differs from the transaction id for the tuple currently being processed</em>. Consider the two cases:</p>
 
 <ol>
-  <li><em>The transaction id in the database is different than the current transaction id:</em> Because of the strong ordering of transactions, we know for sure that the current tuple isn&#8217;t represented in that count. So we can safely increment the count and update the transaction id.</li>
+  <li><em>The transaction id in the database is different than the current transaction id:</em> Because of the strong ordering of transactions, we know for sure that the current tuple isn’t represented in that count. So we can safely increment the count and update the transaction id.</li>
   <li><em>The transaction id is the same as the current transaction id:</em> Then we know that this tuple is already incorporated into the count and can skip the update. The tuple must have failed after updating the database but before reporting success back to Storm.</li>
 </ol>
 
 <p>This logic and the strong ordering of transactions ensures that the count in the database will be accurate even if tuples are replayed.  Credit for this trick of storing a transaction id in the database along with the value goes to the Kafka devs, particularly <a href="http://incubator.apache.org/kafka/07/design.html">this design document</a>.</p>
 
-<p>Furthermore, notice that the topology can safely update many sources of state in the same transaction and achieve exactly-once semantics. If there&#8217;s a failure, any updates that already succeeded will skip on the retry, and any updates that failed will properly retry. For example, if you were processing a stream of tweeted urls, you could update a database that stores a tweet count for each url as well as a database that stores a tweet count for each domain.</p>
+<p>Furthermore, notice that the topology can safely update many sources of state in the same transaction and achieve exactly-once semantics. If there’s a failure, any updates that already succeeded will skip on the retry, and any updates that failed will properly retry. For example, if you were processing a stream of tweeted urls, you could update a database that stores a tweet count for each url as well as a database that stores a tweet count for each domain.</p>
 
-<p>There is a significant problem though with this design of processing one tuple at time. Having to wait for each tuple to be <em>completely processed</em> before moving on to the next one is horribly inefficient. It entails a huge amount of database calls (at least one per tuple), and this design makes very little use of the parallelization capabilities of Storm. So it isn&#8217;t very scalable.</p>
+<p>There is a significant problem though with this design of processing one tuple at time. Having to wait for each tuple to be <em>completely processed</em> before moving on to the next one is horribly inefficient. It entails a huge amount of database calls (at least one per tuple), and this design makes very little use of the parallelization capabilities of Storm. So it isn’t very scalable.</p>
 
 <h3 id="design-2">Design 2</h3>
 
-<p>Instead of processing one tuple at a time, a better approach is to process a batch of tuples for each transaction. So if you&#8217;re doing a global count, you would increment the count by the number of tuples in the entire batch. If a batch fails, you replay the exact batch that failed. Instead of assigning a transaction id to each tuple, you assign a transaction id to each batch, and the processing of the batches is strongly ordered. Here&#8217;s a diagram of this design:</p>
+<p>Instead of processing one tuple at a time, a better approach is to process a batch of tuples for each transaction. So if you’re doing a global count, you would increment the count by the number of tuples in the entire batch. If a batch fails, you replay the exact batch that failed. Instead of assigning a transaction id to each tuple, you assign a transaction id to each batch, and the processing of the batches is strongly ordered. Here’s a diagram of this design:</p>
 
 <p><img src="images/transactional-batches.png" alt="Storm cluster" /></p>
 
-<p>So if you&#8217;re processing 1000 tuples per batch, your application will do 1000x less database operations than design 1. Additionally, it takes advantage of Storm&#8217;s parallelization capabilities as the computation for each batch can be parallelized.</p>
+<p>So if you’re processing 1000 tuples per batch, your application will do 1000x less database operations than design 1. Additionally, it takes advantage of Storm’s parallelization capabilities as the computation for each batch can be parallelized.</p>
 
-<p>While this design is significantly better than design 1, it&#8217;s still not as resource-efficient as possible. The workers in the topology spend a lot of time being idle waiting for the other portions of the computation to finish. For example, in a topology like this:</p>
+<p>While this design is significantly better than design 1, it’s still not as resource-efficient as possible. The workers in the topology spend a lot of time being idle waiting for the other portions of the computation to finish. For example, in a topology like this:</p>
 
 <p><img src="images/transactional-design-2.png" alt="Storm cluster" /></p>
 
 <p>After bolt 1 finishes its portion of the processing, it will be idle until the rest of the bolts finish and the next batch can be emitted from the spout.</p>
 
-<h3 id="design-3-storms-design">Design 3 (Storm&#8217;s design)</h3>
+<h3 id="design-3-storms-design">Design 3 (Storm’s design)</h3>
 
-<p>A key realization is that not all the work for processing batches of tuples needs to be strongly ordered. For example, when computing a global count, there&#8217;s two parts to the computation:</p>
+<p>A key realization is that not all the work for processing batches of tuples needs to be strongly ordered. For example, when computing a global count, there’s two parts to the computation:</p>
 
 <ol>
   <li>Computing the partial count for the batch</li>
   <li>Updating the global count in the database with the partial count</li>
 </ol>
 
-<p>The computation of #2 needs to be strongly ordered across the batches, but there&#8217;s no reason you shouldn&#8217;t be able to <em>pipeline</em> the computation of the batches by computing #1 for many batches in parallel. So while batch 1 is working on updating the database, batches 2 through 10 can compute their partial counts.</p>
+<p>The computation of #2 needs to be strongly ordered across the batches, but there’s no reason you shouldn’t be able to <em>pipeline</em> the computation of the batches by computing #1 for many batches in parallel. So while batch 1 is working on updating the database, batches 2 through 10 can compute their partial counts.</p>
 
 <p>Storm accomplishes this distinction by breaking the computation of a batch into two phases:</p>
 
@@ -134,7 +134,7 @@
   <li>The commit phase: The commit phases for batches are strongly ordered. So the commit for batch 2 is not done until the commit for batch 1 has been successful.</li>
 </ol>
 
-<p>The two phases together are called a &#8220;transaction&#8221;. Many batches can be in the processing phase at a given moment, but only one batch can be in the commit phase. If there&#8217;s any failure in the processing or commit phase for a batch, the entire transaction is replayed (both phases).</p>
+<p>The two phases together are called a “transaction”. Many batches can be in the processing phase at a given moment, but only one batch can be in the commit phase. If there’s any failure in the processing or commit phase for a batch, the entire transaction is replayed (both phases).</p>
 
 <h2 id="design-details">Design details</h2>
 
@@ -143,15 +143,15 @@
 <ol>
   <li><em>Manages state:</em> Storm stores in Zookeeper all the state necessary to do transactional topologies. This includes the current transaction id as well as the metadata defining the parameters for each batch.</li>
   <li><em>Coordinates the transactions:</em> Storm will manage everything necessary to determine which transactions should be processing or committing at any point.</li>
-  <li><em>Fault detection:</em> Storm leverages the acking framework to efficiently determine when a batch has successfully processed, successfully committed, or failed. Storm will then replay batches appropriately. You don&#8217;t have to do any acking or anchoring &#8211; Storm manages all of this for you.</li>
+  <li><em>Fault detection:</em> Storm leverages the acking framework to efficiently determine when a batch has successfully processed, successfully committed, or failed. Storm will then replay batches appropriately. You don’t have to do any acking or anchoring – Storm manages all of this for you.</li>
   <li><em>First class batch processing API</em>: Storm layers an API on top of regular bolts to allow for batch processing of tuples. Storm manages all the coordination for determining when a task has received all the tuples for that particular transaction. Storm will also take care of cleaning up any accumulated state for each transaction (like the partial counts).</li>
 </ol>
 
-<p>Finally, another thing to note is that transactional topologies require a source queue that can replay an exact batch of messages. Technologies like <a href="https://github.com/robey/kestrel">Kestrel</a> can&#8217;t do this. <a href="http://incubator.apache.org/kafka/index.html">Apache Kafka</a> is a perfect fit for this kind of spout, and <a href="https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka">storm-kafka</a> in <a href="https://github.com/nathanmarz/storm-contrib">storm-contrib</a> contains a transactional spout implementation for Kafka.</p>
+<p>Finally, another thing to note is that transactional topologies require a source queue that can replay an exact batch of messages. Technologies like <a href="https://github.com/robey/kestrel">Kestrel</a> can’t do this. <a href="http://incubator.apache.org/kafka/index.html">Apache Kafka</a> is a perfect fit for this kind of spout, and <a href="https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka">storm-kafka</a> in <a href="https://github.com/nathanmarz/storm-contrib">storm-contrib</a> contains a transactional spout implementation for Kafka.</p>
 
 <h2 id="the-basics-through-example">The basics through example</h2>
 
-<p>You build transactional topologies by using <a href="/apidocs/backtype/storm/transactional/TransactionalTopologyBuilder.html">TransactionalTopologyBuilder</a>. Here&#8217;s the transactional topology definition for a topology that computes the global count of tuples from the input stream. This code comes from <a href="https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/TransactionalGlobalCount.java">TransactionalGlobalCount</a> in storm-starter.</p>
+<p>You build transactional topologies by using <a href="/apidocs/backtype/storm/transactional/TransactionalTopologyBuilder.html">TransactionalTopologyBuilder</a>. Here’s the transactional topology definition for a topology that computes the global count of tuples from the input stream. This code comes from <a href="https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/TransactionalGlobalCount.java">TransactionalGlobalCount</a> in storm-starter.</p>
 
 <p><code>java
 MemoryTransactionalSpout spout = new MemoryTransactionalSpout(DATA, new Fields("word"), PARTITION_TAKE_PER_BATCH);
@@ -168,7 +168,7 @@ builder.setBolt("sum", new UpdateGlobalC
 
 <p>Now on to the bolts. This topology parallelizes the computation of the global count. The first bolt, <code>BatchCount</code>, randomly partitions the input stream using a shuffle grouping and emits the count for each partition. The second bolt, <code>UpdateGlobalCount</code>, does a global grouping and sums together the partial counts to get the count for the batch. It then updates the global count in the database if necessary.</p>
 
-<p>Here&#8217;s the definition of <code>BatchCount</code>:</p>
+<p>Here’s the definition of <code>BatchCount</code>:</p>
 
 <p>```java
 public static class BatchCount extends BaseBatchBolt {
@@ -199,7 +199,7 @@ public void declareOutputFields(OutputFi
 } } ```
 </code></pre>
 
-<p>A new instance of this object is created for every batch that&#8217;s being processed. The actual bolt this runs within is called <a href="https://github.com/apache/incubator-storm/blob/0.7.0/src/jvm/backtype/storm/coordination/BatchBoltExecutor.java">BatchBoltExecutor</a> and manages the creation and cleanup for these objects.</p>
+<p>A new instance of this object is created for every batch that’s being processed. The actual bolt this runs within is called <a href="https://github.com/apache/incubator-storm/blob/0.7.0/src/jvm/backtype/storm/coordination/BatchBoltExecutor.java">BatchBoltExecutor</a> and manages the creation and cleanup for these objects.</p>
 
 <p>The <code>prepare</code> method parameterizes this batch bolt with the Storm config, the topology context, an output collector, and the id for this batch of tuples. In the case of transactional topologies, the id will be a <a href="/apidocs/backtype/storm/transactional/TransactionAttempt.html">TransactionAttempt</a> object. The batch bolt abstraction can be used in Distributed RPC as well which uses a different type of id for the batches. <code>BatchBolt</code> can actually be parameterized with the type of the id, so if you only intend to use the batch bolt for transactional topologies, you can extend <code>BaseTransactionalBolt</code> which has this definition:</p>
 
@@ -210,15 +210,15 @@ public abstract class BaseTransactionalB
 
 <p>All tuples emitted within a transactional topology must have the <code>TransactionAttempt</code> as the first field of the tuple. This lets Storm identify which tuples belong to which batches. So when you emit tuples you need to make sure to meet this requirement.</p>
 
-<p>The <code>TransactionAttempt</code> contains two values: the &#8220;transaction id&#8221; and the &#8220;attempt id&#8221;. The &#8220;transaction id&#8221; is the unique id chosen for this batch and is the same no matter how many times the batch is replayed. The &#8220;attempt id&#8221; is a unique id for this particular batch of tuples and lets Storm distinguish tuples from different emissions of the same batch. Without the attempt id, Storm could confuse a replay of a batch with tuples from a prior time that batch was emitted. This would be disastrous.</p>
+<p>The <code>TransactionAttempt</code> contains two values: the “transaction id” and the “attempt id”. The “transaction id” is the unique id chosen for this batch and is the same no matter how many times the batch is replayed. The “attempt id” is a unique id for this particular batch of tuples and lets Storm distinguish tuples from different emissions of the same batch. Without the attempt id, Storm could confuse a replay of a batch with tuples from a prior time that batch was emitted. This would be disastrous.</p>
 
-<p>The transaction id increases by 1 for every batch emitted. So the first batch has id &#8220;1&#8221;, the second has id &#8220;2&#8221;, and so on.</p>
+<p>The transaction id increases by 1 for every batch emitted. So the first batch has id “1”, the second has id “2”, and so on.</p>
 
 <p>The <code>execute</code> method is called for every tuple in the batch. You should accumulate state for the batch in a local instance variable every time this method is called. The <code>BatchCount</code> bolt increments a local counter variable for every tuple.</p>
 
 <p>Finally, <code>finishBatch</code> is called when the task has received all tuples intended for it for this particular batch. <code>BatchCount</code> emits the partial count to the output stream when this method is called.</p>
 
-<p>Here&#8217;s the definition of <code>UpdateGlobalCount</code>:</p>
+<p>Here’s the definition of <code>UpdateGlobalCount</code>:</p>
 
 <p>```java
 public static class UpdateGlobalCount extends BaseTransactionalBolt implements ICommitter {
@@ -280,30 +280,30 @@ public void declareOutputFields(OutputFi
 <p>There are three kinds of bolts possible in a transactional topology:</p>
 
 <ol>
-  <li><a href="/apidocs/backtype/storm/topology/base/BaseBasicBolt.html">BasicBolt</a>: This bolt doesn&#8217;t deal with batches of tuples and just emits tuples based on a single tuple of input.</li>
+  <li><a href="/apidocs/backtype/storm/topology/base/BaseBasicBolt.html">BasicBolt</a>: This bolt doesn’t deal with batches of tuples and just emits tuples based on a single tuple of input.</li>
   <li><a href="/apidocs/backtype/storm/topology/base/BaseBatchBolt.html">BatchBolt</a>: This bolt processes batches of tuples. <code>execute</code> is called for each tuple, and <code>finishBatch</code> is called when the batch is complete.</li>
-  <li>BatchBolt&#8217;s that are marked as committers: The only difference between this bolt and a regular batch bolt is when <code>finishBatch</code> is called. A committer bolt has <code>finishedBatch</code> called during the commit phase. The commit phase is guaranteed to occur only after all prior batches have successfully committed, and it will be retried until all bolts in the topology succeed the commit for the batch. There are two ways to make a <code>BatchBolt</code> a committer, by having the <code>BatchBolt</code> implement the <a href="/apidocs/backtype/storm/transactional/ICommitter.html">ICommitter</a> marker interface, or by using the <code>setCommiterBolt</code> method in <code>TransactionalTopologyBuilder</code>.</li>
+  <li>BatchBolt’s that are marked as committers: The only difference between this bolt and a regular batch bolt is when <code>finishBatch</code> is called. A committer bolt has <code>finishedBatch</code> called during the commit phase. The commit phase is guaranteed to occur only after all prior batches have successfully committed, and it will be retried until all bolts in the topology succeed the commit for the batch. There are two ways to make a <code>BatchBolt</code> a committer, by having the <code>BatchBolt</code> implement the <a href="/apidocs/backtype/storm/transactional/ICommitter.html">ICommitter</a> marker interface, or by using the <code>setCommiterBolt</code> method in <code>TransactionalTopologyBuilder</code>.</li>
 </ol>
 
 <h4 id="processing-phase-vs-commit-phase-in-bolts">Processing phase vs. commit phase in bolts</h4>
 
-<p>To nail down the difference between the processing phase and commit phase of a transaction, let&#8217;s look at an example topology:</p>
+<p>To nail down the difference between the processing phase and commit phase of a transaction, let’s look at an example topology:</p>
 
 <p><img src="images/transactional-commit-flow.png" alt="Storm cluster" /></p>
 
 <p>In this topology, only the bolts with a red outline are committers.</p>
 
-<p>During the processing phase, bolt A will process the complete batch from the spout, call <code>finishBatch</code> and send its tuples to bolts B and C. Bolt B is a committer so it will process all the tuples but finishBatch won&#8217;t be called. Bolt C also will not have <code>finishBatch</code> called because it doesn&#8217;t know if it has received all the tuples from Bolt B yet (because Bolt B is waiting for the transaction to commit). Finally, Bolt D will receive any tuples Bolt C emitted during invocations of its <code>execute</code> method.</p>
+<p>During the processing phase, bolt A will process the complete batch from the spout, call <code>finishBatch</code> and send its tuples to bolts B and C. Bolt B is a committer so it will process all the tuples but finishBatch won’t be called. Bolt C also will not have <code>finishBatch</code> called because it doesn’t know if it has received all the tuples from Bolt B yet (because Bolt B is waiting for the transaction to commit). Finally, Bolt D will receive any tuples Bolt C emitted during invocations of its <code>execute</code> method.</p>
 
 <p>When the batch commits, <code>finishBatch</code> is called on Bolt B. Once it finishes, Bolt C can now detect that it has received all the tuples and will call <code>finishBatch</code>. Finally, Bolt D will receive its complete batch and call <code>finishBatch</code>.</p>
 
-<p>Notice that even though Bolt D is a committer, it doesn&#8217;t have to wait for a second commit message when it receives the whole batch. Since it receives the whole batch during the commit phase, it goes ahead and completes the transaction.</p>
+<p>Notice that even though Bolt D is a committer, it doesn’t have to wait for a second commit message when it receives the whole batch. Since it receives the whole batch during the commit phase, it goes ahead and completes the transaction.</p>
 
 <p>Committer bolts act just like batch bolts during the commit phase. The only difference between committer bolts and batch bolts is that committer bolts will not call <code>finishBatch</code> during the processing phase of a transaction.</p>
 
 <h4 id="acking">Acking</h4>
 
-<p>Notice that you don&#8217;t have to do any acking or anchoring when working with transactional topologies. Storm manages all of that underneath the hood. The acking strategy is heavily optimized.</p>
+<p>Notice that you don’t have to do any acking or anchoring when working with transactional topologies. Storm manages all of that underneath the hood. The acking strategy is heavily optimized.</p>
 
 <h4 id="failing-a-transaction">Failing a transaction</h4>
 
@@ -317,7 +317,7 @@ public void declareOutputFields(OutputFi
 
 <p><img src="images/transactional-spout-structure.png" alt="Storm cluster" /></p>
 
-<p>The coordinator on the left is a regular Storm spout that emits a tuple whenever a batch should be emitted for a transaction. The emitters execute as a regular Storm bolt and are responsible for emitting the actual tuples for the batch. The emitters subscribe to the &#8220;batch emit&#8221; stream of the coordinator using an all grouping.</p>
+<p>The coordinator on the left is a regular Storm spout that emits a tuple whenever a batch should be emitted for a transaction. The emitters execute as a regular Storm bolt and are responsible for emitting the actual tuples for the batch. The emitters subscribe to the “batch emit” stream of the coordinator using an all grouping.</p>
 
 <p>The need to be idempotent with respect to the tuples it emits requires a <code>TransactionalSpout</code> to store a small amount of state. The state is stored in Zookeeper.</p>
 
@@ -329,14 +329,14 @@ public void declareOutputFields(OutputFi
 
 <h3 id="configuration">Configuration</h3>
 
-<p>There&#8217;s two important bits of configuration for transactional topologies:</p>
+<p>There’s two important bits of configuration for transactional topologies:</p>
 
 <ol>
-  <li><em>Zookeeper:</em> By default, transactional topologies will store state in the same Zookeeper instance as used to manage the Storm cluster. You can override this with the &#8220;transactional.zookeeper.servers&#8221; and &#8220;transactional.zookeeper.port&#8221; configs.</li>
-  <li><em>Number of active batches permissible at once:</em> You must set a limit to the number of batches that can be processed at once. You configure this using the &#8220;topology.max.spout.pending&#8221; config. If you don&#8217;t set this config, it will default to 1.</li>
+  <li><em>Zookeeper:</em> By default, transactional topologies will store state in the same Zookeeper instance as used to manage the Storm cluster. You can override this with the “transactional.zookeeper.servers” and “transactional.zookeeper.port” configs.</li>
+  <li><em>Number of active batches permissible at once:</em> You must set a limit to the number of batches that can be processed at once. You configure this using the “topology.max.spout.pending” config. If you don’t set this config, it will default to 1.</li>
 </ol>
 
-<h2 id="what-if-you-cant-emit-the-same-batch-of-tuples-for-a-given-transaction-id">What if you can&#8217;t emit the same batch of tuples for a given transaction id?</h2>
+<h2 id="what-if-you-cant-emit-the-same-batch-of-tuples-for-a-given-transaction-id">What if you can’t emit the same batch of tuples for a given transaction id?</h2>
 
 <p>So far the discussion around transactional topologies has assumed that you can always emit the exact same batch of tuples for the same transaction id. So what do you do if this is not possible?</p>
 
@@ -344,7 +344,7 @@ public void declareOutputFields(OutputFi
 
 <p>It turns out that you can still achieve exactly-once messaging semantics in your processing with a non-idempotent transactional spout, although this requires a bit more work on your part in developing the topology.</p>
 
-<p>If a batch can change for a given transaction id, then the logic we&#8217;ve been using so far of &#8220;skip the update if the transaction id in the database is the same as the id for the current transaction&#8221; is no longer valid. This is because the current batch is different than the batch for the last time the transaction was committed, so the result will not necessarily be the same. You can fix this problem by storing a little bit more state in the database. Let&#8217;s again use the example of storing a global count in the database and suppose the partial count for the batch is stored in the <code>partialCount</code> variable.</p>
+<p>If a batch can change for a given transaction id, then the logic we’ve been using so far of “skip the update if the transaction id in the database is the same as the id for the current transaction” is no longer valid. This is because the current batch is different than the batch for the last time the transaction was committed, so the result will not necessarily be the same. You can fix this problem by storing a little bit more state in the database. Let’s again use the example of storing a global count in the database and suppose the partial count for the batch is stored in the <code>partialCount</code> variable.</p>
 
 <p>Instead of storing a value in the database that looks like this:</p>
 
@@ -374,7 +374,7 @@ class Value {
 
 <p>This logic works because once you commit a particular transaction id for the first time, all prior transaction ids will never be committed again.</p>
 
-<p>There&#8217;s a few more subtle aspects of transactional topologies that make opaque transactional spouts possible.</p>
+<p>There’s a few more subtle aspects of transactional topologies that make opaque transactional spouts possible.</p>
 
 <p>When a transaction fails, all subsequent transactions in the processing phase are considered failed as well. Each of those transactions will be re-emitted and reprocessed. Without this behavior, the following situation could happen:</p>
 
@@ -403,26 +403,26 @@ class Value {
 
 <p>By failing all subsequent transactions on failure, no tuples are skipped. This also shows that a requirement of transactional spouts is that they always emit where the last transaction left off.</p>
 
-<p>A non-idempotent transactional spout is more concisely referred to as an &#8220;OpaqueTransactionalSpout&#8221; (opaque is the opposite of idempotent). <a href="/apidocs/backtype/storm/transactional/partitioned/IOpaquePartitionedTransactionalSpout.html">IOpaquePartitionedTransactionalSpout</a> is an interface for implementing opaque partitioned transactional spouts, of which <a href="https://github.com/nathanmarz/storm-contrib/blob/kafka0.7/storm-kafka/src/jvm/storm/kafka/OpaqueTransactionalKafkaSpout.java">OpaqueTransactionalKafkaSpout</a> is an example. <code>OpaqueTransactionalKafkaSpout</code> can withstand losing individual Kafka nodes without sacrificing accuracy as long as you use the update strategy as explained in this section.</p>
+<p>A non-idempotent transactional spout is more concisely referred to as an “OpaqueTransactionalSpout” (opaque is the opposite of idempotent). <a href="/apidocs/backtype/storm/transactional/partitioned/IOpaquePartitionedTransactionalSpout.html">IOpaquePartitionedTransactionalSpout</a> is an interface for implementing opaque partitioned transactional spouts, of which <a href="https://github.com/nathanmarz/storm-contrib/blob/kafka0.7/storm-kafka/src/jvm/storm/kafka/OpaqueTransactionalKafkaSpout.java">OpaqueTransactionalKafkaSpout</a> is an example. <code>OpaqueTransactionalKafkaSpout</code> can withstand losing individual Kafka nodes without sacrificing accuracy as long as you use the update strategy as explained in this section.</p>
 
 <h2 id="implementation">Implementation</h2>
 
-<p>The implementation for transactional topologies is very elegant. Managing the commit protocol, detecting failures, and pipelining batches seem complex, but everything turns out to be a straightforward mapping to Storm&#8217;s primitives.</p>
+<p>The implementation for transactional topologies is very elegant. Managing the commit protocol, detecting failures, and pipelining batches seem complex, but everything turns out to be a straightforward mapping to Storm’s primitives.</p>
 
 <p>How the data flow works:</p>
 
-<p>Here&#8217;s how transactional spout works:</p>
+<p>Here’s how transactional spout works:</p>
 
 <ol>
   <li>Transactional spout is a subtopology consisting of a coordinator spout and an emitter bolt</li>
   <li>The coordinator is a regular spout with a parallelism of 1</li>
-  <li>The emitter is a bolt with a parallelism of P, connected to the coordinator&#8217;s &#8220;batch&#8221; stream using an all grouping</li>
-  <li>When the coordinator determines it&#8217;s time to enter the processing phase for a transaction, it emits a tuple containing the TransactionAttempt and the metadata for that transaction to the &#8220;batch&#8221; stream</li>
-  <li>Because of the all grouping, every single emitter task receives the notification that it&#8217;s time to emit its portion of the tuples for that transaction attempt</li>
-  <li>Storm automatically manages the anchoring/acking necessary throughout the whole topology to determine when a transaction has completed the processing phase. The key here is that *the root tuple was created by the coordinator, so the coordinator will receive an &#8220;ack&#8221; if the processing phase succeeds, and a &#8220;fail&#8221; if it doesn&#8217;t succeed for any reason (failure or timeout).</li>
-  <li>If the processing phase succeeds, and all prior transactions have successfully committed, the coordinator emits a tuple containing the TransactionAttempt to the &#8220;commit&#8221; stream.</li>
+  <li>The emitter is a bolt with a parallelism of P, connected to the coordinator’s “batch” stream using an all grouping</li>
+  <li>When the coordinator determines it’s time to enter the processing phase for a transaction, it emits a tuple containing the TransactionAttempt and the metadata for that transaction to the “batch” stream</li>
+  <li>Because of the all grouping, every single emitter task receives the notification that it’s time to emit its portion of the tuples for that transaction attempt</li>
+  <li>Storm automatically manages the anchoring/acking necessary throughout the whole topology to determine when a transaction has completed the processing phase. The key here is that *the root tuple was created by the coordinator, so the coordinator will receive an “ack” if the processing phase succeeds, and a “fail” if it doesn’t succeed for any reason (failure or timeout).</li>
+  <li>If the processing phase succeeds, and all prior transactions have successfully committed, the coordinator emits a tuple containing the TransactionAttempt to the “commit” stream.</li>
   <li>All committing bolts subscribe to the commit stream using an all grouping, so that they will all receive a notification when the commit happens.</li>
-  <li>Like the processing phase, the coordinator uses the acking framework to determine whether the commit phase succeeded or not. If it receives an &#8220;ack&#8221;, it marks that transaction as complete in zookeeper.</li>
+  <li>Like the processing phase, the coordinator uses the acking framework to determine whether the commit phase succeeded or not. If it receives an “ack”, it marks that transaction as complete in zookeeper.</li>
 </ol>
 
 <p>More notes:</p>
@@ -442,8 +442,8 @@ class Value {
   <li>CoordinatedBolt is used to detect when a bolt has received all the tuples for a particular batch.
     <ul>
       <li>this is the same abstraction that is used in DRPC</li>
-      <li>for commiting bolts, it waits to receive a tuple from the coordinator&#8217;s commit stream before calling finishbatch</li>
-      <li>so it can&#8217;t call finishbatch until it&#8217;s received all tuples from all subscribed components AND its received the commit stream tuple (for committers). this ensures that it can&#8217;t prematurely call finishBatch</li>
+      <li>for commiting bolts, it waits to receive a tuple from the coordinator’s commit stream before calling finishbatch</li>
+      <li>so it can’t call finishbatch until it’s received all tuples from all subscribed components AND its received the commit stream tuple (for committers). this ensures that it can’t prematurely call finishBatch</li>
     </ul>
   </li>
 </ul>

Modified: incubator/storm/site/publish/documentation/Trident-API-Overview.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Trident-API-Overview.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Trident-API-Overview.html (original)
+++ incubator/storm/site/publish/documentation/Trident-API-Overview.html Tue May 27 18:39:07 2014
@@ -67,13 +67,13 @@
 <div id="aboutcontent">
 <h1 id="trident-api-overview">Trident API overview</h1>
 
-<p>The core data model in Trident is the &#8220;Stream&#8221;, processed as a series of batches. A stream is partitioned among the nodes in the cluster, and operations applied to a stream are applied in parallel across each partition.</p>
+<p>The core data model in Trident is the “Stream”, processed as a series of batches. A stream is partitioned among the nodes in the cluster, and operations applied to a stream are applied in parallel across each partition.</p>
 
 <p>There are five kinds of operations in Trident:</p>
 
 <ol>
   <li>Operations that apply locally to each partition and cause no network transfer</li>
-  <li>Repartitioning operations that repartition a stream but otherwise don&#8217;t change the contents (involves network transfer)</li>
+  <li>Repartitioning operations that repartition a stream but otherwise don’t change the contents (involves network transfer)</li>
   <li>Aggregation operations that do network transfer as part of the operation</li>
   <li>Operations on grouped streams</li>
   <li>Merges and joins</li>
@@ -97,7 +97,7 @@ public class MyFunction extends BaseFunc
 }
 </code></p>
 
-<p>Now suppose you have a stream in the variable &#8220;mystream&#8221; with the fields [&#8220;a&#8221;, &#8220;b&#8221;, &#8220;c&#8221;] with the following tuples:</p>
+<p>Now suppose you have a stream in the variable “mystream” with the fields [“a”, “b”, “c”] with the following tuples:</p>
 
 <p><code>
 [1, 2, 3]
@@ -111,7 +111,7 @@ public class MyFunction extends BaseFunc
 mystream.each(new Fields("b"), new MyFunction(), new Fields("d")))
 </code></p>
 
-<p>The resulting tuples would have fields [&#8220;a&#8221;, &#8220;b&#8221;, &#8220;c&#8221;, &#8220;d&#8221;] and look like this:</p>
+<p>The resulting tuples would have fields [“a”, “b”, “c”, “d”] and look like this:</p>
 
 <p><code>
 [1, 2, 3, 0]
@@ -131,7 +131,7 @@ public class MyFilter extends BaseFuncti
 }
 </code></p>
 
-<p>Now suppose you had these tuples with fields [&#8220;a&#8221;, &#8220;b&#8221;, &#8220;c&#8221;]:</p>
+<p>Now suppose you had these tuples with fields [“a”, “b”, “c”]:</p>
 
 <p><code>
 [1, 2, 3]
@@ -159,24 +159,24 @@ mystream.each(new Fields("b", "a"), new 
 mystream.partitionAggregate(new Fields("b"), new Sum(), new Fields("sum"))
 </code></p>
 
-<p>Suppose the input stream contained fields [&#8220;a&#8221;, &#8220;b&#8221;] and the following partitions of tuples:</p>
+<p>Suppose the input stream contained fields [“a”, “b”] and the following partitions of tuples:</p>
 
 <p>```
 Partition 0:
-[&#8220;a&#8221;, 1]
-[&#8220;b&#8221;, 2]</p>
+[“a”, 1]
+[“b”, 2]</p>
 
 <p>Partition 1:
-[&#8220;a&#8221;, 3]
-[&#8220;c&#8221;, 8]</p>
+[“a”, 3]
+[“c”, 8]</p>
 
 <p>Partition 2:
-[&#8220;e&#8221;, 1]
-[&#8220;d&#8221;, 9]
-[&#8220;d&#8221;, 10]
+[“e”, 1]
+[“d”, 9]
+[“d”, 10]
 ```</p>
 
-<p>Then the output stream of that code would contain these tuples with one field called &#8220;sum&#8221;:</p>
+<p>Then the output stream of that code would contain these tuples with one field called “sum”:</p>
 
 <p>```
 Partition 0:
@@ -191,7 +191,7 @@ Partition 0:
 
 <p>There are three different interfaces for defining aggregators: CombinerAggregator, ReducerAggregator, and Aggregator.</p>
 
-<p>Here&#8217;s the interface for CombinerAggregator:</p>
+<p>Here’s the interface for CombinerAggregator:</p>
 
 <p><code>java
 public interface CombinerAggregator&lt;T&gt; extends Serializable {
@@ -201,7 +201,7 @@ public interface CombinerAggregator&lt;T
 }
 </code></p>
 
-<p>A CombinerAggregator returns a single tuple with a single field as output. CombinerAggregators run the init function on each input tuple and use the combine function to combine values until there&#8217;s only one value left. If there&#8217;s no tuples in the partition, the CombinerAggregator emits the output of the zero function. For example, here&#8217;s the implementation of Count:</p>
+<p>A CombinerAggregator returns a single tuple with a single field as output. CombinerAggregators run the init function on each input tuple and use the combine function to combine values until there’s only one value left. If there’s no tuples in the partition, the CombinerAggregator emits the output of the zero function. For example, here’s the implementation of Count:</p>
 
 <p>```java
 public class Count implements CombinerAggregator<long> {
@@ -229,7 +229,7 @@ public interface ReducerAggregator&lt;T&
 }
 </code></p>
 
-<p>A ReducerAggregator produces an initial value with init, and then it iterates on that value for each input tuple to produce a single tuple with a single value as output. For example, here&#8217;s how you would define Count as a ReducerAggregator:</p>
+<p>A ReducerAggregator produces an initial value with init, and then it iterates on that value for each input tuple to produce a single tuple with a single value as output. For example, here’s how you would define Count as a ReducerAggregator:</p>
 
 <p>```java
 public class Count implements ReducerAggregator<long> {
@@ -242,7 +242,7 @@ public class Count implements ReducerAgg
 } } ```
 </code></pre>
 
-<p>ReducerAggregator can also be used with persistentAggregate, as you&#8217;ll see later.</p>
+<p>ReducerAggregator can also be used with persistentAggregate, as you’ll see later.</p>
 
 <p>The most general interface for performing aggregations is Aggregator, which looks like this:</p>
 
@@ -262,7 +262,7 @@ public interface Aggregator&lt;T&gt; ext
   <li>The complete method is called when all tuples for the batch partition have been processed by aggregate. </li>
 </ol>
 
-<p>Here&#8217;s how you would implement Count as an Aggregator:</p>
+<p>Here’s how you would implement Count as an Aggregator:</p>
 
 <p>```java
 public class CountAgg extends BaseAggregator<countstate> {
@@ -292,7 +292,7 @@ mystream.chainedAgg()
         .chainEnd()
 </code></p>
 
-<p>This code will run the Count and Sum aggregators on each partition. The output will contain a single tuple with the fields [&#8220;count&#8221;, &#8220;sum&#8221;].</p>
+<p>This code will run the Count and Sum aggregators on each partition. The output will contain a single tuple with the fields [“count”, “sum”].</p>
 
 <h3 id="statequery-and-partitionpersist">stateQuery and partitionPersist</h3>
 
@@ -300,13 +300,13 @@ mystream.chainedAgg()
 
 <h3 id="projection">projection</h3>
 
-<p>The projection method on Stream keeps only the fields specified in the operation. If you had a Stream with fields [&#8220;a&#8221;, &#8220;b&#8221;, &#8220;c&#8221;, &#8220;d&#8221;] and you ran this code:</p>
+<p>The projection method on Stream keeps only the fields specified in the operation. If you had a Stream with fields [“a”, “b”, “c”, “d”] and you ran this code:</p>
 
 <p><code>java
 mystream.project(new Fields("b", "d"))
 </code></p>
 
-<p>The output stream would contain only the fields [&#8220;b&#8221;, &#8220;d&#8221;].</p>
+<p>The output stream would contain only the fields [“b”, “d”].</p>
 
 <h2 id="repartitioning-operations">Repartitioning operations</h2>
 
@@ -325,9 +325,9 @@ mystream.project(new Fields("b", "d"))
 
 <p>Trident has aggregate and persistentAggregate methods for doing aggregations on Streams. aggregate is run on each batch of the stream in isolation, while persistentAggregate will aggregation on all tuples across all batches in the stream and store the result in a source of state.</p>
 
-<p>Running aggregate on a Stream does a global aggregation. When you use a ReducerAggregator or an Aggregator, the stream is first repartitioned into a single partition, and then the aggregation function is run on that partition. When you use a CombinerAggregator, on the other hand, first Trident will compute partial aggregations of each partition, then repartition to a single partition, and then finish the aggregation after the network transfer. CombinerAggregator&#8217;s are far more efficient and should be used when possible.</p>
+<p>Running aggregate on a Stream does a global aggregation. When you use a ReducerAggregator or an Aggregator, the stream is first repartitioned into a single partition, and then the aggregation function is run on that partition. When you use a CombinerAggregator, on the other hand, first Trident will compute partial aggregations of each partition, then repartition to a single partition, and then finish the aggregation after the network transfer. CombinerAggregator’s are far more efficient and should be used when possible.</p>
 
-<p>Here&#8217;s an example of using aggregate to get a global count for a batch:</p>
+<p>Here’s an example of using aggregate to get a global count for a batch:</p>
 
 <p><code>java
 mystream.aggregate(new Count(), new Fields("count"))
@@ -339,7 +339,7 @@ mystream.aggregate(new Count(), new Fiel
 
 <h2 id="operations-on-grouped-streams">Operations on grouped streams</h2>
 
-<p>The groupBy operation repartitions the stream by doing a partitionBy on the specified fields, and then within each partition groups tuples together whose group fields are equal. For example, here&#8217;s an illustration of a groupBy operation:</p>
+<p>The groupBy operation repartitions the stream by doing a partitionBy on the specified fields, and then within each partition groups tuples together whose group fields are equal. For example, here’s an illustration of a groupBy operation:</p>
 
 <p><img src="images/grouping.png" alt="Grouping" /></p>
 
@@ -357,26 +357,26 @@ topology.merge(stream1, stream2, stream3
 
 <p>Trident will name the output fields of the new, merged stream as the output fields of the first stream.</p>
 
-<p>Another way to combine streams is with a join. Now, a standard join, like the kind from SQL, require finite input. So they don&#8217;t make sense with infinite streams. Joins in Trident only apply within each small batch that comes off of the spout. </p>
+<p>Another way to combine streams is with a join. Now, a standard join, like the kind from SQL, require finite input. So they don’t make sense with infinite streams. Joins in Trident only apply within each small batch that comes off of the spout. </p>
 
-<p>Here&#8217;s an example join between a stream containing fields [&#8220;key&#8221;, &#8220;val1&#8221;, &#8220;val2&#8221;] and another stream containing [&#8220;x&#8221;, &#8220;val1&#8221;]:</p>
+<p>Here’s an example join between a stream containing fields [“key”, “val1”, “val2”] and another stream containing [“x”, “val1”]:</p>
 
 <p><code>java
 topology.join(stream1, new Fields("key"), stream2, new Fields("x"), new Fields("key", "a", "b", "c"));
 </code></p>
 
-<p>This joins stream1 and stream2 together using &#8220;key&#8221; and &#8220;x&#8221; as the join fields for each respective stream. Then, Trident requires that all the output fields of the new stream be named, since the input streams could have overlapping field names. The tuples emitted from the join will contain:</p>
+<p>This joins stream1 and stream2 together using “key” and “x” as the join fields for each respective stream. Then, Trident requires that all the output fields of the new stream be named, since the input streams could have overlapping field names. The tuples emitted from the join will contain:</p>
 
 <ol>
-  <li>First, the list of join fields. In this case, &#8220;key&#8221; corresponds to &#8220;key&#8221; from stream1 and &#8220;x&#8221; from stream2.</li>
-  <li>Next, a list of all non-join fields from all streams, in order of how the streams were passed to the join method. In this case, &#8220;a&#8221; and &#8220;b&#8221; correspond to &#8220;val1&#8221; and &#8220;val2&#8221; from stream1, and &#8220;c&#8221; corresponds to &#8220;val1&#8221; from stream2.</li>
+  <li>First, the list of join fields. In this case, “key” corresponds to “key” from stream1 and “x” from stream2.</li>
+  <li>Next, a list of all non-join fields from all streams, in order of how the streams were passed to the join method. In this case, “a” and “b” correspond to “val1” and “val2” from stream1, and “c” corresponds to “val1” from stream2.</li>
 </ol>
 
 <p>When a join happens between streams originating from different spouts, those spouts will be synchronized with how they emit batches. That is, a batch of processing will include tuples from each spout.</p>
 
-<p>You might be wondering – how do you do something like a &#8220;windowed join&#8221;, where tuples from one side of the join are joined against the last hour of tuples from the other side of the join.</p>
+<p>You might be wondering – how do you do something like a “windowed join”, where tuples from one side of the join are joined against the last hour of tuples from the other side of the join.</p>
 
-<p>To do this, you would make use of partitionPersist and stateQuery. The last hour of tuples from one side of the join would be stored and rotated in a source of state, keyed by the join field. Then the stateQuery would do lookups by the join field to perform the &#8220;join&#8221;.</p>
+<p>To do this, you would make use of partitionPersist and stateQuery. The last hour of tuples from one side of the join would be stored and rotated in a source of state, keyed by the join field. Then the stateQuery would do lookups by the join field to perform the “join”.</p>
 
 </div>
 </div>

Modified: incubator/storm/site/publish/documentation/Trident-spouts.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Trident-spouts.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Trident-spouts.html (original)
+++ incubator/storm/site/publish/documentation/Trident-spouts.html Tue May 27 18:39:07 2014
@@ -90,22 +90,22 @@ topology.newStream("myspoutid", new MyRi
 
 <h2 id="pipelining">Pipelining</h2>
 
-<p>By default, Trident processes a single batch at a time, waiting for the batch to succeed or fail before trying another batch. You can get significantly higher throughput – and lower latency of processing of each batch – by pipelining the batches. You configure the maximum amount of batches to be processed simultaneously with the &#8220;topology.max.spout.pending&#8221; property. </p>
+<p>By default, Trident processes a single batch at a time, waiting for the batch to succeed or fail before trying another batch. You can get significantly higher throughput – and lower latency of processing of each batch – by pipelining the batches. You configure the maximum amount of batches to be processed simultaneously with the “topology.max.spout.pending” property. </p>
 
-<p>Even while processing multiple batches simultaneously, Trident will order any state updates taking place in the topology among batches. For example, suppose you&#8217;re doing a global count aggregation into a database. The idea is that while you&#8217;re updating the count in the database for batch 1, you can still be computing the partial counts for batches 2 through 10. Trident won&#8217;t move on to the state updates for batch 2 until the state updates for batch 1 have succeeded. This is essential for achieving exactly-once processing semantics, as outline in <a href="Trident-state.html">Trident state doc</a>.</p>
+<p>Even while processing multiple batches simultaneously, Trident will order any state updates taking place in the topology among batches. For example, suppose you’re doing a global count aggregation into a database. The idea is that while you’re updating the count in the database for batch 1, you can still be computing the partial counts for batches 2 through 10. Trident won’t move on to the state updates for batch 2 until the state updates for batch 1 have succeeded. This is essential for achieving exactly-once processing semantics, as outline in <a href="Trident-state.html">Trident state doc</a>.</p>
 
 <h2 id="trident-spout-types">Trident spout types</h2>
 
 <p>Here are the following spout APIs available:</p>
 
 <ol>
-  <li><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/spout/ITridentSpout.java">ITridentSpout</a>: The most general API that can support transactional or opaque transactional semantics. Generally you&#8217;ll use one of the partitioned flavors of this API rather than this one directly.</li>
+  <li><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/spout/ITridentSpout.java">ITridentSpout</a>: The most general API that can support transactional or opaque transactional semantics. Generally you’ll use one of the partitioned flavors of this API rather than this one directly.</li>
   <li><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/spout/IBatchSpout.java">IBatchSpout</a>: A non-transactional spout that emits batches of tuples at a time</li>
   <li><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/spout/IPartitionedTridentSpout.java">IPartitionedTridentSpout</a>: A transactional spout that reads from a partitioned data source (like a cluster of Kafka servers)</li>
   <li><a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/spout/IOpaquePartitionedTridentSpout.java">IOpaquePartitionedTridentSpout</a>: An opaque transactional spout that reads from a partitioned data source</li>
 </ol>
 
-<p>And, like mentioned in the beginning of this tutorial, you can use regular IRichSpout&#8217;s as well.</p>
+<p>And, like mentioned in the beginning of this tutorial, you can use regular IRichSpout’s as well.</p>
 
 
 </div>

Modified: incubator/storm/site/publish/documentation/Trident-state.html
URL: http://svn.apache.org/viewvc/incubator/storm/site/publish/documentation/Trident-state.html?rev=1597847&r1=1597846&r2=1597847&view=diff
==============================================================================
--- incubator/storm/site/publish/documentation/Trident-state.html (original)
+++ incubator/storm/site/publish/documentation/Trident-state.html Tue May 27 18:39:07 2014
@@ -67,45 +67,45 @@
 <div id="aboutcontent">
 <h1 id="state-in-trident">State in Trident</h1>
 
-<p>Trident has first-class abstractions for reading from and writing to stateful sources. The state can either be internal to the topology – e.g., kept in-memory and backed by HDFS – or externally stored in a database like Memcached or Cassandra. There&#8217;s no difference in the Trident API for either case.</p>
+<p>Trident has first-class abstractions for reading from and writing to stateful sources. The state can either be internal to the topology – e.g., kept in-memory and backed by HDFS – or externally stored in a database like Memcached or Cassandra. There’s no difference in the Trident API for either case.</p>
 
 <p>Trident manages state in a fault-tolerant way so that state updates are idempotent in the face of retries and failures. This lets you reason about Trident topologies as if each message were processed exactly-once.</p>
 
-<p>There&#8217;s various levels of fault-tolerance possible when doing state updates. Before getting to those, let&#8217;s look at an example that illustrates the tricks necessary to achieve exactly-once semantics. Suppose that you&#8217;re doing a count aggregation of your stream and want to store the running count in a database. Now suppose you store in the database a single value representing the count, and every time you process a new tuple you increment the count.</p>
+<p>There’s various levels of fault-tolerance possible when doing state updates. Before getting to those, let’s look at an example that illustrates the tricks necessary to achieve exactly-once semantics. Suppose that you’re doing a count aggregation of your stream and want to store the running count in a database. Now suppose you store in the database a single value representing the count, and every time you process a new tuple you increment the count.</p>
 
-<p>When failures occur, tuples will be replayed. This brings up a problem when doing state updates (or anything with side effects) – you have no idea if you&#8217;ve ever successfully updated the state based on this tuple before. Perhaps you never processed the tuple before, in which case you should increment the count. Perhaps you&#8217;ve processed the tuple and successfully incremented the count, but the tuple failed processing in another step. In this case, you should not increment the count. Or perhaps you saw the tuple before but got an error when updating the database. In this case, you <em>should</em> update the database.</p>
+<p>When failures occur, tuples will be replayed. This brings up a problem when doing state updates (or anything with side effects) – you have no idea if you’ve ever successfully updated the state based on this tuple before. Perhaps you never processed the tuple before, in which case you should increment the count. Perhaps you’ve processed the tuple and successfully incremented the count, but the tuple failed processing in another step. In this case, you should not increment the count. Or perhaps you saw the tuple before but got an error when updating the database. In this case, you <em>should</em> update the database.</p>
 
 <p>By just storing the count in the database, you have no idea whether or not this tuple has been processed before. So you need more information in order to make the right decision. Trident provides the following semantics which are sufficient for achieving exactly-once processing semantics:</p>
 
 <ol>
   <li>Tuples are processed as small batches (see <a href="Trident-tutorial.html">the tutorial</a>)</li>
-  <li>Each batch of tuples is given a unique id called the &#8220;transaction id&#8221; (txid). If the batch is replayed, it is given the exact same txid.</li>
-  <li>State updates are ordered among batches. That is, the state updates for batch 3 won&#8217;t be applied until the state updates for batch 2 have succeeded.</li>
+  <li>Each batch of tuples is given a unique id called the “transaction id” (txid). If the batch is replayed, it is given the exact same txid.</li>
+  <li>State updates are ordered among batches. That is, the state updates for batch 3 won’t be applied until the state updates for batch 2 have succeeded.</li>
 </ol>
 
-<p>With these primitives, your State implementation can detect whether or not the batch of tuples has been processed before and take the appropriate action to update the state in a consistent way. The action you take depends on the exact semantics provided by your input spouts as to what&#8217;s in each batch. There&#8217;s three kinds of spouts possible with respect to fault-tolerance: &#8220;non-transactional&#8221;, &#8220;transactional&#8221;, and &#8220;opaque transactional&#8221;. Likewise, there&#8217;s three kinds of state possible with respect to fault-tolerance: &#8220;non-transactional&#8221;, &#8220;transactional&#8221;, and &#8220;opaque transactional&#8221;. Let&#8217;s take a look at each spout type and see what kind of fault-tolerance you can achieve with each.</p>
+<p>With these primitives, your State implementation can detect whether or not the batch of tuples has been processed before and take the appropriate action to update the state in a consistent way. The action you take depends on the exact semantics provided by your input spouts as to what’s in each batch. There’s three kinds of spouts possible with respect to fault-tolerance: “non-transactional”, “transactional”, and “opaque transactional”. Likewise, there’s three kinds of state possible with respect to fault-tolerance: “non-transactional”, “transactional”, and “opaque transactional”. Let’s take a look at each spout type and see what kind of fault-tolerance you can achieve with each.</p>
 
 <h2 id="transactional-spouts">Transactional spouts</h2>
 
-<p>Remember, Trident processes tuples as small batches with each batch being given a unique transaction id. The properties of spouts vary according to the guarantees they can provide as to what&#8217;s in each batch. A transactional spout has the following properties:</p>
+<p>Remember, Trident processes tuples as small batches with each batch being given a unique transaction id. The properties of spouts vary according to the guarantees they can provide as to what’s in each batch. A transactional spout has the following properties:</p>
 
 <ol>
   <li>Batches for a given txid are always the same. Replays of batches for a txid will exact same set of tuples as the first time that batch was emitted for that txid.</li>
-  <li>There&#8217;s no overlap between batches of tuples (tuples are in one batch or another, never multiple).</li>
+  <li>There’s no overlap between batches of tuples (tuples are in one batch or another, never multiple).</li>
   <li>Every tuple is in a batch (no tuples are skipped)</li>
 </ol>
 
 <p>This is a pretty easy type of spout to understand, the stream is divided into fixed batches that never change. storm-contrib has <a href="https://github.com/nathanmarz/storm-contrib/blob/master/storm-kafka/src/jvm/storm/kafka/trident/TransactionalTridentKafkaSpout.java">an implementation of a transactional spout</a> for Kafka.</p>
 
-<p>You might be wondering – why wouldn&#8217;t you just always use a transactional spout? They&#8217;re simple and easy to understand. One reason you might not use one is because they&#8217;re not necessarily very fault-tolerant. For example, the way TransactionalTridentKafkaSpout works is the batch for a txid will contain tuples from all the Kafka partitions for a topic. Once a batch has been emitted, any time that batch is re-emitted in the future the exact same set of tuples must be emitted to meet the semantics of transactional spouts. Now suppose a batch is emitted from TransactionalTridentKafkaSpout, the batch fails to process, and at the same time one of the Kafka nodes goes down. You&#8217;re now incapable of replaying the same batch as you did before (since the node is down and some partitions for the topic are not unavailable), and processing will halt. </p>
+<p>You might be wondering – why wouldn’t you just always use a transactional spout? They’re simple and easy to understand. One reason you might not use one is because they’re not necessarily very fault-tolerant. For example, the way TransactionalTridentKafkaSpout works is the batch for a txid will contain tuples from all the Kafka partitions for a topic. Once a batch has been emitted, any time that batch is re-emitted in the future the exact same set of tuples must be emitted to meet the semantics of transactional spouts. Now suppose a batch is emitted from TransactionalTridentKafkaSpout, the batch fails to process, and at the same time one of the Kafka nodes goes down. You’re now incapable of replaying the same batch as you did before (since the node is down and some partitions for the topic are not unavailable), and processing will halt. </p>
 
-<p>This is why &#8220;opaque transactional&#8221; spouts exist – they are fault-tolerant to losing source nodes while still allowing you to achieve exactly-once processing semantics. We&#8217;ll cover those spouts in the next section though.</p>
+<p>This is why “opaque transactional” spouts exist – they are fault-tolerant to losing source nodes while still allowing you to achieve exactly-once processing semantics. We’ll cover those spouts in the next section though.</p>
 
 <p>(One side note – once Kafka supports replication, it will be possible to have transactional spouts that are fault-tolerant to node failure, but that feature does not exist yet.)</p>
 
-<p>Before we get to &#8220;opaque transactional&#8221; spouts, let&#8217;s look at how you would design a State implementation that has exactly-once semantics for transactional spouts. This State type is called a &#8220;transactional state&#8221; and takes advantage of the fact that any given txid is always associated with the exact same set of tuples.</p>
+<p>Before we get to “opaque transactional” spouts, let’s look at how you would design a State implementation that has exactly-once semantics for transactional spouts. This State type is called a “transactional state” and takes advantage of the fact that any given txid is always associated with the exact same set of tuples.</p>
 
-<p>Suppose your topology computes word count and you want to store the word counts in a key/value database. The key will be the word, and the value will contain the count. You&#8217;ve already seen that storing just the count as the value isn&#8217;t sufficient to know whether you&#8217;ve processed a batch of tuples before. Instead, what you can do is store the transaction id with the count in the database as an atomic value. Then, when updating the count, you can just compare the transaction id in the database with the transaction id for the current batch. If they&#8217;re the same, you skip the update – because of the strong ordering, you know for sure that the value in the database incorporates the current batch. If they&#8217;re different, you increment the count. This logic works because the batch for a txid never changes, and Trident ensures that state updates are ordered among batches.</p>
+<p>Suppose your topology computes word count and you want to store the word counts in a key/value database. The key will be the word, and the value will contain the count. You’ve already seen that storing just the count as the value isn’t sufficient to know whether you’ve processed a batch of tuples before. Instead, what you can do is store the transaction id with the count in the database as an atomic value. Then, when updating the count, you can just compare the transaction id in the database with the transaction id for the current batch. If they’re the same, you skip the update – because of the strong ordering, you know for sure that the value in the database incorporates the current batch. If they’re different, you increment the count. This logic works because the batch for a txid never changes, and Trident ensures that state updates are ordered among batches.</p>
 
 <p>Consider this example of why it works. Suppose you are processing txid 3 which consists of the following batch of tuples:</p>
 
@@ -123,7 +123,7 @@ dog =&gt; [count=4, txid=3]
 apple =&gt; [count=10, txid=2]
 </code></p>
 
-<p>The txid associated with &#8220;man&#8221; is txid 1. Since the current txid is 3, you know for sure that this batch of tuples is not represented in that count. So you can go ahead and increment the count by 2 and update the txid. On the other hand, the txid for &#8220;dog&#8221; is the same as the current txid. So you know for sure that the increment from the current batch is already represented in the database for the &#8220;dog&#8221; key. So you can skip the update. After completing updates, the database looks like this:</p>
+<p>The txid associated with “man” is txid 1. Since the current txid is 3, you know for sure that this batch of tuples is not represented in that count. So you can go ahead and increment the count by 2 and update the txid. On the other hand, the txid for “dog” is the same as the current txid. So you know for sure that the increment from the current batch is already represented in the database for the “dog” key. So you can skip the update. After completing updates, the database looks like this:</p>
 
 <p><code>
 man =&gt; [count=5, txid=3]
@@ -131,21 +131,21 @@ dog =&gt; [count=4, txid=3]
 apple =&gt; [count=10, txid=2]
 </code></p>
 
-<p>Let&#8217;s now look at opaque transactional spouts and how to design states for that type of spout.</p>
+<p>Let’s now look at opaque transactional spouts and how to design states for that type of spout.</p>
 
 <h2 id="opaque-transactional-spouts">Opaque transactional spouts</h2>
 
 <p>As described before, an opaque transactional spout cannot guarantee that the batch of tuples for a txid remains constant. An opaque transactional spout has the following property:</p>
 
 <ol>
-  <li>Every tuple is <em>successfully</em> processed in exactly one batch. However, it&#8217;s possible for a tuple to fail to process in one batch and then succeed to process in a later batch.</li>
+  <li>Every tuple is <em>successfully</em> processed in exactly one batch. However, it’s possible for a tuple to fail to process in one batch and then succeed to process in a later batch.</li>
 </ol>
 
-<p><a href="https://github.com/nathanmarz/storm-contrib/blob/master/storm-kafka/src/jvm/storm/kafka/trident/OpaqueTridentKafkaSpout.java">OpaqueTridentKafkaSpout</a> is a spout that has this property and is fault-tolerant to losing Kafka nodes. Whenever it&#8217;s time for OpaqueTridentKafkaSpout to emit a batch, it emits tuples starting from where the last batch finished emitting. This ensures that no tuple is ever skipped or successfully processed by multiple batches.</p>
+<p><a href="https://github.com/nathanmarz/storm-contrib/blob/master/storm-kafka/src/jvm/storm/kafka/trident/OpaqueTridentKafkaSpout.java">OpaqueTridentKafkaSpout</a> is a spout that has this property and is fault-tolerant to losing Kafka nodes. Whenever it’s time for OpaqueTridentKafkaSpout to emit a batch, it emits tuples starting from where the last batch finished emitting. This ensures that no tuple is ever skipped or successfully processed by multiple batches.</p>
 
-<p>With opaque transactional spouts, it&#8217;s no longer possible to use the trick of skipping state updates if the transaction id in the database is the same as the transaction id for the current batch. This is because the batch may have changed between state updates.</p>
+<p>With opaque transactional spouts, it’s no longer possible to use the trick of skipping state updates if the transaction id in the database is the same as the transaction id for the current batch. This is because the batch may have changed between state updates.</p>
 
-<p>What you can do is store more state in the database. Rather than store a value and transaction id in the database, you instead store a value, transaction id, and the previous value in the database. Let&#8217;s again use the example of storing a count in the database. Suppose the partial count for your batch is &#8220;2&#8221; and it&#8217;s time to apply a state update. Suppose the value in the database looks like this:</p>
+<p>What you can do is store more state in the database. Rather than store a value and transaction id in the database, you instead store a value, transaction id, and the previous value in the database. Let’s again use the example of storing a count in the database. Suppose the partial count for your batch is “2” and it’s time to apply a state update. Suppose the value in the database looks like this:</p>
 
 <p><code>
 { value = 4,
@@ -154,7 +154,7 @@ apple =&gt; [count=10, txid=2]
 }
 </code></p>
 
-<p>Suppose your current txid is 3, different than what&#8217;s in the database. In this case, you set &#8220;prevValue&#8221; equal to &#8220;value&#8221;, increment &#8220;value&#8221; by your partial count, and update the txid. The new database value will look like this:</p>
+<p>Suppose your current txid is 3, different than what’s in the database. In this case, you set “prevValue” equal to “value”, increment “value” by your partial count, and update the txid. The new database value will look like this:</p>
 
 <p><code>
 { value = 6,
@@ -163,7 +163,7 @@ apple =&gt; [count=10, txid=2]
 }
 </code></p>
 
-<p>Now suppose your current txid is 2, equal to what&#8217;s in the database. Now you know that the &#8220;value&#8221; in the database contains an update from a previous batch for your current txid, but that batch may have been different so you have to ignore it. What you do in this case is increment &#8220;prevValue&#8221; by your partial count to compute the new &#8220;value&#8221;. You then set the value in the database to this:</p>
+<p>Now suppose your current txid is 2, equal to what’s in the database. Now you know that the “value” in the database contains an update from a previous batch for your current txid, but that batch may have been different so you have to ignore it. What you do in this case is increment “prevValue” by your partial count to compute the new “value”. You then set the value in the database to this:</p>
 
 <p><code>
 { value = 3,
@@ -176,7 +176,7 @@ apple =&gt; [count=10, txid=2]
 
 <h2 id="non-transactional-spouts">Non-transactional spouts</h2>
 
-<p>Non-transactional spouts don&#8217;t provide any guarantees about what&#8217;s in each batch. So it might have at-most-once processing, in which case tuples are not retried after failed batches. Or it might have at-least-once processing, where tuples can be processed successfully by multiple batches. There&#8217;s no way to achieve exactly-once semantics for this kind of spout.</p>
+<p>Non-transactional spouts don’t provide any guarantees about what’s in each batch. So it might have at-most-once processing, in which case tuples are not retried after failed batches. Or it might have at-least-once processing, where tuples can be processed successfully by multiple batches. There’s no way to achieve exactly-once semantics for this kind of spout.</p>
 
 <h2 id="summary-of-spout-and-state-types">Summary of spout and state types</h2>
 
@@ -190,7 +190,7 @@ apple =&gt; [count=10, txid=2]
 
 <h2 id="state-apis">State APIs</h2>
 
-<p>You&#8217;ve seen the intricacies of what it takes to achieve exactly-once semantics. The nice thing about Trident is that it internalizes all the fault-tolerance logic within the State – as a user you don&#8217;t have to deal with comparing txids, storing multiple values in the database, or anything like that. You can write code like this:</p>
+<p>You’ve seen the intricacies of what it takes to achieve exactly-once semantics. The nice thing about Trident is that it internalizes all the fault-tolerance logic within the State – as a user you don’t have to deal with comparing txids, storing multiple values in the database, or anything like that. You can write code like this:</p>
 
 <p><code>java
 TridentTopology topology = new TridentTopology();        
@@ -213,7 +213,7 @@ public interface State {
 }
 </code></p>
 
-<p>You&#8217;re told when a state update is beginning, when a state update is ending, and you&#8217;re given the txid in each case. Trident assumes nothing about how your state works, what kind of methods there are to update it, and what kind of methods there are to read from it.</p>
+<p>You’re told when a state update is beginning, when a state update is ending, and you’re given the txid in each case. Trident assumes nothing about how your state works, what kind of methods there are to update it, and what kind of methods there are to read from it.</p>
 
 <p>Suppose you have a home-grown database that contains user location information and you want to be able to access it from Trident. Your State implementation would have methods for getting and setting user information:</p>
 
@@ -244,7 +244,7 @@ public class LocationDBFactory implement
 }
 </code></p>
 
-<p>Trident provides the QueryFunction interface for writing Trident operations that query a source of state, and the StateUpdater interface for writing Trident operations that update a source of state. For example, let&#8217;s write an operation &#8220;QueryLocation&#8221; that queries the LocationDB for the locations of users. Let&#8217;s start off with how you would use it in a topology. Let&#8217;s say this topology consumes an input stream of userids:</p>
+<p>Trident provides the QueryFunction interface for writing Trident operations that query a source of state, and the StateUpdater interface for writing Trident operations that update a source of state. For example, let’s write an operation “QueryLocation” that queries the LocationDB for the locations of users. Let’s start off with how you would use it in a topology. Let’s say this topology consumes an input stream of userids:</p>
 
 <p><code>java
 TridentTopology topology = new TridentTopology();
@@ -253,7 +253,7 @@ topology.newStream("myspout", spout)
         .stateQuery(locations, new Fields("userid"), new QueryLocation(), new Fields("location"))
 </code></p>
 
-<p>Now let&#8217;s take a look at what the implementation of QueryLocation would look like:</p>
+<p>Now let’s take a look at what the implementation of QueryLocation would look like:</p>
 
 <p>```java
 public class QueryLocation extends BaseQueryFunction&lt;LocationDB, String&gt; {
@@ -270,9 +270,9 @@ public class QueryLocation extends BaseQ
 }     } ```
 </code></pre>
 
-<p>QueryFunction&#8217;s execute in two steps. First, Trident collects a batch of reads together and passes them to batchRetrieve. In this case, batchRetrieve will receive multiple user ids. batchRetrieve is expected to return a list of results that&#8217;s the same size as the list of input tuples. The first element of the result list corresponds to the result for the first input tuple, the second is the result for the second input tuple, and so on.</p>
+<p>QueryFunction’s execute in two steps. First, Trident collects a batch of reads together and passes them to batchRetrieve. In this case, batchRetrieve will receive multiple user ids. batchRetrieve is expected to return a list of results that’s the same size as the list of input tuples. The first element of the result list corresponds to the result for the first input tuple, the second is the result for the second input tuple, and so on.</p>
 
-<p>You can see that this code doesn&#8217;t take advantage of the batching that Trident does, since it just queries the LocationDB one at a time. So a better way to write the LocationDB would be like this:</p>
+<p>You can see that this code doesn’t take advantage of the batching that Trident does, since it just queries the LocationDB one at a time. So a better way to write the LocationDB would be like this:</p>
 
 <p>```java
 public class LocationDB implements State {
@@ -310,7 +310,7 @@ public class QueryLocation extends BaseQ
 
 <p>This code will be much more efficient by reducing roundtrips to the database. </p>
 
-<p>To update state, you make use of the StateUpdater interface. Here&#8217;s a StateUpdater that updates a LocationDB with new location information:</p>
+<p>To update state, you make use of the StateUpdater interface. Here’s a StateUpdater that updates a LocationDB with new location information:</p>
 
 <p><code>java
 public class LocationUpdater extends BaseStateUpdater&lt;LocationDB&gt; {
@@ -326,7 +326,7 @@ public class LocationUpdater extends Bas
 }
 </code></p>
 
-<p>Here&#8217;s how you would use this operation in a Trident topology:</p>
+<p>Here’s how you would use this operation in a Trident topology:</p>
 
 <p><code>java
 TridentTopology topology = new TridentTopology();
@@ -339,11 +339,11 @@ TridentState locations = 
 
 <p>partitionPersist returns a TridentState object representing the location db being updated by the Trident topology. You could then use this state in stateQuery operations elsewhere in the topology. </p>
 
-<p>You can also see that StateUpdaters are given a TridentCollector. Tuples emitted to this collector go to the &#8220;new values stream&#8221;. In this case, there&#8217;s nothing interesting to emit to that stream, but if you were doing something like updating counts in a database, you could emit the updated counts to that stream. You can then get access to the new values stream for further processing via the TridentState#newValuesStream method.</p>
+<p>You can also see that StateUpdaters are given a TridentCollector. Tuples emitted to this collector go to the “new values stream”. In this case, there’s nothing interesting to emit to that stream, but if you were doing something like updating counts in a database, you could emit the updated counts to that stream. You can then get access to the new values stream for further processing via the TridentState#newValuesStream method.</p>
 
 <h2 id="persistentaggregate">persistentAggregate</h2>
 
-<p>Trident has another method for updating States called persistentAggregate. You&#8217;ve seen this used in the streaming word count example, shown again below:</p>
+<p>Trident has another method for updating States called persistentAggregate. You’ve seen this used in the streaming word count example, shown again below:</p>
 
 <p><code>java
 TridentTopology topology = new TridentTopology();        
@@ -354,7 +354,7 @@ TridentState wordCounts =
         .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))
 </code></p>
 
-<p>persistentAggregate is an additional abstraction built on top of partitionPersist that knows how to take a Trident aggregator and use it to apply updates to the source of state. In this case, since this is a grouped stream, Trident expects the state you provide to implement the &#8220;MapState&#8221; interface. The grouping fields will be the keys in the state, and the aggregation result will be the values in the state. The &#8220;MapState&#8221; interface looks like this:</p>
+<p>persistentAggregate is an additional abstraction built on top of partitionPersist that knows how to take a Trident aggregator and use it to apply updates to the source of state. In this case, since this is a grouped stream, Trident expects the state you provide to implement the “MapState” interface. The grouping fields will be the keys in the state, and the aggregation result will be the values in the state. The “MapState” interface looks like this:</p>
 
 <p><code>java
 public interface MapState&lt;T&gt; extends State {
@@ -364,7 +364,7 @@ public interface MapState&lt;T&gt; exten
 }
 </code></p>
 
-<p>When you do aggregations on non-grouped streams (a global aggregation), Trident expects your State object to implement the &#8220;Snapshottable&#8221; interface:</p>
+<p>When you do aggregations on non-grouped streams (a global aggregation), Trident expects your State object to implement the “Snapshottable” interface:</p>
 
 <p><code>java
 public interface Snapshottable&lt;T&gt; extends State {
@@ -378,7 +378,7 @@ public interface Snapshottable&lt;T&gt; 
 
 <h2 id="implementing-map-states">Implementing Map States</h2>
 
-<p>Trident makes it easy to implement MapState&#8217;s, doing almost all the work for you. The OpaqueMap, TransactionalMap, and NonTransactionalMap classes implement all the logic for doing the respective fault-tolerance logic. You simply provide these classes with an IBackingMap implementation that knows how to do multiGets and multiPuts of the respective key/values. IBackingMap looks like this:</p>
+<p>Trident makes it easy to implement MapState’s, doing almost all the work for you. The OpaqueMap, TransactionalMap, and NonTransactionalMap classes implement all the logic for doing the respective fault-tolerance logic. You simply provide these classes with an IBackingMap implementation that knows how to do multiGets and multiPuts of the respective key/values. IBackingMap looks like this:</p>
 
 <p><code>java
 public interface IBackingMap&lt;T&gt; {
@@ -387,7 +387,7 @@ public interface IBackingMap&lt;T&gt; {
 }
 </code></p>
 
-<p>OpaqueMap&#8217;s will call multiPut with <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/OpaqueValue.java">OpaqueValue</a>&#8217;s for the vals, TransactionalMap&#8217;s will give <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/TransactionalValue.java">TransactionalValue</a>&#8217;s for the vals, and NonTransactionalMaps will just pass the objects from the topology through.</p>
+<p>OpaqueMap’s will call multiPut with <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/OpaqueValue.java">OpaqueValue</a>’s for the vals, TransactionalMap’s will give <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/TransactionalValue.java">TransactionalValue</a>’s for the vals, and NonTransactionalMaps will just pass the objects from the topology through.</p>
 
 <p>Trident also provides the <a href="https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/map/CachedMap.java">CachedMap</a> class to do automatic LRU caching of map key/vals.</p>