You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@storm.apache.org by pt...@apache.org on 2016/01/20 02:06:53 UTC

[13/22] storm git commit: remove 'docs' directory

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Concepts.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Concepts.md b/docs/documentation/Concepts.md
deleted file mode 100644
index bd7ed3d..0000000
--- a/docs/documentation/Concepts.md
+++ /dev/null
@@ -1,118 +0,0 @@
----
-title: Concepts
-layout: documentation
-documentation: true
----
-
-This page lists the main concepts of Storm and links to resources where you can find more information. The concepts discussed are:
-
-1. Topologies
-2. Streams
-3. Spouts
-4. Bolts
-5. Stream groupings
-6. Reliability
-7. Tasks
-8. Workers
-
-### Topologies
-
-The logic for a realtime application is packaged into a Storm topology. A Storm topology is analogous to a MapReduce job. One key difference is that a MapReduce job eventually finishes, whereas a topology runs forever (or until you kill it, of course). A topology is a graph of spouts and bolts that are connected with stream groupings. These concepts are described below.
-
-**Resources:**
-
-* [TopologyBuilder](/javadoc/apidocs/backtype/storm/topology/TopologyBuilder.html): use this class to construct topologies in Java
-* [Running topologies on a production cluster](Running-topologies-on-a-production-cluster.html)
-* [Local mode](Local-mode.html): Read this to learn how to develop and test topologies in local mode.
-
-### Streams
-
-The stream is the core abstraction in Storm. A stream is an unbounded sequence of tuples that is processed and created in parallel in a distributed fashion. Streams are defined with a schema that names the fields in the stream's tuples. By default, tuples can contain integers, longs, shorts, bytes, strings, doubles, floats, booleans, and byte arrays. You can also define your own serializers so that custom types can be used natively within tuples.
-
-Every stream is given an id when declared. Since single-stream spouts and bolts are so common, [OutputFieldsDeclarer](/javadoc/apidocs/backtype/storm/topology/OutputFieldsDeclarer.html) has convenience methods for declaring a single stream without specifying an id. In this case, the stream is given the default id of "default".
-
-
-**Resources:**
-
-* [Tuple](/javadoc/apidocs/backtype/storm/tuple/Tuple.html): streams are composed of tuples
-* [OutputFieldsDeclarer](/javadoc/apidocs/backtype/storm/topology/OutputFieldsDeclarer.html): used to declare streams and their schemas
-* [Serialization](Serialization.html): Information about Storm's dynamic typing of tuples and declaring custom serializations
-* [ISerialization](/javadoc/apidocs/backtype/storm/serialization/ISerialization.html): custom serializers must implement this interface
-* [CONFIG.TOPOLOGY_SERIALIZATIONS](/javadoc/apidocs/backtype/storm/Config.html#TOPOLOGY_SERIALIZATIONS): custom serializers can be registered using this configuration
-
-### Spouts
-
-A spout is a source of streams in a topology. Generally spouts will read tuples from an external source and emit them into the topology (e.g. a Kestrel queue or the Twitter API). Spouts can either be __reliable__ or __unreliable__. A reliable spout is capable of replaying a tuple if it failed to be processed by Storm, whereas an unreliable spout forgets about the tuple as soon as it is emitted.
-
-Spouts can emit more than one stream. To do so, declare multiple streams using the `declareStream` method of [OutputFieldsDeclarer](/javadoc/apidocs/backtype/storm/topology/OutputFieldsDeclarer.html) and specify the stream to emit to when using the `emit` method on [SpoutOutputCollector](/javadoc/apidocs/backtype/storm/spout/SpoutOutputCollector.html).
-
-The main method on spouts is `nextTuple`. `nextTuple` either emits a new tuple into the topology or simply returns if there are no new tuples to emit. It is imperative that `nextTuple` does not block for any spout implementation, because Storm calls all the spout methods on the same thread.
-
-The other main methods on spouts are `ack` and `fail`. These are called when Storm detects that a tuple emitted from the spout either successfully completed through the topology or failed to be completed. `ack` and `fail` are only called for reliable spouts. See [the Javadoc](/javadoc/apidocs/backtype/storm/spout/ISpout.html) for more information.
-
-**Resources:**
-
-* [IRichSpout](/javadoc/apidocs/backtype/storm/topology/IRichSpout.html): this is the interface that spouts must implement.
-* [Guaranteeing message processing](Guaranteeing-message-processing.html)
-
-### Bolts
-
-All processing in topologies is done in bolts. Bolts can do anything from filtering, functions, aggregations, joins, talking to databases, and more. 
-
-Bolts can do simple stream transformations. Doing complex stream transformations often requires multiple steps and thus multiple bolts. For example, transforming a stream of tweets into a stream of trending images requires at least two steps: a bolt to do a rolling count of retweets for each image, and one or more bolts to stream out the top X images (you can do this particular stream transformation in a more scalable way with three bolts than with two). 
-
-Bolts can emit more than one stream. To do so, declare multiple streams using the `declareStream` method of [OutputFieldsDeclarer](/javadoc/apidocs/backtype/storm/topology/OutputFieldsDeclarer.html) and specify the stream to emit to when using the `emit` method on [OutputCollector](/javadoc/apidocs/backtype/storm/task/OutputCollector.html).
-
-When you declare a bolt's input streams, you always subscribe to specific streams of another component. If you want to subscribe to all the streams of another component, you have to subscribe to each one individually. [InputDeclarer](/javadoc/apidocs/backtype/storm/topology/InputDeclarer.html) has syntactic sugar for subscribing to streams declared on the default stream id. Saying `declarer.shuffleGrouping("1")` subscribes to the default stream on component "1" and is equivalent to `declarer.shuffleGrouping("1", DEFAULT_STREAM_ID)`.
-
-The main method in bolts is the `execute` method which takes in as input a new tuple. Bolts emit new tuples using the [OutputCollector](/javadoc/apidocs/backtype/storm/task/OutputCollector.html) object. Bolts must call the `ack` method on the `OutputCollector` for every tuple they process so that Storm knows when tuples are completed (and can eventually determine that its safe to ack the original spout tuples). For the common case of processing an input tuple, emitting 0 or more tuples based on that tuple, and then acking the input tuple, Storm provides an [IBasicBolt](/javadoc/apidocs/backtype/storm/topology/IBasicBolt.html) interface which does the acking automatically.
-
-Please note that [OutputCollector](/javadoc/apidocs/backtype/storm/task/OutputCollector.html) is not thread-safe, and all emits, acks, and fails must happen on the same thread. Please refer [Troubleshooting](troubleshooting.html) for more details.
-
-**Resources:**
-
-* [IRichBolt](/javadoc/apidocs/backtype/storm/topology/IRichBolt.html): this is general interface for bolts.
-* [IBasicBolt](/javadoc/apidocs/backtype/storm/topology/IBasicBolt.html): this is a convenience interface for defining bolts that do filtering or simple functions.
-* [OutputCollector](/javadoc/apidocs/backtype/storm/task/OutputCollector.html): bolts emit tuples to their output streams using an instance of this class
-* [Guaranteeing message processing](Guaranteeing-message-processing.html)
-
-### Stream groupings
-
-Part of defining a topology is specifying for each bolt which streams it should receive as input. A stream grouping defines how that stream should be partitioned among the bolt's tasks.
-
-There are eight built-in stream groupings in Storm, and you can implement a custom stream grouping by implementing the [CustomStreamGrouping](/javadoc/apidocs/backtype/storm/grouping/CustomStreamGrouping.html) interface:
-
-1. **Shuffle grouping**: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.
-2. **Fields grouping**: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.
-3. **Partial Key grouping**: The stream is partitioned by the fields specified in the grouping, like the Fields grouping, but are load balanced between two downstream bolts, which provides better utilization of resources when the incoming data is skewed. [This paper](https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf) provides a good explanation of how it works and the advantages it provides.
-4. **All grouping**: The stream is replicated across all the bolt's tasks. Use this grouping with care.
-5. **Global grouping**: The entire stream goes to a single one of the bolt's tasks. Specifically, it goes to the task with the lowest id.
-6. **None grouping**: This grouping specifies that you don't care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible).
-7. **Direct grouping**: This is a special kind of grouping. A stream grouped this way means that the __producer__ of the tuple decides which task of the consumer will receive this tuple. Direct groupings can only be declared on streams that have been declared as direct streams. Tuples emitted to a direct stream must be emitted using one of the [emitDirect](/javadoc/apidocs/backtype/storm/task/OutputCollector.html#emitDirect(int, int, java.util.List) methods. A bolt can get the task ids of its consumers by either using the provided [TopologyContext](/javadoc/apidocs/backtype/storm/task/TopologyContext.html) or by keeping track of the output of the `emit` method in [OutputCollector](/javadoc/apidocs/backtype/storm/task/OutputCollector.html) (which returns the task ids that the tuple was sent to).
-8. **Local or shuffle grouping**: If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks. Otherwise, this acts like a normal shuffle grouping.
-
-**Resources:**
-
-* [TopologyBuilder](/javadoc/apidocs/backtype/storm/topology/TopologyBuilder.html): use this class to define topologies
-* [InputDeclarer](/javadoc/apidocs/backtype/storm/topology/InputDeclarer.html): this object is returned whenever `setBolt` is called on `TopologyBuilder` and is used for declaring a bolt's input streams and how those streams should be grouped
-* [CoordinatedBolt](/javadoc/apidocs/backtype/storm/task/CoordinatedBolt.html): this bolt is useful for distributed RPC topologies and makes heavy use of direct streams and direct groupings
-
-### Reliability
-
-Storm guarantees that every spout tuple will be fully processed by the topology. It does this by tracking the tree of tuples triggered by every spout tuple and determining when that tree of tuples has been successfully completed. Every topology has a "message timeout" associated with it. If Storm fails to detect that a spout tuple has been completed within that timeout, then it fails the tuple and replays it later. 
-
-To take advantage of Storm's reliability capabilities, you must tell Storm when new edges in a tuple tree are being created and tell Storm whenever you've finished processing an individual tuple. These are done using the [OutputCollector](/javadoc/apidocs/backtype/storm/task/OutputCollector.html) object that bolts use to emit tuples. Anchoring is done in the `emit` method, and you declare that you're finished with a tuple using the `ack` method.
-
-This is all explained in much more detail in [Guaranteeing message processing](Guaranteeing-message-processing.html). 
-
-### Tasks
-
-Each spout or bolt executes as many tasks across the cluster. Each task corresponds to one thread of execution, and stream groupings define how to send tuples from one set of tasks to another set of tasks. You set the parallelism for each spout or bolt in the `setSpout` and `setBolt` methods of [TopologyBuilder](/javadoc/apidocs/backtype/storm/topology/TopologyBuilder.html).
-
-### Workers
-
-Topologies execute across one or more worker processes. Each worker process is a physical JVM and executes a subset of all the tasks for the topology. For example, if the combined parallelism of the topology is 300 and 50 workers are allocated, then each worker will execute 6 tasks (as threads within the worker). Storm tries to spread the tasks evenly across all the workers.
-
-**Resources:**
-
-* [Config.TOPOLOGY_WORKERS](/javadoc/apidocs/backtype/storm/Config.html#TOPOLOGY_WORKERS): this config sets the number of workers to allocate for executing the topology

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Configuration.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Configuration.md b/docs/documentation/Configuration.md
deleted file mode 100644
index 6d85aa1..0000000
--- a/docs/documentation/Configuration.md
+++ /dev/null
@@ -1,31 +0,0 @@
----
-title: Configuration
-layout: documentation
-documentation: true
----
-Storm has a variety of configurations for tweaking the behavior of nimbus, supervisors, and running topologies. Some configurations are system configurations and cannot be modified on a topology by topology basis, whereas other configurations can be modified per topology. 
-
-Every configuration has a default value defined in [defaults.yaml](https://github.com/apache/storm/blob/master/conf/defaults.yaml) in the Storm codebase. You can override these configurations by defining a storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can define a topology-specific configuration that you submit along with your topology when using [StormSubmitter](/javadoc/apidocs/backtype/storm/StormSubmitter.html). However, the topology-specific configuration can only override configs prefixed with "TOPOLOGY".
-
-Storm 0.7.0 and onwards lets you override configuration on a per-bolt/per-spout basis. The only configurations that can be overriden this way are:
-
-1. "topology.debug"
-2. "topology.max.spout.pending"
-3. "topology.max.task.parallelism"
-4. "topology.kryo.register": This works a little bit differently than the other ones, since the serializations will be available to all components in the topology. More details on [Serialization](Serialization.html). 
-
-The Java API lets you specify component specific configurations in two ways:
-
-1. *Internally:* Override `getComponentConfiguration` in any spout or bolt and return the component-specific configuration map.
-2. *Externally:* `setSpout` and `setBolt` in `TopologyBuilder` return an object with methods `addConfiguration` and `addConfigurations` that can be used to override the configurations for the component.
-
-The preference order for configuration values is defaults.yaml < storm.yaml < topology specific configuration < internal component specific configuration < external component specific configuration. 
-
-
-**Resources:**
-
-* [Config](/javadoc/apidocs/backtype/storm/Config.html): a listing of all configurations as well as a helper class for creating topology specific configurations
-* [defaults.yaml](https://github.com/apache/storm/blob/master/conf/defaults.yaml): the default values for all configurations
-* [Setting up a Storm cluster](Setting-up-a-Storm-cluster.html): explains how to create and configure a Storm cluster
-* [Running topologies on a production cluster](Running-topologies-on-a-production-cluster.html): lists useful configurations when running topologies on a cluster
-* [Local mode](Local-mode.html): lists useful configurations when using local mode
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Contributing-to-Storm.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Contributing-to-Storm.md b/docs/documentation/Contributing-to-Storm.md
deleted file mode 100644
index fdc5835..0000000
--- a/docs/documentation/Contributing-to-Storm.md
+++ /dev/null
@@ -1,33 +0,0 @@
----
-title: Contributing
-layout: documentation
-documentation: true
----
-
-### Getting started with contributing
-
-Some of the issues on the [issue tracker](https://issues.apache.org/jira/browse/STORM) are marked with the "Newbie" label. If you're interesting in contributing to Storm but don't know where to begin, these are good issues to start with. These issues are a great way to get your feet wet with learning the codebase because they require learning about only an isolated portion of the codebase and are a relatively small amount of work.
-
-### Learning the codebase
-
-The [Implementation docs](Implementation-docs.html) section of the wiki gives detailed walkthroughs of the codebase. Reading through these docs is highly recommended to understand the codebase.
-
-### Contribution process
-
-Contributions to the Storm codebase should be sent as [GitHub](https://github.com/apache/storm) pull requests. If there's any problems to the pull request we can iterate on it using GitHub's commenting features.
-
-For small patches, feel free to submit pull requests directly for them. For larger contributions, please use the following process. The idea behind this process is to prevent any wasted work and catch design issues early on:
-
-1. Open an issue on the [issue tracker](https://issues.apache.org/jira/browse/STORM) if one doesn't exist already
-2. Comment on the issue with your plan for implementing the issue. Explain what pieces of the codebase you're going to touch and how everything is going to fit together.
-3. Storm committers will iterate with you on the design to make sure you're on the right track
-4. Implement your issue, submit a pull request, and iterate from there.
-
-### Modules built on top of Storm
-
-Modules built on top of Storm (like spouts, bolts, etc) that aren't appropriate for Storm core can be done as your own project or as part of [@stormprocessor](https://github.com/stormprocessor). To be part of @stormprocessor put your project on your own Github and then send an email to the mailing list proposing to make it part of @stormprocessor. Then the community can discuss whether it's useful enough to be part of @stormprocessor. Then you'll be added to the @stormprocessor organization and can maintain your project there. The advantage of hosting your module in @stormprocessor is that it will be easier for potential users to find your project.
-
-### Contributing documentation
-
-Documentation contributions are very welcome! The best way to send contributions is as emails through the mailing list.
-

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Creating-a-new-Storm-project.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Creating-a-new-Storm-project.md b/docs/documentation/Creating-a-new-Storm-project.md
deleted file mode 100644
index b6186b5..0000000
--- a/docs/documentation/Creating-a-new-Storm-project.md
+++ /dev/null
@@ -1,25 +0,0 @@
----
-title: Creating a New Storm Project
-layout: documentation
-documentation: true
----
-This page outlines how to set up a Storm project for development. The steps are:
-
-1. Add Storm jars to classpath
-2. If using multilang, add multilang dir to classpath
-
-Follow along to see how to set up the [storm-starter](https://github.com/apache/storm/blob/master/examples/storm-starter) project in Eclipse.
-
-### Add Storm jars to classpath
-
-You'll need the Storm jars on your classpath to develop Storm topologies. Using [Maven](Maven.html) is highly recommended. [Here's an example](https://github.com/apache/storm/blob/master/examples/storm-starter/pom.xml) of how to setup your pom.xml for a Storm project. If you don't want to use Maven, you can include the jars from the Storm release on your classpath.
-
-To set up the classpath in Eclipse, create a new Java project, include `src/jvm/` as a source path, and make sure all the jars in `lib/` and `lib/dev/` are in the `Referenced Libraries` section of the project.
-
-### If using multilang, add multilang dir to classpath
-
-If you implement spouts or bolts in languages other than Java, then those implementations should be under the `multilang/resources/` directory of the project. For Storm to find these files in local mode, the `resources/` dir needs to be on the classpath. You can do this in Eclipse by adding `multilang/` as a source folder. You may also need to add multilang/resources as a source directory.
-
-For more information on writing topologies in other languages, see [Using non-JVM languages with Storm](Using-non-JVM-languages-with-Storm.html).
-
-To test that everything is working in Eclipse, you should now be able to `Run` the `WordCountTopology.java` file. You will see messages being emitted at the console for 10 seconds.

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/DSLs-and-multilang-adapters.md
----------------------------------------------------------------------
diff --git a/docs/documentation/DSLs-and-multilang-adapters.md b/docs/documentation/DSLs-and-multilang-adapters.md
deleted file mode 100644
index 0ed5450..0000000
--- a/docs/documentation/DSLs-and-multilang-adapters.md
+++ /dev/null
@@ -1,10 +0,0 @@
----
-title: Storm DSLs and Multi-Lang Adapters
-layout: documentation
-documentation: true
----
-* [Scala DSL](https://github.com/velvia/ScalaStorm)
-* [JRuby DSL](https://github.com/colinsurprenant/redstorm)
-* [Clojure DSL](Clojure-DSL.html)
-* [Storm/Esper integration](https://github.com/tomdz/storm-esper): Streaming SQL on top of Storm
-* [io-storm](https://github.com/dan-blanchard/io-storm): Perl multilang adapter

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Defining-a-non-jvm-language-dsl-for-storm.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Defining-a-non-jvm-language-dsl-for-storm.md b/docs/documentation/Defining-a-non-jvm-language-dsl-for-storm.md
deleted file mode 100644
index 77eb392..0000000
--- a/docs/documentation/Defining-a-non-jvm-language-dsl-for-storm.md
+++ /dev/null
@@ -1,38 +0,0 @@
----
-title: Defining a Non-JVM DSL for Storm
-layout: documentation
-documentation: true
----
-The right place to start to learn how to make a non-JVM DSL for Storm is [storm-core/src/storm.thrift](https://github.com/apache/storm/blob/master/storm-core/src/storm.thrift). Since Storm topologies are just Thrift structures, and Nimbus is a Thrift daemon, you can create and submit topologies in any language.
-
-When you create the Thrift structs for spouts and bolts, the code for the spout or bolt is specified in the ComponentObject struct:
-
-```
-union ComponentObject {
-  1: binary serialized_java;
-  2: ShellComponent shell;
-  3: JavaObject java_object;
-}
-```
-
-For a Python DSL, you would want to make use of "2" and "3". ShellComponent lets you specify a script to run that component (e.g., your python code). And JavaObject lets you specify native java spouts and bolts for the component (and Storm will use reflection to create that spout or bolt).
-
-There's a "storm shell" command that will help with submitting a topology. Its usage is like this:
-
-```
-storm shell resources/ python topology.py arg1 arg2
-```
-
-storm shell will then package resources/ into a jar, upload the jar to Nimbus, and call your topology.py script like this:
-
-```
-python topology.py arg1 arg2 {nimbus-host} {nimbus-port} {uploaded-jar-location}
-```
-
-Then you can connect to Nimbus using the Thrift API and submit the topology, passing {uploaded-jar-location} into the submitTopology method. For reference, here's the submitTopology definition:
-
-```java
-void submitTopology(1: string name, 2: string uploadedJarLocation, 3: string jsonConf, 4: StormTopology topology) throws (1: AlreadyAliveException e, 2: InvalidTopologyException ite);
-```
-
-Finally, one of the key things to do in a non-JVM DSL is make it easy to define the entire topology in one file (the bolts, spouts, and the definition of the topology).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Distributed-RPC.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Distributed-RPC.md b/docs/documentation/Distributed-RPC.md
deleted file mode 100644
index 484178b..0000000
--- a/docs/documentation/Distributed-RPC.md
+++ /dev/null
@@ -1,199 +0,0 @@
----
-title: Distributed RPC
-layout: documentation
-documentation: true
----
-The idea behind distributed RPC (DRPC) is to parallelize the computation of really intense functions on the fly using Storm. The Storm topology takes in as input a stream of function arguments, and it emits an output stream of the results for each of those function calls. 
-
-DRPC is not so much a feature of Storm as it is a pattern expressed from Storm's primitives of streams, spouts, bolts, and topologies. DRPC could have been packaged as a separate library from Storm, but it's so useful that it's bundled with Storm.
-
-### High level overview
-
-Distributed RPC is coordinated by a "DRPC server" (Storm comes packaged with an implementation of this). The DRPC server coordinates receiving an RPC request, sending the request to the Storm topology, receiving the results from the Storm topology, and sending the results back to the waiting client. From a client's perspective, a distributed RPC call looks just like a regular RPC call. For example, here's how a client would compute the results for the "reach" function with the argument "http://twitter.com":
-
-```java
-DRPCClient client = new DRPCClient("drpc-host", 3772);
-String result = client.execute("reach", "http://twitter.com");
-```
-
-The distributed RPC workflow looks like this:
-
-![Tasks in a topology](images/drpc-workflow.png)
-
-A client sends the DRPC server the name of the function to execute and the arguments to that function. The topology implementing that function uses a `DRPCSpout` to receive a function invocation stream from the DRPC server. Each function invocation is tagged with a unique id by the DRPC server. The topology then computes the result and at the end of the topology a bolt called `ReturnResults` connects to the DRPC server and gives it the result for the function invocation id. The DRPC server then uses the id to match up that result with which client is waiting, unblocks the waiting client, and sends it the result.
-
-### LinearDRPCTopologyBuilder
-
-Storm comes with a topology builder called [LinearDRPCTopologyBuilder](/javadoc/apidocs/backtype/storm/drpc/LinearDRPCTopologyBuilder.html) that automates almost all the steps involved for doing DRPC. These include:
-
-1. Setting up the spout
-2. Returning the results to the DRPC server
-3. Providing functionality to bolts for doing finite aggregations over groups of tuples
-
-Let's look at a simple example. Here's the implementation of a DRPC topology that returns its input argument with a "!" appended:
-
-```java
-public static class ExclaimBolt extends BaseBasicBolt {
-    public void execute(Tuple tuple, BasicOutputCollector collector) {
-        String input = tuple.getString(1);
-        collector.emit(new Values(tuple.getValue(0), input + "!"));
-    }
-
-    public void declareOutputFields(OutputFieldsDeclarer declarer) {
-        declarer.declare(new Fields("id", "result"));
-    }
-}
-
-public static void main(String[] args) throws Exception {
-    LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("exclamation");
-    builder.addBolt(new ExclaimBolt(), 3);
-    // ...
-}
-```
-
-As you can see, there's very little to it. When creating the `LinearDRPCTopologyBuilder`, you tell it the name of the DRPC function for the topology. A single DRPC server can coordinate many functions, and the function name distinguishes the functions from one another. The first bolt you declare will take in as input 2-tuples, where the first field is the request id and the second field is the arguments for that request. `LinearDRPCTopologyBuilder` expects the last bolt to emit an output stream containing 2-tuples of the form [id, result]. Finally, all intermediate tuples must contain the request id as the first field.
-
-In this example, `ExclaimBolt` simply appends a "!" to the second field of the tuple. `LinearDRPCTopologyBuilder` handles the rest of the coordination of connecting to the DRPC server and sending results back.
-
-### Local mode DRPC
-
-DRPC can be run in local mode. Here's how to run the above example in local mode:
-
-```java
-LocalDRPC drpc = new LocalDRPC();
-LocalCluster cluster = new LocalCluster();
-
-cluster.submitTopology("drpc-demo", conf, builder.createLocalTopology(drpc));
-
-System.out.println("Results for 'hello':" + drpc.execute("exclamation", "hello"));
-
-cluster.shutdown();
-drpc.shutdown();
-```
-
-First you create a `LocalDRPC` object. This object simulates a DRPC server in process, just like how `LocalCluster` simulates a Storm cluster in process. Then you create the `LocalCluster` to run the topology in local mode. `LinearDRPCTopologyBuilder` has separate methods for creating local topologies and remote topologies. In local mode the `LocalDRPC` object does not bind to any ports so the topology needs to know about the object to communicate with it. This is why `createLocalTopology` takes in the `LocalDRPC` object as input.
-
-After launching the topology, you can do DRPC invocations using the `execute` method on `LocalDRPC`.
-
-### Remote mode DRPC
-
-Using DRPC on an actual cluster is also straightforward. There's three steps:
-
-1. Launch DRPC server(s)
-2. Configure the locations of the DRPC servers
-3. Submit DRPC topologies to Storm cluster
-
-Launching a DRPC server can be done with the `storm` script and is just like launching Nimbus or the UI:
-
-```
-bin/storm drpc
-```
-
-Next, you need to configure your Storm cluster to know the locations of the DRPC server(s). This is how `DRPCSpout` knows from where to read function invocations. This can be done through the `storm.yaml` file or the topology configurations. Configuring this through the `storm.yaml` looks something like this:
-
-```yaml
-drpc.servers:
-  - "drpc1.foo.com"
-  - "drpc2.foo.com"
-```
-
-Finally, you launch DRPC topologies using `StormSubmitter` just like you launch any other topology. To run the above example in remote mode, you do something like this:
-
-```java
-StormSubmitter.submitTopology("exclamation-drpc", conf, builder.createRemoteTopology());
-```
-
-`createRemoteTopology` is used to create topologies suitable for Storm clusters.
-
-### A more complex example
-
-The exclamation DRPC example was a toy example for illustrating the concepts of DRPC. Let's look at a more complex example which really needs the parallelism a Storm cluster provides for computing the DRPC function. The example we'll look at is computing the reach of a URL on Twitter.
-
-The reach of a URL is the number of unique people exposed to a URL on Twitter. To compute reach, you need to:
-
-1. Get all the people who tweeted the URL
-2. Get all the followers of all those people
-3. Unique the set of followers
-4. Count the unique set of followers
-
-A single reach computation can involve thousands of database calls and tens of millions of follower records during the computation. It's a really, really intense computation. As you're about to see, implementing this function on top of Storm is dead simple. On a single machine, reach can take minutes to compute; on a Storm cluster, you can compute reach for even the hardest URLs in a couple seconds.
-
-A sample reach topology is defined in storm-starter [here](https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/storm/starter/ReachTopology.java). Here's how you define the reach topology:
-
-```java
-LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("reach");
-builder.addBolt(new GetTweeters(), 3);
-builder.addBolt(new GetFollowers(), 12)
-        .shuffleGrouping();
-builder.addBolt(new PartialUniquer(), 6)
-        .fieldsGrouping(new Fields("id", "follower"));
-builder.addBolt(new CountAggregator(), 2)
-        .fieldsGrouping(new Fields("id"));
-```
-
-The topology executes as four steps:
-
-1. `GetTweeters` gets the users who tweeted the URL. It transforms an input stream of `[id, url]` into an output stream of `[id, tweeter]`. Each `url` tuple will map to many `tweeter` tuples.
-2. `GetFollowers` gets the followers for the tweeters. It transforms an input stream of `[id, tweeter]` into an output stream of `[id, follower]`. Across all the tasks, there may of course be duplication of follower tuples when someone follows multiple people who tweeted the same URL.
-3. `PartialUniquer` groups the followers stream by the follower id. This has the effect of the same follower going to the same task. So each task of `PartialUniquer` will receive mutually independent sets of followers. Once `PartialUniquer` receives all the follower tuples directed at it for the request id, it emits the unique count of its subset of followers.
-4. Finally, `CountAggregator` receives the partial counts from each of the `PartialUniquer` tasks and sums them up to complete the reach computation.
-
-Let's take a look at the `PartialUniquer` bolt:
-
-```java
-public class PartialUniquer extends BaseBatchBolt {
-    BatchOutputCollector _collector;
-    Object _id;
-    Set<String> _followers = new HashSet<String>();
-    
-    @Override
-    public void prepare(Map conf, TopologyContext context, BatchOutputCollector collector, Object id) {
-        _collector = collector;
-        _id = id;
-    }
-
-    @Override
-    public void execute(Tuple tuple) {
-        _followers.add(tuple.getString(1));
-    }
-    
-    @Override
-    public void finishBatch() {
-        _collector.emit(new Values(_id, _followers.size()));
-    }
-
-    @Override
-    public void declareOutputFields(OutputFieldsDeclarer declarer) {
-        declarer.declare(new Fields("id", "partial-count"));
-    }
-}
-```
-
-`PartialUniquer` implements `IBatchBolt` by extending `BaseBatchBolt`. A batch bolt provides a first class API to processing a batch of tuples as a concrete unit. A new instance of the batch bolt is created for each request id, and Storm takes care of cleaning up the instances when appropriate. 
-
-When `PartialUniquer` receives a follower tuple in the `execute` method, it adds it to the set for the request id in an internal `HashSet`. 
-
-Batch bolts provide the `finishBatch` method which is called after all the tuples for this batch targeted at this task have been processed. In the callback, `PartialUniquer` emits a single tuple containing the unique count for its subset of follower ids.
-
-Under the hood, `CoordinatedBolt` is used to detect when a given bolt has received all of the tuples for any given request id. `CoordinatedBolt` makes use of direct streams to manage this coordination.
-
-The rest of the topology should be self-explanatory. As you can see, every single step of the reach computation is done in parallel, and defining the DRPC topology was extremely simple.
-
-### Non-linear DRPC topologies
-
-`LinearDRPCTopologyBuilder` only handles "linear" DRPC topologies, where the computation is expressed as a sequence of steps (like reach). It's not hard to imagine functions that would require a more complicated topology with branching and merging of the bolts. For now, to do this you'll need to drop down into using `CoordinatedBolt` directly. Be sure to talk about your use case for non-linear DRPC topologies on the mailing list to inform the construction of more general abstractions for DRPC topologies.
-
-### How LinearDRPCTopologyBuilder works
-
-* DRPCSpout emits [args, return-info]. return-info is the host and port of the DRPC server as well as the id generated by the DRPC server
-* constructs a topology comprising of:
-  * DRPCSpout
-  * PrepareRequest (generates a request id and creates a stream for the return info and a stream for the args)
-  * CoordinatedBolt wrappers and direct groupings
-  * JoinResult (joins the result with the return info)
-  * ReturnResult (connects to the DRPC server and returns the result)
-* LinearDRPCTopologyBuilder is a good example of a higher level abstraction built on top of Storm's primitives
-
-### Advanced
-* KeyedFairBolt for weaving the processing of multiple requests at the same time
-* How to use `CoordinatedBolt` directly
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Documentation.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Documentation.md b/docs/documentation/Documentation.md
deleted file mode 100644
index ab555c1..0000000
--- a/docs/documentation/Documentation.md
+++ /dev/null
@@ -1,56 +0,0 @@
----
-title: Documentation
-layout: documentation
-documentation: true
----
-### Basics of Storm
-
-* [Javadoc](/javadoc/apidocs/index.html)
-* [Concepts](Concepts.html)
-* [Configuration](Configuration.html)
-* [Guaranteeing message processing](Guaranteeing-message-processing.html)
-* [Fault-tolerance](Fault-tolerance.html)
-* [Command line client](Command-line-client.html)
-* [Understanding the parallelism of a Storm topology](Understanding-the-parallelism-of-a-Storm-topology.html)
-* [FAQ](FAQ.html)
-
-### Trident
-
-Trident is an alternative interface to Storm. It provides exactly-once processing, "transactional" datastore persistence, and a set of common stream analytics operations.
-
-* [Trident Tutorial](Trident-tutorial.html)     -- basic concepts and walkthrough
-* [Trident API Overview](Trident-API-Overview.html) -- operations for transforming and orchestrating data
-* [Trident State](Trident-state.html)        -- exactly-once processing and fast, persistent aggregation
-* [Trident spouts](Trident-spouts.html)       -- transactional and non-transactional data intake
-
-### Setup and deploying
-
-* [Setting up a Storm cluster](Setting-up-a-Storm-cluster.html)
-* [Local mode](Local-mode.html)
-* [Troubleshooting](Troubleshooting.html)
-* [Running topologies on a production cluster](Running-topologies-on-a-production-cluster.html)
-* [Building Storm](Maven.html) with Maven
-
-### Intermediate
-
-* [Serialization](Serialization.html)
-* [Common patterns](Common-patterns.html)
-* [Clojure DSL](Clojure-DSL.html)
-* [Using non-JVM languages with Storm](Using-non-JVM-languages-with-Storm.html)
-* [Distributed RPC](Distributed-RPC.html)
-* [Transactional topologies](Transactional-topologies.html)
-* [Kestrel and Storm](Kestrel-and-Storm.html)
-* [Direct groupings](Direct-groupings.html)
-* [Hooks](Hooks.html)
-* [Metrics](Metrics.html)
-* [Lifecycle of a trident tuple]()
-* [UI REST API](ui-rest-api.html)
-* [Logs](Logs.html)
-* [Dynamic Log Level Settings](dynamic-log-level-settings.html)
-* [Dynamic Worker Profiling](dynamic-worker-profiling.html)
-
-### Advanced
-
-* [Defining a non-JVM language DSL for Storm](Defining-a-non-jvm-language-dsl-for-storm.html)
-* [Multilang protocol](Multilang-protocol.html) (how to provide support for another language)
-* [Implementation docs](Implementation-docs.html)

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/FAQ.md
----------------------------------------------------------------------
diff --git a/docs/documentation/FAQ.md b/docs/documentation/FAQ.md
deleted file mode 100644
index a65da1e..0000000
--- a/docs/documentation/FAQ.md
+++ /dev/null
@@ -1,130 +0,0 @@
----
-title: FAQ
-layout: documentation
-documentation: true
----
-
-## Best Practices
-
-### What rules of thumb can you give me for configuring Storm+Trident?
-
-* number of workers a multiple of number of machines; parallelism a multiple of number of workers; number of kafka partitions a multiple of number of spout parallelism
-* Use one worker per topology per machine
-* Start with fewer, larger aggregators, one per machine with workers on it
-* Use the isolation scheduler
-* Use one acker per worker -- 0.9 makes that the default, but earlier versions do not.
-* enable GC logging; you should see very few major GCs if things are in reasonable shape.
-* set the trident batch millis to about 50% of your typical end-to-end latency.
-* Start with a max spout pending that is for sure too small -- one for trident, or the number of executors for storm -- and increase it until you stop seeing changes in the flow. You'll probably end up with something near `2*(throughput in recs/sec)*(end-to-end latency)` (2x the Little's law capacity).
-
-### What are some of the best ways to get a worker to mysteriously and bafflingly die?
-
-* Do you have write access to the log directory
-* Are you blowing out your heap?
-* Are all the right libraries installed on all of the workers?
-* Is the zookeeper hostname still set to localhost?
-* Did you supply a correct, unique hostname -- one that resolves back to the machine -- to each worker, and put it in the storm conf file?
-* Have you opened firewall/securitygroup permissions _bidirectionally_ among a) all the workers, b) the storm master, c) zookeeper? Also, from the workers to any kafka/kestrel/database/etc that your topology accesses? Use netcat to poke the appropriate ports and be sure. 
-
-### Halp! I cannot see:
-
-* **my logs** Logs by default go to $STORM_HOME/logs. Check that you have write permissions to that directory. They are configured in 
-    * log4j2/{cluster, worker}.xml (> 0.9);
-    * logback/cluster.xml (0.9);
-    * log4j/*.properties in earlier versions (< 0.9).
-* **final JVM settings** Add the `-XX+PrintFlagsFinal` commandline option in the childopts (see the conf file)
-* **final Java system properties** Add `Properties props = System.getProperties(); props.list(System.out);` near where you build your topology.
-
-### How many Workers should I use?
-
-The total number of workers is set by the supervisors -- there's some number of JVM slots each supervisor will superintend. The thing you set on the topology is how many worker slots it will try to claim.
-
-There's no great reason to use more than one worker per topology per machine.
-
-With one topology running on three 8-core nodes, and parallelism hint 24, each bolt gets 8 executors per machine, i.e. one for each core. There are three big benefits to running three workers (with 8 assigned executors each) compare to running say 24 workers (one assigned executor each).
-
-First, data that is repartitioned (shuffles or group-bys) to executors in the same worker will not have to hit the transfer buffer. Instead, tuples are deposited directly from send to receive buffer. That's a big win. By contrast, if the destination executor were on the same machine in a different worker, it would have to go send -> worker transfer -> local socket -> worker recv -> exec recv buffer. It doesn't hit the network card, but it's not as big a win as when executors are in the same worker.
-
-Second, you're typically better off with three aggregators having very large backing cache than having twenty-four aggregators having small backing caches. This reduces the effect of skew, and improves LRU efficiency.
-
-Lastly, fewer workers reduces control flow chatter.
-
-## Topology
-
-### Can a Trident topology have Multiple Streams?
-
-> Can a Trident Topology work like a workflow with conditional paths (if-else)? e.g. A Spout (S1) connects to a bolt (B0) which based on certain values in the incoming tuple routes them to either bolt (B1) or bolt (B2) but not both.
-
-A Trident "each" operator returns a Stream object, which you can store in a variable. You can then run multiple eaches on the same Stream to split it, e.g.: 
-
-        Stream s = topology.each(...).groupBy(...).aggregate(...) 
-        Stream branch1 = s.each(..., FilterA) 
-        Stream branch2 = s.each(..., FilterB) 
-
-You can join streams with join, merge or multiReduce.
-
-At time of writing, you can't emit to multiple output streams from Trident -- see [STORM-68](https://issues.apache.org/jira/browse/STORM-68)
-
-### Why am I getting a NotSerializableException/IllegalStateException when my topology is being started up?
-
-Within the Storm lifecycle, the topology is instantiated and then serialized to byte format to be stored in ZooKeeper, prior to the topology being executed. Within this step, if a spout or bolt within the topology has an initialized unserializable property, serialization will fail. If there is a need for a field that is unserializable, initialize it within the bolt or spout's prepare method, which is run after the topology is delivered to the worker.
-
-## Spouts
-
-### What is a coordinator, and why are there several?
-
-A trident-spout is actually run within a storm _bolt_. The storm-spout of a trident topology is the MasterBatchCoordinator -- it coordinates trident batches and is the same no matter what spouts you use. A batch is born when the MBC dispenses a seed tuple to each of the spout-coordinators. The spout-coordinator bolts know how your particular spouts should cooperate -- so in the kafka case, it's what helps figure out what partition and offset range each spout should pull from.
-
-### What can I store into the spout's metadata record?
-
-You should only store static data, and as little of it as possible, into the metadata record (note: maybe you _can_ store more interesting things; you shouldn't, though)
-
-### How often is the 'emitPartitionBatchNew' function called?
-
-Since the MBC is the actual spout, all the tuples in a batch are just members of its tupletree. That means storm's "max spout pending" config effectively defines the number of concurrent batches trident runs. The MBC emits a new batch if it has fewer than max-spending tuples pending and if at least one [trident batch interval](https://github.com/apache/storm/blob/master/conf/defaults.yaml#L115)'s worth of seconds has passed since the last batch.
-
-### If nothing was emitted does Trident slow down the calls?
-
-Yes, there's a pluggable "spout wait strategy"; the default is to sleep for a [configurable amount of time](https://github.com/apache/storm/blob/master/conf/defaults.yaml#L110)
-
-### OK, then what is the trident batch interval for?
-
-You know how computers of the 486 era had a [turbo button](http://en.wikipedia.org/wiki/Turbo_button) on them? It's like that. 
-
-Actually, it has two practical uses. One is to throttle spouts that poll a remote source without throttling processing. For example, we have a spout that looks in a given S3 bucket for a new batch-uploaded file to read, linebreak and emit. We don't want it hitting S3 more than every few seconds: files don't show up more than once every few minutes, and a batch takes a few seconds to process.
-
-The other is to limit overpressure on the internal queues during startup or under a heavy burst load -- if the spouts spring to life and suddenly jam ten batches' worth of records into the system, you could have a mass of less-urgent tuples from batch 7 clog up the transfer buffer and prevent the $commit tuple from batch 3 to get through (or even just the regular old tuples from batch 3). What we do is set the trident batch interval to about half the typical end-to-end processing latency -- if it takes 600ms to process a batch, it's OK to only kick off a batch every 300ms.
-
-Note that this is a cap, not an additional delay -- with a period of 300ms, if your batch takes 258ms Trident will only delay an additional 42ms.
-
-### How do you set the batch size?
-
-Trident doesn't place its own limits on the batch count. In the case of the Kafka spout, the max fetch bytes size divided by the average record size defines an effective records per subbatch partition.
-
-### How do I resize a batch?
-
-The trident batch is a somewhat overloaded facility. Together with the number of partitions, the batch size is constrained by or serves to define
-
-1. the unit of transactional safety (tuples at risk vs time)
-2. per partition, an effective windowing mechanism for windowed stream analytics
-3. per partition, the number of simultaneous queries that will be made by a partitionQuery, partitionPersist, etc;
-4. per partition, the number of records convenient for the spout to dispatch at the same time;
-
-You can't change the overall batch size once generated, but you can change the number of partitions -- do a shuffle and then change the parallelism hint
-
-## Time Series
-
-### How do I aggregate events by time?
-
-If you have records with an immutable timestamp, and you would like to count, average or otherwise aggregate them into discrete time buckets, Trident is an excellent and scalable solution.
-
-Write an `Each` function that turns the timestamp into a time bucket: if the bucket size was "by hour", then the timestamp `2013-08-08 12:34:56` would be mapped to the `2013-08-08 12:00:00` time bucket, and so would everything else in the twelve o'clock hour. Then group on that timebucket and use a grouped persistentAggregate. The persistentAggregate uses a local cacheMap backed by a data store. Groups with many records require very few reads from the data store, and use efficient bulk reads and writes; as long as your data feed is relatively prompt Trident will make very efficient use of memory and network. Even if a server drops off line for a day, then delivers that full day's worth of data in a rush, the old results will be calmly retrieved and updated -- and without interfering with calculating the current results.
-
-### How can I know that all records for a time bucket have been received?
-
-You cannot know that all events are collected -- this is an epistemological challenge, not a distributed systems challenge. You can:
-
-* Set a time limit using domain knowledge
-* Introduce a _punctuation_: a record known to come after all records in the given time bucket. Trident uses this scheme to know when a batch is complete. If you for instance receive records from a set of sensors, each in order for that sensor, then once all sensors have sent you a 3:02:xx or later timestamp lets you know you can commit. 
-* When possible, make your process incremental: each value that comes in makes the answer more an more true. A Trident ReducerAggregator is an operator that takes a prior result and a set of new records and returns a new result. This lets the result be cached and serialized to a datastore; if a server drops off line for a day and then comes back with a full day's worth of data in a rush, the old results will be calmly retrieved and updated.
-* Lambda architecture: Record all events into an archival store (S3, HBase, HDFS) on receipt. in the fast layer, once the time window is clear, process the bucket to get an actionable answer, and ignore everything older than the time window. Periodically run a global aggregation to calculate a "correct" answer.

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Fault-tolerance.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Fault-tolerance.md b/docs/documentation/Fault-tolerance.md
deleted file mode 100644
index d70fd1d..0000000
--- a/docs/documentation/Fault-tolerance.md
+++ /dev/null
@@ -1,30 +0,0 @@
----
-title: Fault Tolerance
-layout: documentation
-documentation: true
----
-This page explains the design details of Storm that make it a fault-tolerant system.
-
-## What happens when a worker dies?
-
-When a worker dies, the supervisor will restart it. If it continuously fails on startup and is unable to heartbeat to Nimbus, Nimbus will reassign the worker to another machine.
-
-## What happens when a node dies?
-
-The tasks assigned to that machine will time-out and Nimbus will reassign those tasks to other machines.
-
-## What happens when Nimbus or Supervisor daemons die?
-
-The Nimbus and Supervisor daemons are designed to be fail-fast (process self-destructs whenever any unexpected situation is encountered) and stateless (all state is kept in Zookeeper or on disk). As described in [Setting up a Storm cluster](Setting-up-a-Storm-cluster.html), the Nimbus and Supervisor daemons must be run under supervision using a tool like daemontools or monit. So if the Nimbus or Supervisor daemons die, they restart like nothing happened.
-
-Most notably, no worker processes are affected by the death of Nimbus or the Supervisors. This is in contrast to Hadoop, where if the JobTracker dies, all the running jobs are lost. 
-
-## Is Nimbus a single point of failure?
-
-If you lose the Nimbus node, the workers will still continue to function. Additionally, supervisors will continue to restart workers if they die. However, without Nimbus, workers won't be reassigned to other machines when necessary (like if you lose a worker machine). 
-
-So the answer is that Nimbus is "sort of" a SPOF. In practice, it's not a big deal since nothing catastrophic happens when the Nimbus daemon dies. There are plans to make Nimbus highly available in the future.
-
-## How does Storm guarantee data processing?
-
-Storm provides mechanisms to guarantee data processing even if nodes die or messages are lost. See [Guaranteeing message processing](Guaranteeing-message-processing.html) for the details.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Guaranteeing-message-processing.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Guaranteeing-message-processing.md b/docs/documentation/Guaranteeing-message-processing.md
deleted file mode 100644
index 9ade3fa..0000000
--- a/docs/documentation/Guaranteeing-message-processing.md
+++ /dev/null
@@ -1,181 +0,0 @@
----
-title: Guaranteeing Message Processing
-layout: documentation
-documentation: true
----
-Storm guarantees that each message coming off a spout will be fully processed. This page describes how Storm accomplishes this guarantee and what you have to do as a user to benefit from Storm's reliability capabilities.
-
-### What does it mean for a message to be "fully processed"?
-
-A tuple coming off a spout can trigger thousands of tuples to be created based on it. Consider, for example, the streaming word count topology:
-
-```java
-TopologyBuilder builder = new TopologyBuilder();
-builder.setSpout("sentences", new KestrelSpout("kestrel.backtype.com",
-                                               22133,
-                                               "sentence_queue",
-                                               new StringScheme()));
-builder.setBolt("split", new SplitSentence(), 10)
-        .shuffleGrouping("sentences");
-builder.setBolt("count", new WordCount(), 20)
-        .fieldsGrouping("split", new Fields("word"));
-```
-
-This topology reads sentences off of a Kestrel queue, splits the sentences into its constituent words, and then emits for each word the number of times it has seen that word before. A tuple coming off the spout triggers many tuples being created based on it: a tuple for each word in the sentence and a tuple for the updated count for each word. The tree of messages looks something like this:
-
-![Tuple tree](images/tuple_tree.png)
-
-Storm considers a tuple coming off a spout "fully processed" when the tuple tree has been exhausted and every message in the tree has been processed. A tuple is considered failed when its tree of messages fails to be fully processed within a specified timeout. This timeout can be configured on a topology-specific basis using the [Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS](/javadoc/apidocs/backtype/storm/Config.html#TOPOLOGY_MESSAGE_TIMEOUT_SECS) configuration and defaults to 30 seconds.
-
-### What happens if a message is fully processed or fails to be fully processed?
-
-To understand this question, let's take a look at the lifecycle of a tuple coming off of a spout. For reference, here is the interface that spouts implement (see the [Javadoc](/javadoc/apidocs/backtype/storm/spout/ISpout.html) for more information):
-
-```java
-public interface ISpout extends Serializable {
-    void open(Map conf, TopologyContext context, SpoutOutputCollector collector);
-    void close();
-    void nextTuple();
-    void ack(Object msgId);
-    void fail(Object msgId);
-}
-```
-
-First, Storm requests a tuple from the `Spout` by calling the `nextTuple` method on the `Spout`. The `Spout` uses the `SpoutOutputCollector` provided in the `open` method to emit a tuple to one of its output streams. When emitting a tuple, the `Spout` provides a "message id" that will be used to identify the tuple later. For example, the `KestrelSpout` reads a message off of the kestrel queue and emits as the "message id" the id provided by Kestrel for the message. Emitting a message to the `SpoutOutputCollector` looks like this:
-
-```java
-_collector.emit(new Values("field1", "field2", 3) , msgId);
-```
-
-Next, the tuple gets sent to consuming bolts and Storm takes care of tracking the tree of messages that is created. If Storm detects that a tuple is fully processed, Storm will call the `ack` method on the originating `Spout` task with the message id that the `Spout` provided to Storm. Likewise, if the tuple times-out Storm will call the `fail` method on the `Spout`. Note that a tuple will be acked or failed by the exact same `Spout` task that created it. So if a `Spout` is executing as many tasks across the cluster, a tuple won't be acked or failed by a different task than the one that created it.
-
-Let's use `KestrelSpout` again to see what a `Spout` needs to do to guarantee message processing. When `KestrelSpout` takes a message off the Kestrel queue, it "opens" the message. This means the message is not actually taken off the queue yet, but instead placed in a "pending" state waiting for acknowledgement that the message is completed. While in the pending state, a message will not be sent to other consumers of the queue. Additionally, if a client disconnects all pending messages for that client are put back on the queue. When a message is opened, Kestrel provides the client with the data for the message as well as a unique id for the message. The `KestrelSpout` uses that exact id as the "message id" for the tuple when emitting the tuple to the `SpoutOutputCollector`. Sometime later on, when `ack` or `fail` are called on the `KestrelSpout`, the `KestrelSpout` sends an ack or fail message to Kestrel with the message id to take the message off the queue or have it put back on.
-
-### What is Storm's reliability API?
-
-There's two things you have to do as a user to benefit from Storm's reliability capabilities. First, you need to tell Storm whenever you're creating a new link in the tree of tuples. Second, you need to tell Storm when you have finished processing an individual tuple. By doing both these things, Storm can detect when the tree of tuples is fully processed and can ack or fail the spout tuple appropriately. Storm's API provides a concise way of doing both of these tasks. 
-
-Specifying a link in the tuple tree is called _anchoring_. Anchoring is done at the same time you emit a new tuple. Let's use the following bolt as an example. This bolt splits a tuple containing a sentence into a tuple for each word:
-
-```java
-public class SplitSentence extends BaseRichBolt {
-        OutputCollector _collector;
-        
-        public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
-            _collector = collector;
-        }
-
-        public void execute(Tuple tuple) {
-            String sentence = tuple.getString(0);
-            for(String word: sentence.split(" ")) {
-                _collector.emit(tuple, new Values(word));
-            }
-            _collector.ack(tuple);
-        }
-
-        public void declareOutputFields(OutputFieldsDeclarer declarer) {
-            declarer.declare(new Fields("word"));
-        }        
-    }
-```
-
-Each word tuple is _anchored_ by specifying the input tuple as the first argument to `emit`. Since the word tuple is anchored, the spout tuple at the root of the tree will be replayed later on if the word tuple failed to be processed downstream. In contrast, let's look at what happens if the word tuple is emitted like this:
-
-```java
-_collector.emit(new Values(word));
-```
-
-Emitting the word tuple this way causes it to be _unanchored_. If the tuple fails be processed downstream, the root tuple will not be replayed. Depending on the fault-tolerance guarantees you need in your topology, sometimes it's appropriate to emit an unanchored tuple.
-
-An output tuple can be anchored to more than one input tuple. This is useful when doing streaming joins or aggregations. A multi-anchored tuple failing to be processed will cause multiple tuples to be replayed from the spouts. Multi-anchoring is done by specifying a list of tuples rather than just a single tuple. For example:
-
-```java
-List<Tuple> anchors = new ArrayList<Tuple>();
-anchors.add(tuple1);
-anchors.add(tuple2);
-_collector.emit(anchors, new Values(1, 2, 3));
-```
-
-Multi-anchoring adds the output tuple into multiple tuple trees. Note that it's also possible for multi-anchoring to break the tree structure and create tuple DAGs, like so:
-
-![Tuple DAG](images/tuple-dag.png)
-
-Storm's implementation works for DAGs as well as trees (pre-release it only worked for trees, and the name "tuple tree" stuck).
-
-Anchoring is how you specify the tuple tree -- the next and final piece to Storm's reliability API is specifying when you've finished processing an individual tuple in the tuple tree. This is done by using the `ack` and `fail` methods on the `OutputCollector`. If you look back at the `SplitSentence` example, you can see that the input tuple is acked after all the word tuples are emitted.
-
-You can use the `fail` method on the `OutputCollector` to immediately fail the spout tuple at the root of the tuple tree. For example, your application may choose to catch an exception from a database client and explicitly fail the input tuple. By failing the tuple explicitly, the spout tuple can be replayed faster than if you waited for the tuple to time-out.
-
-Every tuple you process must be acked or failed. Storm uses memory to track each tuple, so if you don't ack/fail every tuple, the task will eventually run out of memory. 
-
-A lot of bolts follow a common pattern of reading an input tuple, emitting tuples based on it, and then acking the tuple at the end of the `execute` method. These bolts fall into the categories of filters and simple functions. Storm has an interface called `BasicBolt` that encapsulates this pattern for you. The `SplitSentence` example can be written as a `BasicBolt` like follows:
-
-```java
-public class SplitSentence extends BaseBasicBolt {
-        public void execute(Tuple tuple, BasicOutputCollector collector) {
-            String sentence = tuple.getString(0);
-            for(String word: sentence.split(" ")) {
-                collector.emit(new Values(word));
-            }
-        }
-
-        public void declareOutputFields(OutputFieldsDeclarer declarer) {
-            declarer.declare(new Fields("word"));
-        }        
-    }
-```
-
-This implementation is simpler than the implementation from before and is semantically identical. Tuples emitted to `BasicOutputCollector` are automatically anchored to the input tuple, and the input tuple is acked for you automatically when the execute method completes.
-
-In contrast, bolts that do aggregations or joins may delay acking a tuple until after it has computed a result based on a bunch of tuples. Aggregations and joins will commonly multi-anchor their output tuples as well. These things fall outside the simpler pattern of `IBasicBolt`.
-
-### How do I make my applications work correctly given that tuples can be replayed?
-
-As always in software design, the answer is "it depends." Storm 0.7.0 introduced the "transactional topologies" feature, which enables you to get fully fault-tolerant exactly-once messaging semantics for most computations. Read more about transactional topologies [here](Transactional-topologies.html). 
-
-
-### How does Storm implement reliability in an efficient way?
-
-A Storm topology has a set of special "acker" tasks that track the DAG of tuples for every spout tuple. When an acker sees that a DAG is complete, it sends a message to the spout task that created the spout tuple to ack the message. You can set the number of acker executors for a topology in the topology configuration using [Config.TOPOLOGY_ACKER_EXECUTORS](/javadoc/apidocs/backtype/storm/Config.html#TOPOLOGY_ACKER_EXECUTORS). Storm defaults TOPOLOGY_ACKER_EXECUTORS to be equal to the number of workers configured in the topology -- you will need to increase this number for topologies processing large amounts of messages.
-
-The best way to understand Storm's reliability implementation is to look at the lifecycle of tuples and tuple DAGs. When a tuple is created in a topology, whether in a spout or a bolt, it is given a random 64 bit id. These ids are used by ackers to track the tuple DAG for every spout tuple.
-
-Every tuple knows the ids of all the spout tuples for which it exists in their tuple trees. When you emit a new tuple in a bolt, the spout tuple ids from the tuple's anchors are copied into the new tuple. When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me". 
-
-For example, if tuples "D" and "E" were created based on tuple "C", here's how the tuple tree changes when "C" is acked: 
-
-![What happens on an ack](images/ack_tree.png)
-
-Since "C" is removed from the tree at the same time that "D" and "E" are added to it, the tree can never be prematurely completed.
-
-There are a few more details to how Storm tracks tuple trees. As mentioned already, you can have an arbitrary number of acker tasks in a topology. This leads to the following question: when a tuple is acked in the topology, how does it know to which acker task to send that information? 
-
-Storm uses mod hashing to map a spout tuple id to an acker task. Since every tuple carries with it the spout tuple ids of all the trees they exist within, they know which acker tasks to communicate with. 
-
-Another detail of Storm is how the acker tasks track which spout tasks are responsible for each spout tuple they're tracking. When a spout task emits a new tuple, it simply sends a message to the appropriate acker telling it that its task id is responsible for that spout tuple. Then when an acker sees a tree has been completed, it knows to which task id to send the completion message.
-
-Acker tasks do not track the tree of tuples explicitly. For large tuple trees with tens of thousands of nodes (or more), tracking all the tuple trees could overwhelm the memory used by the ackers. Instead, the ackers take a different strategy that only requires a fixed amount of space per spout tuple (about 20 bytes). This tracking algorithm is the key to how Storm works and is one of its major breakthroughs.
-
-An acker task stores a map from a spout tuple id to a pair of values. The first value is the task id that created the spout tuple which is used later on to send completion messages. The second value is a 64 bit number called the "ack val". The ack val is a representation of the state of the entire tuple tree, no matter how big or how small.  It is simply the xor of all tuple ids that have been created and/or acked in the tree.
-
-When an acker task sees that an "ack val" has become 0, then it knows that the tuple tree is completed. Since tuple ids are random 64 bit numbers, the chances of an "ack val" accidentally becoming 0 is extremely small. If you work the math, at 10K acks per second, it will take 50,000,000 years until a mistake is made. And even then, it will only cause data loss if that tuple happens to fail in the topology.
-
-Now that you understand the reliability algorithm, let's go over all the failure cases and see how in each case Storm avoids data loss:
-
-- **A tuple isn't acked because the task died**: In this case the spout tuple ids at the root of the trees for the failed tuple will time out and be replayed.
-- **Acker task dies**: In this case all the spout tuples the acker was tracking will time out and be replayed.
-- **Spout task dies**: In this case the source that the spout talks to is responsible for replaying the messages. For example, queues like Kestrel and RabbitMQ will place all pending messages back on the queue when a client disconnects.
-
-As you have seen, Storm's reliability mechanisms are completely distributed, scalable, and fault-tolerant. 
-
-### Tuning reliability
-
-Acker tasks are lightweight, so you don't need very many of them in a topology. You can track their performance through the Storm UI (component id "__acker"). If the throughput doesn't look right, you'll need to add more acker tasks. 
-
-If reliability isn't important to you -- that is, you don't care about losing tuples in failure situations -- then you can improve performance by not tracking the tuple tree for spout tuples. Not tracking a tuple tree halves the number of messages transferred since normally there's an ack message for every tuple in the tuple tree. Additionally, it requires fewer ids to be kept in each downstream tuple, reducing bandwidth usage.
-
-There are three ways to remove reliability. The first is to set Config.TOPOLOGY_ACKERS to 0. In this case, Storm will call the `ack` method on the spout immediately after the spout emits a tuple. The tuple tree won't be tracked.
-
-The second way is to remove reliability on a message by message basis. You can turn off tracking for an individual spout tuple by omitting a message id in the `SpoutOutputCollector.emit` method.
-
-Finally, if you don't care if a particular subset of the tuples downstream in the topology fail to be processed, you can emit them as unanchored tuples. Since they're not anchored to any spout tuples, they won't cause any spout tuples to fail if they aren't acked.

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Home.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Home.md b/docs/documentation/Home.md
deleted file mode 100644
index 452dfba..0000000
--- a/docs/documentation/Home.md
+++ /dev/null
@@ -1,69 +0,0 @@
----
-title: Storm Documentation
-layout: documentation
----
-Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, [is used by many companies](/documentation/Powered-By.html), and is a lot of fun to use!
-
-### Read these first
-
-* [Rationale](Rationale.html)
-* [Tutorial](Tutorial.html)
-* [Setting up development environment](Setting-up-development-environment.html)
-* [Creating a new Storm project](Creating-a-new-Storm-project.html)
-
-### Documentation
-
-* [Documentation Index](/doc-index.html)
-* [Manual](Documentation.html)
-* [Javadoc](/javadoc/apidocs/index.html)
-* [FAQ](FAQ.html)
-
-### Getting help
-
-__NOTE:__ The google groups account storm-user@googlegroups.com is now officially deprecated in favor of the Apache-hosted user/dev mailing lists.
-
-#### Storm Users
-Storm users should send messages and subscribe to [user@storm.apache.org](mailto:user@storm.apache.org).
-
-You can subscribe to this list by sending an email to [user-subscribe@storm.apache.org](mailto:user-subscribe@storm.apache.org). Likewise, you can cancel a subscription by sending an email to [user-unsubscribe@storm.apache.org](mailto:user-unsubscribe@storm.apache.org).
-
-You can view the archives of the mailing list [here](http://mail-archives.apache.org/mod_mbox/storm-user/).
-
-#### Storm Developers
-Storm developers should send messages and subscribe to [dev@storm.apache.org](mailto:dev@storm.apache.org).
-
-You can subscribe to this list by sending an email to [dev-subscribe@storm.apache.org](mailto:dev-subscribe@storm.apache.org). Likewise, you can cancel a subscription by sending an email to [dev-unsubscribe@storm.apache.org](mailto:dev-unsubscribe@storm.apache.org).
-
-You can view the archives of the mailing list [here](http://mail-archives.apache.org/mod_mbox/storm-dev/).
-
-#### Which list should I send/subscribe to?
-If you are using a pre-built binary distribution of Storm, then chances are you should send questions, comments, storm-related announcements, etc. to [user@storm.apache.org](user@storm.apache.org). 
-
-If you are building storm from source, developing new features, or otherwise hacking storm source code, then [dev@storm.apache.org](dev@storm.apache.org) is more appropriate. 
-
-#### What will happen with storm-user@googlegroups.com?
-All existing messages will remain archived there, and can be accessed/searched [here](https://groups.google.com/forum/#!forum/storm-user).
-
-New messages sent to storm-user@googlegroups.com will either be rejected/bounced or replied to with a message to direct the email to the appropriate Apache-hosted group.
-
-#### IRC
-You can also come to the #storm-user room on [freenode](http://freenode.net/). You can usually find a Storm developer there to help you out.
-
-
-
-### Related projects
-
-* [storm-contrib](https://github.com/nathanmarz/storm-contrib)
-* [storm-deploy](http://github.com/nathanmarz/storm-deploy): One click deploys for Storm clusters on AWS
-* [Spout implementations](Spout-implementations.html)
-* [DSLs and multilang adapters](DSLs-and-multilang-adapters.html)
-* [Serializers](Serializers.html)
-
-### Contributing to Storm
-
-* [Contributing to Storm](Contributing-to-Storm.html)
-* [Project ideas](Project-ideas.html)
-
-### Powered by Storm
-
-[Companies and projects powered by Storm](Powered-By.html)
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Hooks.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Hooks.md b/docs/documentation/Hooks.md
deleted file mode 100644
index e923348..0000000
--- a/docs/documentation/Hooks.md
+++ /dev/null
@@ -1,9 +0,0 @@
----
-title: Hooks
-layout: documentation
-documentation: true
----
-Storm provides hooks with which you can insert custom code to run on any number of events within Storm. You create a hook by extending the [BaseTaskHook](/javadoc/apidocs/backtype/storm/hooks/BaseTaskHook.html) class and overriding the appropriate method for the event you want to catch. There are two ways to register your hook:
-
-1. In the open method of your spout or prepare method of your bolt using the [TopologyContext#addTaskHook](/javadoc/apidocs/backtype/storm/task/TopologyContext.html) method.
-2. Through the Storm configuration using the ["topology.auto.task.hooks"](/javadoc/apidocs/backtype/storm/Config.html#TOPOLOGY_AUTO_TASK_HOOKS) config. These hooks are automatically registered in every spout or bolt, and are useful for doing things like integrating with a custom monitoring system.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Implementation-docs.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Implementation-docs.md b/docs/documentation/Implementation-docs.md
deleted file mode 100644
index 9c3a427..0000000
--- a/docs/documentation/Implementation-docs.md
+++ /dev/null
@@ -1,20 +0,0 @@
----
-title: Storm Internal Implementation
-layout: documentation
-documentation: true
----
-This section of the wiki is dedicated to explaining how Storm is implemented. You should have a good grasp of how to use Storm before reading these sections. 
-
-- [Structure of the codebase](Structure-of-the-codebase.html)
-- [Lifecycle of a topology](Lifecycle-of-a-topology.html)
-- [Message passing implementation](Message-passing-implementation.html)
-- [Acking framework implementation](Acking-framework-implementation.html)
-- [Metrics](Metrics.html)
-- How transactional topologies work
-   - subtopology for TransactionalSpout
-   - how state is stored in ZK
-   - subtleties around what to do when emitting batches out of order
-- Unit testing
-  - time simulation
-  - complete-topology
-  - tracker clusters

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Installing-native-dependencies.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Installing-native-dependencies.md b/docs/documentation/Installing-native-dependencies.md
deleted file mode 100644
index 3207b8e..0000000
--- a/docs/documentation/Installing-native-dependencies.md
+++ /dev/null
@@ -1,38 +0,0 @@
----
-layout: documentation
----
-The native dependencies are only needed on actual Storm clusters. When running Storm in local mode, Storm uses a pure Java messaging system so that you don't need to install native dependencies on your development machine.
-
-Installing ZeroMQ and JZMQ is usually straightforward. Sometimes, however, people run into issues with autoconf and get strange errors. If you run into any issues, please email the [Storm mailing list](http://groups.google.com/group/storm-user) or come get help in the #storm-user room on freenode. 
-
-Storm has been tested with ZeroMQ 2.1.7, and this is the recommended ZeroMQ release that you install. You can download a ZeroMQ release [here](http://download.zeromq.org/). Installing ZeroMQ should look something like this:
-
-```
-wget http://download.zeromq.org/zeromq-2.1.7.tar.gz
-tar -xzf zeromq-2.1.7.tar.gz
-cd zeromq-2.1.7
-./configure
-make
-sudo make install
-```
-
-JZMQ is the Java bindings for ZeroMQ. JZMQ doesn't have any releases (we're working with them on that), so there is risk of a regression if you always install from the master branch. To prevent a regression from happening, you should instead install from [this fork](http://github.com/nathanmarz/jzmq) which is tested to work with Storm. Installing JZMQ should look something like this:
-
-```
-#install jzmq
-git clone https://github.com/nathanmarz/jzmq.git
-cd jzmq
-./autogen.sh
-./configure
-make
-sudo make install
-```
-
-To get the JZMQ build to work, you may need to do one or all of the following:
-
-1. Set JAVA_HOME environment variable appropriately
-2. Install Java dev package (more info [here](http://codeslinger.posterous.com/getting-zeromq-and-jzmq-running-on-mac-os-x) for Mac OSX users)
-3. Upgrade autoconf on your machine
-4. Follow the instructions in [this blog post](http://blog.pmorelli.com/getting-zeromq-and-jzmq-running-on-mac-os-x)
-
-If you run into any errors when running `./configure`, [this thread](http://stackoverflow.com/questions/3522248/how-do-i-compile-jzmq-for-zeromq-on-osx) may provide a solution.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/storm/blob/93e0d028/docs/documentation/Kestrel-and-Storm.md
----------------------------------------------------------------------
diff --git a/docs/documentation/Kestrel-and-Storm.md b/docs/documentation/Kestrel-and-Storm.md
deleted file mode 100644
index f8eea2c..0000000
--- a/docs/documentation/Kestrel-and-Storm.md
+++ /dev/null
@@ -1,200 +0,0 @@
----
-title: Storm and Kestrel
-layout: documentation
-documentation: true
----
-This page explains how to use to Storm to consume items from a Kestrel cluster.
-
-## Preliminaries
-### Storm
-This tutorial uses examples from the [storm-kestrel](https://github.com/nathanmarz/storm-kestrel) project and the [storm-starter](http://github.com/apache/storm/blob/master/examples/storm-starter) project. It's recommended that you clone those projects and follow along with the examples. Read [Setting up development environment](Setting-up-development-environment.html) and [Creating a new Storm project](https://github.com/apache/storm/wiki/Creating-a-new-Storm-project) to get your machine set up.
-### Kestrel
-It assumes you are able to run locally a Kestrel server as described [here](https://github.com/nathanmarz/storm-kestrel).
-
-## Kestrel Server and Queue
-A single kestrel server has a set of queues. A Kestrel queue is a very simple message queue that runs on the JVM and uses the memcache protocol (with some extensions) to talk to clients. For details, look at the implementation of the [KestrelThriftClient](https://github.com/nathanmarz/storm-kestrel/blob/master/src/jvm/backtype/storm/spout/KestrelThriftClient.java) class provided in [storm-kestrel](https://github.com/nathanmarz/storm-kestrel) project.
-
-Each queue is strictly ordered following the FIFO (first in, first out) principle. To keep up with performance items are cached in system memory; though, only the first 128MB is kept in memory. When stopping the server, the queue state is stored in a journal file.
-
-Further, details can be found [here](https://github.com/nathanmarz/kestrel/blob/master/docs/guide.md).
-
-Kestrel is:
-* fast
-* small
-* durable
-* reliable
-
-For instance, Twitter uses Kestrel as the backbone of its messaging infrastructure as described [here] (http://bhavin.directi.com/notes-on-kestrel-the-open-source-twitter-queue/).
-
-## Add items to Kestrel
-At first, we need to have a program that can add items to a Kestrel queue. The following method takes benefit of the KestrelClient implementation in [storm-kestrel](https://github.com/nathanmarz/storm-kestrel). It adds sentences into a Kestrel queue randomly chosen out of an array that holds five possible sentences.
-
-```
-    private static void queueSentenceItems(KestrelClient kestrelClient, String queueName)
-			throws ParseError, IOException {
-
-		String[] sentences = new String[] {
-	            "the cow jumped over the moon",
-	            "an apple a day keeps the doctor away",
-	            "four score and seven years ago",
-	            "snow white and the seven dwarfs",
-	            "i am at two with nature"};
-
-		Random _rand = new Random();
-
-		for(int i=1; i<=10; i++){
-
-			String sentence = sentences[_rand.nextInt(sentences.length)];
-
-			String val = "ID " + i + " " + sentence;
-
-			boolean queueSucess = kestrelClient.queue(queueName, val);
-
-			System.out.println("queueSucess=" +queueSucess+ " [" + val +"]");
-		}
-	}
-```
-
-## Remove items from Kestrel
-
-This method dequeues items from a queue without removing them.
-```
-    private static void dequeueItems(KestrelClient kestrelClient, String queueName) throws IOException, ParseError
-			 {
-		for(int i=1; i<=12; i++){
-
-			Item item = kestrelClient.dequeue(queueName);
-
-			if(item==null){
-				System.out.println("The queue (" + queueName + ") contains no items.");
-			}
-			else
-			{
-				byte[] data = item._data;
-
-				String receivedVal = new String(data);
-
-				System.out.println("receivedItem=" + receivedVal);
-			}
-		}
-```
-
-This method dequeues items from a queue and then removes them.
-```
-    private static void dequeueAndRemoveItems(KestrelClient kestrelClient, String queueName)
-    throws IOException, ParseError
-		 {
-			for(int i=1; i<=12; i++){
-
-				Item item = kestrelClient.dequeue(queueName);
-
-
-				if(item==null){
-					System.out.println("The queue (" + queueName + ") contains no items.");
-				}
-				else
-				{
-					int itemID = item._id;
-
-
-					byte[] data = item._data;
-
-					String receivedVal = new String(data);
-
-					kestrelClient.ack(queueName, itemID);
-
-					System.out.println("receivedItem=" + receivedVal);
-				}
-			}
-	}
-```
-
-## Add Items continuously to Kestrel
-
-This is our final program to run in order to add continuously sentence items to a queue called **sentence_queue** of a locally running Kestrel server.
-
-In order to stop it type a closing bracket char ']' in console and hit 'Enter'.
-
-```
-    import java.io.IOException;
-    import java.io.InputStream;
-    import java.util.Random;
-
-    import backtype.storm.spout.KestrelClient;
-    import backtype.storm.spout.KestrelClient.Item;
-    import backtype.storm.spout.KestrelClient.ParseError;
-
-    public class AddSentenceItemsToKestrel {
-
-    	/**
-    	 * @param args
-    	 */
-    	public static void main(String[] args) {
-
-    		InputStream is = System.in;
-
-			char closing_bracket = ']';
-
-			int val = closing_bracket;
-
-			boolean aux = true;
-
-			try {
-
-				KestrelClient kestrelClient = null;
-				String queueName = "sentence_queue";
-
-				while(aux){
-
-					kestrelClient = new KestrelClient("localhost",22133);
-
-					queueSentenceItems(kestrelClient, queueName);
-
-					kestrelClient.close();
-
-					Thread.sleep(1000);
-
-					if(is.available()>0){
-					 if(val==is.read())
-						 aux=false;
-					}
-				}
-			} catch (IOException e) {
-				// TODO Auto-generated catch block
-				e.printStackTrace();
-			}
-			catch (ParseError e) {
-				// TODO Auto-generated catch block
-				e.printStackTrace();
-			} catch (InterruptedException e) {
-				// TODO Auto-generated catch block
-				e.printStackTrace();
-			}
-
-			System.out.println("end");
-
-	    }
-	}
-```
-## Using KestrelSpout
-
-This topology reads sentences off of a Kestrel queue using KestrelSpout, splits the sentences into its constituent words (Bolt: SplitSentence), and then emits for each word the number of times it has seen that word before (Bolt: WordCount). How data is processed is described in detail in [Guaranteeing message processing](Guaranteeing-message-processing.html).
-
-```
-    TopologyBuilder builder = new TopologyBuilder();
-    builder.setSpout("sentences", new KestrelSpout("localhost",22133,"sentence_queue",new StringScheme()));
-    builder.setBolt("split", new SplitSentence(), 10)
-    	        .shuffleGrouping("sentences");
-    builder.setBolt("count", new WordCount(), 20)
-	        .fieldsGrouping("split", new Fields("word"));
-```
-
-## Execution
-
-At first, start your local kestrel server in production or development mode.
-
-Than, wait about 5 seconds in order to avoid a ConnectException.
-
-Now execute the program to add items to the queue and launch the Storm topology. The order in which you launch the programs is of no importance.
-
-If you run the topology with TOPOLOGY_DEBUG you should see tuples being emitted in the topology.