You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by rm...@apache.org on 2013/05/30 06:58:44 UTC

[7/9] git commit: Minor editst

Minor editst


Project: http://git-wip-us.apache.org/repos/asf/incubator-helix/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-helix/commit/676533b4
Tree: http://git-wip-us.apache.org/repos/asf/incubator-helix/tree/676533b4
Diff: http://git-wip-us.apache.org/repos/asf/incubator-helix/diff/676533b4

Branch: refs/heads/master
Commit: 676533b4f5c68ffadb0c4ac51827fe7bdfb6814f
Parents: 905f197
Author: Bob Schulman <bs...@linkedin.com>
Authored: Wed May 29 10:05:56 2013 -0700
Committer: Bob Schulman <bs...@linkedin.com>
Committed: Wed May 29 10:05:56 2013 -0700

----------------------------------------------------------------------
 src/site/markdown/Architecture.md                  |   91 +++---
 src/site/markdown/Concepts.md                      |   83 +++---
 src/site/markdown/Quickstart.md                    |  114 ++++++-
 src/site/markdown/index.md                         |   63 +----
 src/site/markdown/involved/building.md             |   30 --
 src/site/markdown/recipes/lock_manager.md          |  253 ---------------
 .../markdown/recipes/rabbitmq_consumer_group.md    |  227 -------------
 .../recipes/rsync_replicated_file_store.md         |  165 ----------
 src/site/markdown/recipes/service_discovery.md     |  191 -----------
 src/site/markdown/recipes/task_dag_execution.md    |  204 ------------
 src/site/markdown/tutorial_participant.md          |    4 +-
 src/site/markdown/tutorial_spectator.md            |    8 +-
 src/site/markdown/tutorial_throttling.md           |    1 -
 13 files changed, 196 insertions(+), 1238 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/676533b4/src/site/markdown/Architecture.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/Architecture.md b/src/site/markdown/Architecture.md
index bf7b4d0..7acf590 100644
--- a/src/site/markdown/Architecture.md
+++ b/src/site/markdown/Architecture.md
@@ -18,11 +18,11 @@ under the License.
 -->
 
 
-Helix aims to provide the following abilities to a distributed system
+Helix aims to provide the following abilities to a distributed system:
 
-* Auto management of a cluster hosting partitioned, replicated resources
+* Automatic management of a cluster hosting partitioned, replicated resources.
 * Soft and hard failure detection and handling.
-* Automatic load balancing via smart placement of resources on servers(nodes) based on server capacity and resource profile (size of partition, access patterns, etc)
+* Automatic load balancing via smart placement of resources on servers(nodes) based on server capacity and resource profile (size of partition, access patterns, etc).
 * Centralized config management and self discovery. Eliminates the need to modify config on each node.
 * Fault tolerance and optimized rebalancing during cluster expansion.
 * Manages entire operational lifecycle of a node. Addition, start, stop, enable/disable without downtime.
@@ -57,10 +57,10 @@ We have divided Helix in 3 logical components based on their responsibility
 3. CONTROLLER: The controller observes and controls the PARTICIPANT nodes. It is responsible for coordinating all transitions in the cluster and ensuring that state constraints are satisfied and cluster stability is maintained. 
 
 
-These are simply logical components and can be deployed as per the system requirements. For example.
+These are simply logical components and can be deployed as per the system requirements. For example:
 
-1. Controller can be deployed as a separate service. 
-2. Controller can be deployed along with Participant but only one Controller will be active at any given time.
+1. Controller can be deployed as a separate service
+2. Controller can be deployed along with a Participant but only one Controller will be active at any given time.
 
 Both have pros and cons, which will be discussed later and one can chose the mode of deployment as per system needs.
 
@@ -69,13 +69,13 @@ Both have pros and cons, which will be discussed later and one can chose the mod
 
 We need a distributed store to maintain the state of the cluster and a notification system to notify if there is any change in the cluster state. Helix uses Zookeeper to achieve this functionality.
 
-Zookeeper provides
+Zookeeper provides:
 
 * A way to represent PERSISTENT state which basically remains until its deleted.
 * A way to represent TRANSIENT/EPHEMERAL state which vanishes when the process that created the STATE dies.
 * Notification mechanism when there is a change in PERSISTENT/EPHEMERAL STATE
 
-The name space provided by ZooKeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (/). Every node[ZNODE] in ZooKeeper's name space is identified by a path.
+The namespace provided by ZooKeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (/). Every node[ZNODE] in ZooKeeper\'s namespace is identified by a path.
 
 More info on Zookeeper can be found here http://zookeeper.apache.org
 
@@ -83,14 +83,14 @@ More info on Zookeeper can be found here http://zookeeper.apache.org
 
 Even though the concept of Resource, Partition, Replicas is common to most distributed systems, one thing that differentiates one distributed system from another is the way each partition is assigned a state and the constraints on each state.
 
-For example, 
+For example:
 
-1. If a system is serving READ ONLY data then all partitions replicas are equal and they can either be ONLINE or OFFLINE.
-2. If a system takes BOTH READ and WRITES but ensure that WRITES go through only one partition then the states will be MASTER,SLAVE and OFFLINE. Writes go through the MASTER and is replicated to the SLAVES. Optionally, READS can go through SLAVES  
+1. If a system is serving READ ONLY data then all partition\'s replicas are equal and they can either be ONLINE or OFFLINE.
+2. If a system takes BOTH READ and WRITES but ensure that WRITES go through only one partition then the states will be MASTER, SLAVE and OFFLINE. Writes go through the MASTER and is replicated to the SLAVES. Optionally, READS can go through SLAVES.
 
-Apart from defining State for each partition, the transition path to each STATE can be application specific. For example, in order to become master it might be a requirement to first become a SLAVE. This ensures that if the SLAVE does not have the data as part of OFFLINE-SLAVE transition it can bootstrap data from other nodes in the system.
+Apart from defining STATE for each partition, the transition path to each STATE can be application specific. For example, in order to become MASTER it might be a requirement to first become a SLAVE. This ensures that if the SLAVE does not have the data as part of OFFLINE-SLAVE transition it can bootstrap data from other nodes in the system.
 
-Helix provides a way to configure an application specific state machine along with constraints on each state. Along with constraints on State, Helix also provides a way to specify constraints on transitions.(More on this later)
+Helix provides a way to configure an application specific state machine along with constraints on each state. Along with constraints on STATE, Helix also provides a way to specify constraints on transitions.  (More on this later.)
 
 ```
           OFFLINE  | SLAVE  |  MASTER  
@@ -113,25 +113,25 @@ MASTER  | SLAVE    | SLAVE  |   N/A   |
 
 The following terminologies are used in Helix to model a state machine.
 
-* IDEALSTATE:The state in which we need the cluster to be in if all nodes are up and running. In other words, all state constraints are satisfied.
-* CURRENSTATE:Represents the current state of each node in the cluster 
-* EXTERNALVIEW:Represents the combined view of CURRENTSTATE of all nodes.  
+* IDEALSTATE: The state in which we need the cluster to be in if all nodes are up and running. In other words, all state constraints are satisfied.
+* CURRENTSTATE: Represents the current state of each node in the cluster 
+* EXTERNALVIEW: Represents the combined view of CURRENTSTATE of all nodes.  
 
-Goal of Helix is always to make the CURRENTSTATE of the system same as the IDEALSTATE. Some scenarios where this may not be true are
+The goal of Helix is always to make the CURRENTSTATE of the system same as the IDEALSTATE. Some scenarios where this may not be true are:
 
 * When all nodes are down
 * When one or more nodes fail
-* New nodes are added and the partitions need to be reassigned.
+* New nodes are added and the partitions need to be reassigned
 
 ### IDEALSTATE
 
-Helix lets the application define the IdealState on a resource basis which basically consists of
+Helix lets the application define the IdealState on a resource basis which basically consists of:
 
-* List of partitions. Example 64
-* Number of replicas for each partition. Example 3
+* List of partitions. Example: 64
+* Number of replicas for each partition. Example: 3
 * Node and State for each replica.
 
-Example
+Example:
 
 * Partition-1, replica-1, Master, Node-1
 * Partition-1, replica-2, Slave, Node-2
@@ -144,7 +144,7 @@ Helix comes with various algorithms to automatically assign the partitions to no
 
 ### CURRENTSTATE
 
-Every Instance in the cluster hosts one or more partitions of a resource. Each of the partitions has a State associated with it.
+Every instance in the cluster hosts one or more partitions of a resource. Each of the partitions has a State associated with it.
 
 Example Node-1
 
@@ -169,20 +169,19 @@ External clients needs to know the state of each partition in the cluster and th
 
 Mode of operation in a cluster
 
-A node process can be one of the following
+A node process can be one of the following:
 
-* PARTICIPANT: The process registers itself in the cluster and acts on the messages received in its queue and updates the current state.  Example:Storage Node
-* SPECTATOR: The process is simply interested in the changes in the externalView. Router is a spectator of Storage cluster.
-* CONTROLLER:This process actively controls the cluster by reacting to changes in Cluster State and sending messages to PARTICIPANTS
+* PARTICIPANT: The process registers itself in the cluster and acts on the messages received in its queue and updates the current state.  Example: Storage Node
+* SPECTATOR: The process is simply interested in the changes in the Externalview. The Router is a spectator of the Storage cluster.
+* CONTROLLER: This process actively controls the cluster by reacting to changes in Cluster State and sending messages to PARTICIPANTS.
 
 
 ### Participant Node Process
 
-
 * When Node starts up, it registers itself under LIVEINSTANCES
 * After registering, it waits for new Messages in the message queue
-* When it receives a message, it will perform the required task as indicated in the message.
-* After the task is completed, depending on the task outcome it updates the CURRENTSTATE.
+* When it receives a message, it will perform the required task as indicated in the message
+* After the task is completed, depending on the task outcome it updates the CURRENTSTATE
 
 ### Controller Process
 
@@ -192,10 +191,10 @@ A node process can be one of the following
 
 ### Spectator Process
 
-* When the process starts it asks cluster manager agent to be notified of changes in ExternalView
-* When ever it receives a notification it reads the externalView and performs required duties. For Router, it updates its routing table.
+* When the process starts, it asks cluster manager agent to be notified of changes in ExternalView
+* Whenever it receives a notification, it reads the Externalview and performs required duties. For the Router, it updates its routing table.
 
-h4. Interaction between controller, participant and spectator
+#### Interaction between controller, participant and spectator
 
 The following picture shows how controllers, participants and spectators interact with each other.
 
@@ -203,22 +202,22 @@ The following picture shows how controllers, participants and spectators interac
 
 ## Core algorithm
 
-
-* Controller gets the Ideal State and the Current State of active storage nodes from ZK
-* Compute the delta between Ideal State and Current State for each partition across all participant nodes
-* For each partition compute tasks based on State Machine Table. Its possible to configure priority on the state Transition. For example in case of Master Slave:
-    * Attempt Mastership transfer if possible without violating constraint.
+* Controller gets the IdealState and the CurrentState of active storage nodes from Zookeeper
+* Compute the delta between IdealState and CurrentState for each partition across all participant nodes
+* For each partition compute tasks based on the State Machine Table. It\'s possible to configure priority on the state Transition. For example, in case of Master-Slave:
+    * Attempt mastership transfer if possible without violating constraint.
     * Partition Addition
     * Drop Partition 
-* Add the tasks in parallel if possible to respective queue for each storage node keeping in mind
-* The tasks added are mutually independent.
-* If a task is dependent on another task being completed do not add that task.
-* After any task is completed by Participant, Controllers gets notified of the change and State Transition algorithm is re-run until the current state is same as Ideal State.
+* Add the tasks in parallel if possible to the respective queue for each storage node (if the tasks added are mutually independent)
+* If a task is dependent on another task being completed, do not add that task
+* After any task is completed by a Participant, Controllers gets notified of the change and the State Transition algorithm is re-run until the CurrentState is same as IdealState.
 
 ## Helix znode layout
 
 Helix organizes znodes under clusterName in multiple levels. 
-The top level (under clusterName) znodes are all Helix defined and in upper case
+
+The top level (under clusterName) znodes are all Helix defined and in upper case:
+
 * PROPERTYSTORE: application property store
 * STATEMODELDEFES: state model definitions
 * INSTANCES: instance runtime information including current state and messages
@@ -229,15 +228,17 @@ The top level (under clusterName) znodes are all Helix defined and in upper case
 * CONTROLLER: cluster controller runtime information
 
 Under INSTANCES, there are runtime znodes for each instance. An instance organizes znodes as follows:
+
 * CURRENTSTATES
- * sessionId
-  * resourceName
+    * sessionId
+    * resourceName
 * ERRORS
 * STATUSUPDATES
 * MESSAGES
 * HEALTHREPORT
 
 Under CONFIGS, there are different scopes of configurations:
+
 * RESOURCE: contains resource scope configurations
 * CLUSTER: contains cluster scope configurations
 * PARTICIPANT: contains participant scope configurations

http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/676533b4/src/site/markdown/Concepts.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/Concepts.md b/src/site/markdown/Concepts.md
index 9fb8eb7..24f15a3 100644
--- a/src/site/markdown/Concepts.md
+++ b/src/site/markdown/Concepts.md
@@ -17,16 +17,16 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Helix is based on the simple fact that a given task has the following attributes associated with it 
+Helix is based on the idea that a given task has the following attributes associated with it:
 
-* Location of the task, for example it runs on  Node N1
-* State, for examples its running, stopped etc.
+* _Location of the task_. For example it runs on Node N1
+* _State_. For example, it is running, stopped etc.
 
-A task is referred to as a 'resource'. 
+In Helix terminology, a task is referred to as a _resource_.
 
-### IDEALSTATE
+### IdealState
 
-Ideal state simply allows one to map tasks to location and state. A standard way of expressing this in Helix is
+IdealState simply allows one to map tasks to location and state. A standard way of expressing this in Helix:
 
 ```
   "TASK_NAME" : {
@@ -34,7 +34,7 @@ Ideal state simply allows one to map tasks to location and state. A standard way
   }
 
 ```
-Consider a simple case where you want to launch a task 'myTask' on node 'N1'. The idealstate for this can be expressed as follows
+Consider a simple case where you want to launch a task \'myTask\' on node \'N1\'. The IdealState for this can be expressed as follows:
 
 ```
 {
@@ -46,11 +46,11 @@ Consider a simple case where you want to launch a task 'myTask' on node 'N1'. Th
   }
 }
 ```
-#### PARTITION
+### Partition
 
-If this task get too big to fit on one box, you might want to divide it into subTasks. Each subTask is referred to as 'partition' in Helix. Lets say you want to divide the task into 3 subTasks/partitions, the idealstate can be changed as shown below. 
+If this task get too big to fit on one box, you might want to divide it into subTasks. Each subTask is referred to as a _partition_ in Helix. Let\'s say you want to divide the task into 3 subTasks/partitions, the IdealState can be changed as shown below. 
 
-'myTask_0', 'myTask_1', 'myTask_2' are logical names representing the partitions of myTask. Each tasks runs on N1,N2 and N3 respectively.
+\'myTask_0\', \'myTask_1\', \'myTask_2\' are logical names representing the partitions of myTask. Each tasks runs on N1, N2 and N3 respectively.
 
 ```
 {
@@ -72,13 +72,13 @@ If this task get too big to fit on one box, you might want to divide it into sub
 }
 ```
 
-#### REPLICA
+### Replica
 
-Partitioning allows one to split the data/task into multiple subparts. But lets say the request rate each partition increases. The common solution is to have multiple copies for each partition. Helix refers to the copy of a partition as 'replica'. Adding replica also increases the availability of the system during failures. One can see this methodology employed often in Search systems. The index is divided into shard and each shard has multiple copies.
+Partitioning allows one to split the data/task into multiple subparts. But let\'s say the request rate each partition increases. The common solution is to have multiple copies for each partition. Helix refers to the copy of a partition as a _replica_.  Adding a replica also increases the availability of the system during failures. One can see this methodology employed often in Search systems. The index is divided into shards, and each shard has multiple copies.
 
-Lets say you want to add one additional replica for each task. The idealstate can simply be changed as shown below. 
+Let\'s say you want to add one additional replica for each task. The IdealState can simply be changed as shown below. 
 
-For increasing the availability of the system, its better to place replica of given partition on different nodes.
+For increasing the availability of the system, it\'s better to place the replica of a given partition on different nodes.
 
 ```
 {
@@ -104,11 +104,11 @@ For increasing the availability of the system, its better to place replica of gi
 }
 ```
 
-#### STATE 
+### State 
 
-Now lets take a slightly complicated scenario where a task represents a database.  Unlike an index which is in general read only, database supports both reads and write. Keeping the data consistent among the replica is crucial in distributed data stores. One commonly applied technique is to assign one replica as MASTER and remaining as SLAVE. All writes go to MASTER and are then replicated to SLAVE.
+Now let\'s take a slightly complicated scenario where a task represents a database.  Unlike an index which is in general read-only, a database supports both reads and writes. Keeping the data consistent among the replicas is crucial in distributed data stores. One commonly applied technique is to assign one replica as MASTER and remaining replicas as SLAVE. All writes go to the MASTER and are then replicated to the SLAVE replicas.
 
-Helix allows one to assign different states to each replica. Lets say you have two mysql instances N1 and N2 where one will serve as MASTER and another as SLAVE. The ideal state can be changed to
+Helix allows one to assign different states to each replica. Let\'s say you have two MySQL instances N1 and N2, where one will serve as MASTER and another as SLAVE. The IdealState can be changed to:
 
 ```
 {
@@ -128,18 +128,17 @@ Helix allows one to assign different states to each replica. Lets say you have t
 ```
 
 
-### STATE MACHINE and TRANSITIONS
+### State Machine and Transitions
 
-Idealstate allows one to exactly specify the desired state of the cluster. Given an idealstate, Helix takes up the responsibility of ensuring that cluster reaches idealstate. Helix CONTROLLER reads the idealstate and then commands the PARTICIPANT to take appropriate actions to move from one state to another until it matches Idealstate. These actions are referred to as 'transitions' in Helix.
+IdealState allows one to exactly specify the desired state of the cluster. Given an IdealState, Helix takes up the responsibility of ensuring that the cluster reaches the IdealState.  The Helix _controller_ reads the IdealState and then commands the Participant to take appropriate actions to move from one state to another until it matches the IdealState.  These actions are referred to as _transitions_ in Helix.
 
-Next logical question is, how does the CONTROLLER compute the transitions required to get to idealstate. This is where finite state machine concept comes in. Helix allows applications to plug in FSM. A state machine consists of the following
+The next logical question is:  how does the _controller_ compute the transitions required to get to IdealState?  This is where the finite state machine concept comes in. Helix allows applications to plug in a finite state machine.  A state machine consists of the following:
 
-* STATE : Describes the role of a replica
-* TRANSITION: An action that allows a replica to move from one STATE to another, thus changing its role.
+* State: Describes the role of a replica
+* Transition: An action that allows a replica to move from one State to another, thus changing its role.
 
 Here is an example of MASTERSLAVE state machine,
 
-
 ```
           OFFLINE  | SLAVE  |  MASTER  
          _____________________________
@@ -155,7 +154,7 @@ MASTER  | SLAVE    | SLAVE  |   N/A   |
 
 ```
 
-Helix allows each resource to be associated with one state machine. This means you can have one resource as a index and another as database in the same cluster. One can associate each resource with a state machine as follows
+Helix allows each resource to be associated with one state machine. This means you can have one resource as an index and another as a database in the same cluster. One can associate each resource with a state machine as follows:
 
 ```
 {
@@ -175,12 +174,12 @@ Helix allows each resource to be associated with one state machine. This means y
 
 ```
 
-### CURRENT STATE
+### Current State
 
-Currentstate of a resource simply represents its actual state at a PARTICIPANT. In the below example, 
+CurrentState of a resource simply represents its actual state at a PARTICIPANT. In the below example:
 
-* 'INSTANCE_NAME' : Unique name representing the process.
-* 'SESSION_ID': Id that is automatically assigned every time a process joins the cluster. 
+* INSTANCE_NAME: Unique name representing the process
+* SESSION_ID: ID that is automatically assigned every time a process joins the cluster
 
 ```
 {
@@ -203,11 +202,11 @@ Currentstate of a resource simply represents its actual state at a PARTICIPANT.
   }
 }
 ```
-Each node in the cluster has its own Current state.
+Each node in the cluster has its own CurrentState.
 
-### EXTERNAL VIEW
+### External View
 
-In order to communicate with the PARTICIPANTs, external clients need to know the current state of each of the PARTICIPANT. The external clients are referred to as SPECTATORS. In order to make the life of SPECTATOR simple, Helix provides EXTERNALVIEW that is an aggregated view of the current state across all nodes. The EXTERNALVIEW has similar format as IDEALSTATE.
+In order to communicate with the PARTICIPANTs, external clients need to know the current state of each of the PARTICIPANTs. The external clients are referred to as SPECTATORS. In order to make the life of SPECTATOR simple, Helix provides an EXTERNALVIEW that is an aggregated view of the current state across all nodes. The EXTERNALVIEW has a similar format as IDEALSTATE.
 
 ```
 {
@@ -232,29 +231,29 @@ In order to communicate with the PARTICIPANTs, external clients need to know the
 }
 ```
 
-### REBALANCER
+### Rebalancer
 
-The core component of Helix is the CONTROLLER which runs the REBALANCER algorithm on every cluster event. Cluster event can be one of the following
+The core component of Helix is the CONTROLLER which runs the REBALANCER algorithm on every cluster event. Cluster events can be one of the following:
 
 * Nodes start/stop and soft/hard failures
 * New nodes are added/removed
 * Ideal state changes
 
-There are few more like config changes etc but the key point to take away is there are many ways to trigger the rebalancer.
+There are few more such as config changes, etc.  The key takeaway: there are many ways to trigger the rebalancer.
 
-When a rebalancer is run it simply does the following
+When a rebalancer is run it simply does the following:
 
-* Compares the idealstate and current state
-* Computes the transitions required to reach the idealstate.
-* Issues the transitions to PARTICIPANT
+* Compares the IdealState and current state
+* Computes the transitions required to reach the IdealState
+* Issues the transitions to each PARTICIPANT
 
-The above steps happen for every change in the system. Once current state matches the idealstate the system is considered stable which implies  'IDEALSTATE = CURRENTSTATE = EXTERNALVIEW'
+The above steps happen for every change in the system. Once the current state matches the IdealState, the system is considered stable which implies \'IDEALSTATE = CURRENTSTATE = EXTERNALVIEW\'
 
-### DYNAMIC IDEALSTATE
+### Dynamic IdealState
 
-One of the things that makes Helix powerful is that idealstate can be changed dynamically. This means one can listen to cluster events like node failures and dynamically change the ideal state. Helix will then take care of triggering the respective transitions in the system.
+One of the things that makes Helix powerful is that IdealState can be changed dynamically. This means one can listen to cluster events like node failures and dynamically change the ideal state. Helix will then take care of triggering the respective transitions in the system.
 
-Helix comes with few algorithms to automatically compute the idealstate based on the constraints. For e.g. if you have a resource 3 partitions and 2 replicas, Helix can automatically compute the idealstate based on the nodes that are currently active. See features page to find out more about various execution modes of Helix like AUTO_REBALANCE, AUTO and CUSTOM. 
+Helix comes with few algorithms to automatically compute the IdealState based on the constraints. For example, if you have a resource of 3 partitions and 2 replicas, Helix can automatically compute the IdealState based on the nodes that are currently active. See the [tutorial](./tutorial_rebalance.hmtl) to find out more about various execution modes of Helix like AUTO_REBALANCE, AUTO and CUSTOM. 
 
 
 

http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/676533b4/src/site/markdown/Quickstart.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/Quickstart.md b/src/site/markdown/Quickstart.md
index 6807c77..65410c3 100644
--- a/src/site/markdown/Quickstart.md
+++ b/src/site/markdown/Quickstart.md
@@ -264,6 +264,36 @@ ExternalView for myDB:
 {
   "id" : "myDB",
   "mapFields" : {
+    "myDB_0" : {
+      "localhost_12913" : "SLAVE",
+      "localhost_12914" : "MASTER",
+      "localhost_12915" : "SLAVE"
+    },
+    "myDB_1" : {
+      "localhost_12913" : "SLAVE",
+      "localhost_12914" : "SLAVE",
+      "localhost_12915" : "MASTER"
+    },
+    "myDB_2" : {
+      "localhost_12913" : "MASTER",
+      "localhost_12914" : "SLAVE",
+      "localhost_12915" : "SLAVE"
+    },
+    "myDB_3" : {
+      "localhost_12913" : "SLAVE",
+      "localhost_12914" : "SLAVE",
+      "localhost_12915" : "MASTER"
+    },
+    "myDB_4" : {
+      "localhost_12913" : "MASTER",
+      "localhost_12914" : "SLAVE",
+      "localhost_12915" : "SLAVE"
+    },
+    "myDB_5" : {
+      "localhost_12913" : "SLAVE",
+      "localhost_12914" : "MASTER",
+      "localhost_12915" : "SLAVE"
+    }
   },
   "listFields" : {
   },
@@ -286,6 +316,14 @@ Next, we\'ll show how Helix does the work that you\'d otherwise have to build in
     ./helix-admin.sh --zkSvr localhost:2199  --addNode MYCLUSTER localhost:12917
     ./helix-admin.sh --zkSvr localhost:2199  --addNode MYCLUSTER localhost:12918
 
+And start up these instances:
+
+    # start up each instance.  These are mock implementations that are actively managed by Helix
+    ./start-helix-participant.sh --zkSvr localhost:2199 --cluster MYCLUSTER --host localhost --port 12916 --stateModelType MasterSlave 2>&1 > /tmp/participant_12916.log
+    ./start-helix-participant.sh --zkSvr localhost:2199 --cluster MYCLUSTER --host localhost --port 12917 --stateModelType MasterSlave 2>&1 > /tmp/participant_12917.log
+    ./start-helix-participant.sh --zkSvr localhost:2199 --cluster MYCLUSTER --host localhost --port 12918 --stateModelType MasterSlave 2>&1 > /tmp/participant_12918.log
+
+
 And now, let Helix do the work for you.  To shift the work, simply rebalance.  After the rebalance, each node will have one master and two slaves.
 
     ./helix-admin.sh --zkSvr localhost:2199 --rebalance MYCLUSTER myDB 3
@@ -354,6 +392,36 @@ ExternalView for myDB:
 {
   "id" : "myDB",
   "mapFields" : {
+    "myDB_0" : {
+      "localhost_12913" : "SLAVE",
+      "localhost_12914" : "SLAVE",
+      "localhost_12917" : "MASTER"
+    },
+    "myDB_1" : {
+      "localhost_12916" : "SLAVE",
+      "localhost_12917" : "SLAVE",
+      "localhost_12918" : "MASTER"
+    },
+    "myDB_2" : {
+      "localhost_12913" : "MASTER",
+      "localhost_12917" : "SLAVE",
+      "localhost_12918" : "SLAVE"
+    },
+    "myDB_3" : {
+      "localhost_12915" : "MASTER",
+      "localhost_12917" : "SLAVE",
+      "localhost_12918" : "SLAVE"
+    },
+    "myDB_4" : {
+      "localhost_12916" : "MASTER",
+      "localhost_12917" : "SLAVE",
+      "localhost_12918" : "SLAVE"
+    },
+    "myDB_5" : {
+      "localhost_12913" : "SLAVE",
+      "localhost_12914" : "MASTER",
+      "localhost_12915" : "SLAVE"
+    }
   },
   "listFields" : {
   },
@@ -369,9 +437,7 @@ Mission accomplished.  The partitions are nicely balanced.
 
 Building a fault tolerant system isn\'t trivial, but with Helix, it\'s easy.  Helix detects a failed instance, and triggers mastership transfer automatically.
 
-First, let's fail an instance:
-
-_KILL A NODE (need to find the pid via listInstanceInfo)_
+First, let's fail an instance.  In this example, we\'ll kill localhost:12918 to simulate a failure.
 
 We lost localhost:12918, so myDB_1 lost its MASTER.  Helix can fix that, it will transfer mastership to a healthy node that is currently a SLAVE, say localhost:12197.  Helix balances the load as best as it can, given there are 6 partitions on 5 nodes.  Let\'s see:
 
@@ -390,23 +456,23 @@ IdealState for myDB:
     },
     "myDB_1" : {
       "localhost_12916" : "SLAVE",
-      "localhost_12917" : "MASTER"
-      "localhost_12918" : "OFFLINE"
+      "localhost_12917" : "SLAVE",
+      "localhost_12918" : "MASTER"
     },
     "myDB_2" : {
       "localhost_12913" : "MASTER",
       "localhost_12917" : "SLAVE",
-      "localhost_12918" : "OFFLINE"
+      "localhost_12918" : "SLAVE"
     },
     "myDB_3" : {
       "localhost_12915" : "MASTER",
       "localhost_12917" : "SLAVE",
-      "localhost_12918" : "OFFLINE"
+      "localhost_12918" : "SLAVE"
     },
     "myDB_4" : {
       "localhost_12916" : "MASTER",
       "localhost_12917" : "SLAVE",
-      "localhost_12918" : "OFFLINE"
+      "localhost_12918" : "SLAVE"
     },
     "myDB_5" : {
       "localhost_12913" : "SLAVE",
@@ -416,9 +482,9 @@ IdealState for myDB:
   },
   "listFields" : {
     "myDB_0" : [ "localhost_12917", "localhost_12913", "localhost_12914" ],
-    "myDB_1" : [ "localhost_12917", "localhost_12916", "localhost_12918" ],
-    "myDB_2" : [ "localhost_12913", "localhost_12917", "localhost_12918" ],
-    "myDB_3" : [ "localhost_12915", "localhost_12917", "localhost_12918" ],
+    "myDB_1" : [ "localhost_12918", "localhost_12917", "localhost_12916" ],
+    "myDB_2" : [ "localhost_12913", "localhost_12918", "localhost_12917" ],
+    "myDB_3" : [ "localhost_12915", "localhost_12918", "localhost_12917" ],
     "myDB_4" : [ "localhost_12916", "localhost_12917", "localhost_12918" ],
     "myDB_5" : [ "localhost_12914", "localhost_12915", "localhost_12913" ]
   },
@@ -435,6 +501,32 @@ ExternalView for myDB:
 {
   "id" : "myDB",
   "mapFields" : {
+    "myDB_0" : {
+      "localhost_12913" : "SLAVE",
+      "localhost_12914" : "SLAVE",
+      "localhost_12917" : "MASTER"
+    },
+    "myDB_1" : {
+      "localhost_12916" : "SLAVE",
+      "localhost_12917" : "MASTER"
+    },
+    "myDB_2" : {
+      "localhost_12913" : "MASTER",
+      "localhost_12917" : "SLAVE"
+    },
+    "myDB_3" : {
+      "localhost_12915" : "MASTER",
+      "localhost_12917" : "SLAVE"
+    },
+    "myDB_4" : {
+      "localhost_12916" : "MASTER",
+      "localhost_12917" : "SLAVE"
+    },
+    "myDB_5" : {
+      "localhost_12913" : "SLAVE",
+      "localhost_12914" : "MASTER",
+      "localhost_12915" : "SLAVE"
+    }
   },
   "listFields" : {
   },

http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/676533b4/src/site/markdown/index.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/index.md b/src/site/markdown/index.md
index 1aeeab5..f07c9cb 100644
--- a/src/site/markdown/index.md
+++ b/src/site/markdown/index.md
@@ -23,49 +23,6 @@ Navigating the Documentation
 ### Conceptual Understanding
 
 [Concepts / Terminology](./Concepts.html)
-[Architecture](./Architecture.html)
-
-### Hands-on Helix
-
-[Quickstart](./Quickstart.html)
-
-[Tutorial](./Tutorial.html)
-
-  * [Chapter 1: Participant](./tutorial1.html)
-  * [Chapter 2: Spectator](./tutorial2.html)
-  * [Chapter 3: Controller](./tutorial3.html)
-  * [Chapter 4: Rebalancing Algorithms](./tutorial4.html)
-  * [Chapter 5: State Models](./tutorial5.html)
-  * [Chapter 6: Messaging](./tutorial6.html)
-  * [Chapter 7: Customized Health Check](./tutorial7.html)
-  * [Chapter 8: Throttling](./tutorial8.html)
-  * [Chapter 9: Admin Interface](./tutorial9.html)
-  * [Chapter 10: Controller Deployment](./tutorial10.html)
-
-[Javadocs](http://helix.incubator.apache.org/apidocs/index.html)
-
-### Recipes
-
-[Distributed lock manager](./recipes/lock_manager.html)
-
-[Rabbit MQ consumer group](./recipes/rabbitmq_consumer_group.html)
-
-[Rsync replicated file store](./recipes/rsync_replicated_file_store.html)
-
-[Service discovery](./recipes/service_discovery.html)
-
-
-What Is Helix
---------------
-Helix is a generic _cluster management_ framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. 
-
-
-What Is Cluster Management
---------------------------
-To understand Helix, first you need to understand what is _cluster management_.  A distributed system typically runs on multiple nodes for the following reasons:
-=======
-
-[Concepts / Terminology](./Concepts.html)
 
 [Architecture](./Architecture.html)
 
@@ -87,26 +44,14 @@ To understand Helix, first you need to understand what is _cluster management_.
 
 [Service discovery](./recipes/service_discovery.html)
 
+[Distributed Task DAG Execution](./task_dag_execution.html)
+
 
 What Is Helix
 --------------
 Helix is a generic _cluster management_ framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. 
->>>>>>> 93a8770e41e61337525234945193afc2280a3c3d
 
-* scalability
-* fault tolerance
-* load balancing
-
-<<<<<<< HEAD
-Each node performs one or more of the primary function of the cluster, such as storing/serving data, producing/consuming data streams, etc.  Once configured for your system, Helix acts is the global brain for the system.  It is designed to make decisions that cannot be made in isolation.  Examples of decisions that require global knowledge and coordination:
-
-* scheduling of maintainence tasks, such as backups, garbage collection, file consolidation, index rebuilds
-* repartitioning of data or resources across the cluster
-* informing dependent systems of changes so they can react appropriately to cluster changes
-* throttling system tasks and changes
 
-While it is possible to integrate these functions into the distributed system, it complicates the code.  Helix has abstracted common cluster management tasks, enabling the system builder to model the desired behavior in a declarative state model, and let Helix manage the coordination.  The result is less new code to write, and a robust, highly operable system.
-=======
 What Is Cluster Management
 --------------------------
 To understand Helix, first you need to understand what is _cluster management_.  A distributed system typically runs on multiple nodes for the following reasons:
@@ -116,19 +61,15 @@ To understand Helix, first you need to understand what is _cluster management_.
 * load balancing
 
 Each node performs one or more of the primary function of the cluster, such as storing/serving data, producing/consuming data streams, etc.  Once configured for your system, Helix acts as the global brain for the system.  It is designed to make decisions that cannot be made in isolation.  Examples of decisions that require global knowledge and coordination:
->>>>>>> 93a8770e41e61337525234945193afc2280a3c3d
 
 * scheduling of maintainence tasks, such as backups, garbage collection, file consolidation, index rebuilds
 * repartitioning of data or resources across the cluster
 * informing dependent systems of changes so they can react appropriately to cluster changes
 * throttling system tasks and changes
 
-<<<<<<< HEAD
-=======
 While it is possible to integrate these functions into the distributed system, it complicates the code.  Helix has abstracted common cluster management tasks, enabling the system builder to model the desired behavior in a declarative state model, and let Helix manage the coordination.  The result is less new code to write, and a robust, highly operable system.
 
 
->>>>>>> 93a8770e41e61337525234945193afc2280a3c3d
 Key Features of Helix
 ---------------------
 1. Automatic assignment of resource/partition to nodes

http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/676533b4/src/site/markdown/involved/building.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/involved/building.md b/src/site/markdown/involved/building.md
deleted file mode 100644
index 06f0589..0000000
--- a/src/site/markdown/involved/building.md
+++ /dev/null
@@ -1,30 +0,0 @@
-<!---
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-
-Building Apache Helix
---------------
-
-First you need to install Apache Maven.
-
-To install jars locally:
-mvn clean install (-DskipTests if you don't want to run tests).
-
-
-   

http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/676533b4/src/site/markdown/recipes/lock_manager.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/recipes/lock_manager.md b/src/site/markdown/recipes/lock_manager.md
deleted file mode 100644
index 84420dd..0000000
--- a/src/site/markdown/recipes/lock_manager.md
+++ /dev/null
@@ -1,253 +0,0 @@
-<!---
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-Distributed lock manager
-------------------------
-Distributed locks are used to synchronize accesses shared resources. Most applications use Zookeeper to model the distributed locks. 
-
-The simplest way to model a lock using zookeeper is (See Zookeeper leader recipe for an exact and more advanced solution)
-
-* Each process tries to create an emphemeral node.
-* If can successfully create it then, it acquires the lock
-* Else it will watch on the znode and try to acquire the lock again if the current lock holder disappears 
-
-This is good enough if there is only one lock. But in practice, an application will need many such locks. Distributing and managing the locks among difference process becomes challenging. Extending such a solution to many locks will result in
-
-* Uneven distribution of locks among nodes, the node that starts first will acquire all the lock. Nodes that start later will be idle.
-* When a node fails, how the locks will be distributed among remaining nodes is not predicable. 
-* When new nodes are added the current nodes dont relinquish the locks so that new nodes can acquire some locks
-
-In other words we want a system to satisfy the following requirements.
-
-* Distribute locks evenly among all nodes to get better hardware utilization
-* If a node fails, the locks that were acquired by that node should be evenly distributed among other nodes
-* If nodes are added, locks must be evenly re-distributed among nodes.
-
-Helix provides a simple and elegant solution to this problem. Simply specify the number of locks and Helix will ensure that above constraints are satisfied. 
-
-To quickly see this working run the lock-manager-demo script where 12 locks are evenly distributed among three nodes, and when a node fails, the locks get re-distributed among remaining two nodes. Note that Helix does not re-shuffle the locks completely, instead it simply distributes the locks relinquished by dead node among 2 remaining nodes evenly.
-
-----------------------------------------------------------------------------------------
-
-#### Short version
- This version starts multiple threads with in same process to simulate a multi node deployment. Try the long version to get a better idea of how it works.
- 
-```
-git clone https://git-wip-us.apache.org/repos/asf/incubator-helix.git
-cd incubator-helix
-mvn clean install package -DskipTests
-cd recipes/distributed-lock-manager/target/distributed-lock-manager-pkg/bin
-chmod +x *
-./lock-manager-demo
-```
-
-##### Output
-
-```
-./lock-manager-demo 
-STARTING localhost_12000
-STARTING localhost_12002
-STARTING localhost_12001
-STARTED localhost_12000
-STARTED localhost_12002
-STARTED localhost_12001
-localhost_12001 acquired lock:lock-group_3
-localhost_12000 acquired lock:lock-group_8
-localhost_12001 acquired lock:lock-group_2
-localhost_12001 acquired lock:lock-group_4
-localhost_12002 acquired lock:lock-group_1
-localhost_12002 acquired lock:lock-group_10
-localhost_12000 acquired lock:lock-group_7
-localhost_12001 acquired lock:lock-group_5
-localhost_12002 acquired lock:lock-group_11
-localhost_12000 acquired lock:lock-group_6
-localhost_12002 acquired lock:lock-group_0
-localhost_12000 acquired lock:lock-group_9
-lockName    acquired By
-======================================
-lock-group_0    localhost_12002
-lock-group_1    localhost_12002
-lock-group_10    localhost_12002
-lock-group_11    localhost_12002
-lock-group_2    localhost_12001
-lock-group_3    localhost_12001
-lock-group_4    localhost_12001
-lock-group_5    localhost_12001
-lock-group_6    localhost_12000
-lock-group_7    localhost_12000
-lock-group_8    localhost_12000
-lock-group_9    localhost_12000
-Stopping localhost_12000
-localhost_12000 Interrupted
-localhost_12001 acquired lock:lock-group_9
-localhost_12001 acquired lock:lock-group_8
-localhost_12002 acquired lock:lock-group_6
-localhost_12002 acquired lock:lock-group_7
-lockName    acquired By
-======================================
-lock-group_0    localhost_12002
-lock-group_1    localhost_12002
-lock-group_10    localhost_12002
-lock-group_11    localhost_12002
-lock-group_2    localhost_12001
-lock-group_3    localhost_12001
-lock-group_4    localhost_12001
-lock-group_5    localhost_12001
-lock-group_6    localhost_12002
-lock-group_7    localhost_12002
-lock-group_8    localhost_12001
-lock-group_9    localhost_12001
-
-```
-
-----------------------------------------------------------------------------------------
-
-#### Long version
-This provides more details on how to setup the cluster and where to plugin application code.
-
-##### start zookeeper
-
-```
-./start-standalone-zookeeper 2199
-```
-
-##### Create a cluster
-
-```
-./helix-admin --zkSvr localhost:2199 --addCluster lock-manager-demo
-```
-
-##### Create a lock group
-
-Create a lock group and specify the number of locks in the lock group. 
-
-```
-./helix-admin --zkSvr localhost:2199  --addResource lock-manager-demo lock-group 6 OnlineOffline AUTO_REBALANCE
-```
-
-##### Start the nodes
-
-Create a Lock class that handles the callbacks. 
-
-```
-
-public class Lock extends StateModel
-{
-  private String lockName;
-
-  public Lock(String lockName)
-  {
-    this.lockName = lockName;
-  }
-
-  public void lock(Message m, NotificationContext context)
-  {
-    System.out.println(" acquired lock:"+ lockName );
-  }
-
-  public void release(Message m, NotificationContext context)
-  {
-    System.out.println(" releasing lock:"+ lockName );
-  }
-
-}
-
-```
-
-LockFactory that creates the lock
- 
-```
-public class LockFactory extends StateModelFactory<Lock>{
-    
-    /* Instantiates the lock handler, one per lockName*/
-    public Lock create(String lockName)
-    {
-        return new Lock(lockName);
-    }   
-}
-```
-
-At node start up, simply join the cluster and helix will invoke the appropriate callbacks on Lock instance. One can start any number of nodes and Helix detects that a new node has joined the cluster and re-distributes the locks automatically.
-
-```
-public class LockProcess{
-
-  public static void main(String args){
-    String zkAddress= "localhost:2199";
-    String clusterName = "lock-manager-demo";
-    //Give a unique id to each process, most commonly used format hostname_port
-    String instanceName ="localhost_12000";
-    ZKHelixAdmin helixAdmin = new ZKHelixAdmin(zkAddress);
-    //configure the instance and provide some metadata 
-    InstanceConfig config = new InstanceConfig(instanceName);
-    config.setHostName("localhost");
-    config.setPort("12000");
-    admin.addInstance(clusterName, config);
-    //join the cluster
-    HelixManager manager;
-    manager = HelixManagerFactory.getHelixManager(clusterName,
-                                                  instanceName,
-                                                  InstanceType.PARTICIPANT,
-                                                  zkAddress);
-    manager.getStateMachineEngine().registerStateModelFactory("OnlineOffline", modelFactory);
-    manager.connect();
-    Thread.currentThread.join();
-    }
-
-}
-```
-
-##### Start the controller
-
-Controller can be started either as a separate process or can be embedded within each node process
-
-###### Separate process
-This is recommended when number of nodes in the cluster >100. For fault tolerance, you can run multiple controllers on different boxes.
-
-```
-./run-helix-controller --zkSvr localhost:2199 --cluster lock-manager-demo 2>&1 > /tmp/controller.log &
-```
-
-###### Embedded within the node process
-This is recommended when the number of nodes in the cluster is less than 100. To start a controller from each process, simply add the following lines to MyClass
-
-```
-public class LockProcess{
-
-  public static void main(String args){
-    String zkAddress= "localhost:2199";
-    String clusterName = "lock-manager-demo";
-    .
-    .
-    manager.connect();
-    HelixManager controller;
-    controller = HelixControllerMain.startHelixController(zkAddress, 
-                                                          clusterName,
-                                                          "controller", 
-                                                          HelixControllerMain.STANDALONE);
-    Thread.currentThread.join();
-  }
-}
-```
-
-----------------------------------------------------------------------------------------
-
-
-
-
-

http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/676533b4/src/site/markdown/recipes/rabbitmq_consumer_group.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/recipes/rabbitmq_consumer_group.md b/src/site/markdown/recipes/rabbitmq_consumer_group.md
deleted file mode 100644
index ec3053a..0000000
--- a/src/site/markdown/recipes/rabbitmq_consumer_group.md
+++ /dev/null
@@ -1,227 +0,0 @@
-<!---
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-
-RabbitMQ Consumer Group
-=======================
-
-[RabbitMQ](http://www.rabbitmq.com/) is a well known Open source software the provides robust messaging for applications.
-
-One of the commonly implemented recipes using this software is a work queue.  http://www.rabbitmq.com/tutorials/tutorial-four-java.html describes the use case where
-
-* A producer sends a message with a routing key. 
-* The message is routed to the queue whose binding key exactly matches the routing key of the message.	
-* There are multiple consumers and each consumer is interested in processing only a subset of the messages by binding to the interested keys
-
-The example provided [here](http://www.rabbitmq.com/tutorials/tutorial-four-java.html) describes how multiple consumers can be started to process all the messages.
-
-While this works, in production systems one needs the following 
-
-* Ability to handle failures: when a consumers fails another consumer must be started or the other consumers must start processing these messages that should have been processed by the failed consumer.
-* When the existing consumers cannot keep up with the task generation rate, new consumers will be added. The tasks must be redistributed among all the consumers. 
-
-In this recipe, we demonstrate handling of consumer failures and new consumer additions using Helix.
-
-Mapping this usecase to Helix is pretty easy as the binding key/routing key is equivalent to a partition. 
-
-Let's take an example. Lets say the queue has 6 partitions, and we have 2 consumers to process all the queues. 
-What we want is all 6 queues to be evenly divided among 2 consumers. 
-Eventually when the system scales, we add more consumers to keep up. This will make each consumer process tasks from 2 queues.
-Now let's say that a consumer failed which reduces the number of active consumers to 2. This means each consumer must process 3 queues.
-
-We showcase how such a dynamic App can be developed using Helix. Even though we use rabbitmq as the pub/sub system one can extend this solution to other pub/sub systems.
-
-Try it
-======
-
-```
-git clone https://git-wip-us.apache.org/repos/asf/incubator-helix.git
-cd incubator-helix
-mvn clean install package -DskipTests
-cd recipes/rabbitmq-consumer-group/bin
-chmod +x *
-export HELIX_PKG_ROOT=`pwd`/helix-core/target/helix-core-pkg
-export HELIX_RABBITMQ_ROOT=`pwd`/recipes/rabbitmq-consumer-group/
-chmod +x $HELIX_PKG_ROOT/bin/*
-chmod +x $HELIX_RABBITMQ_ROOT/bin/*
-```
-
-
-Install Rabbit MQ
-----------------
-
-Setting up RabbitMQ on a local box is straightforward. You can find the instructions here
-http://www.rabbitmq.com/download.html
-
-Start ZK
---------
-Start zookeeper at port 2199
-
-```
-$HELIX_PKG_ROOT/bin/start-standalone-zookeeper 2199
-```
-
-Setup the consumer group cluster
---------------------------------
-This will setup the cluster by creating a "rabbitmq-consumer-group" cluster and adds a "topic" with "6" queues. 
-
-```
-$HELIX_RABBITMQ_ROOT/bin/setup-cluster.sh localhost:2199 
-```
-
-Add consumers
--------------
-Start 2 consumers in 2 different terminals. Each consumer is given a unique id.
-
-```
-//start-consumer.sh zookeeperAddress (e.g. localhost:2181) consumerId , rabbitmqServer (e.g. localhost)
-$HELIX_RABBITMQ_ROOT/bin/start-consumer.sh localhost:2199 0 localhost 
-$HELIX_RABBITMQ_ROOT/bin/start-consumer.sh localhost:2199 1 localhost 
-
-```
-
-Start HelixController
---------------------
-Now start a Helix controller that starts managing the "rabbitmq-consumer-group" cluster.
-
-```
-$HELIX_RABBITMQ_ROOT/bin/start-cluster-manager.sh localhost:2199
-```
-
-Send messages to the Topic
---------------------------
-
-Start sending messages to the topic. This script randomly selects a routing key (1-6) and sends the message to topic. 
-Based on the key, messages gets routed to the appropriate queue.
-
-```
-$HELIX_RABBITMQ_ROOT/bin/send-message.sh localhost 20
-```
-
-After running this, you should see all 20 messages being processed by 2 consumers. 
-
-Add another consumer
---------------------
-Once a new consumer is started, helix detects it. In order to balance the load between 3 consumers, it deallocates 1 partition from the existing consumers and allocates it to the new consumer. We see that
-each consumer is now processing only 2 queues.
-Helix makes sure that old nodes are asked to stop consuming before the new consumer is asked to start consuming for a given partition. But the transitions for each partition can happen in parallel.
-
-```
-$HELIX_RABBITMQ_ROOT/bin/start-consumer.sh localhost:2199 2 localhost
-```
-
-Send messages again to the topic.
-
-```
-$HELIX_RABBITMQ_ROOT/bin/send-message.sh localhost 100
-```
-
-You should see that messages are now received by all 3 consumers.
-
-Stop a consumer
----------------
-In any terminal press CTRL^C and notice that Helix detects the consumer failure and distributes the 2 partitions that were processed by failed consumer to the remaining 2 active consumers.
-
-
-How does it work
-================
-
-Find the entire code [here](https://git-wip-us.apache.org/repos/asf?p=incubator-helix.git;a=tree;f=recipes/rabbitmq-consumer-group/src/main/java/org/apache/helix/recipes/rabbitmq). 
- 
-Cluster setup
--------------
-This step creates znode on zookeeper for the cluster and adds the state model. We use online offline state model since there is no need for other states. The consumer is either processing a queue or it is not.
-
-It creates a resource called "rabbitmq-consumer-group" with 6 partitions. The execution mode is set to AUTO_REBALANCE. This means that the Helix controls the assignment of partition to consumers and automatically distributes the partitions evenly among the active consumers. When a consumer is added or removed, it ensures that a minimum number of partitions are shuffled.
-
-```
-      zkclient = new ZkClient(zkAddr, ZkClient.DEFAULT_SESSION_TIMEOUT,
-          ZkClient.DEFAULT_CONNECTION_TIMEOUT, new ZNRecordSerializer());
-      ZKHelixAdmin admin = new ZKHelixAdmin(zkclient);
-      
-      // add cluster
-      admin.addCluster(clusterName, true);
-
-      // add state model definition
-      StateModelConfigGenerator generator = new StateModelConfigGenerator();
-      admin.addStateModelDef(clusterName, "OnlineOffline",
-          new StateModelDefinition(generator.generateConfigForOnlineOffline()));
-
-      // add resource "topic" which has 6 partitions
-      String resourceName = "rabbitmq-consumer-group";
-      admin.addResource(clusterName, resourceName, 6, "OnlineOffline", "AUTO_REBALANCE");
-```
-
-Starting the consumers
-----------------------
-The only thing consumers need to know is the zkaddress, cluster name and consumer id. It does not need to know anything else.
-
-```
-   _manager =
-          HelixManagerFactory.getZKHelixManager(_clusterName,
-                                                _consumerId,
-                                                InstanceType.PARTICIPANT,
-                                                _zkAddr);
-
-      StateMachineEngine stateMach = _manager.getStateMachineEngine();
-      ConsumerStateModelFactory modelFactory =
-          new ConsumerStateModelFactory(_consumerId, _mqServer);
-      stateMach.registerStateModelFactory("OnlineOffline", modelFactory);
-
-      _manager.connect();
-
-```
-
-Once the consumer has registered the statemodel and the controller is started, the consumer starts getting callbacks (onBecomeOnlineFromOffline) for the partition it needs to host. All it needs to do as part of the callback is to start consuming messages from the appropriate queue. Similarly, when the controller deallocates a partitions from a consumer, it fires onBecomeOfflineFromOnline for the same partition. 
-As a part of this transition, the consumer will stop consuming from a that queue.
-
-```
- @Transition(to = "ONLINE", from = "OFFLINE")
-  public void onBecomeOnlineFromOffline(Message message, NotificationContext context)
-  {
-    LOG.debug(_consumerId + " becomes ONLINE from OFFLINE for " + _partition);
-
-    if (_thread == null)
-    {
-      LOG.debug("Starting ConsumerThread for " + _partition + "...");
-      _thread = new ConsumerThread(_partition, _mqServer, _consumerId);
-      _thread.start();
-      LOG.debug("Starting ConsumerThread for " + _partition + " done");
-
-    }
-  }
-
-  @Transition(to = "OFFLINE", from = "ONLINE")
-  public void onBecomeOfflineFromOnline(Message message, NotificationContext context)
-      throws InterruptedException
-  {
-    LOG.debug(_consumerId + " becomes OFFLINE from ONLINE for " + _partition);
-
-    if (_thread != null)
-    {
-      LOG.debug("Stopping " + _consumerId + " for " + _partition + "...");
-
-      _thread.interrupt();
-      _thread.join(2000);
-      _thread = null;
-      LOG.debug("Stopping " +  _consumerId + " for " + _partition + " done");
-
-    }
-  }
-```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/676533b4/src/site/markdown/recipes/rsync_replicated_file_store.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/recipes/rsync_replicated_file_store.md b/src/site/markdown/recipes/rsync_replicated_file_store.md
deleted file mode 100644
index f8a74a0..0000000
--- a/src/site/markdown/recipes/rsync_replicated_file_store.md
+++ /dev/null
@@ -1,165 +0,0 @@
-<!---
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-Near real time rsync replicated file system
-===========================================
-
-Quickdemo
----------
-
-* This demo starts 3 instances with id's as ```localhost_12001, localhost_12002, localhost_12003```
-* Each instance stores its files under ```/tmp/<id>/filestore```
-* ``` localhost_12001 ``` is designated as the master and ``` localhost_12002 and localhost_12003``` are the slaves.
-* Files written to master are replicated to the slaves automatically. In this demo, a.txt and b.txt are written to ```/tmp/localhost_12001/filestore``` and it gets replicated to other folders.
-* When the master is stopped, ```localhost_12002``` is promoted to master. 
-* The other slave ```localhost_12003``` stops replicating from ```localhost_12001``` and starts replicating from new master ```localhost_12002```
-* Files written to new master ```localhost_12002``` are replicated to ```localhost_12003```
-* In the end state of this quick demo, ```localhost_12002``` is the master and ```localhost_12003``` is the slave. Manually create files under ```/tmp/localhost_12002/filestore``` and see that appears in ```/tmp/localhost_12003/filestore```
-* Ignore the interrupted exceptions on the console :-).
-
-
-```
-git clone https://git-wip-us.apache.org/repos/asf/incubator-helix.git
-cd recipes/rsync-replicated-file-system/
-mvn clean install package -DskipTests
-cd target/rsync-replicated-file-system-pkg/bin
-chmod +x *
-./quickdemo
-
-```
-
-Overview
---------
-
-There are many applications that require storage for storing large number of relatively small data files. Examples include media stores to store small videos, images, mail attachments etc. Each of these objects is typically kilobytes, often no larger than a few megabytes. An additional distinguishing feature of these usecases is also that files are typically only added or deleted, rarely updated. When there are updates, they are rare and do not have any concurrency requirements.
-
-These are much simpler requirements than what general purpose distributed file system have to satisfy including concurrent access to files, random access for reads and updates, posix compliance etc. To satisfy those requirements, general DFSs are also pretty complex that are expensive to build and maintain.
- 
-A different implementation of a distributed file system includes HDFS which is inspired by Google's GFS. This is one of the most widely used distributed file system that forms the main data storage platform for Hadoop. HDFS is primary aimed at processing very large data sets and distributes files across a cluster of commodity servers by splitting up files in fixed size chunks. HDFS is not particularly well suited for storing a very large number of relatively tiny files.
-
-### File Store
-
-It's possible to build a vastly simpler system for the class of applications that have simpler requirements as we have pointed out.
-
-* Large number of files but each file is relatively small.
-* Access is limited to create, delete and get entire files.
-* No updates to files that are already created (or it's feasible to delete the old file and create a new one).
- 
-
-We call this system a Partitioned File Store (PFS) to distinguish it from other distributed file systems. This system needs to provide the following features:
-
-* CRD access to large number of small files
-* Scalability: Files should be distributed across a large number of commodity servers based on the storage requirement.
-* Fault-tolerance: Each file should be replicated on multiple servers so that individual server failures do not reduce availability.
-* Elasticity: It should be possible to add capacity to the cluster easily.
- 
-
-Apache Helix is a generic cluster management framework that makes it very easy to provide the scalability, fault-tolerance and elasticity features. 
-Rsync can be easily used as a replication channel between servers so that each file gets replicated on multiple servers.
-
-Design
-------
-
-High level 
-
-* Partition the file system based on the file name. 
-* At any time a single writer can write, we call this a master.
-* For redundancy, we need to have additional replicas called slave. Slaves can optionally serve reads.
-* Slave replicates data from the master.
-* When a master fails, slave gets promoted to master.
-
-### Transaction log
-
-Every write on the master will result in creation/deletion of one or more files. In order to maintain timeline consistency slaves need to apply the changes in the same order. 
-To facilitate this, the master logs each transaction in a file and each transaction is associated with an 64 bit id in which the 32 LSB represents a sequence number and MSB represents the generation number.
-Sequence gets incremented on every transaction and and generation is increment when a new master is elected. 
-
-### Replication
-
-Replication is required to slave to keep up with the changes on the master. Every time the slave applies a change it checkpoints the last applied transaction id. 
-During restarts, this allows the slave to pull changes from the last checkpointed id. Similar to master, the slave logs each transaction to the transaction logs but instead of generating new transaction id, it uses the same id generated by the master.
-
-
-### Fail over
-
-When a master fails, a new slave will be promoted to master. If the prev master node is reachable, then the new master will flush all the 
-changes from previous master before taking up mastership. The new master will record the end transaction id of the current generation and then starts new generation 
-with sequence starting from 1. After this the master will begin accepting writes. 
-
-
-![Partitioned File Store](../images/PFS-Generic.png)
-
-
-
-Rsync based solution
--------------------
-
-![Rsync based File Store](../images/RSYNC_BASED_PFS.png)
-
-
-This application demonstrate a file store that uses rsync as the replication mechanism. One can envision a similar system where instead of using rsync, 
-can implement a custom solution to notify the slave of the changes and also provide an api to pull the change files.
-#### Concept
-* file_store_dir: Root directory for the actual data files 
-* change_log_dir: The transaction logs are generated under this folder.
-* check_point_dir: The slave stores the check points ( last processed transaction) here.
-
-#### Master
-* File server: This component support file uploads and downloads and writes the files to ```file_store_dir```. This is not included in this application. Idea is that most applications have different ways of implementing this component and has some business logic associated with it. It is not hard to come up with such a component if needed.
-* File store watcher: This component watches the ```file_store_dir``` directory on the local file system for any changes and notifies the registered listeners of the changes.
-* Change Log Generator: This registers as a listener of File System Watcher and on each notification logs the changes into a file under ```change_log_dir```. 
-
-####Slave
-* File server: This component on the slave will only support reads.
-* Cluster state observer: Slave observes the cluster state and is able to know who is the current master. 
-* Replicator: This has two subcomponents
-    - Periodic rsync of change log: This is a background process that periodically rsyncs the ```change_log_dir``` of the master to its local directory
-    - Change Log Watcher: This watches the ```change_log_dir``` for changes and notifies the registered listeners of the change
-    - On demand rsync invoker: This is registered as a listener to change log watcher and on every change invokes rsync to sync only the changed file.
-
-
-#### Coordination
-
-The coordination between nodes is done by Helix. Helix does the partition management and assigns the partition to multiple nodes based on the replication factor. It elects one the nodes as master and designates others as slaves.
-It provides notifications to each node in the form of state transitions ( Offline to Slave, Slave to Master). It also provides notification when there is change is cluster state. 
-This allows the slave to stop replicating from current master and start replicating from new master. 
-
-In this application, we have only one partition but its very easy to extend it to support multiple partitions. By partitioning the file store, one can add new nodes and Helix will automatically 
-re-distribute partitions among the nodes. To summarize, Helix provides partition management, fault tolerance and facilitates automated cluster expansion.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-

http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/676533b4/src/site/markdown/recipes/service_discovery.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/recipes/service_discovery.md b/src/site/markdown/recipes/service_discovery.md
deleted file mode 100644
index 8e06ead..0000000
--- a/src/site/markdown/recipes/service_discovery.md
+++ /dev/null
@@ -1,191 +0,0 @@
-<!---
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-Service Discovery
------------------
-
-One of the common usage of zookeeper is enable service discovery. 
-The basic idea is that when a server starts up it advertises its configuration/metadata such as host name port etc on zookeeper. 
-This allows clients to dynamically discover the servers that are currently active. One can think of this like a service registry to which a server registers when it starts and 
-is automatically deregistered when it shutdowns or crashes. In many cases it serves as an alternative to vips.
-
-The core idea behind this is to use zookeeper ephemeral nodes. The ephemeral nodes are created when the server registers and all its metadata is put into a znode. 
-When the server shutdowns, zookeeper automatically removes this znode. 
-
-There are two ways the clients can dynamically discover the active servers
-
-#### ZOOKEEPER WATCH
-
-Clients can set a child watch under specific path on zookeeper. 
-When a new service is registered/deregistered, zookeeper notifies the client via watchevent and the client can read the list of services. Even though this looks trivial, 
-there are lot of things one needs to keep in mind like ensuring that you first set the watch back on zookeeper before reading data from zookeeper.
-
-
-#### POLL
-
-Another approach is for the client to periodically read the zookeeper path and get the list of services.
-
-
-Both approaches have pros and cons, for example setting a watch might trigger herd effect if there are large number of clients. This is worst especially when servers are starting up. 
-But good thing about setting watch is that clients are immediately notified of a change which is not true in case of polling. 
-In some cases, having both WATCH and POLL makes sense, WATCH allows one to get notifications as soon as possible while POLL provides a safety net if a watch event is missed because of code bug or zookeeper fails to notify.
-
-##### Other important scenarios to take care of
-* What happens when zookeeper session expires. All the watches/ephemeral nodes previously added/created by this server are lost. 
-One needs to add the watches again , recreate the ephemeral nodes etc.
-* Due to network issues or java GC pauses session expiry might happen again and again also known as flapping. Its important for the server to detect this and deregister itself.
-
-##### Other operational things to consider
-* What if the node is behaving badly, one might kill the server but will lose the ability to debug. 
-It would be nice to have the ability to mark a server as disabled and clients know that a node is disabled and will not contact that node.
- 
-#### Configuration ownership
-
-This is an important aspect that is often ignored in the initial stages of your development. In common, service discovery pattern means that servers start up with some configuration and then simply puts its configuration/metadata in zookeeper. While this works well in the beginning, 
-configuration management becomes very difficult since the servers themselves are statically configured. Any change in server configuration implies restarting of the server. Ideally, it will be nice to have the ability to change configuration dynamically without having to restart a server. 
-
-Ideally you want a hybrid solution, a node starts with minimal configuration and gets the rest of configuration from zookeeper.
-
-h3. How to use Helix to achieve this
-
-Even though Helix has higher level abstraction in terms of statemachine, constraints and objectives, 
-service discovery is one of things that existed since we started. 
-The controller uses the exact mechanism we described above to discover when new servers join the cluster.
-We create these znodes under /CLUSTERNAME/LIVEINSTANCES. 
-Since at any time there is only one controller, we use ZK watch to track the liveness of a server.
-
-This recipe, simply demonstrate how one can re-use that part for implementing service discovery. This demonstrates multiple MODE's of service discovery
-
-* POLL: The client reads from zookeeper at regular intervals 30 seconds. Use this if you have 100's of clients
-* WATCH: The client sets up watcher and gets notified of the changes. Use this if you have 10's of clients.
-* NONE: This does neither of the above, but reads directly from zookeeper when ever needed.
-
-Helix provides these additional features compared to other implementations available else where
-
-* It has the concept of disabling a node which means that a badly behaving node, can be disabled using helix admin api.
-* It automatically detects if a node connects/disconnects from zookeeper repeatedly and disables the node.
-* Configuration management  
-    * Allows one to set configuration via admin api at various granulaties like cluster, instance, resource, partition 
-    * Configuration can be dynamically changed.
-    * Notifies the server when configuration changes.
-
-
-##### checkout and build
-
-```
-git clone https://git-wip-us.apache.org/repos/asf/incubator-helix.git
-cd incubator-helix
-mvn clean install package -DskipTests
-cd recipes/service-discovery/target/service-discovery-pkg/bin
-chmod +x *
-```
-
-##### start zookeeper
-
-```
-./start-standalone-zookeeper 2199
-```
-
-#### Run the demo
-
-```
-./service-discovery-demo.sh
-```
-
-#### Output
-
-```
-START:Service discovery demo mode:WATCH
-	Registering service
-		host.x.y.z_12000
-		host.x.y.z_12001
-		host.x.y.z_12002
-		host.x.y.z_12003
-		host.x.y.z_12004
-	SERVICES AVAILABLE
-		SERVICENAME 	HOST 			PORT
-		myServiceName 	host.x.y.z 		12000
-		myServiceName 	host.x.y.z 		12001
-		myServiceName 	host.x.y.z 		12002
-		myServiceName 	host.x.y.z 		12003
-		myServiceName 	host.x.y.z 		12004
-	Deregistering service:
-		host.x.y.z_12002
-	SERVICES AVAILABLE
-		SERVICENAME 	HOST 			PORT
-		myServiceName 	host.x.y.z 		12000
-		myServiceName 	host.x.y.z 		12001
-		myServiceName 	host.x.y.z 		12003
-		myServiceName 	host.x.y.z 		12004
-	Registering service:host.x.y.z_12002
-END:Service discovery demo mode:WATCH
-=============================================
-START:Service discovery demo mode:POLL
-	Registering service
-		host.x.y.z_12000
-		host.x.y.z_12001
-		host.x.y.z_12002
-		host.x.y.z_12003
-		host.x.y.z_12004
-	SERVICES AVAILABLE
-		SERVICENAME 	HOST 			PORT
-		myServiceName 	host.x.y.z 		12000
-		myServiceName 	host.x.y.z 		12001
-		myServiceName 	host.x.y.z 		12002
-		myServiceName 	host.x.y.z 		12003
-		myServiceName 	host.x.y.z 		12004
-	Deregistering service:
-		host.x.y.z_12002
-	Sleeping for poll interval:30000
-	SERVICES AVAILABLE
-		SERVICENAME 	HOST 			PORT
-		myServiceName 	host.x.y.z 		12000
-		myServiceName 	host.x.y.z 		12001
-		myServiceName 	host.x.y.z 		12003
-		myServiceName 	host.x.y.z 		12004
-	Registering service:host.x.y.z_12002
-END:Service discovery demo mode:POLL
-=============================================
-START:Service discovery demo mode:NONE
-	Registering service
-		host.x.y.z_12000
-		host.x.y.z_12001
-		host.x.y.z_12002
-		host.x.y.z_12003
-		host.x.y.z_12004
-	SERVICES AVAILABLE
-		SERVICENAME 	HOST 			PORT
-		myServiceName 	host.x.y.z 		12000
-		myServiceName 	host.x.y.z 		12001
-		myServiceName 	host.x.y.z 		12002
-		myServiceName 	host.x.y.z 		12003
-		myServiceName 	host.x.y.z 		12004
-	Deregistering service:
-		host.x.y.z_12000
-	SERVICES AVAILABLE
-		SERVICENAME 	HOST 			PORT
-		myServiceName 	host.x.y.z 		12001
-		myServiceName 	host.x.y.z 		12002
-		myServiceName 	host.x.y.z 		12003
-		myServiceName 	host.x.y.z 		12004
-	Registering service:host.x.y.z_12000
-END:Service discovery demo mode:NONE
-=============================================
-
-```
-