You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by rm...@apache.org on 2013/05/30 06:58:41 UTC

[4/9] git commit: Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/incubator-helix

Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/incubator-helix

Conflicts:
	src/site/markdown/Tutorial.md
	src/site/markdown/index.md


Project: http://git-wip-us.apache.org/repos/asf/incubator-helix/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-helix/commit/6a2ddc6e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-helix/tree/6a2ddc6e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-helix/diff/6a2ddc6e

Branch: refs/heads/master
Commit: 6a2ddc6ed18dbfe6911ba00110f8962ef9f3c5ea
Parents: d3d846b 93a8770
Author: Bob Schulman <bs...@linkedin.com>
Authored: Wed May 22 23:31:25 2013 -0700
Committer: Bob Schulman <bs...@linkedin.com>
Committed: Wed May 22 23:31:25 2013 -0700

----------------------------------------------------------------------
 src/site/markdown/Tutorial.md |  199 ------------------------------------
 src/site/markdown/index.md    |   67 +++++++++++--
 2 files changed, 60 insertions(+), 206 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/6a2ddc6e/src/site/markdown/Tutorial.md
----------------------------------------------------------------------
diff --cc src/site/markdown/Tutorial.md
index b5ee150,27f9fd9..0000000
deleted file mode 100644,100644
--- a/src/site/markdown/Tutorial.md
+++ /dev/null
@@@ -1,199 -1,199 +1,0 @@@
--<!---
--Licensed to the Apache Software Foundation (ASF) under one
--or more contributor license agreements.  See the NOTICE file
--distributed with this work for additional information
--regarding copyright ownership.  The ASF licenses this file
--to you under the Apache License, Version 2.0 (the
--"License"); you may not use this file except in compliance
--with the License.  You may obtain a copy of the License at
--
--  http://www.apache.org/licenses/LICENSE-2.0
--
--Unless required by applicable law or agreed to in writing,
--software distributed under the License is distributed on an
--"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
--KIND, either express or implied.  See the License for the
--specific language governing permissions and limitations
--under the License.
---->
--
--# Helix Tutorial
--
--In this tutorial, we will cover the roles of a Helix-managed cluster, and show the code you need to write to integrate with it.  In many cases, there is a simple default behavior that is often appropriate, but you can also customize the behavior.
--
--Convention: we first cover the _basic_ approach, which is the easiest to implement.  Then, we'll describe _advanced_ options, which give you more control over the system behavior, but require you to write more code.
--
--
--### Prerequisites
--
- 1. Read [Concepts/Terminology](./concepts.html) and [Architecture](./architecture.html)
- 2. Read the [Quickstart guide](./quickstart.html) to learn how Helix models and manages a cluster
- 3. Install Helix source.  See: [Quickstart](./quickstart.html) for the steps.
- 
- ### Tutorial Outline
- 
- 1. [Participant](./tutorial_participant.html)
- 2. [Spectator](./tutorial_spectator.html)
- 3. [Controller](./tutorial_controller.html)
- 4. [Rebalancing Algorithms](./tutorial_rebalance.html)
- 5. [State Machines](./tutorial_state.html)
- 6. [Messaging](./tutorial_messaging.html)
- 7. [Customized health check](./tutorial_health.html)
- 8. [Throttling](./tutorial_throttling.html)
- 9. [Application Property Store](./tutorial_propstore.html)
- 10. [Admin Interface](./tutorial_admin.html)
- 
- ### Preliminaries
- 
- First, we need to set up the system.  Let\'s walk through the steps in building a distributed system using Helix.
- 
- ### Start Zookeeper
- 
- This starts a zookeeper in standalone mode. For production deployment, see [Apache Zookeeper](http://zookeeper.apache.org) for instructions.
- 
- ```
-     ./start-standalone-zookeeper.sh 2199 &
- ```
- 
- ### Create a cluster
- 
- Creating a cluster will define the cluster in appropriate znodes on zookeeper.   
- 
- Using the java API:
- 
- ```
-     // Create setup tool instance
-     // Note: ZK_ADDRESS is the host:port of Zookeeper
-     String ZK_ADDRESS = "localhost:2199";
-     admin = new ZKHelixAdmin(ZK_ADDRESS);
- 
-     String CLUSTER_NAME = "helix-demo";
-     //Create cluster namespace in zookeeper
-     admin.addCluster(CLUSTER_NAME);
- ```
- 
- OR
- 
- Using the command-line interface:
- 
- ```
-     ./helix-admin.sh --zkSvr localhost:2199 --addCluster helix-demo 
- ```
- 
- 
- ### Configure the nodes of the cluster
- 
- First we\'ll add new nodes to the cluster, then configure the nodes in the cluster. Each node in the cluster must be uniquely identifiable. 
- The most commonly used convention is hostname:port.
- 
- ```
-     String CLUSTER_NAME = "helix-demo";
-     int NUM_NODES = 2;
-     String hosts[] = new String[]{"localhost","localhost"};
-     String ports[] = new String[]{7000,7001};
-     for (int i = 0; i < NUM_NODES; i++)
-     {
-       
-       InstanceConfig instanceConfig = new InstanceConfig(hosts[i]+ "_" + ports[i]);
-       instanceConfig.setHostName(hosts[i]);
-       instanceConfig.setPort(ports[i]);
-       instanceConfig.setInstanceEnabled(true);
- 
-       //Add additional system specific configuration if needed. These can be accessed during the node start up.
-       instanceConfig.getRecord().setSimpleField("key", "value");
-       admin.addInstance(CLUSTER_NAME, instanceConfig);
-       
-     }
- ```
- 
- ### Configure the resource
- 
- A _resource_ represents the actual task performed by the nodes. It can be a database, index, topic, queue or any other processing entity.
- A _resource_ can be divided into many sub-parts known as _partitions_.
- 
- 
- #### Define the _state model_ and _constraints_
- 
- For scalability and fault tolerance, each partition can have one or more replicas. 
- The _state model_ allows one to declare the system behavior by first enumerating the various STATES, and the TRANSITIONS between them.
- A simple model is ONLINE-OFFLINE where ONLINE means the task is active and OFFLINE means it\'s not active.
- You can also specify how many replicas must be in each state, these are known as _constraints_.
- For example, in a search system, one might need more than one node serving the same index to handle the load.
- 
- The allowed states: 
- 
- * MASTER
- * SLAVE
- * OFFLINE
- 
- The allowed transitions: 
- 
- * OFFLINE to SLAVE
- * SLAVE to OFFLINE
- * SLAVE to MASTER
- * MASTER to SLAVE
- 
- The constraints:
- 
- * no more than 1 MASTER per partition
- * the rest of the replicas should be slaves
- 
- The following snippet shows how to declare the _state model_ and _constraints_ for the MASTER-SLAVE model.
- 
- ```
- 
-     StateModelDefinition.Builder builder = new StateModelDefinition.Builder(STATE_MODEL_NAME);
- 
-     // Add states and their rank to indicate priority. A lower rank corresponds to a higher priority
-     builder.addState(MASTER, 1);
-     builder.addState(SLAVE, 2);
-     builder.addState(OFFLINE);
- 
-     // Set the initial state when the node starts
-     builder.initialState(OFFLINE);
- 
-     // Add transitions between the states.
-     builder.addTransition(OFFLINE, SLAVE);
-     builder.addTransition(SLAVE, OFFLINE);
-     builder.addTransition(SLAVE, MASTER);
-     builder.addTransition(MASTER, SLAVE);
- 
-     // set constraints on states.
- 
-     // static constraint: upper bound of 1 MASTER
-     builder.upperBound(MASTER, 1);
- 
-     // dynamic constraint: R means it should be derived based on the replication factor for the cluster
-     // this allows a different replication factor for each resource without 
-     // having to define a new state model
-     //
-     builder.dynamicUpperBound(SLAVE, "R");
- 
-     StateModelDefinition statemodelDefinition = builder.build();
-     admin.addStateModelDef(CLUSTER_NAME, STATE_MODEL_NAME, myStateModel);
- ```
- 
- #### Assigning partitions to nodes
- 
- The final goal of Helix is to ensure that the constraints on the state model are satisfied. 
- Helix does this by assigning a STATE to a partition (such as MASTER, SLAVE), and placing it on a particular node.
- 
- There are 3 assignment modes Helix can operate on
- 
- * AUTO_REBALANCE: Helix decides the placement and state of a partition.
- * AUTO: Application decides the placement but Helix decides the state of a partition.
- * CUSTOM: Application controls the placement and state of a partition.
- 
- For more info on the assignment modes, see [chapter 4](./tutorial4.html) of the tutorial.
- 
- ```
-     String RESOURCE_NAME = "MyDB";
-     int NUM_PARTITIONS = 6;
-     STATE_MODEL_NAME = "MasterSlave";
-     String MODE = "AUTO";
-     int NUM_REPLICAS = 2;
- 
-     admin.addResource(CLUSTER_NAME, RESOURCE_NAME, NUM_PARTITIONS, STATE_MODEL_NAME, MODE);
-     admin.rebalance(CLUSTER_NAME, RESOURCE_NAME, NUM_REPLICAS);
- ```
- 
 -1. Read [Concepts/Terminology](./Concepts.html) and [Architecture](./Architecture.html)
 -2. Read the [Quickstart guide](./Quickstart.html) to learn how Helix models and manages a cluster
 -3. Install Helix source.  See: [Quickstart](./Quickstart.html) for the steps.
 -
 -### Tutorial Outline
 -
 -1. [Participant](./tutorial_participant.html)
 -2. [Spectator](./tutorial_spectator.html)
 -3. [Controller](./tutorial_controller.html)
 -4. [Rebalancing Algorithms](./tutorial_rebalance.html)
 -5. [State Machines](./tutorial_state.html)
 -6. [Messaging](./tutorial_messaging.html)
 -7. [Customized health check](./tutorial_health.html)
 -8. [Throttling](./tutorial_throttling.html)
 -9. [Application Property Store](./tutorial_propstore.html)
 -10. [Admin Interface](./tutorial_admin.html)
 -
 -### Preliminaries
 -
 -First, we need to set up the system.  Let\'s walk through the steps in building a distributed system using Helix.
 -
 -### Start Zookeeper
 -
 -This starts a zookeeper in standalone mode. For production deployment, see [Apache Zookeeper](http://zookeeper.apache.org) for instructions.
 -
 -```
 -    ./start-standalone-zookeeper.sh 2199 &
 -```
 -
 -### Create a cluster
 -
 -Creating a cluster will define the cluster in appropriate znodes on zookeeper.   
 -
 -Using the java API:
 -
 -```
 -    // Create setup tool instance
 -    // Note: ZK_ADDRESS is the host:port of Zookeeper
 -    String ZK_ADDRESS = "localhost:2199";
 -    admin = new ZKHelixAdmin(ZK_ADDRESS);
 -
 -    String CLUSTER_NAME = "helix-demo";
 -    //Create cluster namespace in zookeeper
 -    admin.addCluster(CLUSTER_NAME);
 -```
 -
 -OR
 -
 -Using the command-line interface:
 -
 -```
 -    ./helix-admin.sh --zkSvr localhost:2199 --addCluster helix-demo 
 -```
 -
 -
 -### Configure the nodes of the cluster
 -
 -First we\'ll add new nodes to the cluster, then configure the nodes in the cluster. Each node in the cluster must be uniquely identifiable. 
 -The most commonly used convention is hostname:port.
 -
 -```
 -    String CLUSTER_NAME = "helix-demo";
 -    int NUM_NODES = 2;
 -    String hosts[] = new String[]{"localhost","localhost"};
 -    String ports[] = new String[]{7000,7001};
 -    for (int i = 0; i < NUM_NODES; i++)
 -    {
 -      
 -      InstanceConfig instanceConfig = new InstanceConfig(hosts[i]+ "_" + ports[i]);
 -      instanceConfig.setHostName(hosts[i]);
 -      instanceConfig.setPort(ports[i]);
 -      instanceConfig.setInstanceEnabled(true);
 -
 -      //Add additional system specific configuration if needed. These can be accessed during the node start up.
 -      instanceConfig.getRecord().setSimpleField("key", "value");
 -      admin.addInstance(CLUSTER_NAME, instanceConfig);
 -      
 -    }
 -```
 -
 -### Configure the resource
 -
 -A _resource_ represents the actual task performed by the nodes. It can be a database, index, topic, queue or any other processing entity.
 -A _resource_ can be divided into many sub-parts known as _partitions_.
 -
 -
 -#### Define the _state model_ and _constraints_
 -
 -For scalability and fault tolerance, each partition can have one or more replicas. 
 -The _state model_ allows one to declare the system behavior by first enumerating the various STATES, and the TRANSITIONS between them.
 -A simple model is ONLINE-OFFLINE where ONLINE means the task is active and OFFLINE means it\'s not active.
 -You can also specify how many replicas must be in each state, these are known as _constraints_.
 -For example, in a search system, one might need more than one node serving the same index to handle the load.
 -
 -The allowed states: 
 -
 -* MASTER
 -* SLAVE
 -* OFFLINE
 -
 -The allowed transitions: 
 -
 -* OFFLINE to SLAVE
 -* SLAVE to OFFLINE
 -* SLAVE to MASTER
 -* MASTER to SLAVE
 -
 -The constraints:
 -
 -* no more than 1 MASTER per partition
 -* the rest of the replicas should be slaves
 -
 -The following snippet shows how to declare the _state model_ and _constraints_ for the MASTER-SLAVE model.
 -
 -```
 -
 -    StateModelDefinition.Builder builder = new StateModelDefinition.Builder(STATE_MODEL_NAME);
 -
 -    // Add states and their rank to indicate priority. A lower rank corresponds to a higher priority
 -    builder.addState(MASTER, 1);
 -    builder.addState(SLAVE, 2);
 -    builder.addState(OFFLINE);
 -
 -    // Set the initial state when the node starts
 -    builder.initialState(OFFLINE);
 -
 -    // Add transitions between the states.
 -    builder.addTransition(OFFLINE, SLAVE);
 -    builder.addTransition(SLAVE, OFFLINE);
 -    builder.addTransition(SLAVE, MASTER);
 -    builder.addTransition(MASTER, SLAVE);
 -
 -    // set constraints on states.
 -
 -    // static constraint: upper bound of 1 MASTER
 -    builder.upperBound(MASTER, 1);
 -
 -    // dynamic constraint: R means it should be derived based on the replication factor for the cluster
 -    // this allows a different replication factor for each resource without 
 -    // having to define a new state model
 -    //
 -    builder.dynamicUpperBound(SLAVE, "R");
 -
 -    StateModelDefinition statemodelDefinition = builder.build();
 -    admin.addStateModelDef(CLUSTER_NAME, STATE_MODEL_NAME, myStateModel);
 -```
 -
 -#### Assigning partitions to nodes
 -
 -The final goal of Helix is to ensure that the constraints on the state model are satisfied. 
 -Helix does this by assigning a STATE to a partition (such as MASTER, SLAVE), and placing it on a particular node.
 -
 -There are 3 assignment modes Helix can operate on
 -
 -* AUTO_REBALANCE: Helix decides the placement and state of a partition.
 -* AUTO: Application decides the placement but Helix decides the state of a partition.
 -* CUSTOM: Application controls the placement and state of a partition.
 -
 -For more info on the assignment modes, see [Rebalancing Algorithms](./tutorial_rebalance.html) of the tutorial.
 -
 -```
 -    String RESOURCE_NAME = "MyDB";
 -    int NUM_PARTITIONS = 6;
 -    STATE_MODEL_NAME = "MasterSlave";
 -    String MODE = "AUTO";
 -    int NUM_REPLICAS = 2;
 -
 -    admin.addResource(CLUSTER_NAME, RESOURCE_NAME, NUM_PARTITIONS, STATE_MODEL_NAME, MODE);
 -    admin.rebalance(CLUSTER_NAME, RESOURCE_NAME, NUM_REPLICAS);
 -```
 -

http://git-wip-us.apache.org/repos/asf/incubator-helix/blob/6a2ddc6e/src/site/markdown/index.md
----------------------------------------------------------------------
diff --cc src/site/markdown/index.md
index 72a8a00,cbcb6ba..1aeeab5
--- a/src/site/markdown/index.md
+++ b/src/site/markdown/index.md
@@@ -22,63 -22,52 +22,113 @@@ Navigating the Documentatio
  
  ### Conceptual Understanding
  
- [Concepts / Terminology](./concepts.html)
- 
- [Architecture](./architecture.html)
+ [Concepts / Terminology](./Concepts.html)
++[Architecture](./Architecture.html)
 +
 +### Hands-on Helix
 +
- [Quickstart](./quickstart.html)
++[Quickstart](./Quickstart.html)
 +
- [Tutorial](./tutorial.html)
++[Tutorial](./Tutorial.html)
 +
 +  * [Chapter 1: Participant](./tutorial1.html)
 +  * [Chapter 2: Spectator](./tutorial2.html)
 +  * [Chapter 3: Controller](./tutorial3.html)
 +  * [Chapter 4: Rebalancing Algorithms](./tutorial4.html)
 +  * [Chapter 5: State Models](./tutorial5.html)
 +  * [Chapter 6: Messaging](./tutorial6.html)
 +  * [Chapter 7: Customized Health Check](./tutorial7.html)
 +  * [Chapter 8: Throttling](./tutorial8.html)
 +  * [Chapter 9: Admin Interface](./tutorial9.html)
 +  * [Chapter 10: Controller Deployment](./tutorial10.html)
 +
 +[Javadocs](http://helix.incubator.apache.org/apidocs/index.html)
 +
 +### Recipes
 +
 +[Distributed lock manager](./recipes/lock_manager.html)
 +
 +[Rabbit MQ consumer group](./recipes/rabbitmq_consumer_group.html)
 +
 +[Rsync replicated file store](./recipes/rsync_replicated_file_store.html)
 +
 +[Service discovery](./recipes/service_discovery.html)
 +
 +
 +What Is Helix
 +--------------
 +Helix is a generic _cluster management_ framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. 
 +
 +
 +What Is Cluster Management
 +--------------------------
 +To understand Helix, first you need to understand what is _cluster management_.  A distributed system typically runs on multiple nodes for the following reasons:
++=======
++
++[Concepts / Terminology](./Concepts.html)
+ 
+ [Architecture](./Architecture.html)
+ 
+ ### Hands-on Helix
+ 
+ [Quickstart](./Quickstart.html)
+ 
+ [Tutorial](./Tutorial.html)
+ 
+ [Javadocs](http://helix.incubator.apache.org/apidocs/index.html)
+ 
+ ### Recipes
+ 
+ [Distributed lock manager](./recipes/lock_manager.html)
+ 
+ [Rabbit MQ consumer group](./recipes/rabbitmq_consumer_group.html)
+ 
+ [Rsync replicated file store](./recipes/rsync_replicated_file_store.html)
+ 
+ [Service discovery](./recipes/service_discovery.html)
+ 
+ 
+ What Is Helix
+ --------------
+ Helix is a generic _cluster management_ framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. 
++>>>>>>> 93a8770e41e61337525234945193afc2280a3c3d
  
 +* scalability
 +* fault tolerance
 +* load balancing
 +
++<<<<<<< HEAD
 +Each node performs one or more of the primary function of the cluster, such as storing/serving data, producing/consuming data streams, etc.  Once configured for your system, Helix acts is the global brain for the system.  It is designed to make decisions that cannot be made in isolation.  Examples of decisions that require global knowledge and coordination:
 +
 +* scheduling of maintainence tasks, such as backups, garbage collection, file consolidation, index rebuilds
 +* repartitioning of data or resources across the cluster
 +* informing dependent systems of changes so they can react appropriately to cluster changes
 +* throttling system tasks and changes
  
 +While it is possible to integrate these functions into the distributed system, it complicates the code.  Helix has abstracted common cluster management tasks, enabling the system builder to model the desired behavior in a declarative state model, and let Helix manage the coordination.  The result is less new code to write, and a robust, highly operable system.
++=======
+ What Is Cluster Management
+ --------------------------
+ To understand Helix, first you need to understand what is _cluster management_.  A distributed system typically runs on multiple nodes for the following reasons:
+ 
+ * scalability
+ * fault tolerance
+ * load balancing
+ 
+ Each node performs one or more of the primary function of the cluster, such as storing/serving data, producing/consuming data streams, etc.  Once configured for your system, Helix acts as the global brain for the system.  It is designed to make decisions that cannot be made in isolation.  Examples of decisions that require global knowledge and coordination:
++>>>>>>> 93a8770e41e61337525234945193afc2280a3c3d
+ 
+ * scheduling of maintainence tasks, such as backups, garbage collection, file consolidation, index rebuilds
+ * repartitioning of data or resources across the cluster
+ * informing dependent systems of changes so they can react appropriately to cluster changes
+ * throttling system tasks and changes
+ 
++<<<<<<< HEAD
++=======
+ While it is possible to integrate these functions into the distributed system, it complicates the code.  Helix has abstracted common cluster management tasks, enabling the system builder to model the desired behavior in a declarative state model, and let Helix manage the coordination.  The result is less new code to write, and a robust, highly operable system.
  
  
++>>>>>>> 93a8770e41e61337525234945193afc2280a3c3d
  Key Features of Helix
  ---------------------
  1. Automatic assignment of resource/partition to nodes