You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by ne...@apache.org on 2014/03/15 21:45:31 UTC

svn commit: r1577937 - /kafka/site/081/ops.html

Author: nehanarkhede
Date: Sat Mar 15 20:45:30 2014
New Revision: 1577937

URL: http://svn.apache.org/r1577937
Log:
Improved documentation for partition reassignment in 0.8.1

Modified:
    kafka/site/081/ops.html

Modified: kafka/site/081/ops.html
URL: http://svn.apache.org/viewvc/kafka/site/081/ops.html?rev=1577937&r1=1577936&r2=1577937&view=diff
==============================================================================
--- kafka/site/081/ops.html (original)
+++ kafka/site/081/ops.html Sat Mar 15 20:45:30 2014
@@ -112,16 +112,34 @@ my-group        my-topic                
 
 <h4><a id="basic_ops_cluster_expansion">Expanding your cluster</a></h4>
 
-Adding servers to a Kafka cluster is easy, just assign them a unique broker id and start up Kafka on your the new servers. However these new servers will not automatically be assigned any data partitions, so unless partitions are moved to them they won't be doing any work until new topics are created. So usually when you add machines to your cluster you will want to migrate some existing data to these machines.
+Adding servers to a Kafka cluster is easy, just assign them a unique broker id and start up Kafka on your new servers. However these new servers will not automatically be assigned any data partitions, so unless partitions are moved to them they won't be doing any work until new topics are created. So usually when you add machines to your cluster you will want to migrate some existing data to these machines.
 <p>
 The process of migrating data is manually initiated but fully automated. Under the covers what happens is that Kafka will add the new server as a follower of the partition it is migrating and allow it to fully replicate the existing data in that partition. When the new server has fully replicated the contents of this partition and joined the in-sync replica one of the existing replicas will delete their partition's data.
-
+<p>
+The partition reassignment tool can be used to move partitions across brokers. An ideal partition distribution would ensure even data load and partition sizes across all brokers. In 0.8.1, the partition reassignment tool does not have the capability to automatically study the data distribution in a Kafka cluster and move partitions around to attain an even load distribution. As such, the admin has to figure out which topics or partitions should be moved around. 
+<p>
+The partition reassignment tool can run in 3 mutually exclusive modes -
+<ul>
+<li>--generate: In this mode, given a list of topics and a list of brokers, the tool generates a candidate reassignment to move all partitions of the specified topics to the new brokers. This option merely provides a convenient way to generate a partition reassignment plan given a list of topics and target brokers.</li>
+<li>--execute: In this mode, the tool kicks off the reassignment of partitions based on the user provided reassignment plan. (using the --reassignment-json-file option). This can either be a custom reassignment plan hand crafted by the admin or provided by using the --generate option</li>
+<li>--verify: In this mode, the tool verifies the status of the reassignment for all partitions listed during the last --execute. The status can be either of successfully completed, failed or in progress</li>
+</ul>
 <h5>Automatically migrating data to new machines</h5>
-The partition reassignment tool can be used to move some topics off of the current set of brokers to the newly added brokers. When used to do this, the user should provide a list of topics that should be moved to the new set of brokers and a target list of new brokers. The tool then evenly distributes all partitions for the given list of topics to the new set of brokers. During this move, the replication factor of the topic is kept constant. Effectively the replicas for all partitions are moved from the old set of brokers to the newly added brokers. 
-
-For example, the following will move all partitions for topics foo1,foo2 to the new set of brokers 5,6. At the end of this move, all partitions for topics foo1 and foo2 will only exist on brokers 5,6
+The partition reassignment tool can be used to move some topics off of the current set of brokers to the newly added brokers. This is typically useful while expanding an existing cluster since it is easier to move entire topics to the new set of brokers, than moving one partition at a time. When used to do this, the user should provide a list of topics that should be moved to the new set of brokers and a target list of new brokers. The tool then evenly distributes all partitions for the given list of topics across the new set of brokers. During this move, the replication factor of the topic is kept constant. Effectively the replicas for all partitions for the input list of topics are moved from the old set of brokers to the newly added brokers. 
+<p>
+For example, the following example will move all partitions for topics foo1,foo2 to the new set of brokers 5,6. At the end of this move, all partitions for topics foo1 and foo2 will <i>only</i> exist on brokers 5,6
+<p>
+Since, the tool accepts the input list of topics as a json file, you first need to identify the topics you want to move and create the json file as follows-
+<pre>
+> cat topics-to-move.json
+{"topics":
+     [{"topic": "foo1"},{"topic": "foo2"}],
+     "version":1
+}
+</pre>
+Once the json file is ready, use the partition reassignment tool to generate a candidate assignment-
 <pre>
-bin/kafka-reassign-partitions.sh --topics-to-move-json-file topics-to-move.json --broker-list "5,6" --generate 
+> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file topics-to-move.json --broker-list "5,6" --generate 
 Current partition replica assignment
 
 {"version":1,"partitions":[{"topic":"foo1","partition":2,"replicas":[1,2]},{"topic":"foo1","partition":0,"replicas":[3,4]},{"topic":"foo2","partition":2,"replicas":[1,2]},{"topic":"foo2","partition":0,"replicas":[3,4]},{"topic":"foo1","partition":1,"replicas":[2,3]},{topic":"foo2","partition":1,"replicas":[2,3]}]}
@@ -129,18 +147,11 @@ Current partition replica assignment
 Proposed partition reassignment configuration
 
 {"version":1,"partitions":[{"topic":"foo1","partition":2,"replicas":[5,6]},{"topic":"foo1","partition":0,"replicas":[5,6]},{"topic":"foo2","partition":2,"replicas":[5,6]},{"topic":"foo2","partition":0,"replicas":[5,6]},{"topic":"foo1","partition":1,"replicas":[5,6]},{topic":"foo2","partition":1,"replicas":[5,6]}]}
-
-cat topics-to-move.json
-{"topics":
-     [{"topic": "foo1"},{"topic": "foo2"}],
-     "version":1
-}
 </pre>
 <p>
-The tool generates a candidate assignment that will move all partitions from topics foo1,foo2 to brokers 5,6. Note, however, that at this point, the partition movement has not started, it merely tells you the current assignment and the proposed new assignment. The current assignment should be saved in case you want to rollback to it. The new assignment should be input to the tool with the --execute option as follows
-
+The tool generates a candidate assignment that will move all partitions from topics foo1,foo2 to brokers 5,6. Note, however, that at this point, the partition movement has not started, it merely tells you the current assignment and the proposed new assignment. The current assignment should be saved in case you want to rollback to it. The new assignment should be saved in a json file (e.g. expand-cluster-reassignment.json) to be input to the tool with the --execute option as follows-
 <pre>
-bin/kafka-reassign-partitions.sh --reassignment-json-file expand-cluster-reassignment.json --execute
+> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file expand-cluster-reassignment.json --execute
 Current partition replica assignment
 
 {"version":1,"partitions":[{"topic":"foo1","partition":2,"replicas":[1,2]},{"topic":"foo1","partition":0,"replicas":[3,4]},{"topic":"foo2","partition":2,"replicas":[1,2]},{"topic":"foo2","partition":0,"replicas":[3,4]},{"topic":"foo1","partition":1,"replicas":[2,3]},{topic":"foo2","partition":1,"replicas":[2,3]}]}
@@ -150,10 +161,9 @@ Successfully started reassignment of par
 {"version":1,"partitions":[{"topic":"foo1","partition":2,"replicas":[5,6]},{"topic":"foo1","partition":0,"replicas":[5,6]},{"topic":"foo2","partition":2,"replicas":[5,6]},{"topic":"foo2","partition":0,"replicas":[5,6]},{"topic":"foo1","partition":1,"replicas":[5,6]},{topic":"foo2","partition":1,"replicas":[5,6]}]}
 </pre>
 <p>
-The --verify option can be used with the tool to check the status of the partition reassignment. Note that the same expand-cluster-reassignment.json (used with the --execute option) should be used with the --verify option
-
+Finally, the --verify option can be used with the tool to check the status of the partition reassignment. Note that the same expand-cluster-reassignment.json (used with the --execute option) should be used with the --verify option
 <pre>
-bin/kafka-reassign-partitions.sh --reassignment-json-file expand-cluster-reassignment.json --verify
+> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file expand-cluster-reassignment.json --verify
 Status of partition reassignment:
 Reassignment of partition [foo1,0] completed successfully
 Reassignment of partition [foo1,1] is in progress
@@ -164,15 +174,18 @@ Reassignment of partition [foo2,2] compl
 </pre>
 
 <h5>Custom partition assignment and migration</h5>
-The partition reassignment tool can also be used to selectively move replicas of a partition to a specific set of brokers. When used in this manner, it is assumed that the user knows the reassignment and does not require the tool to generate a candidate reassignment, effectively skipping the --generate step and moving straight to the --execute step
+The partition reassignment tool can also be used to selectively move replicas of a partition to a specific set of brokers. When used in this manner, it is assumed that the user knows the reassignment plan and does not require the tool to generate a candidate reassignment, effectively skipping the --generate step and moving straight to the --execute step
 <p>
-For example, the following moves partition 0 of topic foo1 to brokers 5,6 and partition 1 of topic foo2 to brokers 2,3
-
+For example, the following example moves partition 0 of topic foo1 to brokers 5,6 and partition 1 of topic foo2 to brokers 2,3
+<p>
+The first step is to hand craft the custom reassignment plan in a json file-
 <pre>
-cat custom-reassignment.json
+> cat custom-reassignment.json
 {"version":1,"partitions":[{"topic":"foo1","partition":0,"replicas":[5,6]},{"topic":"foo2","partition":1,"replicas":[2,3]}]}
-
-bin/kafka-reassign-partitions.sh --reassignment-json-file custom-reassignment.json --execute
+</pre>
+Then, use the json file with the --execute option to start the reassignment process-
+<pre>
+> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file custom-reassignment.json --execute
 Current partition replica assignment
 
 {"version":1,"partitions":[{"topic":"foo1","partition":0,"replicas":[1,2]},{"topic":"foo2","partition":1,"replicas":[3,4]}]}
@@ -183,16 +196,15 @@ Successfully started reassignment of par
 </pre>
 <p>
 The --verify option can be used with the tool to check the status of the partition reassignment. Note that the same expand-cluster-reassignment.json (used with the --execute option) should be used with the --verify option
-
 <pre>
-bin/kafka-reassign-partitions.sh --reassignment-json-file custom-reassignment.json --verify
+bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file custom-reassignment.json --verify
 Status of partition reassignment:
 Reassignment of partition [foo1,0] completed successfully
 Reassignment of partition [foo2,1] completed successfully 
 </pre>
 
 <h5>Decommissioning machines</h5>
-The partition reassignment tool does not have the ability to decommission machines yet. As such, you cannot decommission machines without effectively reducing the replication factor of the partitions that existed on the decommissioned machine. We plan to add support for this in 0.8.2
+The partition reassignment tool does not have the ability to automatically generate a reassignment plan for decommissioning brokers yet. As such, the admin has to come up with a reassignment plan to move the replica for all partitions hosted on the broker to be decommissioned, to the rest of the brokers. This can be relatively tedious as the reassignment needs to ensure that all the replicas are not moved from the decommissioned broker to only one other broker. To make this process effortless, we plan to add tooling support for decommissioning brokers in 0.8.2.
 
 <h3><a id="datacenters">6.2 Datacenters</a></h3>