You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by mp...@apache.org on 2017/09/18 19:31:26 UTC
kudu git commit: docs: clarify steps for changing master from multi-master deployment

Repository: kudu
Updated Branches:
  refs/heads/master 0f0f54eef -> 41a41fdf7


docs: clarify steps for changing master from multi-master deployment

The current docs for multi-master migration discuss moving up from a
single-master deployment to multi-master, but some users may want to
move in the other direction. We've had to rely on the existing docs and
have these users use their imagination to go through this. I've added
docs specifying the process and parameters to do so.

Additionally, this patch clarifies steps for multi-master recovery in
case the cluster was configured without DNS aliases.

Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Reviewed-on: http://gerrit.cloudera.org:8080/8032
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-by: Mike Percy <mp...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/41a41fdf
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/41a41fdf
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/41a41fdf

Branch: refs/heads/master
Commit: 41a41fdf79f641f3d43f4730d533a3df4796a689
Parents: 0f0f54e
Author: Andrew Wong <aw...@cloudera.com>
Authored: Wed Sep 6 15:59:14 2017 -0700
Committer: Mike Percy <mp...@apache.org>
Committed: Mon Sep 18 19:31:04 2017 +0000

----------------------------------------------------------------------
 docs/administration.adoc | 86 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 77 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/41a41fdf/docs/administration.adoc
----------------------------------------------------------------------
diff --git a/docs/administration.adoc b/docs/administration.adoc
index 138809e..26d2743 100644
--- a/docs/administration.adoc
+++ b/docs/administration.adoc
@@ -254,8 +254,8 @@ $ kudu fs dump uuid --fs_wal_dir=/var/lib/kudu/master 2>/dev/null
   or an alias in /etc/hosts. The alias should be an abstract representation of the master (e.g.
   `master-1`).
 +
-WARNING: Without DNS aliases it is not possible to recover from permanent master failures, and as
-such it is highly recommended.
+WARNING: Without DNS aliases it is not possible to recover from permanent master failures without
+bringing the cluster down for maintenance, and as such, it is highly recommended.
 +
 . Perform the following preparatory steps for each new master:
 * Choose an unused machine in the cluster. The master generates very little load so it can be
@@ -267,6 +267,7 @@ such it is highly recommended.
 * Choose and record the port the master should use for RPCs.
 * Optional: configure a DNS alias for the master (e.g. `master-2`, `master-3`, etc).
 
+[[perform-the-migration]]
 ==== Perform the migration
 
 . Stop all the Kudu processes in the entire cluster.
@@ -379,11 +380,12 @@ master.
 
 Due to https://issues.apache.org/jira/browse/KUDU-1620[KUDU-1620], it is not possible to perform
 this workflow without also restarting the live masters. As such, the workflow requires a
-maintenance window, albeit a brief one as masters generally restart quickly.
+maintenance window, albeit a potentially brief one if the cluster was set up with DNS aliases.
 
-WARNING: Kudu does not yet support Raft configuration changes for masters. As such, it is only
-possible to replace a master if the deployment was created with DNS aliases. See the
-<<migrate_to_multi_master,multi-master migration workflow>> for more details.
+WARNING: Kudu does not yet support live Raft configuration changes for masters. As such, it is only
+possible to replace a master if the deployment was created with DNS aliases or if every node in the
+cluster is first shut down. See the <<migrate_to_multi_master,multi-master migration workflow>> for
+more details on deploying with DNS aliases.
 
 WARNING: The workflow presupposes at least basic familiarity with Kudu configuration management. If
 using Cloudera Manager (CM), the workflow also presupposes familiarity with it.
@@ -393,6 +395,11 @@ WARNING: All of the command line steps below should be executed as the Kudu UNIX
 
 ==== Prepare for the recovery
 
+. If the deployment was configured without DNS aliases perform the following steps:
+* Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
+  will be unavailable.
+* Shut down all Kudu tablet server processes in the cluster.
+
 . Ensure that the dead master is well and truly dead. Take whatever steps needed to prevent it from
   accidentally restarting; this can be quite dangerous for the cluster post-recovery.
 
@@ -503,12 +510,19 @@ $ kudu local_replica copy_from_remote --fs_wal_dir=/var/lib/kudu/master 00000000
   master's alias.
 * Add the port number (separated by a colon) if using a non-default RPC port value.
 
-. Reconfigure the DNS alias for the dead master to point at the replacement master.
+. If the cluster was set up with DNS aliases, reconfigure the DNS alias for the dead master to point
+  at the replacement master.
+
+. If the cluster was set up without DNS aliases, perform the following steps:
+* Stop the remaining live masters.
+* Rewrite the Raft configurations on these masters to include the replacement master. See Step 4 of
+  <<perform-the-migration, Perform the Migration>> for more details.
 
 . Start the replacement master.
 
-. Restart the existing live masters. This results in a brief availability outage, but it should
-  last only as long as it takes for the masters to come back up.
+. Restart the remaining masters in the new multi-master deployment. While the masters are shut down,
+  there will be an availability outage, but it should last only as long as it takes for the masters
+  to come back up.
 
 Congratulations, the dead master has been replaced! To verify that all masters are working properly,
 consider performing the following sanity checks:
@@ -520,6 +534,60 @@ consider performing the following sanity checks:
 * Run a Kudu system check (ksck) on the cluster using the `kudu` command line
   tool. See <<ksck>> for more details.
 
+=== Removing Kudu Masters from a Multi-Master Deployment
+
+In the event that a multi-master deployment has been overallocated nodes, the following steps should
+be taken to remove the unwanted masters.
+
+WARNING: In planning the new multi-master configuration, keep in mind that the number of masters
+should be odd and that three or five node master configurations are recommended.
+
+WARNING: Dropping the number of masters below the number of masters currently needed for a Raft
+majority can incur data loss. To mitigate this, ensure that the leader master is not removed during
+this process.
+
+==== Prepare for the removal
+
+. Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
+will be unavailable.
+
+. Identify the UUID and RPC address current leader of the multi-master deployment by visiting the
+`/masters` page of any master's web UI. This master must not be removed during this process; its
+removal may result in severe data loss.
+
+. Stop all the Kudu processes in the entire cluster.
+
+. If using CM, remove the unwanted Kudu master.
+
+==== Perform the removal
+
+. Rewrite the Raft configuration on the remaining masters to include only the remaining masters. See
+Step 4 of <<perform-the-migration,Perform the Migration>> for more details.
+
+. Remove the data directories and WAL directory on the unwanted masters. This is a precaution to
+ensure that they cannot start up again and interfere with the new multi-master deployment.
+
+. Modify the value of the `master_addresses` configuration parameter for the masters of the new
+multi-master deployment. If migrating to a single-master deployment, the `master_addresses` flag
+should be omitted entirely.
+
+. Start all of the masters that were not removed.
+
+. Modify the value of the `tserver_master_addrs` configuration parameter for the tablet servers to
+remove any unwanted masters.
+
+. Start all of the tablet servers.
+
+Congratulations, the masters have now been removed! To verify that all masters are working properly,
+consider performing the following sanity checks:
+
+* Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should
+  be listed there with one master in the LEADER role and the others in the FOLLOWER role. The
+  contents of /masters on each master should be the same.
+
+* Run a Kudu system check (ksck) on the cluster using the `kudu` command line
+  tool. See <<ksck>> for more details.
+
 [[ksck]]
 === Checking Cluster Health with `ksck`