You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by ad...@apache.org on 2017/09/14 04:34:18 UTC

kudu git commit: docs: split disk failure from disk config changes

Repository: kudu
Updated Branches:
  refs/heads/master eed74ee6c -> d45eb2700


docs: split disk failure from disk config changes

The administration notes commented on Kudu's handling of disk failures
with instructions to rebuild a tserver with a new directory
configuration. While related, these two are separate and should be
documented as such.

Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Reviewed-on: http://gerrit.cloudera.org:8080/7984
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/d45eb270
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/d45eb270
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/d45eb270

Branch: refs/heads/master
Commit: d45eb27006756780d38ed0e019842c4223f2dffd
Parents: eed74ee
Author: Andrew Wong <aw...@cloudera.com>
Authored: Wed Sep 6 14:40:41 2017 -0700
Committer: Adar Dembo <ad...@cloudera.com>
Committed: Thu Sep 14 04:32:39 2017 +0000

----------------------------------------------------------------------
 docs/administration.adoc | 55 ++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 21 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/d45eb270/docs/administration.adoc
----------------------------------------------------------------------
diff --git a/docs/administration.adoc b/docs/administration.adoc
index f3fad35..138809e 100644
--- a/docs/administration.adoc
+++ b/docs/administration.adoc
@@ -586,37 +586,50 @@ be done with the following command:
 $ kudu cluster ksck --checksum_scan --tables IntegrationTestBigLinkedList master-01.example.com,master-02.example.com,master-03.example.com
 ----
 
-[[disk_failure_recovery]]
-=== Recovering from Disk Failure
-
-// TODO(dan): revise this once KUDU-616 is fixed.
-Kudu tablet servers are not resilient to disk failure. When a disk containing a
-data directory or the write-ahead log (WAL) dies, the entire tablet server must
-be rebuilt. Kudu will automatically re-replicate tablets on other servers after
-a tablet server fails, but manual intervention is needed in order to restore the
-failed tablet server to a running state.
-
-The first step to restoring a tablet server after a disk failure is to replace
-the failed disk, or remove the failed disk from the data-directory and/or WAL
-configuration. Next, the contents of the data directories and WAL directory must
-be removed. For example, if the tablet server is configured with
-`--fs_wal_dir=/data/0/kudu-tserver-wal` and
+[[change_dir_config]]
+=== Changing Directory Configurations
+// TODO(awong): revise this when KUDU-2062 is fixed.
+Kudu does not allow for the addition or removal of directories on existing
+master or tablet servers. In order to start a server with a different directory
+configuration from what it was created with, the server needs to be rebuilt.
+
+WARNING: Before proceeding, ensure the contents of the directories are backed
+up, either as a copy or in the form of other tablet replicas.
+
+The first step to starting up a server with a new directory configuration is
+emptying all of the server's existing directories. For example, if a tablet
+server is configured with `--fs_wal_dir=/data/0/kudu-tserver-wal` and
 `--fs_data_dirs=/data/1/kudu-tserver,/data/2/kudu-tserver`, the following
-commands will remove the data directories and WAL directory contents:
+commands will remove the write-ahead-log (WAL) directory's and data
+directories' contents:
 
 [source,bash]
 ----
 $ rm -rf /data/0/kudu-tserver-wal/* /data/1/kudu-tserver/* /data/2/kudu-tserver/*
 ----
 
-After the WAL and data directories are emptied, the tablet server process can be
-started. When Kudu is installed using system packages, `service` is typically
-used:
+After the WAL and data directories are emptied, and any new directories are
+created with the appropriate permissions, the server process can be
+started with the new directory configuration. When Kudu is installed using
+system packages, `service` is typically used:
 
 [source,bash]
 ----
 $ sudo service kudu-tserver start
 ----
 
-Once the tablet server is running again, new tablet replicas will be created on
-it as necessary.
+[[disk_failure_recovery]]
+=== Recovering from Disk Failure
+
+// TODO(awong): revise this when KUDU-616 is fixed.
+Kudu tablet servers are not resilient to disk failure. When a disk containing a
+data directory or WAL fails, the server will crash, and the entire server must
+be rebuilt. Kudu will automatically re-replicate tablets on other servers after
+a tablet server fails, but manual intervention is needed in order to restore the
+failed tablet server to a running state.
+
+To rebuild the tablet server after a disk failure, the failed disk needs to be
+replaced or removed from the data-directory and/or WAL configuration. Once this
+is complete the server needs to be rebuilt with this new configuration. See the
+section on <<change_dir_config,Changing Directory Configurations>> for more
+details.