You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by aw...@apache.org on 2018/05/23 23:20:25 UTC

kudu git commit: KUDU-2359: allow startup with missing data dirs

Repository: kudu
Updated Branches:
  refs/heads/master b7cf3b2e4 -> 006937d33


KUDU-2359: allow startup with missing data dirs

Context
-------
As a part of previous disk failure work, Kudu currently supports opening
the FS layout in the face of EIO, EROFS, etc. POSIX codes when reading
FS and block manager instance files. The directory manager, in such
cases, labels the data directory in which the bad file resides as failed
in memory. The check that enforces consistency between path instance
metadata files (PIMFs) accounts for such failed directories.

Separately, the introduction of the `kudu fs update_dirs` tool expanded
the logic used to open the FS layout to serve two additional purposes
when running the tool:
- To open the FS layout as the user specified it on a best-effort basis,
  ignoring any inconsistencies across PIMFs. This mode allows Kudu to
  stage the requested FS layout and test whether existing tablets would
  break under the given layout.
- To actually update the FS layout on disk to match the user input.
  Walking down the FS-layout-opening path in this mode, Kudu will
  create any missing files or directories it encounters along the way.
  In this mode, the PIMF consistency check is performed after updating
  the appropriate instance files.

The above behavior is currently encapsulated in FsManager::Open() and
the consequent call to DataDirManager::Open().

As mentioned in the JIRA, not accounted for by this previous work, a
disk failure can present itself as a failure to read directory entries
(POSIX code ENOENT), leading to NotFound errors. Therein lies some
conflict in the above logic:
- when opening the FS layout normally (i.e. not running the update tool)
  and a NotFound error is encountered (e.g. the user asks for
  --fs_data_dirs=/a,/b but only /a can be found), it would be desirable
  to treat the affected directory how we currently treat failed
  directories
- however, when updating directories, NotFound statuses are still meant
  to indicate missing directories to be created.

The questions then become, "How should we treat ENOENT/NotFound errors?
Should they be different than disk failures? If so, how?" There are a
couple options to consider:
1. Lump ENOENT in with the list of POSIX codes that signify a disk
   error. This change would mean that semantically, a failed directory
   is no different than a missing one, and all of the above mentioned
   codepaths must be updated to account for this. Additionally, this
   would affect all other codepaths that may yield ENOENT -- codepaths
   that previously triggered disk failure handling would trigger that
   handling for ENOENT errors.
2. Treat NotFound errors as a special case during update/open only. When
   reading FS or block manager instance files, extend disk error
   checking to also check for NotFound errors, and treat the errors
   similarly, creating in-memory "unhealthy" instances. To open the FS
   normally, these unhealthy instances will simply correspond to failed
   directories. To update the FS layout, we can leverage the fact that
   the FS update tool doesn't support running with any failed
   directories, attempting to create all directories and files for _all_
   unhealthy instances (both missing and failed) will correctly yield
   success in the face of only missing directories and failure in the
   presence of any failed directory.

This patch implements Option 2 instead of Option 1. ENOENT is still a
file-specific code, and so lumping it in to trigger DataDir-wide error
handling, even if it may be indicative of disk-wide failure, seems
inappropriate. It could also be argued that if NotFound errors are
discovered at server runtime (e.g. because some malicious user is
deleting files left and right), maybe the best course of action wouldn't
be to treat the directory as failed, but to crash Kudu altogether.

Notes on the new stuff
----------------------
The newly supported scenario is different than starting up Kudu with
extra or missing entries in `fs_data_dirs`, which is still not supported
unless running the update tool.

Examples:
- If an existing server were configured with --fs_data_dirs=/a,/b,/c,
  and it were restarted such that only /a,/b existed on disk, Kudu will
  start up and list /a,/b,/c, and note that /c is failed.
- If the above server were restarted with --fs_data_dirs=/a,/b, even if
  only /a,/b existed on disk, Kudu would fail to start up until running
  `kudu fs update_dirs <other flags> --fs_data_dirs=/a,/b`

Some changes in this patch include:
- methods involved in loading PIMFs now treat missing instance files as
  "unhealthy", the same way they treat files that fail due to disk
  errors
- DataDirManager::LoadInstances() has been updated to treat missing
  PIMFs as "unhealthy", the same way they treat PIMFs that yield disk
  errors, returning the successfully loaded, "unhealthy" instances
- a side effect of this is that all user-specified data dirs will have
  in-memory DataDirs created for them, even if they don't exist on disk
- various codepaths that previously ended FsManager::Open() with an
  IOError/Corruption because all drives were failed will now return
  NotFound, indicating Kudu should attempt to create a new FS layout.
  This means that a server that has lost all of its data dirs to disk
  failures is semantically equivalent to a brand new server; Kudu will
  attempt to create a new FS layout in these cases
- as a byproduct of the above changes, when opening the FS layout in
  ENFORCE_CONSISTENCY mode with an extra entry in `fs_data_dirs`, Kudu
  will fail later than before, at the integrity check, and yield an
  IOError instead of a NotFound error
- the UUID and UUID index assignment for missing directories has been
  updated when opening the speculative directory manager; see
  DataDirManager::Open() for more details.

Change-Id: I61a71265c3cc34a7b72320149770a814ec7f8351
Reviewed-on: http://gerrit.cloudera.org:8080/10340
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/006937d3
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/006937d3
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/006937d3

Branch: refs/heads/master
Commit: 006937d3338caeab76cac4768eedf685106ee4d6
Parents: b7cf3b2
Author: Andrew Wong <aw...@cloudera.com>
Authored: Thu May 3 16:00:34 2018 -0700
Committer: Andrew Wong <aw...@cloudera.com>
Committed: Wed May 23 23:18:34 2018 +0000

----------------------------------------------------------------------
 src/kudu/fs/block_manager_util.cc |  15 ++-
 src/kudu/fs/data_dirs-test.cc     |   5 +-
 src/kudu/fs/data_dirs.cc          | 196 +++++++++++++++++----------------
 src/kudu/fs/data_dirs.h           |  26 +++--
 src/kudu/fs/fs_manager-test.cc    | 158 ++++++++++++++++++++++----
 src/kudu/fs/fs_manager.cc         |   7 +-
 src/kudu/tools/kudu-tool-test.cc  |   2 +-
 7 files changed, 267 insertions(+), 142 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/006937d3/src/kudu/fs/block_manager_util.cc
----------------------------------------------------------------------
diff --git a/src/kudu/fs/block_manager_util.cc b/src/kudu/fs/block_manager_util.cc
index 6c7b5bd..c2afbec 100644
--- a/src/kudu/fs/block_manager_util.cc
+++ b/src/kudu/fs/block_manager_util.cc
@@ -49,8 +49,13 @@ using std::unordered_map;
 using std::vector;
 using strings::Substitute;
 
-// Evaluates 'status_expr' and if it results in a disk failure, logs a message
-// and fails the instance, returning with no error.
+// Evaluates 'status_expr' and if it results in a disk-failure error, logs a
+// message and marks the instance as unhealthy, returning with no error.
+//
+// Note: A disk failure may thwart attempts to read directory entries at the OS
+// level, leading to NotFound errors when reading the instance files. As such,
+// we treat missing instances the same way we treat those that yield more
+// blatant disk failure POSIX codes.
 //
 // Note: if a non-disk-failure error is produced, the instance will remain
 // healthy. These errors should be handled externally.
@@ -58,9 +63,9 @@ using strings::Substitute;
   const Status& _s = (status_expr); \
   if (PREDICT_FALSE(!_s.ok())) { \
     const Status _s_prepended = _s.CloneAndPrepend(msg); \
-    if (_s_prepended.IsDiskFailure()) { \
+    if (_s.IsNotFound() || _s.IsDiskFailure()) { \
       health_status_ = _s_prepended; \
-      LOG(ERROR) << "Instance failed: " << _s_prepended.ToString(); \
+      LOG(INFO) << "Instance is unhealthy: " << _s_prepended.ToString(); \
       return Status::OK(); \
     } \
     return _s_prepended; \
@@ -194,7 +199,7 @@ Status PathInstanceMetadataFile::CheckIntegrity(
     }
   }
   if (first_healthy == -1) {
-    return Status::IOError("All data directories are unhealthy");
+    return Status::NotFound("no healthy data directories found");
   }
 
   // Map of instance UUID to path instance structure. Tracks duplicate UUIDs.

http://git-wip-us.apache.org/repos/asf/kudu/blob/006937d3/src/kudu/fs/data_dirs-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/fs/data_dirs-test.cc b/src/kudu/fs/data_dirs-test.cc
index 7d35f09..f0955bd 100644
--- a/src/kudu/fs/data_dirs-test.cc
+++ b/src/kudu/fs/data_dirs-test.cc
@@ -415,12 +415,13 @@ TEST_F(DataDirManagerTest, TestOpenWithFailedDirs) {
   ASSERT_OK(OpenDataDirManager());
   ASSERT_EQ(kNumDirs - 1, dd_manager_->GetFailedDataDirs().size());
 
-  // Ensure that when all data directories fail, the server will crash.
+  // Ensure that when there are no healthy data directories, the open will
+  // yield an error.
   FLAGS_env_inject_eio_globs = JoinStrings(JoinPathSegmentsV(test_roots_, "**"), ",");
   Status s = DataDirManager::OpenExistingForTests(env_, test_roots_,
       DataDirManagerOptions(), &dd_manager_);
   ASSERT_STR_CONTAINS(s.ToString(), "could not open directory manager");
-  ASSERT_TRUE(s.IsIOError());
+  ASSERT_TRUE(s.IsNotFound());
 }
 
 class TooManyDataDirManagerTest : public DataDirManagerTest {

http://git-wip-us.apache.org/repos/asf/kudu/blob/006937d3/src/kudu/fs/data_dirs.cc
----------------------------------------------------------------------
diff --git a/src/kudu/fs/data_dirs.cc b/src/kudu/fs/data_dirs.cc
index 8e1acd8..49139ea 100644
--- a/src/kudu/fs/data_dirs.cc
+++ b/src/kudu/fs/data_dirs.cc
@@ -445,13 +445,13 @@ Status DataDirManager::Create() {
     all_uuids.emplace_back(uuid);
     root_uuid_pairs_to_create.emplace_back(r.path, std::move(uuid));
   }
-  RETURN_NOT_OK_PREPEND(CreateNewDataDirectoriesAndUpdateExistingOnes(
+  RETURN_NOT_OK_PREPEND(CreateNewDataDirectoriesAndUpdateInstances(
       std::move(root_uuid_pairs_to_create), {}, std::move(all_uuids)),
                         "could not create new data directories");
   return Status::OK();
 }
 
-Status DataDirManager::CreateNewDataDirectoriesAndUpdateExistingOnes(
+Status DataDirManager::CreateNewDataDirectoriesAndUpdateInstances(
     vector<pair<string, string>> root_uuid_pairs_to_create,
     vector<unique_ptr<PathInstanceMetadataFile>> instances_to_update,
     vector<string> all_uuids) {
@@ -493,7 +493,7 @@ Status DataDirManager::CreateNewDataDirectoriesAndUpdateExistingOnes(
   }
 
   // Update existing instances, if any.
-  RETURN_NOT_OK_PREPEND(UpdateExistingInstances(
+  RETURN_NOT_OK_PREPEND(UpdateInstances(
       std::move(instances_to_update), std::move(all_uuids)),
                         "could not update existing data directories");
 
@@ -508,7 +508,7 @@ Status DataDirManager::CreateNewDataDirectoriesAndUpdateExistingOnes(
   return Status::OK();
 }
 
-Status DataDirManager::UpdateExistingInstances(
+Status DataDirManager::UpdateInstances(
     vector<unique_ptr<PathInstanceMetadataFile>> instances_to_update,
     vector<string> new_all_uuids) {
   // Prepare a scoped cleanup for managing instance metadata copies.
@@ -531,6 +531,9 @@ Status DataDirManager::UpdateExistingInstances(
   WritableFileOptions opts;
   opts.sync_on_close = true;
   for (const auto& instance : instances_to_update) {
+    if (!instance->healthy()) {
+      continue;
+    }
     const string& instance_filename = instance->path();
     string copy_filename = instance_filename + kTmpInfix;
     RETURN_NOT_OK_PREPEND(env_util::CopyFile(
@@ -541,6 +544,9 @@ Status DataDirManager::UpdateExistingInstances(
 
   // Update existing instance metadata files with the new value of all_uuids.
   for (const auto& instance : instances_to_update) {
+    if (!instance->healthy()) {
+      continue;
+    }
     const string& instance_filename = instance->path();
     string copy_filename = instance_filename + kTmpInfix;
 
@@ -570,7 +576,6 @@ Status DataDirManager::UpdateExistingInstances(
 }
 
 Status DataDirManager::LoadInstances(
-    vector<string>* missing_roots,
     vector<unique_ptr<PathInstanceMetadataFile>>* loaded_instances) {
   LockMode lock_mode;
   if (!FLAGS_fs_lock_data_dirs) {
@@ -592,21 +597,16 @@ Status DataDirManager::LoadInstances(
     if (PREDICT_FALSE(!root.status.ok())) {
       instance->SetInstanceFailed(root.status);
     } else {
-      // This may return OK and mark 'instance' as failed.
-      Status s = instance->LoadFromDisk();
-      if (s.IsNotFound() &&
-          opts_.consistency_check != ConsistencyCheckBehavior::ENFORCE_CONSISTENCY) {
-        // A missing instance is not tolerated if we've been asked to enforce
-        // consistency checks.
-        missing_roots_tmp.emplace_back(root.path);
-        continue;
-      }
-      RETURN_NOT_OK_PREPEND(s, Substitute("could not load $0", instance_filename));
+      // This may return OK and mark 'instance' as unhealthy if the file could
+      // not be loaded (e.g. not found, disk errors).
+      RETURN_NOT_OK_PREPEND(instance->LoadFromDisk(),
+                            Substitute("could not load $0", instance_filename));
     }
 
     // Try locking the instance.
     if (instance->healthy() && lock_mode != LockMode::NONE) {
-      // This may return OK and mark 'instance' as failed.
+      // This may return OK and mark 'instance' as unhealthy if the file could
+      // not be locked due to non-locking issues (e.g. disk errors).
       Status s = instance->Lock();
       if (!s.ok()) {
         if (lock_mode == LockMode::OPTIONAL) {
@@ -622,11 +622,6 @@ Status DataDirManager::LoadInstances(
     loaded_instances_tmp.emplace_back(std::move(instance));
   }
 
-  // Check that at least a single data directory exists and is healthy.
-  if (loaded_instances_tmp.empty()) {
-    return Status::NotFound("could not open directory manager; none of the "
-                            "provided data directories could be found.");
-  }
   int num_healthy_instances = 0;
   for (const auto& instance : loaded_instances_tmp) {
     if (instance->healthy()) {
@@ -634,10 +629,9 @@ Status DataDirManager::LoadInstances(
     }
   }
   if (num_healthy_instances == 0) {
-    return Status::IOError("could not open directory manager; all data "
-                           "directories failed");
+    return Status::NotFound("could not open directory manager; no healthy "
+                            "data directories found");
   }
-  missing_roots->swap(missing_roots_tmp);
   loaded_instances->swap(loaded_instances_tmp);
   return Status::OK();
 }
@@ -646,9 +640,8 @@ Status DataDirManager::Open() {
   const int kMaxDataDirs = opts_.block_manager_type == "file" ? (1 << 16) - 1 : kint32max;
 
   // Find and load existing data directory instances.
-  vector<string> missing_roots;
   vector<unique_ptr<PathInstanceMetadataFile>> loaded_instances;
-  RETURN_NOT_OK(LoadInstances(&missing_roots, &loaded_instances));
+  RETURN_NOT_OK(LoadInstances(&loaded_instances));
 
   // Add new or remove existing data directories, if desired.
   if (opts_.consistency_check == ConsistencyCheckBehavior::UPDATE_ON_DISK) {
@@ -666,18 +659,19 @@ Status DataDirManager::Open() {
     vector<string> new_all_uuids;
     vector<pair<string, string>> root_uuid_pairs_to_create;
     for (const auto& i : loaded_instances) {
+      if (i->health_status().IsNotFound()) {
+        string uuid = gen.Next();
+        new_all_uuids.emplace_back(uuid);
+        root_uuid_pairs_to_create.emplace_back(DirName(i->dir()), std::move(uuid));
+        continue;
+      }
       RETURN_NOT_OK_PREPEND(
           i->health_status(),
           "found failed data directory while adding new data directories");
       new_all_uuids.emplace_back(i->metadata()->path_set().uuid());
     }
-    for (const auto& m : missing_roots) {
-      string uuid = gen.Next();
-      new_all_uuids.emplace_back(uuid);
-      root_uuid_pairs_to_create.emplace_back(m, std::move(uuid));
-    }
     RETURN_NOT_OK_PREPEND(
-        CreateNewDataDirectoriesAndUpdateExistingOnes(
+        CreateNewDataDirectoriesAndUpdateInstances(
             std::move(root_uuid_pairs_to_create),
             std::move(loaded_instances),
             std::move(new_all_uuids)),
@@ -688,9 +682,11 @@ Status DataDirManager::Open() {
     //
     // Note: 'loaded_instances' must be cleared to unlock the instance files.
     loaded_instances.clear();
-    missing_roots.clear();
-    RETURN_NOT_OK(LoadInstances(&missing_roots, &loaded_instances));
-    DCHECK(missing_roots.empty());
+    RETURN_NOT_OK(LoadInstances(&loaded_instances));
+    for (const auto& i : loaded_instances) {
+      RETURN_NOT_OK_PREPEND(i->health_status(),
+          "found failed data directory after updating data directories");
+    }
   }
 
   // Check the integrity of all loaded instances.
@@ -765,74 +761,80 @@ Status DataDirManager::Open() {
     InsertOrDie(&tablets_by_uuid_idx_map, idx, {});
   };
 
-  vector<DataDir*> unassigned_dirs;
-  int first_healthy = -1;
-  // Assign a uuid index to each healthy instance.
-  for (int dir = 0; dir < dds.size(); dir++) {
-    const auto& dd = dds[dir];
-    if (PREDICT_FALSE(!dd->instance()->healthy())) {
-      // Keep track of failed directories so we can assign them UUIDs later.
-      unassigned_dirs.push_back(dd.get());
-      continue;
-    }
-    if (first_healthy == -1) {
-      first_healthy = dir;
-    }
-    const PathSetPB& path_set = dd->instance()->metadata()->path_set();
-    int idx = -1;
-    for (int i = 0; i < path_set.all_uuids_size(); i++) {
-      if (path_set.uuid() == path_set.all_uuids(i)) {
-        idx = i;
-        break;
+  if (opts_.consistency_check != ConsistencyCheckBehavior::IGNORE_INCONSISTENCY) {
+    // If we're not in IGNORE_INCONSISTENCY mode, we're guaranteed that the
+    // healthy instances match from the above integrity check, so we can assign
+    // each healthy directory a UUID in accordance with its instance file.
+    //
+    // A directory may not have been assigned a UUID because its instance file
+    // could not be read, in which case, we track it and assign a UUID to it
+    // later if we can.
+    vector<DataDir*> unassigned_dirs;
+    int first_healthy = -1;
+    for (int dir = 0; dir < dds.size(); dir++) {
+      const auto& dd = dds[dir];
+      if (PREDICT_FALSE(!dd->instance()->healthy())) {
+        // Keep track of failed directories so we can assign them UUIDs later.
+        unassigned_dirs.push_back(dd.get());
+        continue;
       }
+      if (first_healthy == -1) {
+        first_healthy = dir;
+      }
+      const PathSetPB& path_set = dd->instance()->metadata()->path_set();
+      int idx = -1;
+      for (int i = 0; i < path_set.all_uuids_size(); i++) {
+        if (path_set.uuid() == path_set.all_uuids(i)) {
+          idx = i;
+          break;
+        }
+      }
+      if (idx == -1) {
+        return Status::IOError(Substitute(
+            "corrupt path set for data directory $0: uuid $1 not found in path set",
+            dd->dir(), path_set.uuid()));
+      }
+      if (idx > kMaxDataDirs) {
+        return Status::NotSupported(
+            Substitute("block manager supports a maximum of $0 paths", kMaxDataDirs));
+      }
+      insert_to_maps(idx, path_set.uuid(), dd.get());
     }
-    if (idx == -1) {
-      return Status::IOError(Substitute(
-          "corrupt path set for data directory $0: uuid $1 not found in path set",
-          dd->dir(), path_set.uuid()));
-    }
-    if (idx > kMaxDataDirs) {
-      return Status::NotSupported(
-          Substitute("block manager supports a maximum of $0 paths", kMaxDataDirs));
-    }
-    insert_to_maps(idx, path_set.uuid(), dd.get());
-  }
-  CHECK_NE(first_healthy, -1); // Guaranteed by LoadInstances().
-
-  // If the uuid index was not assigned, assign it to a failed directory. Use
-  // the path set from the first healthy instance.
-  PathSetPB path_set = dds[first_healthy]->instance()->metadata()->path_set();
-  int failed_dir_idx = 0;
-  for (int uuid_idx = 0; uuid_idx < path_set.all_uuids_size(); uuid_idx++) {
-    if (!ContainsKey(uuid_by_idx, uuid_idx)) {
-      const string& unassigned_uuid = path_set.all_uuids(uuid_idx);
-      if (failed_dir_idx == unassigned_dirs.size()) {
-        // This uuid's data directory is missing outright, which can only
-        // happen when opts_.consistency_check is IGNORE_INCONSISTENCY (i.e. if
-        // this DataDirManager is trialing a new data dir configuration that
-        // removes an existing data dir). In such a case, we don't bother
-        // tracking the missing directory at all.
-        //
-        // If opts_.consistency_check is UPDATE_ON_DISK, the on-disk instances
-        // were updated and reloaded above, ensuring that the list of data
-        // directories now excludes the missing directory.
-        //
-        // If opts_.consistency_check is ENFORCE_CONSISTENCY, the
-        // CheckIntegrity call above requires that on-disk instances and the
-        // list of data directories match, so there cannot be a missing data directory.
-        continue;
+    CHECK_NE(first_healthy, -1); // Guaranteed by LoadInstances().
+
+    // If the uuid index was not assigned, assign it to a failed directory. Use
+    // the path set from the first healthy instance.
+    PathSetPB path_set = dds[first_healthy]->instance()->metadata()->path_set();
+    int failed_dir_idx = 0;
+    for (int uuid_idx = 0; uuid_idx < path_set.all_uuids_size(); uuid_idx++) {
+      if (!ContainsKey(uuid_by_idx, uuid_idx)) {
+        const string& unassigned_uuid = path_set.all_uuids(uuid_idx);
+        insert_to_maps(uuid_idx, unassigned_uuid, unassigned_dirs[failed_dir_idx]);
+
+        // Record the directory as failed.
+        if (metrics_) {
+          metrics_->data_dirs_failed->IncrementBy(1);
+        }
+        InsertOrDie(&failed_data_dirs, uuid_idx);
+        failed_dir_idx++;
       }
-      insert_to_maps(uuid_idx, unassigned_uuid, unassigned_dirs[failed_dir_idx]);
-
-      // Record the directory as failed.
-      if (metrics_) {
-        metrics_->data_dirs_failed->IncrementBy(1);
+    }
+    CHECK_EQ(unassigned_dirs.size(), failed_dir_idx);
+  } else {
+    // If we are in IGNORE_INCONSISTENCY mode, all bets are off. The most we
+    // can do is make a best effort assignment of data dirs to UUIDs based on
+    // the ones that are healthy, and for the sake of completeness, assign
+    // artificial UUIDs to the unhealthy ones.
+    for (int dir = 0; dir < dds.size(); dir++) {
+      DataDir* dd = dds[dir].get();
+      if (dd->instance()->healthy()) {
+        insert_to_maps(dir, dd->instance()->metadata()->path_set().uuid(), dd);
+      } else {
+        insert_to_maps(dir, Substitute("<unknown uuid $0>", dir), dd);
+        InsertOrDie(&failed_data_dirs, dir);
       }
-      InsertOrDie(&failed_data_dirs, uuid_idx);
-      failed_dir_idx++;
     }
   }
-  CHECK_EQ(unassigned_dirs.size(), failed_dir_idx);
 
   data_dirs_.swap(dds);
   uuid_by_idx_.swap(uuid_by_idx);

http://git-wip-us.apache.org/repos/asf/kudu/blob/006937d3/src/kudu/fs/data_dirs.h
----------------------------------------------------------------------
diff --git a/src/kudu/fs/data_dirs.h b/src/kudu/fs/data_dirs.h
index 0398e15..2b2fc75 100644
--- a/src/kudu/fs/data_dirs.h
+++ b/src/kudu/fs/data_dirs.h
@@ -420,31 +420,33 @@ class DataDirManager {
 
   // Loads the instance files for each data directory.
   //
-  // If consistency checking is disabled, missing instance files are tolerated
-  // and will be written to 'missing_roots' on success.
+  // On success, 'loaded_instances' contains loaded instance objects. It also
+  // includes instance files that failed to load because they were missing or
+  // because of a disk failure; they are still considered "loaded" and are
+  // labeled unhealthy internally.
   //
-  // On success, 'loaded_instances' contains all successfully loaded instance files.
-  //
-  // Returns an error if an instance file fails to load or if all instance
-  // files are unhealthy.
-  Status LoadInstances(std::vector<std::string>* missing_roots,
+  // Returns an error if an instance file fails in an irreconcileable way (e.g.
+  // the file is locked), or if none of the instance files are healthy.
+  Status LoadInstances(
       std::vector<std::unique_ptr<PathInstanceMetadataFile>>* loaded_instances);
 
   // Initializes new data directories specified by 'root_uuid_pairs_to_create'
-  // and updates the on-disk instance files of existing data directories
-  // specified by 'instances_to_update' using the contents of 'all_uuids'.
+  // and updates the on-disk instance files of data directories specified by
+  // 'instances_to_update' using the contents of 'all_uuids', skipping any
+  // unhealthy instance files.
   //
   // Returns an error if any disk operations fail.
-  Status CreateNewDataDirectoriesAndUpdateExistingOnes(
+  Status CreateNewDataDirectoriesAndUpdateInstances(
       std::vector<std::pair<std::string, std::string>> root_uuid_pairs_to_create,
       std::vector<std::unique_ptr<PathInstanceMetadataFile>> instances_to_update,
       std::vector<std::string> all_uuids);
 
   // Updates the on-disk instance files specified by 'instances_to_update'
-  // using the contents of 'new_all_uuids'.
+  // using the contents of 'new_all_uuids', skipping any unhealthy instance
+  // files.
   //
   // Returns an error if any disk operations fail.
-  Status UpdateExistingInstances(
+  Status UpdateInstances(
       std::vector<std::unique_ptr<PathInstanceMetadataFile>> instances_to_update,
       std::vector<std::string> new_all_uuids);
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/006937d3/src/kudu/fs/fs_manager-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/fs/fs_manager-test.cc b/src/kudu/fs/fs_manager-test.cc
index 5a76883..6e1bfd3 100644
--- a/src/kudu/fs/fs_manager-test.cc
+++ b/src/kudu/fs/fs_manager-test.cc
@@ -40,6 +40,7 @@
 #include "kudu/gutil/map-util.h"
 #include "kudu/gutil/port.h"
 #include "kudu/gutil/stringprintf.h"
+#include "kudu/gutil/strings/join.h"
 #include "kudu/gutil/strings/substitute.h"
 #include "kudu/gutil/strings/util.h"
 #include "kudu/util/env.h"
@@ -428,7 +429,7 @@ TEST_F(FsManagerTestBase, TestOpenWithNoBlockManagerInstances) {
     new_opts.consistency_check = check_behavior;
     ReinitFsManagerWithOpts(new_opts);
     Status s = fs_manager()->Open();
-    ASSERT_STR_CONTAINS(s.ToString(), "none of the provided data directories could be found");
+    ASSERT_STR_CONTAINS(s.ToString(), "no healthy data directories found");
     ASSERT_TRUE(s.IsNotFound());
 
     // Once we supply the WAL directory as a data directory, we can open successfully.
@@ -438,25 +439,131 @@ TEST_F(FsManagerTestBase, TestOpenWithNoBlockManagerInstances) {
   }
 }
 
-TEST_F(FsManagerTestBase, TestOpenWithFailedDirs) {
-  // Successfully create an FS layout.
-  string wal_path = GetTestPath("wals");
-  vector<string> data_paths = { GetTestPath("data1"), GetTestPath("data2"), GetTestPath("data3") };
-  for (const string& path : data_paths) {
-    env_->CreateDir(path);
+// Test the behavior when we fail to open a data directory for some reason (its
+// mountpoint failed, it's missing, etc). Kudu should allow this and open up
+// with failed data directories listed.
+TEST_F(FsManagerTestBase, TestOpenWithUnhealthyDataDir) {
+  // Successfully create a multi-directory FS layout.
+  const string new_root = GetTestPath("new_root");
+  FsManagerOpts opts;
+  opts.wal_root = fs_root_;
+  opts.data_roots = { fs_root_, new_root };
+  opts.consistency_check = ConsistencyCheckBehavior::UPDATE_ON_DISK;
+  ReinitFsManagerWithOpts(opts);
+  ASSERT_OK(fs_manager()->Open());
+  string new_root_uuid;
+  ASSERT_TRUE(fs_manager()->dd_manager()->FindUuidByRoot(new_root, &new_root_uuid));
+
+  // Fail the new directory. Kudu should have no problem starting up with this
+  // and should list one as failed.
+  FLAGS_env_inject_eio_globs = JoinPathSegments(new_root, "**");
+  FLAGS_env_inject_eio = 1.0;
+  opts.consistency_check = ConsistencyCheckBehavior::ENFORCE_CONSISTENCY;
+  ReinitFsManagerWithOpts(opts);
+  ASSERT_OK(fs_manager()->Open());
+  ASSERT_EQ(1, fs_manager()->dd_manager()->GetFailedDataDirs().size());
+
+  // Now remove the new directory on disk. Similarly, Kudu should have no
+  // problem starting up and it should list one failed data directory.
+  FLAGS_env_inject_eio = 0;
+  ASSERT_OK(env_->DeleteRecursively(new_root));
+  ReinitFsManagerWithOpts(opts);
+  ASSERT_OK(fs_manager()->Open());
+  ASSERT_EQ(1, fs_manager()->dd_manager()->GetFailedDataDirs().size());
+
+  // Now let's simulate the operator replacing the drive. The update tool will
+  // be run and the new directory, even at the same mountpoint, will be
+  // assigned a new UUID.
+  //
+  // At this point, our remaining healthy instance file should know about two
+  // data directories. Kudu should detect one missing and create a new one.
+  // Let's update and ensure we get a new UUID.
+  opts.consistency_check = ConsistencyCheckBehavior::UPDATE_ON_DISK;
+  ReinitFsManagerWithOpts(opts);
+  ASSERT_OK(fs_manager()->Open());
+  ASSERT_EQ(0, fs_manager()->dd_manager()->GetFailedDataDirs().size());
+  string new_root_uuid_post_update;
+  ASSERT_TRUE(fs_manager()->dd_manager()->FindUuidByRoot(new_root, &new_root_uuid_post_update));
+  ASSERT_NE(new_root_uuid, new_root_uuid_post_update);
+
+  // Now let's try failing all the directories. Kudu should yield an error,
+  // complaining it couldn't find any healthy data directories.
+  FLAGS_env_inject_eio_globs = JoinStrings(JoinPathSegmentsV(opts.data_roots, "**"), ",");
+  FLAGS_env_inject_eio = 1.0;
+  opts.consistency_check = ConsistencyCheckBehavior::ENFORCE_CONSISTENCY;
+  ReinitFsManagerWithOpts(opts);
+  Status s = fs_manager()->Open();
+  ASSERT_TRUE(s.IsNotFound()) << s.ToString();
+  ASSERT_STR_CONTAINS(s.ToString(), "could not find a healthy instance file");
+
+  // Upon returning from FsManager::Open() with a NotFound error, Kudu will
+  // attempt to create a new FS layout. With bad mountpoints, this should fail.
+  s = fs_manager()->CreateInitialFileSystemLayout();
+  ASSERT_TRUE(s.IsIOError()) << s.ToString();
+  ASSERT_STR_CONTAINS(s.ToString(), "cannot create FS layout");
+
+  // The above behavior should be seen if the data directories are missing...
+  FLAGS_env_inject_eio = 0;
+  for (const auto& root : opts.data_roots) {
+    ASSERT_OK(env_->DeleteRecursively(root));
   }
-  vector<string> data_roots = { GetTestPath("data1/roots"), GetTestPath("data2/roots"),
-                                GetTestPath("data3/roots") };
-  ReinitFsManagerWithPaths(wal_path, data_roots);
+  ReinitFsManagerWithOpts(opts);
+  s = fs_manager()->Open();
+  ASSERT_TRUE(s.IsNotFound()) << s.ToString();
+  ASSERT_STR_CONTAINS(s.ToString(), "could not find a healthy instance file");
+
+  // ...except we should be able to successfully create a new FS layout.
   ASSERT_OK(fs_manager()->CreateInitialFileSystemLayout());
+  ASSERT_EQ(0, fs_manager()->dd_manager()->GetFailedDataDirs().size());
+}
 
-  // Now fail one of the directories.
-  FLAGS_crash_on_eio = false;
+// When we canonicalize a directory, we actually canonicalize the directory's
+// parent directory; as such, canonicalization can fail if the parent directory
+// can't be read (e.g. due to a disk error or because it's flat out missing).
+// In such cases, we should still be able to open the FS layout.
+TEST_F(FsManagerTestBase, TestOpenWithCanonicalizationFailure) {
+  // Create some parent directories and subdirectories.
+  const string dir1 = GetTestPath("test1");
+  const string dir2 = GetTestPath("test2");
+  ASSERT_OK(env_->CreateDir(dir1));
+  ASSERT_OK(env_->CreateDir(dir2));
+  const string subdir1 = GetTestPath("test1/subdir");
+  const string subdir2 = GetTestPath("test2/subdir");
+  FsManagerOpts opts;
+  opts.wal_root = subdir1;
+  opts.data_roots = { subdir1, subdir2 };
+  ReinitFsManagerWithOpts(opts);
+  ASSERT_OK(fs_manager()->CreateInitialFileSystemLayout());
+
+  // Fail the canonicalization by injecting errors to a parent directory.
+  ReinitFsManagerWithOpts(opts);
+  FLAGS_env_inject_eio_globs = JoinPathSegments(dir2, "**");
   FLAGS_env_inject_eio = 1.0;
-  FLAGS_env_inject_eio_globs = JoinPathSegments(Substitute("$0,$0", data_paths[1]), "**");
-  ReinitFsManagerWithPaths(wal_path, data_roots);
-  ASSERT_OK(fs_manager()->Open(nullptr));
+  ASSERT_OK(fs_manager()->Open());
+  ASSERT_EQ(1, fs_manager()->dd_manager()->GetFailedDataDirs().size());
+  FLAGS_env_inject_eio = 0;
+
+  // Now fail the canonicalization by deleting a parent directory. This
+  // simulates the mountpoint disappearing.
+  ASSERT_OK(env_->DeleteRecursively(dir2));
+  ReinitFsManagerWithOpts(opts);
+  ASSERT_OK(fs_manager()->Open());
   ASSERT_EQ(1, fs_manager()->dd_manager()->GetFailedDataDirs().size());
+
+  // In both of the above failures, the appropriate steps would be to run the
+  // update tool after ensuring the bad mountpoint is replaced with a healthy
+  // one. Until that happens, we won't be able to update the data dirs.
+  opts.consistency_check = ConsistencyCheckBehavior::UPDATE_ON_DISK;
+  ReinitFsManagerWithOpts(opts);
+  Status s = fs_manager()->Open();
+  ASSERT_TRUE(s.IsNotFound()) << s.ToString();
+  ASSERT_STR_CONTAINS(s.ToString(), "could not add new data directories");
+
+  // Let's try that again, but with the appropriate mountpoint/directory.
+  ASSERT_OK(env_->CreateDir(dir2));
+  ReinitFsManagerWithOpts(opts);
+  ASSERT_OK(fs_manager()->Open());
+  ASSERT_EQ(0, fs_manager()->dd_manager()->GetFailedDataDirs().size());
 }
 
 TEST_F(FsManagerTestBase, TestTmpFilesCleanup) {
@@ -598,8 +705,8 @@ TEST_F(FsManagerTestBase, TestAddRemoveDataDirs) {
   opts.data_roots = { fs_root_, new_path1 };
   ReinitFsManagerWithOpts(opts);
   Status s = fs_manager()->Open();
-  ASSERT_TRUE(s.IsNotFound());
-  ASSERT_STR_CONTAINS(s.ToString(), fs_manager()->GetInstanceMetadataPath(new_path1));
+  ASSERT_TRUE(s.IsIOError()) << s.ToString();
+  ASSERT_STR_CONTAINS(s.ToString(), "2 data directories provided, but expected 1");
 
   // This time allow new data directories to be created.
   opts.consistency_check = ConsistencyCheckBehavior::UPDATE_ON_DISK;
@@ -627,14 +734,22 @@ TEST_F(FsManagerTestBase, TestAddRemoveDataDirs) {
   ReinitFsManagerWithOpts(opts);
   ASSERT_OK(fs_manager()->Open());
   ASSERT_EQ(2, fs_manager()->dd_manager()->GetDataDirs().size());
+  ASSERT_EQ(0, fs_manager()->dd_manager()->GetFailedDataDirs().size());
 
-  // Try to open with a new data dir although an existing data dir has failed;
-  // this should fail.
+  // Open the FS layout with an existing, failed data dir; this should be fine,
+  // but should report a single failed directory.
   FLAGS_crash_on_eio = false;
   FLAGS_env_inject_eio = 1.0;
   FLAGS_env_inject_eio_globs = JoinPathSegments(new_path2, "**");
+  opts.consistency_check = ConsistencyCheckBehavior::ENFORCE_CONSISTENCY;
+  ReinitFsManagerWithOpts(opts);
+  ASSERT_OK(fs_manager()->Open());
+  ASSERT_EQ(1, fs_manager()->dd_manager()->GetFailedDataDirs().size());
 
+  // Now try to add a new data dir with an existing, failed data dir; this
+  // should fail.
   const string new_path3 = GetTestPath("new_path3");
+  opts.consistency_check = ConsistencyCheckBehavior::UPDATE_ON_DISK;
   opts.data_roots = { fs_root_, new_path2, new_path3 };
   ReinitFsManagerWithOpts(opts);
   s = fs_manager()->Open();
@@ -729,6 +844,7 @@ TEST_F(FsManagerTestBase, TestAddRemoveSpeculative) {
   ReinitFsManagerWithOpts(opts);
   ASSERT_OK(fs_manager()->Open());
   ASSERT_EQ(3, fs_manager()->dd_manager()->GetDataDirs().size());
+  ASSERT_EQ(1, fs_manager()->dd_manager()->GetFailedDataDirs().size());
 
   // Neither of those attempts should have changed the on-disk state. Verify
   // this by retrying all three combinations with consistency checking
@@ -750,8 +866,8 @@ TEST_F(FsManagerTestBase, TestAddRemoveSpeculative) {
       ASSERT_EQ(2, fs_manager()->dd_manager()->GetDataDirs().size());
     } else {
       // The third data directory has no instance file.
-      ASSERT_TRUE(s.IsNotFound()) << s.ToString();
-      ASSERT_STR_CONTAINS(s.ToString(), "No such file or directory");
+      ASSERT_TRUE(s.IsIOError()) << s.ToString();
+      ASSERT_STR_CONTAINS(s.ToString(), "3 data directories provided, but expected 2");
     }
   }
 }

http://git-wip-us.apache.org/repos/asf/kudu/blob/006937d3/src/kudu/fs/fs_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/fs/fs_manager.cc b/src/kudu/fs/fs_manager.cc
index 64f963f..a17ba7f 100644
--- a/src/kudu/fs/fs_manager.cc
+++ b/src/kudu/fs/fs_manager.cc
@@ -217,7 +217,7 @@ Status FsManager::Init() {
     string canonicalized;
     Status s = env_->Canonicalize(DirName(root), &canonicalized);
     if (PREDICT_FALSE(!s.ok())) {
-      if (s.IsDiskFailure()) {
+      if (s.IsNotFound() || s.IsDiskFailure()) {
         // If the directory fails to canonicalize due to disk failure, store
         // the non-canonicalized form and the returned error.
         canonicalized = DirName(root);
@@ -325,8 +325,7 @@ Status FsManager::Open(FsReport* report) {
     Status s = pb_util::ReadPBContainerFromPath(env_, GetInstanceMetadataPath(root.path),
                                                 pb.get());
     if (PREDICT_FALSE(!s.ok())) {
-      if (s.IsNotFound() &&
-          opts_.consistency_check != ConsistencyCheckBehavior::ENFORCE_CONSISTENCY) {
+      if (s.IsNotFound()) {
         missing_roots.emplace_back(root);
         continue;
       }
@@ -348,7 +347,7 @@ Status FsManager::Open(FsReport* report) {
   }
 
   if (!metadata_) {
-    return Status::Corruption("All instance files are missing");
+    return Status::NotFound("could not find a healthy instance file");
   }
 
   // Ensure all of the ancillary directories exist.

http://git-wip-us.apache.org/repos/asf/kudu/blob/006937d3/src/kudu/tools/kudu-tool-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/kudu-tool-test.cc b/src/kudu/tools/kudu-tool-test.cc
index 9f596a2..026b5d2 100644
--- a/src/kudu/tools/kudu-tool-test.cc
+++ b/src/kudu/tools/kudu-tool-test.cc
@@ -2494,7 +2494,7 @@ TEST_F(ToolTest, TestFsSwappingDirectoriesFailsGracefully) {
   Status s = RunTool(Substitute(
       "fs update_dirs --fs_wal_dir=$0 --fs_data_dirs=$1",
       wal_root, new_data_root_no_wal), nullptr, &stderr);
-  ASSERT_STR_CONTAINS(stderr, "none of the provided data directories could be found");
+  ASSERT_STR_CONTAINS(stderr, "no healthy data directories found");
 
   // If we instead try to add the directory to the existing list of
   // directories, Kudu should allow it.