You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by mp...@apache.org on 2017/09/08 04:49:50 UTC

[1/2] kudu git commit: web ui: fix "percentage consumed" calculation

Repository: kudu
Updated Branches:
  refs/heads/master c85ac94b1 -> 2108767bf


web ui: fix "percentage consumed" calculation

There was a misplaced cast, so the division of consumption by limit was
using an integer rather than floating point calculation. This results in
the "percentage consumed" always showing 0.

Change-Id: I07d5b7d5f44548120a9b31bfef43e23051e27d8e
Reviewed-on: http://gerrit.cloudera.org:8080/7987
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/98b308fd
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/98b308fd
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/98b308fd

Branch: refs/heads/master
Commit: 98b308fd82df5ee9b0f3cdcfc1748137e23c3efc
Parents: c85ac94
Author: Todd Lipcon <to...@apache.org>
Authored: Wed Sep 6 17:02:52 2017 -0700
Committer: Todd Lipcon <to...@apache.org>
Committed: Fri Sep 8 00:58:01 2017 +0000

----------------------------------------------------------------------
 src/kudu/server/default-path-handlers.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/98b308fd/src/kudu/server/default-path-handlers.cc
----------------------------------------------------------------------
diff --git a/src/kudu/server/default-path-handlers.cc b/src/kudu/server/default-path-handlers.cc
index 1d53456..d3050f1 100644
--- a/src/kudu/server/default-path-handlers.cc
+++ b/src/kudu/server/default-path-handlers.cc
@@ -167,7 +167,7 @@ static void MemTrackersHandler(const Webserver::WebRequest& /*req*/, std::ostrin
   *output << Substitute("  <tr><th>Memory limit</th><td>$0</td></tr>\n",
                         HumanReadableNumBytes::ToString(hard_limit));
   if (hard_limit > 0) {
-    double percentage = 100 * static_cast<double>(current_consumption / hard_limit);
+    double percentage = 100 * static_cast<double>(current_consumption) / hard_limit;
     *output << Substitute("  <tr><th>Percentage consumed</th><td>$0%</td></tr>\n",
                           StringPrintf("%.2f", percentage));
   }


[2/2] kudu git commit: KUDU-2123. Auto-vivify cmeta on tombstoned replicas if doesn't exist at startup

Posted by mp...@apache.org.
KUDU-2123. Auto-vivify cmeta on tombstoned replicas if doesn't exist at startup

It is possible for tombstoned replicas to legitimately not have a cmeta
file as a result of crashing during a first tablet copy, or failing a
tablet copy operation in an older version of Kudu. Not having a cmeta
file results in those tombstoned replicas being unable to vote in Raft
leader elections. We remedy this by creating a cmeta object (with an
empty config) at startup time. The empty config is safe for a tombstoned
replica, because the config doesn't affect a replica's ability to vote
in a leader election. Additionally, if the tombstoned replica were ever
to be overwritten by a tablet copy operation, that would also result in
overwriting the config stored in the local cmeta with a valid Raft
config. Finally, all of this assumes that the nonexistence of a cmeta
file guarantees that the replica has never voted in a leader election.

As an optimization, the cmeta is created with the NO_FLUSH_ON_CREATE
flag, meaning that it will only be flushed to disk if the replica ever
votes.

The following changes had to be made to ConsensusMetadata and the
ConsensusMetadataManager to support the above functionality:

* Enable deferred flush on Create() by defining a flag called
  NO_FLUSH_ON_CREATE
* Made some additional method arguments optional, for convenience.

The following tests have been added:

* A unit test for ConsensusMetadataManager::LoadOrCreate().
* A unit test for ConsensusMetadataCreateMode::NO_FLUSH_ON_CREATE.
* A test that crashes the target of a tablet copy after writing the
  superblock and before writing the cmeta file. The tablet server is
  restarted and the replica is expected to be able to vote while
  tombstoned.

Previously-written tests that verify ConsensusMetadata::Create() will
not clobber an existing file still pass, and an additional test was
added for unflushed cmeta instances.

Change-Id: I8ff6255b1fcbb12417b82853bcde9b239291492b
Reviewed-on: http://gerrit.cloudera.org:8080/7988
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/2108767b
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/2108767b
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/2108767b

Branch: refs/heads/master
Commit: 2108767bf5331e0f3beccd56a987cb413cca380a
Parents: 98b308f
Author: Mike Percy <mp...@apache.org>
Authored: Tue Sep 5 15:50:00 2017 -0700
Committer: Mike Percy <mp...@apache.org>
Committed: Fri Sep 8 04:49:14 2017 +0000

----------------------------------------------------------------------
 src/kudu/consensus/consensus_meta-test.cc       |  47 ++++--
 src/kudu/consensus/consensus_meta.cc            |  21 ++-
 src/kudu/consensus/consensus_meta.h             |  19 ++-
 .../consensus/consensus_meta_manager-test.cc    |  68 ++++++--
 src/kudu/consensus/consensus_meta_manager.cc    |  29 +++-
 src/kudu/consensus/consensus_meta_manager.h     |  24 ++-
 .../consensus/raft_consensus_quorum-test.cc     |   3 +-
 src/kudu/integration-tests/CMakeLists.txt       |   1 +
 .../tombstoned_voting-itest.cc                  | 166 +++++++++++++++++++
 src/kudu/master/sys_catalog.cc                  |   3 +-
 src/kudu/tablet/tablet_bootstrap-test.cc        |   3 +-
 src/kudu/tablet/tablet_replica-test.cc          |   6 +-
 src/kudu/tserver/tablet_copy_client.cc          |  15 +-
 .../tserver/tablet_copy_source_session-test.cc  |   7 +-
 src/kudu/tserver/ts_tablet_manager.cc           |  25 ++-
 15 files changed, 370 insertions(+), 67 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/consensus/consensus_meta-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/consensus_meta-test.cc b/src/kudu/consensus/consensus_meta-test.cc
index 5598702..297dfa5 100644
--- a/src/kudu/consensus/consensus_meta-test.cc
+++ b/src/kudu/consensus/consensus_meta-test.cc
@@ -94,9 +94,8 @@ void ConsensusMetadataTest::AssertValuesEqual(const scoped_refptr<ConsensusMetad
 TEST_F(ConsensusMetadataTest, TestCreateLoad) {
   // Create the file.
   {
-    scoped_refptr<ConsensusMetadata> cmeta;
     ASSERT_OK(ConsensusMetadata::Create(&fs_manager_, kTabletId, fs_manager_.uuid(),
-                                        config_, kInitialTerm, &cmeta));
+                                        config_, kInitialTerm));
   }
 
   // Load the file.
@@ -105,23 +104,41 @@ TEST_F(ConsensusMetadataTest, TestCreateLoad) {
   ASSERT_VALUES_EQUAL(cmeta, kInvalidOpIdIndex, fs_manager_.uuid(), kInitialTerm);
 }
 
+// Test deferred creation.
+TEST_F(ConsensusMetadataTest, TestDeferredCreateLoad) {
+  // Create the cmeta object, but not the file.
+  scoped_refptr<ConsensusMetadata> writer;
+  ASSERT_OK(ConsensusMetadata::Create(&fs_manager_, kTabletId, fs_manager_.uuid(),
+                                      config_, kInitialTerm,
+                                      ConsensusMetadataCreateMode::NO_FLUSH_ON_CREATE,
+                                      &writer));
+
+  // Try to load the file: it should not be there.
+  scoped_refptr<ConsensusMetadata> reader;
+  Status s = ConsensusMetadata::Load(&fs_manager_, kTabletId, fs_manager_.uuid(), &reader);
+  ASSERT_TRUE(s.IsNotFound()) << s.ToString();
+
+  // Flush; now the file will be there.
+  ASSERT_OK(writer->Flush());
+  ASSERT_OK(ConsensusMetadata::Load(&fs_manager_, kTabletId, fs_manager_.uuid(), &reader));
+  ASSERT_VALUES_EQUAL(reader, kInvalidOpIdIndex, fs_manager_.uuid(), kInitialTerm);
+}
+
 // Ensure that Create() will not overwrite an existing file.
 TEST_F(ConsensusMetadataTest, TestCreateNoOverwrite) {
-  scoped_refptr<ConsensusMetadata> cmeta;
   // Create the consensus metadata file.
   ASSERT_OK(ConsensusMetadata::Create(&fs_manager_, kTabletId, fs_manager_.uuid(),
-                                      config_, kInitialTerm, &cmeta));
+                                      config_, kInitialTerm));
   // Try to create it again.
   Status s = ConsensusMetadata::Create(&fs_manager_, kTabletId, fs_manager_.uuid(),
-                                       config_, kInitialTerm, &cmeta);
+                                       config_, kInitialTerm);
   ASSERT_TRUE(s.IsAlreadyPresent()) << s.ToString();
   ASSERT_STR_MATCHES(s.ToString(), "Unable to write consensus meta file.*already exists");
 }
 
 // Ensure that we get an error when loading a file that doesn't exist.
 TEST_F(ConsensusMetadataTest, TestFailedLoad) {
-  scoped_refptr<ConsensusMetadata> cmeta;
-  Status s = ConsensusMetadata::Load(&fs_manager_, kTabletId, fs_manager_.uuid(), &cmeta);
+  Status s = ConsensusMetadata::Load(&fs_manager_, kTabletId, fs_manager_.uuid());
   ASSERT_TRUE(s.IsNotFound()) << "Unexpected status: " << s.ToString();
   LOG(INFO) << "Expected failure: " << s.ToString();
 }
@@ -131,7 +148,9 @@ TEST_F(ConsensusMetadataTest, TestFlush) {
   const int64_t kNewTerm = 4;
   scoped_refptr<ConsensusMetadata> cmeta;
   ASSERT_OK(ConsensusMetadata::Create(&fs_manager_, kTabletId, fs_manager_.uuid(),
-                                      config_, kInitialTerm, &cmeta));
+                                      config_, kInitialTerm,
+                                      ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
+                                      &cmeta));
   cmeta->set_current_term(kNewTerm);
 
   // We are sort of "breaking the rules" by having multiple ConsensusMetadata
@@ -173,7 +192,9 @@ TEST_F(ConsensusMetadataTest, TestActiveRole) {
 
   scoped_refptr<ConsensusMetadata> cmeta;
   ASSERT_OK(ConsensusMetadata::Create(&fs_manager_, kTabletId, peer_uuid,
-                                      config1, kInitialTerm, &cmeta));
+                                      config1, kInitialTerm,
+                                      ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
+                                      &cmeta));
 
   ASSERT_EQ(4, cmeta->CountVotersInConfig(COMMITTED_CONFIG));
   ASSERT_EQ(0, cmeta->GetConfigOpIdIndex(COMMITTED_CONFIG));
@@ -237,7 +258,9 @@ TEST_F(ConsensusMetadataTest, TestToConsensusStatePB) {
   committed_config.set_opid_index(1);
   scoped_refptr<ConsensusMetadata> cmeta;
   ASSERT_OK(ConsensusMetadata::Create(&fs_manager_, kTabletId, peer_uuid,
-                                      committed_config, kInitialTerm, &cmeta));
+                                      committed_config, kInitialTerm,
+                                      ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
+                                      &cmeta));
 
   uuids.push_back(peer_uuid);
   RaftConfigPB pending_config = BuildConfig(uuids);
@@ -285,7 +308,9 @@ TEST_F(ConsensusMetadataTest, TestMergeCommittedConsensusStatePB) {
   committed_config.set_opid_index(1);
   scoped_refptr<ConsensusMetadata> cmeta;
   ASSERT_OK(ConsensusMetadata::Create(&fs_manager_, kTabletId, "e",
-                                      committed_config, 1, &cmeta));
+                                      committed_config, 1,
+                                      ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
+                                      &cmeta));
 
   uuids.emplace_back("e");
   RaftConfigPB pending_config = BuildConfig(uuids);

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/consensus/consensus_meta.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/consensus_meta.cc b/src/kudu/consensus/consensus_meta.cc
index b2d53b0..697ebae 100644
--- a/src/kudu/consensus/consensus_meta.cc
+++ b/src/kudu/consensus/consensus_meta.cc
@@ -258,7 +258,7 @@ void ConsensusMetadata::MergeCommittedConsensusStatePB(const ConsensusStatePB& c
   clear_pending_config_unlocked();
 }
 
-Status ConsensusMetadata::Flush(FlushMode mode) {
+Status ConsensusMetadata::Flush(FlushMode flush_mode) {
   lock_guard<Mutex> l(lock_);
   MAYBE_FAULT(FLAGS_fault_crash_before_cmeta_flush);
   SCOPED_LOG_SLOW_EXECUTION_PREFIX(WARNING, 500, LogPrefix(), "flushing consensus metadata");
@@ -283,7 +283,7 @@ Status ConsensusMetadata::Flush(FlushMode mode) {
   string meta_file_path = fs_manager_->GetConsensusMetadataPath(tablet_id_);
   RETURN_NOT_OK_PREPEND(pb_util::WritePBContainerToPath(
       fs_manager_->env(), meta_file_path, pb_,
-      mode == OVERWRITE ? pb_util::OVERWRITE : pb_util::NO_OVERWRITE,
+      flush_mode == OVERWRITE ? pb_util::OVERWRITE : pb_util::NO_OVERWRITE,
       // We use FLAGS_log_force_fsync_all here because the consensus metadata is
       // essentially an extension of the primary durability mechanism of the
       // consensus subsystem: the WAL. Using the same flag ensures that the WAL
@@ -309,12 +309,23 @@ Status ConsensusMetadata::Create(FsManager* fs_manager,
                                  const std::string& peer_uuid,
                                  const RaftConfigPB& config,
                                  int64_t current_term,
+                                 ConsensusMetadataCreateMode create_mode,
                                  scoped_refptr<ConsensusMetadata>* cmeta_out) {
+
   scoped_refptr<ConsensusMetadata> cmeta(new ConsensusMetadata(fs_manager, tablet_id, peer_uuid));
   cmeta->set_committed_config(config);
   cmeta->set_current_term(current_term);
-  RETURN_NOT_OK(cmeta->Flush(NO_OVERWRITE)); // Create() should not clobber.
-  cmeta_out->swap(cmeta);
+
+  if (create_mode == ConsensusMetadataCreateMode::FLUSH_ON_CREATE) {
+    RETURN_NOT_OK(cmeta->Flush(NO_OVERWRITE)); // Create() should not clobber.
+  } else {
+    // Sanity check: ensure that there is no cmeta file currently on disk.
+    const string& path = fs_manager->GetConsensusMetadataPath(tablet_id);
+    if (fs_manager->env()->FileExists(path)) {
+      return Status::AlreadyPresent(Substitute("File $0 already exists", path));
+    }
+  }
+  if (cmeta_out) *cmeta_out = std::move(cmeta);
   return Status::OK();
 }
 
@@ -327,7 +338,7 @@ Status ConsensusMetadata::Load(FsManager* fs_manager,
                                                  fs_manager->GetConsensusMetadataPath(tablet_id),
                                                  &cmeta->pb_));
   cmeta->UpdateActiveRole(); // Needs to happen here as we sidestep the accessor APIs.
-  cmeta_out->swap(cmeta);
+  if (cmeta_out) *cmeta_out = std::move(cmeta);
   return Status::OK();
 }
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/consensus/consensus_meta.h
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/consensus_meta.h b/src/kudu/consensus/consensus_meta.h
index 3a37e05..8127269 100644
--- a/src/kudu/consensus/consensus_meta.h
+++ b/src/kudu/consensus/consensus_meta.h
@@ -37,6 +37,11 @@ namespace consensus {
 class ConsensusMetadataManager; // IWYU pragma: keep
 class ConsensusMetadataTest;    // IWYU pragma: keep
 
+enum class ConsensusMetadataCreateMode {
+  FLUSH_ON_CREATE,
+  NO_FLUSH_ON_CREATE,
+};
+
 // Provides methods to read, write, and persist consensus-related metadata.
 // This partly corresponds to Raft Figure 2's "Persistent state on all servers".
 //
@@ -144,7 +149,7 @@ class ConsensusMetadata : public RefCountedThreadSafe<ConsensusMetadata> {
   void MergeCommittedConsensusStatePB(const ConsensusStatePB& cstate);
 
   // Persist current state of the protobuf to disk.
-  Status Flush(FlushMode mode = OVERWRITE);
+  Status Flush(FlushMode flush_mode = OVERWRITE);
 
   int64_t flush_count_for_tests() const {
     return flush_count_for_tests_;
@@ -155,6 +160,7 @@ class ConsensusMetadata : public RefCountedThreadSafe<ConsensusMetadata> {
   friend class ConsensusMetadataManager;
 
   FRIEND_TEST(ConsensusMetadataTest, TestCreateLoad);
+  FRIEND_TEST(ConsensusMetadataTest, TestDeferredCreateLoad);
   FRIEND_TEST(ConsensusMetadataTest, TestCreateNoOverwrite);
   FRIEND_TEST(ConsensusMetadataTest, TestFailedLoad);
   FRIEND_TEST(ConsensusMetadataTest, TestFlush);
@@ -166,13 +172,18 @@ class ConsensusMetadata : public RefCountedThreadSafe<ConsensusMetadata> {
                     std::string peer_uuid);
 
   // Create a ConsensusMetadata object with provided initial state.
-  // Encoded PB is flushed to disk before returning.
+  // If 'create_mode' is set to FLUSH_ON_CREATE, the encoded PB is flushed to
+  // disk before returning. Otherwise, if 'create_mode' is set to
+  // NO_FLUSH_ON_CREATE, the caller must explicitly call Flush() on the
+  // returned object to get the bytes onto disk.
   static Status Create(FsManager* fs_manager,
                        const std::string& tablet_id,
                        const std::string& peer_uuid,
                        const RaftConfigPB& config,
                        int64_t current_term,
-                       scoped_refptr<ConsensusMetadata>* cmeta_out);
+                       ConsensusMetadataCreateMode create_mode =
+                           ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
+                       scoped_refptr<ConsensusMetadata>* cmeta_out = nullptr);
 
   // Load a ConsensusMetadata object from disk.
   // Returns Status::NotFound if the file could not be found. May return other
@@ -180,7 +191,7 @@ class ConsensusMetadata : public RefCountedThreadSafe<ConsensusMetadata> {
   static Status Load(FsManager* fs_manager,
                      const std::string& tablet_id,
                      const std::string& peer_uuid,
-                     scoped_refptr<ConsensusMetadata>* cmeta_out);
+                     scoped_refptr<ConsensusMetadata>* cmeta_out = nullptr);
 
   // Delete the ConsensusMetadata file associated with the given tablet from
   // disk. Returns Status::NotFound if the on-disk data is not found.

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/consensus/consensus_meta_manager-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/consensus_meta_manager-test.cc b/src/kudu/consensus/consensus_meta_manager-test.cc
index f77c12c..4f84f53 100644
--- a/src/kudu/consensus/consensus_meta_manager-test.cc
+++ b/src/kudu/consensus/consensus_meta_manager-test.cc
@@ -74,7 +74,9 @@ TEST_F(ConsensusMetadataManagerTest, TestCreateLoad) {
   ASSERT_TRUE(s.IsNotFound()) << s.ToString();
 
   // Create a new ConsensusMetadata instance.
-  ASSERT_OK(cmeta_manager_->Create(kTabletId, config_, kInitialTerm, &cmeta));
+  ASSERT_OK(cmeta_manager_->Create(kTabletId, config_, kInitialTerm,
+                                   ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
+                                   &cmeta));
 
   // Load it back.
   ASSERT_OK(cmeta_manager_->Load(kTabletId, &cmeta));
@@ -85,33 +87,73 @@ TEST_F(ConsensusMetadataManagerTest, TestCreateLoad) {
       << DiffRaftConfigs(config_, cmeta->CommittedConfig());
 }
 
+// Test the LoadOrCreate() API.
+TEST_F(ConsensusMetadataManagerTest, TestLoadOrCreate) {
+  // Initial Load() should fail due to non-existence.
+  Status s = cmeta_manager_->Load(kTabletId);
+  ASSERT_TRUE(s.IsNotFound()) << s.ToString();
+
+  {
+    // Create as needed (this call will perform the creation).
+    scoped_refptr<ConsensusMetadata> cmeta;
+    ASSERT_OK(cmeta_manager_->LoadOrCreate(kTabletId, config_, kInitialTerm,
+                                           ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
+                                           &cmeta));
+    ASSERT_TRUE(cmeta); // Ensure that the create path returns a valid cmeta.
+  }
+
+  // Load (this should not need to perform the creation).
+  scoped_refptr<ConsensusMetadata> cmeta;
+  ASSERT_OK(cmeta_manager_->LoadOrCreate(kTabletId,
+                                         /*config=*/ RaftConfigPB(), // Empty config.
+                                         /*initial_term=*/ 123,      // Different term.
+                                         ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
+                                         &cmeta));
+  ASSERT_TRUE(cmeta); // Ensure that the load path returns a valid cmeta.
+
+  // Ensure we got the results of what we requested to create in our first
+  // LoadOrCreate() call, above, not the second call.
+  ASSERT_EQ(kInitialTerm, cmeta->current_term());
+  ASSERT_TRUE(MessageDifferencer::Equals(config_, cmeta->CommittedConfig()))
+      << DiffRaftConfigs(config_, cmeta->CommittedConfig());
+}
+
 // Test Delete.
 TEST_F(ConsensusMetadataManagerTest, TestDelete) {
   // Create a ConsensusMetadata instance.
-  scoped_refptr<ConsensusMetadata> cmeta;
-  ASSERT_OK(cmeta_manager_->Create(kTabletId, config_, kInitialTerm, &cmeta));
+  ASSERT_OK(cmeta_manager_->Create(kTabletId, config_, kInitialTerm));
 
   // Now delete it.
   ASSERT_OK(cmeta_manager_->Delete(kTabletId));
 
   // Can't load it because it's gone.
-  Status s = cmeta_manager_->Load(kTabletId, &cmeta);
+  Status s = cmeta_manager_->Load(kTabletId);
   ASSERT_TRUE(s.IsNotFound()) << s.ToString();
 }
 
+// Test attempting to create multiple "unflushed" cmeta instances.
+TEST_F(ConsensusMetadataManagerTest, TestCreateMultipleUnFlushedCMetas) {
+  ASSERT_OK(cmeta_manager_->Create(kTabletId, config_, kInitialTerm,
+                                   ConsensusMetadataCreateMode::NO_FLUSH_ON_CREATE));
+  Status s = cmeta_manager_->Create(kTabletId, config_, kInitialTerm,
+                                    ConsensusMetadataCreateMode::NO_FLUSH_ON_CREATE);
+  ASSERT_TRUE(s.IsAlreadyPresent()) << s.ToString();
+  ASSERT_STR_CONTAINS(s.ToString(), "exists");
+}
+
 // Test that we can't clobber (overwrite) an existing cmeta.
 TEST_F(ConsensusMetadataManagerTest, TestNoClobber) {
   // Create a ConsensusMetadata instance.
-  {
-    scoped_refptr<ConsensusMetadata> cmeta;
-    ASSERT_OK(cmeta_manager_->Create(kTabletId, config_, kInitialTerm, &cmeta));
+  ASSERT_OK(cmeta_manager_->Create(kTabletId, config_, kInitialTerm));
+
+  // Creating it again should fail, both in FLUSH_ON_CREATE and
+  // NO_FLUSH_ON_CREATE modes.
+  for (auto create_mode : { ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
+                            ConsensusMetadataCreateMode::NO_FLUSH_ON_CREATE }) {
+    Status s = cmeta_manager_->Create(kTabletId, config_, kInitialTerm, create_mode);
+    ASSERT_TRUE(s.IsAlreadyPresent()) << s.ToString();
+    ASSERT_STR_CONTAINS(s.ToString(), "already exists");
   }
-
-  // Creating it again should fail.
-  scoped_refptr<ConsensusMetadata> cmeta;
-  Status s = cmeta_manager_->Create(kTabletId, config_, kInitialTerm, &cmeta);
-  ASSERT_TRUE(s.IsAlreadyPresent()) << s.ToString();
-  ASSERT_STR_CONTAINS(s.ToString(), "already exists");
 }
 
 } // namespace consensus

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/consensus/consensus_meta_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/consensus_meta_manager.cc b/src/kudu/consensus/consensus_meta_manager.cc
index 683ebd1..1aaf6c4 100644
--- a/src/kudu/consensus/consensus_meta_manager.cc
+++ b/src/kudu/consensus/consensus_meta_manager.cc
@@ -40,30 +40,33 @@ ConsensusMetadataManager::ConsensusMetadataManager(FsManager* fs_manager)
 
 Status ConsensusMetadataManager::Create(const string& tablet_id,
                                         const RaftConfigPB& config,
-                                        int64_t current_term,
+                                        int64_t initial_term,
+                                        ConsensusMetadataCreateMode create_mode,
                                         scoped_refptr<ConsensusMetadata>* cmeta_out) {
   scoped_refptr<ConsensusMetadata> cmeta;
   RETURN_NOT_OK_PREPEND(ConsensusMetadata::Create(fs_manager_, tablet_id, fs_manager_->uuid(),
-                                                  config, current_term, &cmeta),
+                                                  config, initial_term, create_mode,
+                                                  &cmeta),
                         Substitute("Unable to create consensus metadata for tablet $0", tablet_id));
 
   lock_guard<Mutex> l(lock_);
-  InsertOrDie(&cmeta_cache_, tablet_id, cmeta);
+  if (!InsertIfNotPresent(&cmeta_cache_, tablet_id, cmeta)) {
+    return Status::AlreadyPresent(Substitute("ConsensusMetadata instance for $0 already exists",
+                                             tablet_id));
+  }
   if (cmeta_out) *cmeta_out = std::move(cmeta);
   return Status::OK();
 }
 
 Status ConsensusMetadataManager::Load(const string& tablet_id,
                                       scoped_refptr<ConsensusMetadata>* cmeta_out) {
-  DCHECK(cmeta_out);
-
   {
     lock_guard<Mutex> l(lock_);
 
     // Try to get the cmeta instance from cache first.
     scoped_refptr<ConsensusMetadata>* cached_cmeta = FindOrNull(cmeta_cache_, tablet_id);
     if (cached_cmeta) {
-      *cmeta_out = *cached_cmeta;
+      if (cmeta_out) *cmeta_out = *cached_cmeta;
       return Status::OK();
     }
   }
@@ -82,10 +85,22 @@ Status ConsensusMetadataManager::Load(const string& tablet_id,
     InsertOrDie(&cmeta_cache_, tablet_id, cmeta);
   }
 
-  *cmeta_out = std::move(cmeta);
+  if (cmeta_out) *cmeta_out = std::move(cmeta);
   return Status::OK();
 }
 
+Status ConsensusMetadataManager::LoadOrCreate(const string& tablet_id,
+                                              const RaftConfigPB& config,
+                                              int64_t initial_term,
+                                              ConsensusMetadataCreateMode create_mode,
+                                              scoped_refptr<ConsensusMetadata>* cmeta_out) {
+  Status s = Load(tablet_id, cmeta_out);
+  if (s.IsNotFound()) {
+    return Create(tablet_id, config, initial_term, create_mode, cmeta_out);
+  }
+  return s;
+}
+
 Status ConsensusMetadataManager::Delete(const string& tablet_id) {
   {
     lock_guard<Mutex> l(lock_);

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/consensus/consensus_meta_manager.h
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/consensus_meta_manager.h b/src/kudu/consensus/consensus_meta_manager.h
index 8f04722..2f28ee7 100644
--- a/src/kudu/consensus/consensus_meta_manager.h
+++ b/src/kudu/consensus/consensus_meta_manager.h
@@ -20,6 +20,7 @@
 #include <string>
 #include <unordered_map>
 
+#include "kudu/consensus/consensus_meta.h"
 #include "kudu/gutil/macros.h"
 #include "kudu/gutil/ref_counted.h"
 #include "kudu/util/mutex.h"
@@ -29,7 +30,6 @@ class FsManager;
 class Status;
 
 namespace consensus {
-class ConsensusMetadata;
 class RaftConfigPB;
 
 // API and implementation for a consensus metadata "manager" that controls
@@ -53,16 +53,30 @@ class ConsensusMetadataManager : public RefCountedThreadSafe<ConsensusMetadataMa
   // Returns an error if a ConsensusMetadata instance with that key already exists.
   Status Create(const std::string& tablet_id,
                 const RaftConfigPB& config,
-                int64_t current_term,
+                int64_t initial_term,
+                ConsensusMetadataCreateMode create_mode =
+                    ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
                 scoped_refptr<ConsensusMetadata>* cmeta_out = nullptr);
 
   // Load the ConsensusMetadata instance keyed by 'tablet_id'.
-  // Returns an error if it cannot be found.
+  // Returns an error if it cannot be found, either in 'cmeta_cache_' or on
+  // disk.
   Status Load(const std::string& tablet_id,
-              scoped_refptr<ConsensusMetadata>* cmeta_out);
+              scoped_refptr<ConsensusMetadata>* cmeta_out = nullptr);
+
+  // Load the ConsensusMetadata instance keyed by 'tablet_id' if it exists,
+  // otherwise create it using the given parameters 'config' and
+  // 'initial_term'. If the instance already exists, those parameters are
+  // ignored.
+  Status LoadOrCreate(const std::string& tablet_id,
+                      const RaftConfigPB& config,
+                      int64_t initial_term,
+                      ConsensusMetadataCreateMode create_mode =
+                          ConsensusMetadataCreateMode::FLUSH_ON_CREATE,
+                      scoped_refptr<ConsensusMetadata>* cmeta_out = nullptr);
 
   // Permanently delete the ConsensusMetadata instance keyed by 'tablet_id'.
-  // Returns Status::NotFound if the instance cannot be found.
+  // Returns Status::NotFound if the instance does not exist on disk.
   // Returns another error if the cmeta instance exists but cannot be deleted
   // for some reason, perhaps due to a permissions or I/O-related issue.
   Status Delete(const std::string& tablet_id);

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/consensus/raft_consensus_quorum-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/raft_consensus_quorum-test.cc b/src/kudu/consensus/raft_consensus_quorum-test.cc
index a6ecf6f..1c7684d 100644
--- a/src/kudu/consensus/raft_consensus_quorum-test.cc
+++ b/src/kudu/consensus/raft_consensus_quorum-test.cc
@@ -188,8 +188,7 @@ class RaftConsensusQuorumTest : public KuduTest {
     CHECK_EQ(config_.peers_size(), cmeta_managers_.size());
     CHECK_EQ(config_.peers_size(), fs_managers_.size());
     for (int i = 0; i < config_.peers_size(); i++) {
-      scoped_refptr<ConsensusMetadata> cmeta;
-      RETURN_NOT_OK(cmeta_managers_[i]->Create(kTestTablet, config_, kMinimumTerm, &cmeta));
+      RETURN_NOT_OK(cmeta_managers_[i]->Create(kTestTablet, config_, kMinimumTerm));
 
       RaftPeerPB local_peer_pb;
       RETURN_NOT_OK(GetRaftConfigMember(config_, fs_managers_[i]->uuid(), &local_peer_pb));

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/integration-tests/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/CMakeLists.txt b/src/kudu/integration-tests/CMakeLists.txt
index 85023da..4326e74 100644
--- a/src/kudu/integration-tests/CMakeLists.txt
+++ b/src/kudu/integration-tests/CMakeLists.txt
@@ -97,6 +97,7 @@ ADD_KUDU_TEST(tablet_copy_client_session-itest)
 ADD_KUDU_TEST(tablet_history_gc-itest)
 ADD_KUDU_TEST(tablet_replacement-itest)
 ADD_KUDU_TEST(tombstoned_voting-imc-itest)
+ADD_KUDU_TEST(tombstoned_voting-itest)
 ADD_KUDU_TEST(tombstoned_voting-stress-test RUN_SERIAL true)
 ADD_KUDU_TEST(token_signer-itest RESOURCE_LOCK "master-rpc-ports")
 ADD_KUDU_TEST(ts_recovery-itest)

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/integration-tests/tombstoned_voting-itest.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/tombstoned_voting-itest.cc b/src/kudu/integration-tests/tombstoned_voting-itest.cc
new file mode 100644
index 0000000..815540d
--- /dev/null
+++ b/src/kudu/integration-tests/tombstoned_voting-itest.cc
@@ -0,0 +1,166 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cstdint>
+#include <memory>
+#include <set>
+#include <string>
+#include <unordered_map>
+
+#include <gtest/gtest.h>
+
+#include "kudu/consensus/consensus-test-util.h"
+#include "kudu/consensus/opid.pb.h"
+#include "kudu/consensus/opid_util.h"
+#include "kudu/gutil/map-util.h"
+#include "kudu/integration-tests/cluster_itest_util.h"
+#include "kudu/integration-tests/external_mini_cluster-itest-base.h"
+#include "kudu/integration-tests/external_mini_cluster.h"
+#include "kudu/integration-tests/external_mini_cluster_fs_inspector.h"
+#include "kudu/integration-tests/test_workload.h"
+#include "kudu/master/master.pb.h"
+#include "kudu/tablet/metadata.pb.h"
+#include "kudu/util/monotime.h"
+#include "kudu/util/status.h"
+#include "kudu/util/test_macros.h"
+#include "kudu/util/test_util.h"
+
+using kudu::consensus::MakeOpId;
+using kudu::itest::TServerDetails;
+using kudu::tablet::TABLET_DATA_COPYING;
+using kudu::tablet::TABLET_DATA_TOMBSTONED;
+using kudu::tablet::TabletSuperBlockPB;
+using std::set;
+using std::string;
+
+namespace kudu {
+
+class TombstonedVotingITest : public ExternalMiniClusterITestBase {
+};
+
+// Test that a replica that crashes during a first-time tablet copy after
+// persisting a superblock but before persisting a cmeta file will be
+// tombstoned and able to vote after the tablet server is restarted.
+// See KUDU-2123.
+TEST_F(TombstonedVotingITest, TestTombstonedReplicaWithoutCMetaCanVote) {
+  const MonoDelta kTimeout = MonoDelta::FromSeconds(30);
+
+  // Cause tablet copy operations to crash the target.
+  NO_FATALS(StartCluster({"--tablet_copy_fault_crash_before_write_cmeta=1.0"}, {},
+                         /*num_tablet_servers=*/ 4));
+  TestWorkload workload(cluster_.get());
+  workload.Setup();
+  // Load some data and ensure we have elected a leader.
+  workload.Start();
+  ASSERT_EVENTUALLY([&] {
+    ASSERT_GE(10, workload.batches_completed());
+  });
+  workload.StopAndJoin();
+
+  // Wait for all 3 replicas to come up and figure out where they landed.
+  ASSERT_OK(inspect_->WaitForReplicaCount(3));
+  master::GetTableLocationsResponsePB table_locations;
+  ASSERT_OK(itest::GetTableLocations(cluster_->master_proxy(), TestWorkload::kDefaultTableName,
+                                     kTimeout, &table_locations));
+  ASSERT_EQ(1, table_locations.tablet_locations_size());
+  string tablet_id = table_locations.tablet_locations(0).tablet_id();
+  set<string> replica_uuids;
+  for (const auto& replica : table_locations.tablet_locations(0).replicas()) {
+    replica_uuids.insert(replica.ts_info().permanent_uuid());
+  }
+
+  // Figure out which tablet server didn't get a replica. We will use it for
+  // testing.
+  string new_replica_uuid;
+  for (int i = 0; i < cluster_->num_tablet_servers(); i++) {
+    if (!ContainsKey(replica_uuids, cluster_->tablet_server(i)->uuid())) {
+      new_replica_uuid = cluster_->tablet_server(i)->uuid();
+      break;
+    }
+  }
+  ASSERT_FALSE(new_replica_uuid.empty());
+  auto new_replica_ts = ts_map_[new_replica_uuid];
+  auto new_replica_ets = cluster_->tablet_server_by_uuid(new_replica_uuid);
+
+  // Initiating a tablet copy operation will crash the target replica before
+  // it gets a chance to persist a cmeta file. But it will have a superblock.
+  ASSERT_EVENTUALLY([&] {
+    TServerDetails* leader_ts;
+    ASSERT_OK(FindTabletLeader(ts_map_, tablet_id, kTimeout, &leader_ts));
+    ExternalTabletServer* leader_ets = cluster_->tablet_server_by_uuid(leader_ts->uuid());
+    Status s = itest::StartTabletCopy(new_replica_ts, tablet_id, leader_ts->uuid(),
+                                      leader_ets->bound_rpc_hostport(),
+                                      /*caller_term=*/ 0, // Any term will do.
+                                      kTimeout);
+    ASSERT_TRUE(!s.ok()) << s.ToString(); // Crashed.
+    ASSERT_OK(new_replica_ets->WaitForInjectedCrash(MonoDelta::FromSeconds(2)));
+  });
+
+  cluster_->Shutdown();
+
+  // Verify that there is a superblock but no cmeta.
+  int new_replica_idx = cluster_->tablet_server_index_by_uuid(new_replica_uuid);
+  ASSERT_OK(inspect_->CheckTabletDataStateOnTS(new_replica_idx, tablet_id,
+                                               { TABLET_DATA_COPYING }));
+  ASSERT_FALSE(inspect_->DoesConsensusMetaExistForTabletOnTS(new_replica_idx, tablet_id));
+
+  // Restart only the replica that was the target of the tablet copy. It should
+  // be able to vote.
+  ASSERT_OK(new_replica_ets->Restart());
+  ASSERT_OK(inspect_->WaitForTabletDataStateOnTS(new_replica_idx, tablet_id,
+                                                 { TABLET_DATA_TOMBSTONED },
+                                                 kTimeout));
+  TabletSuperBlockPB superblock_pb;
+  ASSERT_OK(inspect_->ReadTabletSuperBlockOnTS(new_replica_idx, tablet_id, &superblock_pb));
+
+  // The tombstoned replica should be using 1.0 as its last-logged OpId.
+  ASSERT_TRUE(superblock_pb.has_tombstone_last_logged_opid());
+  ASSERT_OPID_EQ(MakeOpId(1, 0), superblock_pb.tombstone_last_logged_opid());
+
+  const string kCandidateUuid = "X";
+  const int64_t kCandidateTerm = 2;
+
+  // We may need to retry due to waiting for RaftConsensus to initialize.
+  ASSERT_EVENTUALLY([&] {
+    // Initially, even when the replica is running, cmeta should not exist on disk.
+    ASSERT_FALSE(inspect_->DoesConsensusMetaExistForTabletOnTS(new_replica_idx, tablet_id));
+
+    // Should vote no to OpId 0.0 because it's smaller than 1.0.
+    Status s = itest::RequestVote(new_replica_ts, tablet_id, kCandidateUuid,
+                                  kCandidateTerm,
+                                  /*last_logged_opid=*/ MakeOpId(0, 0),
+                                  /*ignore_live_leader=*/ true,
+                                  /*is_pre_election=*/ true,
+                                  MonoDelta::FromSeconds(5));
+    ASSERT_FALSE(s.ok()) << s.ToString();
+    ASSERT_STR_MATCHES(s.ToString(), "Denying vote.*greater than that of the candidate");
+
+    // Should vote yes to 2.2 because it's larger than 1.0.
+    s = itest::RequestVote(new_replica_ts, tablet_id, kCandidateUuid,
+                           kCandidateTerm,
+                           /*last_logged_opid=*/ MakeOpId(2, 2),
+                           /*ignore_live_leader=*/ true,
+                           /*is_pre_election=*/ true,
+                           MonoDelta::FromSeconds(5));
+    ASSERT_TRUE(s.ok()) << s.ToString();
+
+    // After voting yes, cmeta should exist.
+    ASSERT_FALSE(inspect_->DoesConsensusMetaExistForTabletOnTS(new_replica_idx, tablet_id));
+  });
+}
+
+} // namespace kudu

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/master/sys_catalog.cc
----------------------------------------------------------------------
diff --git a/src/kudu/master/sys_catalog.cc b/src/kudu/master/sys_catalog.cc
index eeb95e0..421e82e 100644
--- a/src/kudu/master/sys_catalog.cc
+++ b/src/kudu/master/sys_catalog.cc
@@ -256,8 +256,7 @@ Status SysCatalogTable::CreateNew(FsManager *fs_manager) {
   }
 
   string tablet_id = metadata->tablet_id();
-  scoped_refptr<ConsensusMetadata> cmeta;
-  RETURN_NOT_OK_PREPEND(cmeta_manager_->Create(tablet_id, config, consensus::kMinimumTerm, &cmeta),
+  RETURN_NOT_OK_PREPEND(cmeta_manager_->Create(tablet_id, config, consensus::kMinimumTerm),
                         "Unable to persist consensus metadata for tablet " + tablet_id);
 
   return SetupTablet(metadata);

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/tablet/tablet_bootstrap-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/tablet_bootstrap-test.cc b/src/kudu/tablet/tablet_bootstrap-test.cc
index 4af600a..e8131e0 100644
--- a/src/kudu/tablet/tablet_bootstrap-test.cc
+++ b/src/kudu/tablet/tablet_bootstrap-test.cc
@@ -142,8 +142,7 @@ class BootstrapTest : public LogTestBase {
     peer->set_permanent_uuid(meta->fs_manager()->uuid());
     peer->set_member_type(consensus::RaftPeerPB::VOTER);
 
-    scoped_refptr<ConsensusMetadata> cmeta;
-    RETURN_NOT_OK_PREPEND(cmeta_manager_->Create(meta->tablet_id(), config, kMinimumTerm, &cmeta),
+    RETURN_NOT_OK_PREPEND(cmeta_manager_->Create(meta->tablet_id(), config, kMinimumTerm),
                           "Unable to create consensus metadata");
 
     return Status::OK();

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/tablet/tablet_replica-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/tablet_replica-test.cc b/src/kudu/tablet/tablet_replica-test.cc
index df07384..a3c6b80 100644
--- a/src/kudu/tablet/tablet_replica-test.cc
+++ b/src/kudu/tablet/tablet_replica-test.cc
@@ -35,7 +35,6 @@
 #include "kudu/common/wire_protocol.h"
 #include "kudu/common/wire_protocol.pb.h"
 #include "kudu/consensus/consensus.pb.h"
-#include "kudu/consensus/consensus_meta.h"
 #include "kudu/consensus/consensus_meta_manager.h"
 #include "kudu/consensus/log.h"
 #include "kudu/consensus/log_anchor_registry.h"
@@ -83,7 +82,6 @@ namespace tablet {
 
 using consensus::CommitMsg;
 using consensus::ConsensusBootstrapInfo;
-using consensus::ConsensusMetadata;
 using consensus::ConsensusMetadataManager;
 using consensus::OpId;
 using consensus::RECEIVED_OPID;
@@ -137,9 +135,7 @@ class TabletReplicaTest : public KuduTabletTest {
     scoped_refptr<ConsensusMetadataManager> cmeta_manager(
         new ConsensusMetadataManager(tablet()->metadata()->fs_manager()));
 
-    scoped_refptr<ConsensusMetadata> cmeta;
-    ASSERT_OK(cmeta_manager->Create(tablet()->tablet_id(), config, consensus::kMinimumTerm,
-                                    &cmeta));
+    ASSERT_OK(cmeta_manager->Create(tablet()->tablet_id(), config, consensus::kMinimumTerm));
 
     // "Bootstrap" and start the TabletReplica.
     tablet_replica_.reset(

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/tserver/tablet_copy_client.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tserver/tablet_copy_client.cc b/src/kudu/tserver/tablet_copy_client.cc
index 3110d71..b8aa6a9 100644
--- a/src/kudu/tserver/tablet_copy_client.cc
+++ b/src/kudu/tserver/tablet_copy_client.cc
@@ -86,6 +86,13 @@ DEFINE_double(tablet_copy_fault_crash_on_fetch_all, 0.0,
 TAG_FLAG(tablet_copy_fault_crash_on_fetch_all, unsafe);
 TAG_FLAG(tablet_copy_fault_crash_on_fetch_all, runtime);
 
+DEFINE_double(tablet_copy_fault_crash_before_write_cmeta, 0.0,
+              "Fraction of the time that the server will crash before the "
+              "TabletCopyClient persists the ConsensusMetadata file. "
+              "(For testing only!)");
+TAG_FLAG(tablet_copy_fault_crash_before_write_cmeta, unsafe);
+TAG_FLAG(tablet_copy_fault_crash_before_write_cmeta, runtime);
+
 DECLARE_int32(tablet_copy_transfer_chunk_size_bytes);
 
 METRIC_DEFINE_counter(server, tablet_copy_bytes_fetched,
@@ -318,7 +325,7 @@ Status TabletCopyClient::Start(const HostPort& copy_source_addr,
     // HACK: Set the initial tombstoned last-logged OpId to 1.0 when copying a
     // replica for the first time, so that if the tablet copy fails, the
     // tombstoned replica will still be able to vote.
-    // TODO(mpercy): Give this particular OpId a name.
+    // TODO(KUDU-2122): Give this particular OpId a name.
     *superblock_->mutable_tombstone_last_logged_opid() = MakeOpId(1, 0);
     Partition partition;
     Partition::FromPB(superblock_->partition(), &partition);
@@ -592,13 +599,13 @@ Status TabletCopyClient::DownloadWAL(uint64_t wal_segment_seqno) {
 }
 
 Status TabletCopyClient::WriteConsensusMetadata() {
+  MAYBE_FAULT(FLAGS_tablet_copy_fault_crash_before_write_cmeta);
+
   // If we didn't find a previous consensus meta file, create one.
   if (!cmeta_) {
-    scoped_refptr<ConsensusMetadata> cmeta;
     return cmeta_manager_->Create(tablet_id_,
                                   remote_cstate_->committed_config(),
-                                  remote_cstate_->current_term(),
-                                  &cmeta);
+                                  remote_cstate_->current_term());
   }
 
   // Otherwise, update the consensus metadata to reflect the config and term

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/tserver/tablet_copy_source_session-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tserver/tablet_copy_source_session-test.cc b/src/kudu/tserver/tablet_copy_source_session-test.cc
index 9d73e9a..8526762 100644
--- a/src/kudu/tserver/tablet_copy_source_session-test.cc
+++ b/src/kudu/tserver/tablet_copy_source_session-test.cc
@@ -33,7 +33,6 @@
 #include "kudu/common/schema.h"
 #include "kudu/common/wire_protocol.h"
 #include "kudu/common/wire_protocol.pb.h"
-#include "kudu/consensus/consensus_meta.h"
 #include "kudu/consensus/consensus_meta_manager.h"
 #include "kudu/consensus/log.h"
 #include "kudu/consensus/log_anchor_registry.h"
@@ -88,10 +87,10 @@ class BlockIdPB;
 
 namespace tserver {
 
-using consensus::ConsensusMetadata;
 using consensus::ConsensusMetadataManager;
 using consensus::RaftConfigPB;
 using consensus::RaftPeerPB;
+using consensus::kMinimumTerm;
 using fs::ReadableBlock;
 using log::Log;
 using log::LogOptions;
@@ -153,9 +152,7 @@ class TabletCopyTest : public KuduTabletTest {
 
     scoped_refptr<ConsensusMetadataManager> cmeta_manager(
         new ConsensusMetadataManager(fs_manager()));
-    scoped_refptr<ConsensusMetadata> cmeta;
-    ASSERT_OK(cmeta_manager->Create(tablet()->tablet_id(),
-                                    config, consensus::kMinimumTerm, &cmeta));
+    ASSERT_OK(cmeta_manager->Create(tablet()->tablet_id(), config, kMinimumTerm));
 
     tablet_replica_.reset(
         new TabletReplica(tablet()->metadata(),

http://git-wip-us.apache.org/repos/asf/kudu/blob/2108767b/src/kudu/tserver/ts_tablet_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tserver/ts_tablet_manager.cc b/src/kudu/tserver/ts_tablet_manager.cc
index 8e23e61..d8d4bd4 100644
--- a/src/kudu/tserver/ts_tablet_manager.cc
+++ b/src/kudu/tserver/ts_tablet_manager.cc
@@ -124,6 +124,7 @@ class Tablet;
 }
 
 using consensus::ConsensusMetadata;
+using consensus::ConsensusMetadataCreateMode;
 using consensus::ConsensusMetadataManager;
 using consensus::OpId;
 using consensus::OpIdToString;
@@ -296,8 +297,7 @@ Status TSTabletManager::CreateNewTablet(const string& table_id,
 
   // We must persist the consensus metadata to disk before starting a new
   // tablet's TabletReplica and RaftConsensus implementation.
-  scoped_refptr<ConsensusMetadata> cmeta;
-  RETURN_NOT_OK_PREPEND(cmeta_manager_->Create(tablet_id, config, kMinimumTerm, &cmeta),
+  RETURN_NOT_OK_PREPEND(cmeta_manager_->Create(tablet_id, config, kMinimumTerm),
                         "Unable to create new ConsensusMetadata for tablet " + tablet_id);
   scoped_refptr<TabletReplica> new_replica;
   RETURN_NOT_OK(CreateAndRegisterTabletReplica(meta, NEW_REPLICA, &new_replica));
@@ -1073,6 +1073,27 @@ Status TSTabletManager::HandleNonReadyTabletOnStartup(const scoped_refptr<Tablet
     data_state = TABLET_DATA_TOMBSTONED;
   }
 
+  if (data_state == TABLET_DATA_TOMBSTONED) {
+    // It is possible for tombstoned replicas to legitimately not have a cmeta
+    // file as a result of crashing during a first tablet copy, or failing a
+    // tablet copy operation in an older version of Kudu. Not having a cmeta
+    // file results in those tombstoned replicas being unable to vote in Raft
+    // leader elections. We remedy this by creating a cmeta object (with an
+    // empty config) at startup time. The empty config is safe for a tombstoned
+    // replica, because the config doesn't affect a replica's ability to vote
+    // in a leader election. Additionally, if the tombstoned replica were ever
+    // to be overwritten by a tablet copy operation, that would also result in
+    // overwriting the config stored in the local cmeta with a valid Raft
+    // config. Finally, all of this assumes that the nonexistence of a cmeta
+    // file guarantees that the replica has never voted in a leader election.
+    //
+    // As an optimization, the cmeta is created with the NO_FLUSH_ON_CREATE
+    // flag, meaning that it will only be flushed to disk if the replica ever
+    // votes.
+    RETURN_NOT_OK(cmeta_manager_->LoadOrCreate(tablet_id, RaftConfigPB(), kMinimumTerm,
+                                               ConsensusMetadataCreateMode::NO_FLUSH_ON_CREATE));
+  }
+
   if (!skip_deletion) {
     // Passing no OpId will retain the last_logged_opid that was previously in the metadata.
     RETURN_NOT_OK(DeleteTabletData(meta, cmeta_manager_, data_state, boost::none));