You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2017/05/19 00:32:46 UTC

[1/3] kudu git commit: [security-faults-itest] fixes on test flakiness

Repository: kudu
Updated Branches:
  refs/heads/master 733afeb82 -> 65c3aec9a


[security-faults-itest] fixes on test flakiness

Couple of updates to make the test less flaky and have more coverage:

  * Increased TTL of Kerberos ticket to 120 seconds: in case of inferior
    VMs the test sometimes could not make a couple of smoke test
    iterations in ~60 seconds.

  * Enabled newly introduced --rpc_reopen_outbound_connections flag
    to make servers authenticate the client upon every RPC call.

Test results prior to this fix with --stress_cpu_threads=8
(DEBUG build, 18 out of 1024 fail):
  http://dist-test.cloudera.org//job?job_id=aserbin.1494950598.5510

Test results after this fix --stress_cpu_threads=8
(DEBUG build,  0 out of 1024 fail):
  http://dist-test.cloudera.org//job?job_id=aserbin.1494948803.23142

Change-Id: I4f3ff1c65cd695a367e52804dae951e1b962452d
Reviewed-on: http://gerrit.cloudera.org:8080/6895
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/068767b8
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/068767b8
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/068767b8

Branch: refs/heads/master
Commit: 068767b815ee75f988a406bfe4541e9602e0d831
Parents: 733afeb
Author: Alexey Serbin <as...@cloudera.com>
Authored: Mon May 15 23:36:03 2017 -0700
Committer: Todd Lipcon <to...@apache.org>
Committed: Thu May 18 17:10:49 2017 +0000

----------------------------------------------------------------------
 src/kudu/integration-tests/security-faults-itest.cc | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/068767b8/src/kudu/integration-tests/security-faults-itest.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/security-faults-itest.cc b/src/kudu/integration-tests/security-faults-itest.cc
index f68f21d..93a9566 100644
--- a/src/kudu/integration-tests/security-faults-itest.cc
+++ b/src/kudu/integration-tests/security-faults-itest.cc
@@ -30,6 +30,7 @@
 #include "kudu/tablet/key_value_test_schema.h"
 #include "kudu/util/test_util.h"
 
+DECLARE_bool(rpc_reopen_outbound_connections);
 DECLARE_bool(rpc_trace_negotiation);
 
 using kudu::client::KuduClient;
@@ -52,10 +53,14 @@ namespace kudu {
 class SecurityComponentsFaultsITest : public KuduTest {
  public:
   SecurityComponentsFaultsITest()
-      : krb_lifetime_seconds_(64),
+      : krb_lifetime_seconds_(120),
         num_masters_(3),
         num_tservers_(3) {
 
+    // Reopen client-->server connections on every RPC. This is to make sure the
+    // servers authenticate the client on every RPC call.
+    FLAGS_rpc_reopen_outbound_connections = true;
+
     // Want to run with Kerberos enabled.
     cluster_opts_.enable_kerberos = true;
 


[3/3] kudu git commit: log: reduce segment size from 64MB to 8MB

Posted by to...@apache.org.
log: reduce segment size from 64MB to 8MB

Currently, we always retain a minimum of one segment of the WAL, even if
a tablet is cold and has not performed any writes in a long time. If a
server has hundreds or thousands of tablets, keeping a 64MB segment for
each tablet adds up to a lot of "wasted" disk space, especially if the
WAL is configured to an expensive disk such as an SSD.

In addition to wasting space, the current code always re-reads all live
WAL segments at startup. Solving this has been an open problem for quite
some time, but there are various subtleties (described in KUDU-38). So,
as a band-aid, reducing the size of the segment will linearly reduce the
amount of work during bootstrap of "cold" tablets.

In summary, the expectation is that, in a "cold" server, this will
reduce WAL disk space usage and startup time by approximately a factor
of 8.

I verified this by making these same changes in the configuration of a server
with ~6000 cold tablets. The disk usage of the WAL disk went from 381GB to
48GB, with a similar factor reduction in startup time.

Because the segment size is reduced by a factor of 8, the patch also
increases the max retention segment count by the same factor (from 10 to
80). This has one risk, in that we currently keep an open file
descriptor for every retained segment. However, in typical workloads,
the number of tablets with a high rate of active writes is not in the
hundreds or thousands, and thus the total number of file descriptors is
still likely to be manageable. Nevertheless, this patch adds a TODO that
we should consider a FileCache for these descriptors if we start to see
the usage be problematic.

So, if reducing to 8MB is good, why not go further, like 4MB or 1MB? The
assumption is that 8MB is still large enough to get good sequential IO
throughput on both read and write, and large enough to limit the FD
usage as described above. If we add an FD cache at some point, we could
consider reducing it further, particularly if running on SSDs where the
sequential throughput is less affected by size.

Although it's possible that a user had configured max_segments_to_retain
based on their own disk space requirements, the flag is marked
experimental, so I don't think we have to worry about compatibility in
that respect. We should consider changing the units here to be MB rather
than segment-count, so that the value is more robust.

Change-Id: Iadcda6b085e69cae5a15d54bb4c945d7605d5f98
Reviewed-on: http://gerrit.cloudera.org:8080/6911
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <mp...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/65c3aec9
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/65c3aec9
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/65c3aec9

Branch: refs/heads/master
Commit: 65c3aec9aeb7cb684cac98add6f48b737ae0de85
Parents: 1aa0ebb
Author: Todd Lipcon <to...@cloudera.com>
Authored: Wed May 17 12:23:01 2017 -0700
Committer: Todd Lipcon <to...@apache.org>
Committed: Fri May 19 00:32:28 2017 +0000

----------------------------------------------------------------------
 src/kudu/consensus/consensus_queue.cc           |  1 -
 src/kudu/consensus/log.cc                       |  4 +-
 src/kudu/consensus/log_util.cc                  |  4 +-
 .../integration-tests/raft_consensus-itest.cc   | 43 ++++++++++++++++++++
 4 files changed, 48 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/65c3aec9/src/kudu/consensus/consensus_queue.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/consensus_queue.cc b/src/kudu/consensus/consensus_queue.cc
index e752eb2..8fa6bef 100644
--- a/src/kudu/consensus/consensus_queue.cc
+++ b/src/kudu/consensus/consensus_queue.cc
@@ -445,7 +445,6 @@ Status PeerMessageQueue::RequestForPeer(const string& uuid,
       request->mutable_ops()->AddAllocated(msg->get());
     }
     msg_refs->swap(messages);
-    DCHECK_LE(request->ByteSize(), FLAGS_consensus_max_batch_size_bytes);
   }
 
   DCHECK(preceding_id.IsInitialized());

http://git-wip-us.apache.org/repos/asf/kudu/blob/65c3aec9/src/kudu/consensus/log.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/log.cc b/src/kudu/consensus/log.cc
index 23f80e2..f6dd641 100644
--- a/src/kudu/consensus/log.cc
+++ b/src/kudu/consensus/log.cc
@@ -63,7 +63,7 @@ DEFINE_int32(log_min_segments_to_retain, 1,
 TAG_FLAG(log_min_segments_to_retain, runtime);
 TAG_FLAG(log_min_segments_to_retain, advanced);
 
-DEFINE_int32(log_max_segments_to_retain, 10,
+DEFINE_int32(log_max_segments_to_retain, 80,
              "The maximum number of past log segments to keep at all times for "
              "the purposes of catching up other peers.");
 TAG_FLAG(log_max_segments_to_retain, runtime);
@@ -1051,6 +1051,8 @@ Status Log::SwitchToAllocatedSegment() {
   }
 
   // Open the segment we just created in readable form and add it to the reader.
+  // TODO(todd): consider using a global FileCache here? With short log segments and
+  // lots of tablets, this file descriptor usage may add up.
   unique_ptr<RandomAccessFile> readable_file;
 
   RandomAccessFileOptions opts;

http://git-wip-us.apache.org/repos/asf/kudu/blob/65c3aec9/src/kudu/consensus/log_util.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/log_util.cc b/src/kudu/consensus/log_util.cc
index 8ef177c..bf20545 100644
--- a/src/kudu/consensus/log_util.cc
+++ b/src/kudu/consensus/log_util.cc
@@ -45,8 +45,8 @@
 #include "kudu/util/pb_util.h"
 #include "kudu/util/scoped_cleanup.h"
 
-DEFINE_int32(log_segment_size_mb, 64,
-             "The default segment size for log roll-overs, in MB");
+DEFINE_int32(log_segment_size_mb, 8,
+             "The default size for log segments, in MB");
 TAG_FLAG(log_segment_size_mb, advanced);
 
 DEFINE_bool(log_force_fsync_all, false,

http://git-wip-us.apache.org/repos/asf/kudu/blob/65c3aec9/src/kudu/integration-tests/raft_consensus-itest.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/raft_consensus-itest.cc b/src/kudu/integration-tests/raft_consensus-itest.cc
index d461451..ae3ca79 100644
--- a/src/kudu/integration-tests/raft_consensus-itest.cc
+++ b/src/kudu/integration-tests/raft_consensus-itest.cc
@@ -2508,6 +2508,49 @@ TEST_F(RaftConsensusITest, TestSlowLeader) {
   SleepFor(MonoDelta::FromSeconds(60));
 }
 
+// Test write batches just below the maximum limit.
+TEST_F(RaftConsensusITest, TestLargeBatches) {
+  vector<string> ts_flags = {
+    // We write 128KB cells in this test, so bump the limit, and disable compression.
+    "--max_cell_size_bytes=1000000",
+    "--log_segment_size_mb=1",
+    "--log_compression_codec=none",
+    "--log_min_segments_to_retain=100", // disable GC of logs.
+  };
+  BuildAndStart(ts_flags);
+
+  const int64_t kBatchSize = 40; // Write 40 * 128kb = 5MB per batch.
+  const int64_t kNumBatchesToWrite = 100;
+  TestWorkload workload(cluster_.get());
+  workload.set_table_name(kTableId);
+  workload.set_payload_bytes(128 * 1024); // Write ops of size 128KB.
+  workload.set_write_batch_size(kBatchSize);
+  workload.set_num_write_threads(1);
+  workload.Setup();
+  workload.Start();
+  LOG(INFO) << "Waiting until we've written enough data...";
+  while (workload.rows_inserted() < kBatchSize * kNumBatchesToWrite) {
+    SleepFor(MonoDelta::FromMilliseconds(100));
+  }
+  workload.StopAndJoin();
+
+  // Verify replication.
+  ClusterVerifier v(cluster_.get());
+  NO_FATALS(v.CheckCluster());
+  NO_FATALS(v.CheckRowCount(workload.table_name(),
+                            ClusterVerifier::EXACTLY,
+                            workload.rows_inserted()));
+
+  int num_wals = inspect_->CountFilesInWALDirForTS(0, tablet_id_, "wal-*");
+  int num_batches = workload.rows_inserted() / kBatchSize;
+  // The number of WALs should be similar to 'num_batches'. We can't make
+  // an exact assertion because async preallocation may take a small amount
+  // of time, in which case it's possible to put more than one batch in a
+  // single WAL.
+  ASSERT_GE(num_wals, num_batches / 2);
+  ASSERT_LE(num_wals, num_batches + 2);
+}
+
 void RaftConsensusITest::InsertPayloadIgnoreErrors(int start_row, int num_rows, int payload_size) {
   shared_ptr<KuduTable> table;
   CHECK_OK(client_->OpenTable(kTableId, &table));


[2/3] kudu git commit: KUDU-2001 Refactor rowset size estimates

Posted by to...@apache.org.
KUDU-2001 Refactor rowset size estimates

Currently, Rowset::EstimateOnDiskSize() serves two purposes:
1. An estimate of the total size of the rowset, which is
exposed when rolled into the tablet's on-disk size metric.
2. An estimate of the benefit of compaction.
These two purposes conflicted-- the compaction size counts only
base data and redo deltas that are relevant for compaction, so
e.g. undo deltas are omitted from the estimate.

This patch separates these two purposes. EstimateOnDiskSize()
remains the method for purpose #1, while a new method
EstimateCompactionSize() is introduced for purpose #2.
EstimateOnDiskSize now includes undo deltas, and so is more
accurate than before (however, there's more work to do: see
KUDU-1755).

There should be no changes to compaction policy as a result of
this patch.

Change-Id: I59001adadb9a768a464e7b2cf0f0a5df0ef5393a
Reviewed-on: http://gerrit.cloudera.org:8080/6850
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/1aa0ebb9
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/1aa0ebb9
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/1aa0ebb9

Branch: refs/heads/master
Commit: 1aa0ebb91fc072b4cd3ed629721b8263a6ba1ffe
Parents: 068767b
Author: Will Berkeley <wd...@apache.org>
Authored: Wed May 10 17:32:42 2017 -0700
Committer: Will Berkeley <wd...@gmail.com>
Committed: Thu May 18 20:05:42 2017 +0000

----------------------------------------------------------------------
 src/kudu/tablet/compaction_policy-test.cc |  2 +-
 src/kudu/tablet/delta_tracker.cc          | 12 ++++++++
 src/kudu/tablet/delta_tracker.h           |  4 +++
 src/kudu/tablet/diskrowset-test.cc        | 40 ++++++++++++++++++++++++++
 src/kudu/tablet/diskrowset.cc             | 12 ++++++--
 src/kudu/tablet/diskrowset.h              | 10 ++++---
 src/kudu/tablet/memrowset.h               |  4 +++
 src/kudu/tablet/mock-rowsets.h            |  8 ++++++
 src/kudu/tablet/rowset.cc                 | 10 ++++++-
 src/kudu/tablet/rowset.h                  |  5 ++++
 src/kudu/tablet/rowset_info.cc            |  2 +-
 src/kudu/tablet/tablet.cc                 |  7 ++---
 12 files changed, 101 insertions(+), 15 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/compaction_policy-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/compaction_policy-test.cc b/src/kudu/tablet/compaction_policy-test.cc
index 9d74c60..b34d1bd 100644
--- a/src/kudu/tablet/compaction_policy-test.cc
+++ b/src/kudu/tablet/compaction_policy-test.cc
@@ -107,7 +107,7 @@ TEST(TestCompactionPolicy, TestYcsbCompaction) {
     LOG(INFO) << "quality=" << quality;
     int total_size = 0;
     for (const auto* rs : picked) {
-      total_size += rs->EstimateOnDiskSize() / 1024 / 1024;
+      total_size += rs->EstimateCompactionSize() / 1024 / 1024;
     }
     ASSERT_LE(total_size, budget_mb);
     qualities.push_back(quality);

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/delta_tracker.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/delta_tracker.cc b/src/kudu/tablet/delta_tracker.cc
index c4e3f07..1996842 100644
--- a/src/kudu/tablet/delta_tracker.cc
+++ b/src/kudu/tablet/delta_tracker.cc
@@ -740,6 +740,18 @@ uint64_t DeltaTracker::EstimateOnDiskSize() const {
   for (const shared_ptr<DeltaStore>& ds : redo_delta_stores_) {
     size += ds->EstimateSize();
   }
+  for (const shared_ptr<DeltaStore>& ds : undo_delta_stores_) {
+    size += ds->EstimateSize();
+  }
+  return size;
+}
+
+uint64_t DeltaTracker::EstimateRedoDeltaOnDiskSize() const {
+  shared_lock<rw_spinlock> lock(component_lock_);
+  uint64_t size = 0;
+  for (const shared_ptr<DeltaStore>& ds : redo_delta_stores_) {
+    size += ds->EstimateSize();
+  }
   return size;
 }
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/delta_tracker.h
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/delta_tracker.h b/src/kudu/tablet/delta_tracker.h
index 50b5505..a71b67c 100644
--- a/src/kudu/tablet/delta_tracker.h
+++ b/src/kudu/tablet/delta_tracker.h
@@ -216,8 +216,12 @@ class DeltaTracker {
   // Return the number of redo delta stores, not including the DeltaMemStore.
   size_t CountRedoDeltaStores() const;
 
+  // Estimate the number of bytes on disk of all delta blocks.
   uint64_t EstimateOnDiskSize() const;
 
+  // Estimate the number of bytes on disk of REDO deltas.
+  uint64_t EstimateRedoDeltaOnDiskSize() const;
+
   // Retrieves the list of column indexes that currently have updates.
   void GetColumnIdsWithUpdates(std::vector<ColumnId>* col_ids) const;
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/diskrowset-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/diskrowset-test.cc b/src/kudu/tablet/diskrowset-test.cc
index 3b68307..f7a5ecc 100644
--- a/src/kudu/tablet/diskrowset-test.cc
+++ b/src/kudu/tablet/diskrowset-test.cc
@@ -569,5 +569,45 @@ TEST_F(TestRowSet, TestGCAncientStores) {
   ASSERT_EQ(0, dt->CountRedoDeltaStores());
 }
 
+TEST_F(TestRowSet, TestDiskSizeEstimation) {
+  // Force the files to be opened so the stats are read.
+  FLAGS_cfile_lazy_open = false;
+
+  // Write a rowset.
+  WriteTestRowSet();
+  shared_ptr<DiskRowSet> rs;
+  ASSERT_OK(OpenTestRowSet(&rs));
+
+  // Write a first delta file.
+  UpdateExistingRows(rs.get(), FLAGS_update_fraction, nullptr);
+  ASSERT_OK(rs->FlushDeltas());
+
+  // The rowset consists of base data and REDO deltas, so
+  // 1. the delta tracker's on-disk estimate should be the same as the on-disk estimate for REDOs.
+  // 2. the rowset's on-disk estimate and the sum of the base data and REDO estimates should equal.
+  ASSERT_EQ(rs->delta_tracker()->EstimateOnDiskSize(),
+            rs->delta_tracker()->EstimateRedoDeltaOnDiskSize());
+  ASSERT_EQ(rs->EstimateOnDiskSize(),
+            rs->EstimateBaseDataDiskSize() + rs->EstimateRedoDeltaDiskSize());
+
+  // Convert the REDO delta to an UNDO delta.
+  // REDO size should be zero, but there should be UNDOs, so the on-disk size of the rowset
+  // should be larger than the base data.
+  ASSERT_OK(rs->MajorCompactDeltaStores(HistoryGcOpts::Disabled()));
+  ASSERT_EQ(0, rs->EstimateRedoDeltaDiskSize());
+  ASSERT_GT(rs->EstimateOnDiskSize(), rs->EstimateBaseDataDiskSize());
+
+  // Write a second delta file.
+  UpdateExistingRows(rs.get(), FLAGS_update_fraction, nullptr);
+  ASSERT_OK(rs->FlushDeltas());
+
+  // There's base data, REDOs, and UNDOs, so the delta tracker and rowset's sizes should be larger
+  // than estimates counting only base data and REDOs.
+  ASSERT_GT(rs->delta_tracker()->EstimateOnDiskSize(),
+            rs->delta_tracker()->EstimateRedoDeltaOnDiskSize());
+  ASSERT_GT(rs->EstimateOnDiskSize(),
+            rs->EstimateBaseDataDiskSize() + rs->EstimateRedoDeltaDiskSize());
+}
+
 } // namespace tablet
 } // namespace kudu

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/diskrowset.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/diskrowset.cc b/src/kudu/tablet/diskrowset.cc
index 0b40684..7f7329e 100644
--- a/src/kudu/tablet/diskrowset.cc
+++ b/src/kudu/tablet/diskrowset.cc
@@ -680,10 +680,10 @@ uint64_t DiskRowSet::EstimateBaseDataDiskSize() const {
   return base_data_->EstimateOnDiskSize();
 }
 
-uint64_t DiskRowSet::EstimateDeltaDiskSize() const {
+uint64_t DiskRowSet::EstimateRedoDeltaDiskSize() const {
   DCHECK(open_);
   shared_lock<rw_spinlock> l(component_lock_);
-  return delta_tracker_->EstimateOnDiskSize();
+  return delta_tracker_->EstimateRedoDeltaOnDiskSize();
 }
 
 uint64_t DiskRowSet::EstimateOnDiskSize() const {
@@ -692,6 +692,12 @@ uint64_t DiskRowSet::EstimateOnDiskSize() const {
   return base_data_->EstimateOnDiskSize() + delta_tracker_->EstimateOnDiskSize();
 }
 
+uint64_t DiskRowSet::EstimateCompactionSize() const {
+  DCHECK(open_);
+  shared_lock<rw_spinlock> l(component_lock_);
+  return base_data_->EstimateOnDiskSize() + delta_tracker_->EstimateRedoDeltaOnDiskSize();
+}
+
 size_t DiskRowSet::DeltaMemStoreSize() const {
   DCHECK(open_);
   return delta_tracker_->DeltaMemStoreSize();
@@ -739,7 +745,7 @@ double DiskRowSet::DeltaStoresCompactionPerfImprovementScore(DeltaCompactionType
     delta_tracker_->GetColumnIdsWithUpdates(&col_ids_with_updates);
     // If we have files but no updates, we don't want to major compact.
     if (!col_ids_with_updates.empty()) {
-      double ratio = static_cast<double>(EstimateDeltaDiskSize()) / base_data_size;
+      double ratio = static_cast<double>(EstimateRedoDeltaDiskSize()) / base_data_size;
       if (ratio >= FLAGS_tablet_delta_store_major_compact_min_ratio) {
         perf_improv = ratio;
       }

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/diskrowset.h
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/diskrowset.h b/src/kudu/tablet/diskrowset.h
index be14c7d..ad722b7 100644
--- a/src/kudu/tablet/diskrowset.h
+++ b/src/kudu/tablet/diskrowset.h
@@ -331,13 +331,15 @@ class DiskRowSet : public RowSet {
   // Estimate the number of bytes on-disk for the base data.
   uint64_t EstimateBaseDataDiskSize() const;
 
-  // Estimate the number of bytes on-disk for the delta stores.
-  uint64_t EstimateDeltaDiskSize() const;
+  // Estimate the number of bytes on-disk of REDO deltas.
+  uint64_t EstimateRedoDeltaDiskSize() const;
 
-  // Estimate the total number of bytes on-disk, excluding the bloom files and the ad hoc index.
-  // TODO Offer a version that has the real total disk space usage.
+  // Estimate the total number of bytes on-disk. Excludes the bloom files and the ad hoc index.
+  // TODO(wdberkeley) Offer a version that has the real total disk space usage. See KUDU-1755.
   uint64_t EstimateOnDiskSize() const OVERRIDE;
 
+  uint64_t EstimateCompactionSize() const OVERRIDE;
+
   size_t DeltaMemStoreSize() const OVERRIDE;
 
   bool DeltaMemStoreEmpty() const OVERRIDE;

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/memrowset.h
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/memrowset.h b/src/kudu/tablet/memrowset.h
index 5e180c5..8027e30 100644
--- a/src/kudu/tablet/memrowset.h
+++ b/src/kudu/tablet/memrowset.h
@@ -237,6 +237,10 @@ class MemRowSet : public RowSet,
     return 0;
   }
 
+  uint64_t EstimateCompactionSize() const OVERRIDE {
+    return 0;
+  }
+
   std::mutex *compact_flush_lock() OVERRIDE {
     return &compact_flush_lock_;
   }

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/mock-rowsets.h
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/mock-rowsets.h b/src/kudu/tablet/mock-rowsets.h
index c93f985..4201508 100644
--- a/src/kudu/tablet/mock-rowsets.h
+++ b/src/kudu/tablet/mock-rowsets.h
@@ -79,6 +79,10 @@ class MockRowSet : public RowSet {
     LOG(FATAL) << "Unimplemented";
     return 0;
   }
+  virtual uint64_t EstimateCompactionSize() const OVERRIDE {
+    LOG(FATAL) << "Unimplemented";
+    return 0;
+  }
   virtual std::mutex *compact_flush_lock() OVERRIDE {
     LOG(FATAL) << "Unimplemented";
     return NULL;
@@ -164,6 +168,10 @@ class MockDiskRowSet : public MockRowSet {
     return size_;
   }
 
+  virtual uint64_t EstimateCompactionSize() const OVERRIDE {
+    return size_;
+  }
+
   virtual std::string ToString() const OVERRIDE {
     return strings::Substitute("mock[$0, $1]",
                                Slice(first_key_).ToDebugString(),

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/rowset.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/rowset.cc b/src/kudu/tablet/rowset.cc
index 06c7b2b..52681ce 100644
--- a/src/kudu/tablet/rowset.cc
+++ b/src/kudu/tablet/rowset.cc
@@ -208,11 +208,19 @@ Status DuplicatingRowSet::GetBounds(string* min_encoded_key,
 }
 
 uint64_t DuplicatingRowSet::EstimateOnDiskSize() const {
+  uint64_t size = 0;
+  for (const shared_ptr<RowSet> &rs : new_rowsets_) {
+    size += rs->EstimateOnDiskSize();
+  }
+  return size;
+}
+
+uint64_t DuplicatingRowSet::EstimateCompactionSize() const {
   // The actual value of this doesn't matter, since it won't be selected
   // for compaction.
   uint64_t size = 0;
   for (const shared_ptr<RowSet> &rs : new_rowsets_) {
-    size += rs->EstimateOnDiskSize();
+    size += rs->EstimateCompactionSize();
   }
   return size;
 }

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/rowset.h
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/rowset.h b/src/kudu/tablet/rowset.h
index cd6dcaf..46bdcb7 100644
--- a/src/kudu/tablet/rowset.h
+++ b/src/kudu/tablet/rowset.h
@@ -117,6 +117,9 @@ class RowSet {
   // Estimate the number of bytes on-disk
   virtual uint64_t EstimateOnDiskSize() const = 0;
 
+  // Estimate the number of bytes relevant for compaction.
+  virtual uint64_t EstimateCompactionSize() const = 0;
+
   // Return the lock used for including this DiskRowSet in a compaction.
   // This prevents multiple compactions and flushes from trying to include
   // the same rowset.
@@ -328,6 +331,8 @@ class DuplicatingRowSet : public RowSet {
 
   uint64_t EstimateOnDiskSize() const OVERRIDE;
 
+  uint64_t EstimateCompactionSize() const OVERRIDE;
+
   string ToString() const OVERRIDE;
 
   virtual Status DebugDump(vector<string> *lines = NULL) OVERRIDE;

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/rowset_info.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/rowset_info.cc b/src/kudu/tablet/rowset_info.cc
index f414dea..0fa88d5 100644
--- a/src/kudu/tablet/rowset_info.cc
+++ b/src/kudu/tablet/rowset_info.cc
@@ -254,7 +254,7 @@ void RowSetInfo::CollectOrdered(const RowSetTree& tree,
 
 RowSetInfo::RowSetInfo(RowSet* rs, double init_cdf)
   : rowset_(rs),
-    size_bytes_(rs->EstimateOnDiskSize()),
+    size_bytes_(rs->EstimateCompactionSize()),
     size_mb_(std::max(implicit_cast<int>(size_bytes_ / 1024 / 1024), kMinSizeMb)),
     cdf_min_key_(init_cdf),
     cdf_max_key_(init_cdf) {

http://git-wip-us.apache.org/repos/asf/kudu/blob/1aa0ebb9/src/kudu/tablet/tablet.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/tablet.cc b/src/kudu/tablet/tablet.cc
index f5c1fb1..5503dc3 100644
--- a/src/kudu/tablet/tablet.cc
+++ b/src/kudu/tablet/tablet.cc
@@ -146,12 +146,9 @@ METRIC_DEFINE_gauge_size(tablet, memrowset_size, "MemRowSet Memory Usage",
                          "Size of this tablet's memrowset");
 METRIC_DEFINE_gauge_size(tablet, on_disk_size, "Tablet Size On Disk",
                          kudu::MetricUnit::kBytes,
-                         "Size of this tablet on disk.");
+                         "Space used by this tablet's data blocks.");
 
-using base::subtle::Barrier_AtomicIncrement;
 using kudu::MaintenanceManager;
-using kudu::consensus::OpId;
-using kudu::consensus::MaximumOpId;
 using kudu::log::LogAnchorRegistry;
 using kudu::server::HybridClock;
 using std::shared_ptr;
@@ -1470,7 +1467,7 @@ Status Tablet::DoMergeCompactionOrFlush(const RowSetsInCompaction &input,
   if (input.num_rowsets() > 1) {
     MAYBE_FAULT(FLAGS_fault_crash_before_flush_tablet_meta_after_compaction);
   } else if (input.num_rowsets() == 1 &&
-             input.rowsets()[0]->EstimateOnDiskSize() == 0) {
+             input.rowsets()[0]->EstimateCompactionSize() == 0) {
     MAYBE_FAULT(FLAGS_fault_crash_before_flush_tablet_meta_after_flush_mrs);
   }