You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2016/12/15 23:01:13 UTC

[47/50] [abbrv] incubator-impala git commit: IMPALA-4633: Change broken gflag default for Kudu client mem

IMPALA-4633: Change broken gflag default for Kudu client mem

We discovered that the current Kudu client defaults in the
KuduTableSink are causing a large number of queries to
timeout, failing the query. The current default value of the
'mutation buffer size' is 100MB which results in higher
write throughput than Kudu can currently handle on large
clusters.  By decreasing the value of this flag, more RPCs
will be sent for the same amount of data, i.e. throttling
the load on Kudu. We found tests to be more successful on
200 nodes with a 10MB buffer size than the previous 100MB
value where most queries couldn't complete due to timeouts.
These queries were not timing out with the 10MB value. This
appears to work well on 9 node stress tests as well.

Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd
Reviewed-on: http://gerrit.cloudera.org:8080/5503
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/4fa9270e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/4fa9270e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/4fa9270e

Branch: refs/heads/hadoop-next
Commit: 4fa9270e647b9c097295dcc13d97136cca3139ad
Parents: 6c5f8e3
Author: Matthew Jacobs <mj...@cloudera.com>
Authored: Thu Dec 8 20:32:45 2016 -0800
Committer: Internal Jenkins <cl...@gerrit.cloudera.org>
Committed: Thu Dec 15 04:39:22 2016 +0000

----------------------------------------------------------------------
 be/src/exec/kudu-table-sink.cc | 5 +++--
 be/src/exec/kudu-table-sink.h  | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/4fa9270e/be/src/exec/kudu-table-sink.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/kudu-table-sink.cc b/be/src/exec/kudu-table-sink.cc
index 20bbe69..699f00a 100644
--- a/be/src/exec/kudu-table-sink.cc
+++ b/be/src/exec/kudu-table-sink.cc
@@ -31,7 +31,7 @@
 
 #include "common/names.h"
 
-#define DEFAULT_KUDU_MUTATION_BUFFER_SIZE 100 * 1024 * 1024
+#define DEFAULT_KUDU_MUTATION_BUFFER_SIZE 10 * 1024 * 1024
 
 DEFINE_int32(kudu_mutation_buffer_size, DEFAULT_KUDU_MUTATION_BUFFER_SIZE,
     "The size (bytes) of the Kudu client buffer for mutations.");
@@ -173,10 +173,11 @@ Status KuduTableSink::Open(RuntimeState* state) {
   // Internally, the Kudu client keeps one or more buffers for writing operations. When a
   // single buffer is flushed, it is locked (that space cannot be reused) until all
   // operations within it complete, so it is important to have a number of buffers. In
-  // our testing, we found that allowing a total of 100MB of buffer space to provide good
+  // our testing, we found that allowing a total of 10MB of buffer space to provide good
   // results; this is the default.  Then, because of some existing 8MB limits in Kudu, we
   // want to have that total space broken up into 7MB buffers (INDIVIDUAL_BUFFER_SIZE).
   // The mutation flush watermark is set to flush every INDIVIDUAL_BUFFER_SIZE.
+  // TODO: simplify/remove this logic when Kudu simplifies the API (KUDU-1808).
   int num_buffers = FLAGS_kudu_mutation_buffer_size / INDIVIDUAL_BUFFER_SIZE;
   if (num_buffers == 0) num_buffers = 1;
   KUDU_RETURN_IF_ERROR(session_->SetMutationBufferFlushWatermark(1.0 / num_buffers),

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/4fa9270e/be/src/exec/kudu-table-sink.h
----------------------------------------------------------------------
diff --git a/be/src/exec/kudu-table-sink.h b/be/src/exec/kudu-table-sink.h
index 2f539bc..3dd831f 100644
--- a/be/src/exec/kudu-table-sink.h
+++ b/be/src/exec/kudu-table-sink.h
@@ -37,10 +37,10 @@ namespace impala {
 /// requires specifying a mutation buffer size and a buffer flush watermark percentage in
 /// the Kudu client. The mutation buffer needs to be large enough to buffer rows sent to
 /// all destination nodes because the buffer accounting is not specified per-tablet
-/// server (KUDU-1693). Tests showed that 100MB was a good default, and this is
+/// server (KUDU-1693). Tests showed that 10MB was a good default, and this is
 /// configurable via the gflag --kudu_mutation_buffer_size. The buffer flush watermark
 /// percentage is set to a value that results in Kudu flushing after 7MB is in a
-/// buffer for a particular destination (of the 100MB of the total mutation buffer space)
+/// buffer for a particular destination (of the 10MB of the total mutation buffer space)
 /// because Kudu currently has some 8MB buffer limits.
 ///
 /// Kudu doesn't have transactions yet, so some rows may fail to write while others are