You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2017/06/12 18:17:04 UTC

kudu git commit: KUDU-2037 fix flake in ts_recovery-itest

Repository: kudu
Updated Branches:
  refs/heads/master 65c1edaf0 -> 681f05b43


KUDU-2037 fix flake in ts_recovery-itest

Fixed flake in TsRecoveryITest.TestRestartWithOrphanedReplicates
scenario.  The write operation timeout was set to 100ms, and for a
ASAN/TSAN build that was under the reasonable minimum to successfully
complete the majority of write operations.

The issue of bloating the client- and the master-side queue with
GetTableLocations() requests will be addressed separately, with a
new integration test to cover the specific issue (see below).

Prior to KUDU-1034 fix, the client continued to retry the operation to
the same tablet server again and again, not invalidating the entry in
its meta-cache.

After KUDU-1034 fix, the client started marking the tserver as failed
and switching to another one, calling GetTableLocations() after every
failure since that was the only available tablet server. In the test
scenario, the master was not responding fast enough to sustain the rate
of adding new entries into the client- and the master-side queues,
so eventually the client timed out on the GetTableLocations() calls.
As a result, the expected tablet crash hadn't happened because there
were too few write operations trigger the crash of the tablet server.

Having short write timeout is not essential for the test.  Bumping
the write operation timeout from 100 to 1000 ms allows for the majority
of write operations to succeed even in TSAN/ASAN builds and avoid
needless retries on the client side.

Change-Id: I6c5449dc9b47062ea9389b25a1b9d906d9de64d9
Reviewed-on: http://gerrit.cloudera.org:8080/7138
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Todd Lipcon <to...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/681f05b4
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/681f05b4
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/681f05b4

Branch: refs/heads/master
Commit: 681f05b431a6fe62370feb439dd0756d9eefe07d
Parents: 65c1eda
Author: Alexey Serbin <as...@cloudera.com>
Authored: Thu Jun 8 20:13:14 2017 -0700
Committer: Todd Lipcon <to...@apache.org>
Committed: Sat Jun 10 01:18:20 2017 +0000

----------------------------------------------------------------------
 src/kudu/integration-tests/ts_recovery-itest.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/681f05b4/src/kudu/integration-tests/ts_recovery-itest.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/ts_recovery-itest.cc b/src/kudu/integration-tests/ts_recovery-itest.cc
index e996ad1..9391717 100644
--- a/src/kudu/integration-tests/ts_recovery-itest.cc
+++ b/src/kudu/integration-tests/ts_recovery-itest.cc
@@ -86,7 +86,7 @@ TEST_F(TsRecoveryITest, TestRestartWithOrphanedReplicates) {
   TestWorkload work(cluster_.get());
   work.set_num_replicas(1);
   work.set_num_write_threads(4);
-  work.set_write_timeout_millis(100);
+  work.set_write_timeout_millis(1000);
   work.set_timeout_allowed(true);
   work.Setup();