You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by ad...@apache.org on 2018/09/26 21:36:18 UTC
kudu git commit: create-table-stress-test: deflake
TestConcurrentCreateTableAndReloadMetadata
Repository: kudu
Updated Branches:
refs/heads/master 801d00536 -> c9f1772e7
create-table-stress-test: deflake TestConcurrentCreateTableAndReloadMetadata
First, if the ASSERT_OK fired, the test would hang. No bueno.
Second, in TSAN environments the test would sometimes fail with a table
already present error, typically happening either 30s or 60s after that
table was created. I believe this is due to a client-side timeout _after_
the table was created but _before_ its RPC response was sent. We could
try to address this by bumping the client's timeout, but because
reload_metadata_thread is DoS'ing the cluster, that doesn't seem robust.
Instead, let's have the thread ease up in TSAN environments.
Without the change, the test failed 4/1000 times (--stress_cpu_threads=4).
With the change, the test failed 0/1000 times (--stress_cpu_threads=8).
Change-Id: I607a8e278e8e8f370cda512f365a97fbb81e1ae3
Reviewed-on: http://gerrit.cloudera.org:8080/11523
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Kudu Jenkins
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/c9f1772e
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/c9f1772e
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/c9f1772e
Branch: refs/heads/master
Commit: c9f1772e7683d8e2d2467c565091aeb8d12d6af3
Parents: 801d005
Author: Adar Dembo <ad...@cloudera.com>
Authored: Wed Sep 26 11:11:59 2018 -0700
Committer: Adar Dembo <ad...@cloudera.com>
Committed: Wed Sep 26 21:35:28 2018 +0000
----------------------------------------------------------------------
.../create-table-stress-test.cc | 23 ++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/kudu/blob/c9f1772e/src/kudu/integration-tests/create-table-stress-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/create-table-stress-test.cc b/src/kudu/integration-tests/create-table-stress-test.cc
index 35cbdf0..3bf6c58 100644
--- a/src/kudu/integration-tests/create-table-stress-test.cc
+++ b/src/kudu/integration-tests/create-table-stress-test.cc
@@ -53,6 +53,7 @@
#include "kudu/util/monotime.h"
#include "kudu/util/net/sockaddr.h"
#include "kudu/util/pb_util.h"
+#include "kudu/util/scoped_cleanup.h"
#include "kudu/util/status.h"
#include "kudu/util/stopwatch.h"
#include "kudu/util/test_macros.h"
@@ -359,10 +360,20 @@ TEST_F(CreateTableStressTest, TestConcurrentCreateTableAndReloadMetadata) {
CHECK_OK(cluster_->mini_master()->master()->catalog_manager()->
VisitTablesAndTablets());
- // Give table creation a chance to run.
- SleepFor(MonoDelta::FromMilliseconds(1 + rand() % 10));
+ // Give table creation a chance to run. TSAN is especially brutal; so
+ // let's sleep for longer in such environments.
+#ifdef THREAD_SANITIZER
+ int sleep_ms = 10 + rand() % 91;
+#else
+ int sleep_ms = 1 + rand() % 10;
+#endif
+ SleepFor(MonoDelta::FromMilliseconds(sleep_ms));
}
});
+ SCOPED_CLEANUP({
+ stop.Store(true);
+ reload_metadata_thread.join();
+ });
for (int num_tables_created = 0; num_tables_created < 20;) {
string table_name = Substitute("test-$0", num_tables_created);
@@ -388,6 +399,12 @@ TEST_F(CreateTableStressTest, TestConcurrentCreateTableAndReloadMetadata) {
// it's hard do deduce the universal timeout constant and we prefer to
// not introduce test flakiness.
//
+ // Note: on timeout, we don't actually know with certainty whether the
+ // table was created or not. It's possible for the RPC to be accepted but
+ // processed very slowly by the master, for the client to eventually give
+ // up and time out the request, all while the master continues and
+ // eventually finishes creating the table.
+ //
// TODO(aserbin): update the test keeping its racy essence but making it
// cleaner regarding this timeout&retry workaround.
continue;
@@ -395,8 +412,6 @@ TEST_F(CreateTableStressTest, TestConcurrentCreateTableAndReloadMetadata) {
ASSERT_OK(s);
num_tables_created++;
}
- stop.Store(true);
- reload_metadata_thread.join();
}
} // namespace kudu