You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by al...@apache.org on 2022/08/23 15:05:51 UTC
[kudu] 01/04: [tests] fix flakiness in TestTabletCopyEncryptedServers
This is an automated email from the ASF dual-hosted git repository.
alexey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git
commit 2f9f62c7a74661b781550216302eb90833516ad7
Author: Alexey Serbin <al...@apache.org>
AuthorDate: Thu Aug 11 16:30:29 2022 -0700
[tests] fix flakiness in TestTabletCopyEncryptedServers
The TabletCopyITest.TestTabletCopyEncryptedServers scenario deletes
a tablet, and then checks to see that the tablet data state is
TABLET_DATA_COPYING. However, it's possible for the remote bootstrap
to complete so quickly that it's already TABLET_DATA_READY at the time
of sampling, so from time to time the test failed with
src/kudu/integration-tests/tablet_copy-itest.cc:1014: Failure
Failed
Bad status: Timed out: Timed out after 30.002s waiting for correct tablet state: Illegal state: State TABLET_DATA_READY unexpected, expected TABLET_DATA_COPYING
This patch updates the assertion to allow both the COPYING and READY
tablet data states.
Without the patch, the test was about 7% flaky [1]. With the patch,
it's not flaky [2].
[1] http://dist-test.cloudera.org/job?job_id=aserbin.1660260668.94650
[2] http://dist-test.cloudera.org/job?job_id=aserbin.1660261249.109365
Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Reviewed-on: http://gerrit.cloudera.org:8080/18842
Tested-by: Alexey Serbin <al...@apache.org>
Reviewed-by: Yingchun Lai <ac...@gmail.com>
Reviewed-by: Abhishek Chennaka <ac...@cloudera.com>
Reviewed-by: Attila Bukor <ab...@apache.org>
---
src/kudu/integration-tests/tablet_copy-itest.cc | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/src/kudu/integration-tests/tablet_copy-itest.cc b/src/kudu/integration-tests/tablet_copy-itest.cc
index 5ba3f1ea9..ba3e4be45 100644
--- a/src/kudu/integration-tests/tablet_copy-itest.cc
+++ b/src/kudu/integration-tests/tablet_copy-itest.cc
@@ -27,6 +27,7 @@
#include <set>
#include <string>
#include <thread>
+#include <type_traits>
#include <unordered_map>
#include <utility>
#include <vector>
@@ -1003,15 +1004,17 @@ TEST_F(TabletCopyITest, TestTabletCopyEncryptedServers) {
ExternalTabletServer* replica_ets = cluster_->tablet_server(2);
TServerDetails* replica_ts = ts_map_[replica_ets->uuid()];
ASSERT_OK(WaitForNumTabletsOnTS(replica_ts, 1, timeout, &tablets));
- string tablet_id = tablets[0].tablet_status().tablet_id();
+ const auto& tablet_id = tablets[0].tablet_status().tablet_id();
// Tombstone the follower.
LOG(INFO) << "Tombstoning follower tablet " << tablet_id << " on TS " << replica_ts->uuid();
ASSERT_OK(DeleteTablet(replica_ts, tablet_id, TABLET_DATA_TOMBSTONED, timeout));
- // Wait for tablet copy to start.
- ASSERT_OK(inspect_->WaitForTabletDataStateOnTS(2, tablet_id,
- { tablet::TABLET_DATA_COPYING }, timeout));
+ // Wait for tablet copy to start. The copying might complete fast or there
+ // might be some scheduler anomalies, so here it's necessary to count in
+ // TABLET_DATA_COPYING --> TABLET_DATA_READY transitions as well.
+ ASSERT_OK(inspect_->WaitForTabletDataStateOnTS(
+ 2, tablet_id, { tablet::TABLET_DATA_COPYING, tablet::TABLET_DATA_READY }, timeout));
workload.StopAndJoin();
ASSERT_OK(WaitForServersToAgree(timeout, ts_map_, tablet_id, 1));