You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by ad...@apache.org on 2019/01/23 17:54:21 UTC

[kudu] 07/08: KUDU-2665: deflake block_manager-stress-test

This is an automated email from the ASF dual-hosted git repository.

adar pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit 04bfc8c3b81e7c3ee816f607383a4eaaef92c8a2
Author: Adar Dembo <ad...@cloudera.com>
AuthorDate: Tue Jan 22 16:07:53 2019 -0800

    KUDU-2665: deflake block_manager-stress-test
    
    After commit 0c501979b was merged, this test became really flaky (like 50%
    flaky in some environments). I think it's due to the new nature of the log
    block manager, which may now delete dead containers in the background.
    
    Specifically, if two transactions delete the last blocks from a full
    container, it's possible for one to get scheduled for an (asynchronous) hole
    punch, and for the other to set the container as dead. Later, when the
    hole punch runs, the container's last ref will be dropped, causing the dead
    container to be deleted.
    
    While perhaps surprising, this new behavior is desirable, and it's now
    incorrect to assume that a cessation in user threads implies an end to LBM
    activity. block_manager-stress-test makes this assumption by using the
    LBMCorruptor to inject inconsistencies after test threads have been joined.
    To fix, we must explicitly quiesce the LBM; destroying it will do the trick.
    
    What is surprising is that, for the life of me, I can't reproduce the
    failure. Not locally, not on a CentOS 6.6 machine, not looped in dist-test
    with stress threads, not ever. I even tried adding some "creative" sleep
    calls in a few places to tickle the race, to no avail.
    
    Change-Id: I0be328f740056cd6b64c9881759225c8b961a935
    Reviewed-on: http://gerrit.cloudera.org:8080/12254
    Reviewed-by: helifu <hz...@corp.netease.com>
    Tested-by: Adar Dembo <ad...@cloudera.com>
    Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
 src/kudu/fs/block_manager-stress-test.cc | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/kudu/fs/block_manager-stress-test.cc b/src/kudu/fs/block_manager-stress-test.cc
index c841ce5..9d424f6 100644
--- a/src/kudu/fs/block_manager-stress-test.cc
+++ b/src/kudu/fs/block_manager-stress-test.cc
@@ -546,6 +546,11 @@ TYPED_TEST(BlockManagerStressTest, StressTest) {
   LOG(INFO) << "Running on fresh block manager";
   checker.Start();
   this->RunTest(FLAGS_test_duration_secs / kNumStarts);
+
+  // Quiesce the block manager before injecting inconsistencies so that the two
+  // don't interfere with one another.
+  this->bm_.reset();
+
   NO_FATALS(this->InjectNonFatalInconsistencies());
 
   for (int i = 1; i < kNumStarts; i++) {