You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/11/21 20:10:39 UTC

[kudu-CR] consensus peers-test: fix flake in TSAN builds

Hello Mike Percy, Alexey Serbin,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/8625

to review the following change.


Change subject: consensus_peers-test: fix flake in TSAN builds
......................................................................

consensus_peers-test: fix flake in TSAN builds

TSAN builds were significantly flaky with the following error:

==8581==ERROR: AddressSanitizer: heap-use-after-free on address 0x615000003fe8 at pc 0x7fde311d3229 bp 0x7fde16b56a10 sp 0x7fde16b56a08
READ of size 8 at 0x615000003fe8 thread T12 (test-raft-pool )
    #0 0x7fde311d3228 in std::_Hashtable<std::string, std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*>, std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*> >, std::__detail::_Se
    #1 0x7fde311d4c68 in std::_Hashtable<std::string, std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*>, std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*> >, std::__detail::_Se
    #2 0x7fde311c45bc in std::unordered_map<std::string, kudu::consensus::PeerMessageQueue::TrackedPeer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPee
    #3 0x7fde311b55e4 in kudu::consensus::PeerMessageQueue::UntrackPeer(std::string const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_queue.cc:241:23
    #4 0x7fde311a580c in kudu::consensus::Peer::Close() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers.cc:446:11
    #5 0x7fde311a599a in kudu::consensus::Peer::~Peer() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers.cc:450:3
    #6 0x7fde311af50d in std::_Sp_counted_ptr<kudu::consensus::Peer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /opt/rh/devtoolset-3/root/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../include/c++/4.9.2/bits/shared_ptr_base.h:373:9
    #7 0x560f75 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /opt/rh/devtoolset-3/root/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../include/c++/4.9.2/bits/shared_ptr_base.h:149:6
    #9 0x569231 in boost::function0<void>::~function0() /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:763:34
    #10 0x568777 in kudu::consensus::TestPeerProxy::Respond(kudu::consensus::TestPeerProxy::Method) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus-test-util.h:157:3
    #11 0x56f740 in kudu::consensus::DelayablePeerProxy<kudu::consensus::NoOpTestPeerProxy>::RespondUnlessDelayed(kudu::consensus::TestPeerProxy::Method) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus-test-util.h:1
    #17 0x7fde2689fb3e in kudu::Thread::SuperviseThread(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.cc:624:3

0x615000003fe8 is located 232 bytes inside of 488-byte region [0x615000003f00,0x6150000040e8)
freed by thread T0 here:
    #0 0x551210 in operator delete(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:126
    #1 0x55c522 in kudu::consensus::ConsensusPeersTest::~ConsensusPeersTest() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:74:7
    #2 0x55ad89 in kudu::consensus::ConsensusPeersTest_TestRemotePeer_Test::~ConsensusPeersTest_TestRemotePeer_Test() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:174:1

previously allocated by thread T0 here:
    #0 0x5504d0 in operator new(unsigned long) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:82
    #1 0x55b3e8 in kudu::consensus::ConsensusPeersTest::SetUp() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:100:26

Specifically, the thread destructor was destructing the ConsensusQueue before
letting tasks drain from the "raft_pool_". One of those tasks might have been
associated with one of the peers that is owned by the queue.

This just adds the appropriate Wait() call to let the tasks drain before
destruction.

To test I ran:

./build-support/dist_test.py loop -n 100 build/latest/bin/consensus_peers-test --gtest_filter=\*RemotePeer\* --stress-cpu-threads=8 --gtest_repeat=100

Prior to this fix, 100/100 failed[1]. After the fix, 100/100 succeeded[2].

[1] http://dist-test.cloudera.org/job?job_id=todd.1511292638.95033
[2] http://dist-test.cloudera.org/job?job_id=todd.1511292986.102150

Change-Id: I33f5e2da3d2d6c275ece20265b6a455e2c4b967d
---
M src/kudu/consensus/consensus_peers-test.cc
1 file changed, 5 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/25/8625/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I33f5e2da3d2d6c275ece20265b6a455e2c4b967d
Gerrit-Change-Number: 8625
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] consensus peers-test: fix flake in TSAN builds

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8625 )

Change subject: consensus_peers-test: fix flake in TSAN builds
......................................................................

consensus_peers-test: fix flake in TSAN builds

TSAN builds were significantly flaky with the following error:

==8581==ERROR: AddressSanitizer: heap-use-after-free on address 0x615000003fe8 at pc 0x7fde311d3229 bp 0x7fde16b56a10 sp 0x7fde16b56a08
READ of size 8 at 0x615000003fe8 thread T12 (test-raft-pool )
    #0 0x7fde311d3228 in std::_Hashtable<std::string, std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*>, std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*> >, std::__detail::_Se
    #1 0x7fde311d4c68 in std::_Hashtable<std::string, std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*>, std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*> >, std::__detail::_Se
    #2 0x7fde311c45bc in std::unordered_map<std::string, kudu::consensus::PeerMessageQueue::TrackedPeer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPee
    #3 0x7fde311b55e4 in kudu::consensus::PeerMessageQueue::UntrackPeer(std::string const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_queue.cc:241:23
    #4 0x7fde311a580c in kudu::consensus::Peer::Close() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers.cc:446:11
    #5 0x7fde311a599a in kudu::consensus::Peer::~Peer() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers.cc:450:3
    #6 0x7fde311af50d in std::_Sp_counted_ptr<kudu::consensus::Peer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /opt/rh/devtoolset-3/root/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../include/c++/4.9.2/bits/shared_ptr_base.h:373:9
    #7 0x560f75 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /opt/rh/devtoolset-3/root/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../include/c++/4.9.2/bits/shared_ptr_base.h:149:6
    #9 0x569231 in boost::function0<void>::~function0() /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:763:34
    #10 0x568777 in kudu::consensus::TestPeerProxy::Respond(kudu::consensus::TestPeerProxy::Method) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus-test-util.h:157:3
    #11 0x56f740 in kudu::consensus::DelayablePeerProxy<kudu::consensus::NoOpTestPeerProxy>::RespondUnlessDelayed(kudu::consensus::TestPeerProxy::Method) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus-test-util.h:1
    #17 0x7fde2689fb3e in kudu::Thread::SuperviseThread(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.cc:624:3

0x615000003fe8 is located 232 bytes inside of 488-byte region [0x615000003f00,0x6150000040e8)
freed by thread T0 here:
    #0 0x551210 in operator delete(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:126
    #1 0x55c522 in kudu::consensus::ConsensusPeersTest::~ConsensusPeersTest() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:74:7
    #2 0x55ad89 in kudu::consensus::ConsensusPeersTest_TestRemotePeer_Test::~ConsensusPeersTest_TestRemotePeer_Test() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:174:1

previously allocated by thread T0 here:
    #0 0x5504d0 in operator new(unsigned long) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:82
    #1 0x55b3e8 in kudu::consensus::ConsensusPeersTest::SetUp() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:100:26

Specifically, the thread destructor was destructing the ConsensusQueue before
letting tasks drain from the "raft_pool_". One of those tasks might have been
associated with one of the peers that is owned by the queue.

This just adds the appropriate Wait() call to let the tasks drain before
destruction.

To test I ran:

./build-support/dist_test.py loop -n 100 build/latest/bin/consensus_peers-test --gtest_filter=\*RemotePeer\* --stress-cpu-threads=8 --gtest_repeat=100

Prior to this fix, 100/100 failed[1]. After the fix, 100/100 succeeded[2].

[1] http://dist-test.cloudera.org/job?job_id=todd.1511292638.95033
[2] http://dist-test.cloudera.org/job?job_id=todd.1511292986.102150

Change-Id: I33f5e2da3d2d6c275ece20265b6a455e2c4b967d
Reviewed-on: http://gerrit.cloudera.org:8080/8625
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/consensus/consensus_peers-test.cc
1 file changed, 5 insertions(+), 0 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Mike Percy: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/8625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I33f5e2da3d2d6c275ece20265b6a455e2c4b967d
Gerrit-Change-Number: 8625
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] consensus peers-test: fix flake in TSAN builds

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8625 )

Change subject: consensus_peers-test: fix flake in TSAN builds
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I33f5e2da3d2d6c275ece20265b6a455e2c4b967d
Gerrit-Change-Number: 8625
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Tue, 21 Nov 2017 20:51:11 +0000
Gerrit-HasComments: No