You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/05/16 18:50:04 UTC
[jira] [Commented] (KUDU-1294) CHECK failure on TransactionTracker
memtracker with unreleased consumption
[ https://issues.apache.org/jira/browse/KUDU-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012919#comment-16012919 ]
Todd Lipcon commented on KUDU-1294:
-----------------------------------
[~aserbin] hit a similar issue which caused use-after-free due to the same underlying problem. I looked into it and I think the issue is the following:
When the last transaction finishes, it runs TransactionTracker::Release()
{code}
void TransactionTracker::Release(TransactionDriver* driver) {
DecrementCounters(*driver);
State st;
{
// Remove the transaction from the map, retaining the state for use
// below.
std::lock_guard<simple_spinlock> l(lock_);
st = FindOrDie(pending_txns_, driver);
if (PREDICT_FALSE(pending_txns_.erase(driver) != 1)) {
LOG(FATAL) << "Could not remove pending transaction from map: "
<< driver->ToStringUnlocked();
}
}
if (mem_tracker_) {
mem_tracker_->Release(st.memory_footprint);
}
}
{code}
This removes from the map before it releases from mem_tracker_.
However, the TabletReplica::Delete path has a sequence like:
{code}
// TODO: KUDU-183: Keep track of the pending tasks and send an "abort" message.
LOG_SLOW_EXECUTION(WARNING, 1000,
Substitute("TabletReplica: tablet $0: Waiting for Transactions to complete", tablet_id())) {
txn_tracker_.WaitForAllToFinish();
}
...
// Only mark the peer as SHUTDOWN when all other components have shut down.
{
std::lock_guard<simple_spinlock> lock(lock_);
// Release mem tracker resources.
consensus_.reset();
tablet_.reset();
state_ = SHUTDOWN;
}
{code}
i.e it is using the "WaitForAllToFinish" as a sort of barrier to make sure there are no more transactions running. However, "WaitForAllToFinish" is just waiting for pending_txns_ to be empty.
So, we can hit the interleaving:
- T1: a transaction removes itself from pending_txns_
- T2: DeleteReplica returns from WaitForAllToFinish(), and then deletes the TabletReplica, which deletes TransactionTracker
-- gets "unreleased consumption" because T1 hasn't yet continued to call memtracker->Release()
If we disable the memtracker for the test, we get use-after-free instead because it calls 'if (mem_tracker_)' on a now-destructed TransactionTracker instance
> CHECK failure on TransactionTracker memtracker with unreleased consumption
> --------------------------------------------------------------------------
>
> Key: KUDU-1294
> URL: https://issues.apache.org/jira/browse/KUDU-1294
> Project: Kudu
> Issue Type: Bug
> Components: tablet
> Affects Versions: 0.6.0
> Reporter: Todd Lipcon
>
> {code}
> DeleteTableTest.TestDeleteTableWithConcurrentWrites: mem_tracker.cc:187] Check failed: consumption() == 0 Memory tracker txn_tracker->tablet-9446603547684183ba2053888b40696f->server->root has unreleased consumption 2400
> @ 0x7effe68a03b8 kudu::MemTracker::~MemTracker() at ??:0
> @ 0x7effe68a3acc std::_Sp_counted_ptr<>::_M_dispose() at ??:0
> @ 0x7effe9f74911 std::_Sp_counted_base<>::_M_release() at ??:0
> @ 0x7effe9f748c7 std::__shared_count<>::~__shared_count() at ??:0
> @ 0x7effe9fafdee std::__shared_ptr<>::~__shared_ptr() at ??:0
> @ 0x7effe93b8531 kudu::tablet::TransactionTracker::~TransactionTracker() at ??:0
> @ 0x7effe93a614b kudu::tablet::TabletPeer::~TabletPeer() at ??:0
> @ 0x7effe93a62ea kudu::tablet::TabletPeer::~TabletPeer() at ??:0
> @ 0x7effe9f89e48 kudu::RefCountedThreadSafe<>::DeleteInternal() at ??:0
> @ 0x7effe9f89e0a kudu::DefaultRefCountedThreadSafeTraits<>::Destruct() at ??:0
> @ 0x7effe9f89dda kudu::RefCountedThreadSafe<>::Release() at ??:0
> @ 0x7effe9f8743b scoped_refptr<>::~scoped_refptr() at ??:0
> @ 0x7effe9fd4f42 kudu::tserver::TSTabletManager::DeleteTablet() at ??:0
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)