You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/05/16 18:50:04 UTC

[jira] [Commented] (KUDU-1294) CHECK failure on TransactionTracker memtracker with unreleased consumption

    [ https://issues.apache.org/jira/browse/KUDU-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012919#comment-16012919 ] 

Todd Lipcon commented on KUDU-1294:
-----------------------------------

[~aserbin] hit a similar issue which caused use-after-free due to the same underlying problem. I looked into it and I think the issue is the following:

When the last transaction finishes, it runs TransactionTracker::Release()
{code}
void TransactionTracker::Release(TransactionDriver* driver) {
  DecrementCounters(*driver);

  State st;
  {
    // Remove the transaction from the map, retaining the state for use
    // below.
    std::lock_guard<simple_spinlock> l(lock_);
    st = FindOrDie(pending_txns_, driver);
    if (PREDICT_FALSE(pending_txns_.erase(driver) != 1)) {
      LOG(FATAL) << "Could not remove pending transaction from map: "
          << driver->ToStringUnlocked();
    }
  }

  if (mem_tracker_) {
    mem_tracker_->Release(st.memory_footprint);
  }
}
{code}

This removes from the map before it releases from mem_tracker_.

However, the TabletReplica::Delete path has a sequence like:

{code}
  // TODO: KUDU-183: Keep track of the pending tasks and send an "abort" message.
  LOG_SLOW_EXECUTION(WARNING, 1000,
      Substitute("TabletReplica: tablet $0: Waiting for Transactions to complete", tablet_id())) {
    txn_tracker_.WaitForAllToFinish();
  }
  ...
  // Only mark the peer as SHUTDOWN when all other components have shut down.
  {
    std::lock_guard<simple_spinlock> lock(lock_);
    // Release mem tracker resources.
    consensus_.reset();
    tablet_.reset();
    state_ = SHUTDOWN;
  }
{code}

i.e it is using the "WaitForAllToFinish" as a sort of barrier to make sure there are no more transactions running. However, "WaitForAllToFinish" is just waiting for pending_txns_ to be empty.

So, we can hit the interleaving:

- T1: a transaction removes itself from pending_txns_
- T2: DeleteReplica returns from WaitForAllToFinish(), and then deletes the TabletReplica, which deletes TransactionTracker
-- gets "unreleased consumption" because T1 hasn't yet continued to call memtracker->Release()

If we disable the memtracker for the test, we get use-after-free instead because it calls 'if (mem_tracker_)' on a now-destructed TransactionTracker instance

> CHECK failure on TransactionTracker memtracker with unreleased consumption
> --------------------------------------------------------------------------
>
>                 Key: KUDU-1294
>                 URL: https://issues.apache.org/jira/browse/KUDU-1294
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 0.6.0
>            Reporter: Todd Lipcon
>
> {code}
> DeleteTableTest.TestDeleteTableWithConcurrentWrites: mem_tracker.cc:187] Check failed: consumption() == 0 Memory tracker txn_tracker->tablet-9446603547684183ba2053888b40696f->server->root has unreleased consumption 2400
>     @     0x7effe68a03b8  kudu::MemTracker::~MemTracker() at ??:0
>     @     0x7effe68a3acc  std::_Sp_counted_ptr<>::_M_dispose() at ??:0
>     @     0x7effe9f74911  std::_Sp_counted_base<>::_M_release() at ??:0
>     @     0x7effe9f748c7  std::__shared_count<>::~__shared_count() at ??:0
>     @     0x7effe9fafdee  std::__shared_ptr<>::~__shared_ptr() at ??:0
>     @     0x7effe93b8531  kudu::tablet::TransactionTracker::~TransactionTracker() at ??:0
>     @     0x7effe93a614b  kudu::tablet::TabletPeer::~TabletPeer() at ??:0
>     @     0x7effe93a62ea  kudu::tablet::TabletPeer::~TabletPeer() at ??:0
>     @     0x7effe9f89e48  kudu::RefCountedThreadSafe<>::DeleteInternal() at ??:0
>     @     0x7effe9f89e0a  kudu::DefaultRefCountedThreadSafeTraits<>::Destruct() at ??:0
>     @     0x7effe9f89dda  kudu::RefCountedThreadSafe<>::Release() at ??:0
>     @     0x7effe9f8743b  scoped_refptr<>::~scoped_refptr() at ??:0
>     @     0x7effe9fd4f42  kudu::tserver::TSTabletManager::DeleteTablet() at ??:0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)