You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2016/10/04 00:43:57 UTC

[kudu-CR] mem tracker: fix race between FindTracker() and destructor

Hello Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/4614

to review the following change.

Change subject: mem_tracker: fix race between FindTracker() and destructor
......................................................................

mem_tracker: fix race between FindTracker() and destructor

In very rare cases, it's possible for an interleaving between these two
functions to lead to a recursive lock acquisition of child_trackers_lock_
in the destructor.

For example:
1. The tracker hierarchy contains one parent (P) and one child (C1).
2. Thread 1 creates a second child (C2) parented to P. It has the sole ref
   to C2.
3. Thread 2 calls FindTracker() looking for C1.
4. Thread 2 runs as far as the loop in FindTrackerUnlocked(), getting
   descheduled just as it has locked a ref to C2. It also holds P's
   child_trackers_lock_.
5. Thread 1 is rescheduled and drops its ref to C2.
6. Thread 2 is rescheduled. It also drops its ref to C2, which was the last
   ref, so it runs C2's destructor. This acquires P's child_trackers_lock_,
   which it already owns. Boom.

The fix is simple: when looking up a tracker, make a local copy of the list
of children while holding child_trackers_lock_, then iterate without it.
It gets a little complicated due to the unusual requirements of
FindOrCreateTracker(); I ended up introducing a special static lock to
handle that case. The rest of the changes are basically code motion.

Unfortunately, the new test does not fail reliably without the fix. In my
testing, it failed maybe once every few hundred runs. But it doesn't fail at
all with the fix in place.

Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985
---
M src/kudu/consensus/log_cache.cc
M src/kudu/util/cache.cc
M src/kudu/util/mem_tracker-test.cc
M src/kudu/util/mem_tracker.cc
M src/kudu/util/mem_tracker.h
5 files changed, 85 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/4614/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] mem tracker: fix race between FindTracker() and destructor

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: mem_tracker: fix race between FindTracker() and destructor
......................................................................


mem_tracker: fix race between FindTracker() and destructor

In very rare cases, it's possible for an interleaving between these two
functions to lead to a recursive lock acquisition of child_trackers_lock_
in the destructor.

For example:
1. The tracker hierarchy contains one parent (P) and one child (C1).
2. Thread 1 creates a second child (C2) parented to P. It has the sole ref
   to C2.
3. Thread 2 calls FindTracker() looking for C1.
4. Thread 2 runs as far as the loop in FindTrackerUnlocked(), getting
   descheduled just as it has locked a ref to C2. It also holds P's
   child_trackers_lock_.
5. Thread 1 is rescheduled and drops its ref to C2.
6. Thread 2 is rescheduled. It also drops its ref to C2, which was the last
   ref, so it runs C2's destructor. This acquires P's child_trackers_lock_,
   which it already owns. Boom.

The fix is simple: when looking up a tracker, make a local copy of the list
of children while holding child_trackers_lock_, then iterate without it.
It gets a little complicated due to the unusual requirements of
FindOrCreateTracker(); I ended up introducing a special static lock to
handle that case. The rest of the changes are basically code motion.

Unfortunately, the new test does not fail reliably without the fix. In my
testing, it failed maybe once every few hundred runs. But it doesn't fail at
all with the fix in place.

Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985
Reviewed-on: http://gerrit.cloudera.org:8080/4614
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/consensus/log_cache.cc
M src/kudu/util/cache.cc
M src/kudu/util/mem_tracker-test.cc
M src/kudu/util/mem_tracker.cc
M src/kudu/util/mem_tracker.h
5 files changed, 78 insertions(+), 56 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] mem tracker: fix race between FindTracker() and destructor

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: mem_tracker: fix race between FindTracker() and destructor
......................................................................


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] mem tracker: fix race between FindTracker() and destructor

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has uploaded a new patch set (#2).

Change subject: mem_tracker: fix race between FindTracker() and destructor
......................................................................

mem_tracker: fix race between FindTracker() and destructor

In very rare cases, it's possible for an interleaving between these two
functions to lead to a recursive lock acquisition of child_trackers_lock_
in the destructor.

For example:
1. The tracker hierarchy contains one parent (P) and one child (C1).
2. Thread 1 creates a second child (C2) parented to P. It has the sole ref
   to C2.
3. Thread 2 calls FindTracker() looking for C1.
4. Thread 2 runs as far as the loop in FindTrackerUnlocked(), getting
   descheduled just as it has locked a ref to C2. It also holds P's
   child_trackers_lock_.
5. Thread 1 is rescheduled and drops its ref to C2.
6. Thread 2 is rescheduled. It also drops its ref to C2, which was the last
   ref, so it runs C2's destructor. This acquires P's child_trackers_lock_,
   which it already owns. Boom.

The fix is simple: when looking up a tracker, make a local copy of the list
of children while holding child_trackers_lock_, then iterate without it.
It gets a little complicated due to the unusual requirements of
FindOrCreateTracker(); I ended up introducing a special static lock to
handle that case. The rest of the changes are basically code motion.

Unfortunately, the new test does not fail reliably without the fix. In my
testing, it failed maybe once every few hundred runs. But it doesn't fail at
all with the fix in place.

Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985
---
M src/kudu/consensus/log_cache.cc
M src/kudu/util/cache.cc
M src/kudu/util/mem_tracker-test.cc
M src/kudu/util/mem_tracker.cc
M src/kudu/util/mem_tracker.h
5 files changed, 78 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/4614/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] mem tracker: fix race between FindTracker() and destructor

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: mem_tracker: fix race between FindTracker() and destructor
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4614/1/src/kudu/util/mem_tracker-test.cc
File src/kudu/util/mem_tracker-test.cc:

PS1, Line 348: num_runs
> this variable seems poorly named. perhaps thread_idx?
Well, my original intent here was to run the test 20 times for a shorter duration each time, rather than once for one second.

Along the way I think I made some changes to use it to reproduce my bug too. But, I'm actually going to restore it to be the way it was, so that it's still very clearly the regression test for commit e284a93. I had no good reason to modify it the way I did, and the num_runs thing can be had via gtest_repeat anyway.


-- 
To view, visit http://gerrit.cloudera.org:8080/4614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] mem tracker: fix race between FindTracker() and destructor

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: mem_tracker: fix race between FindTracker() and destructor
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4614/1/src/kudu/util/mem_tracker-test.cc
File src/kudu/util/mem_tracker-test.cc:

PS1, Line 348: num_runs
this variable seems poorly named. perhaps thread_idx?


-- 
To view, visit http://gerrit.cloudera.org:8080/4614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] mem tracker: fix race between FindTracker() and destructor

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: mem_tracker: fix race between FindTracker() and destructor
......................................................................


Patch Set 1: Verified+1

Overriding Jenkins, codegen-test failures due to thirdparty collisions with 1.0.x builds.

-- 
To view, visit http://gerrit.cloudera.org:8080/4614
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69610f782c7a00d161bfb48d2487c29c0309c985
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No