You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2016/02/19 05:48:52 UTC

incubator-kudu git commit: KUDU-1340. Fix a lock order inversion handling timed out RPCs

Repository: incubator-kudu
Updated Branches:
  refs/heads/master 6991cd432 -> d43ec1e9d


KUDU-1340. Fix a lock order inversion handling timed out RPCs

This fixes a lock order inversion reported by TSAN with detect_deadlocks=1:

WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=4232)
  Cycle in lock order graph: M11532 (0x7d440006dd88) => M10486 (0x7d4c00053fb0) => M11532

  Mutex M10486 acquired here while holding mutex M11532 in thread T6:
    #3 kudu::lock_guard<kudu::simple_spinlock>::lock_guard(kudu::simple_spinlock*) /home/todd/git/kudu/build/tsan/../../src/kudu/util/locks.h:241:5 (libtserver.so+0x000000130339)
    #4 kudu::rpc::RpcController::timeout() const /home/todd/git/kudu/build/tsan/../../src/kudu/rpc/rpc_controller.cc:91:31 (libkrpc.so+0x0000000d4506)
    #5 kudu::rpc::OutboundCall::SetTimedOut() /home/todd/git/kudu/build/tsan/../../src/kudu/rpc/outbound_call.cc:262:9 (libkrpc.so+0x00000009d825)

  Mutex M11532 acquired here while holding mutex M10486 in thread T132:
    #3 kudu::lock_guard<kudu::simple_spinlock>::lock_guard(kudu::simple_spinlock*) /home/todd/git/kudu/build/tsan/../../src/kudu/util/locks.h:241:5 (libtserver.so+0x000000130339)
    #4 kudu::rpc::OutboundCall::IsFinished() const /home/todd/git/kudu/build/tsan/../../src/kudu/rpc/outbound_call.cc:274:31 (libkrpc.so+0x00000009c785)
    #5 kudu::rpc::RpcController::finished() const /home/todd/git/kudu/build/tsan/../../src/kudu/rpc/rpc_controller.cc:57:12 (libkrpc.so+0x0000000d41d1)

This would be impossible to trigger in real life, since the only context
in which we call RpcController::finished() is when CHECK()ing that
it is the case. Thus, it wouldn't ever race with a SetTimedOut() call.

However, fixing this is good hygiene, because it seems to be the only lock
order inversion reported by TSAN. With it fixed, I was able to enable
'detect_deadlocks'. As a quick test of whether the deadlock detector is
stable, I submitted 200 copies of linked_list-test:
http://dist-test.cloudera.org//job?job_id=todd.1455851252.15158

Change-Id: I882016e42462da07013a80ba87716773046cd25a
Reviewed-on: http://gerrit.cloudera.org:8080/2240
Reviewed-by: Jean-Daniel Cryans
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Todd Lipcon <to...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/d43ec1e9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/d43ec1e9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/d43ec1e9

Branch: refs/heads/master
Commit: d43ec1e9d5408c146c48bd4f74c1e6ccedb9611b
Parents: 6991cd4
Author: Todd Lipcon <to...@apache.org>
Authored: Thu Feb 18 18:59:26 2016 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Fri Feb 19 04:46:08 2016 +0000

----------------------------------------------------------------------
 build-support/run-test.sh     | 7 -------
 src/kudu/rpc/outbound_call.cc | 5 ++++-
 2 files changed, 4 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/d43ec1e9/build-support/run-test.sh
----------------------------------------------------------------------
diff --git a/build-support/run-test.sh b/build-support/run-test.sh
index 486a4a3..946af40 100755
--- a/build-support/run-test.sh
+++ b/build-support/run-test.sh
@@ -102,13 +102,6 @@ if [ -z "$ASAN_SYMBOLIZER_PATH" ]; then
 fi
 
 # Configure TSAN (ignored if this isn't a TSAN build).
-#
-# Deadlock detection (new in clang 3.5) is disabled because:
-# 1. The clang 3.5 deadlock detector crashes in some Kudu unit tests. It
-#    needs compiler-rt commits c4c3dfd, 9a8efe3, and possibly others.
-# 2. Many unit tests report lock-order-inversion warnings; they should be
-#    fixed before reenabling the detector.
-TSAN_OPTIONS="$TSAN_OPTIONS detect_deadlocks=0"
 TSAN_OPTIONS="$TSAN_OPTIONS suppressions=$SOURCE_ROOT/build-support/tsan-suppressions.txt"
 TSAN_OPTIONS="$TSAN_OPTIONS history_size=7"
 TSAN_OPTIONS="$TSAN_OPTIONS external_symbolizer_path=$ASAN_SYMBOLIZER_PATH"

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/d43ec1e9/src/kudu/rpc/outbound_call.cc
----------------------------------------------------------------------
diff --git a/src/kudu/rpc/outbound_call.cc b/src/kudu/rpc/outbound_call.cc
index e608dad..8c2ef57 100644
--- a/src/kudu/rpc/outbound_call.cc
+++ b/src/kudu/rpc/outbound_call.cc
@@ -253,13 +253,16 @@ void OutboundCall::SetFailed(const Status &status,
 }
 
 void OutboundCall::SetTimedOut() {
+  // We have to fetch timeout outside the lock to avoid a lock
+  // order inversion between this class and RpcController.
+  MonoDelta timeout = controller_->timeout();
   {
     lock_guard<simple_spinlock> l(&lock_);
     status_ = Status::TimedOut(Substitute(
         "$0 RPC to $1 timed out after $2",
         remote_method_.method_name(),
         conn_id_.remote().ToString(),
-        controller_->timeout().ToString()));
+        timeout.ToString()));
     set_state_unlocked(TIMED_OUT);
   }
   CallCallback();