You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2018/03/30 03:19:40 UTC

[kudu-CR] KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp

Hello David Ribeiro Alves,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/9866

to review the following change.


Change subject: KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp
......................................................................

KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp

This fixes the following TSAN race:

WARNING: ThreadSanitizer: data race (pid=17822)  Read of size 1 at 0x7b4c000054e8 by thread T59 (mutexes: write M1750):
...
    #3 strings::internal::SubstituteArg::SubstituteArg(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/strings/substitute.h:76 (libtserver.so+0x9edb0)
    #4 kudu::MaintenanceManager::LogPrefix() const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:545:31 (libkudu_util.so+0x167791)
    #5 kudu::MaintenanceManager::UnregisterOp(kudu::MaintenanceOp*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:235:3 (libkudu_util.so+0x165963)
    #6 kudu::MaintenanceOp::Unregister() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:123:13 (libkudu_util.so+0x1654fe)
    #7 kudu::tablet::Tablet::UnregisterMaintenanceOps() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet.cc:1405:9 (libtablet.so+0xfb5af)
    #8 kudu::tablet::TabletReplica::Stop() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet_replica.cc:271:25 (libtablet.so+0x146e66)
    #9 kudu::tserver::TSTabletManager::DeleteTablet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > c

  Previous write of size 8 at 0x7b4c000054e8 by main thread:
    #0 memcpy /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:655 (kudu-tserver+0x449e4c)
    #1 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__move_assign(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, std::__1::integral_constant<bool, true>) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:2044:18 (libkudu_util.so+0x16664d)
    #2 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::operator=(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:2055 (libkudu_util.so+0x16664d)
    #3 kudu::MaintenanceManager::Init(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:169 (libkudu_util.so+0x16664d)
...

The race is on the 'server_uuid_' field in the MaintenanceManager. This is set
during startup, but was being set _after_ calls such as UnregisterOp could be
made as seen above. That means the UnregisterOp call could either see an empty
UUID or even crash due to the above race.

This simply rejiggers the MaintenanceManager startup to take the UUID in as a
constructor parameter instead, and to instantiate the object slightly later
during startup.

Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16
---
M src/kudu/master/master.cc
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
5 files changed, 24 insertions(+), 16 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/9866/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9866
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16
Gerrit-Change-Number: 9866
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>

[kudu-CR] KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9866 )

Change subject: KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc
File src/kudu/tserver/tablet_server.cc:

http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc@97
PS1, Line 97:   maintenance_manager_.reset(new MaintenanceManager(
> I think this should move up above the path handler registration, because th
yea, it doesn't start actually accepting connections until you call Start() which happens in ServerBase::Start(), called by TabletServer::Start(), which happens after Init()



-- 
To view, visit http://gerrit.cloudera.org:8080/9866
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16
Gerrit-Change-Number: 9866
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:06:12 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9866 )

Change subject: KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp
......................................................................


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc
File src/kudu/tserver/tablet_server.cc:

http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc@97
PS1, Line 97:   maintenance_manager_.reset(new MaintenanceManager(
I think this should move up above the path handler registration, because the MM can be dereferenced by a handler.

Unless we're not actually listening on any ports yet, in which case it's fine.



-- 
To view, visit http://gerrit.cloudera.org:8080/9866
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16
Gerrit-Change-Number: 9866
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Fri, 30 Mar 2018 14:59:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9866 )

Change subject: KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp
......................................................................

KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp

This fixes the following TSAN race:

WARNING: ThreadSanitizer: data race (pid=17822)  Read of size 1 at 0x7b4c000054e8 by thread T59 (mutexes: write M1750):
...
    #3 strings::internal::SubstituteArg::SubstituteArg(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/strings/substitute.h:76 (libtserver.so+0x9edb0)
    #4 kudu::MaintenanceManager::LogPrefix() const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:545:31 (libkudu_util.so+0x167791)
    #5 kudu::MaintenanceManager::UnregisterOp(kudu::MaintenanceOp*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:235:3 (libkudu_util.so+0x165963)
    #6 kudu::MaintenanceOp::Unregister() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:123:13 (libkudu_util.so+0x1654fe)
    #7 kudu::tablet::Tablet::UnregisterMaintenanceOps() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet.cc:1405:9 (libtablet.so+0xfb5af)
    #8 kudu::tablet::TabletReplica::Stop() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet_replica.cc:271:25 (libtablet.so+0x146e66)
    #9 kudu::tserver::TSTabletManager::DeleteTablet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > c

  Previous write of size 8 at 0x7b4c000054e8 by main thread:
    #0 memcpy /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:655 (kudu-tserver+0x449e4c)
    #1 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__move_assign(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, std::__1::integral_constant<bool, true>) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:2044:18 (libkudu_util.so+0x16664d)
    #2 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::operator=(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:2055 (libkudu_util.so+0x16664d)
    #3 kudu::MaintenanceManager::Init(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:169 (libkudu_util.so+0x16664d)
...

The race is on the 'server_uuid_' field in the MaintenanceManager. This is set
during startup, but was being set _after_ calls such as UnregisterOp could be
made as seen above. That means the UnregisterOp call could either see an empty
UUID or even crash due to the above race.

This simply rejiggers the MaintenanceManager startup to take the UUID in as a
constructor parameter instead, and to instantiate the object slightly later
during startup.

Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16
Reviewed-on: http://gerrit.cloudera.org:8080/9866
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/master/master.cc
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
5 files changed, 24 insertions(+), 16 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/9866
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16
Gerrit-Change-Number: 9866
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>