You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2020/12/11 21:05:42 UTC

[kudu-CR](branch-1.12.x) [client] make metacache reset safe

Hello Andrew Wong,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/16863

to review the following change.


Change subject: [client] make metacache reset safe
......................................................................

[client] make metacache reset safe

I noticed that the newly added TxnManagerTest.BeginManyTransactions
test scenario started failing with ASAN heap-use-after-free warnings.

After looking a that, it turned out that the original code was assuming
the cache wouldn't be ever reset before calling the MetaCache's
destructor.  However, changelist 232474a51 introduced a new method
MetaCache::ClearCache() and since then the method is being called upon
altering a table if the partitioning scheme has been updated.

This patch resolves the issue by introducing so-called tablet server
registry that's never reset indeed, where entries in the tablet server
cache are just references to the entries in the registry (they are
raw pointers, actually).

The newly added test scenario was reliably producing AddressSanitizer's
heap-use-after-free warnings every time I ran it using ASAN build.
Below is a snapshot of the relevant traces captured when running the
new test scenario without the changes in the client metacache.

  AddressSanitizer: heap-use-after-free on address 0x608000129e20 at pc 0x00000078bd54 bp 0x7fa731d0b240 sp 0x7fa731d0b238
  READ of size 4 at 0x608000129e20 thread T149 (rpc reactor-146)
      #0 0x78bd53 in base::subtle::NoBarrier_Load(int const volatile*) src/kudu/gutil/atomicops-internals-x86.h:200:10
      #1 0x7fa79520e227 in base::SpinLock::SpinLoop(long, int*) src/kudu/gutil/spinlock.cc:86:10
      #2 0x7fa79520e38b in base::SpinLock::SlowLock() src/kudu/gutil/spinlock.cc:104:25
      #3 0x7fa7a099aab0 in std::unique_lock<kudu::simple_spinlock>::lock() ../../../include/c++/8/bits/std_mutex.h:267:17
      #4 0x7fa7a0991e3e in std::unique_lock<kudu::simple_spinlock>::unique_lock(kudu::simple_spinlock&) ../../../include/c++/8/bits/std_mutex.h:197:2
      #5 0x7fa7a0abfda1 in kudu::client::internal::RemoteTabletServer::InitProxy(kudu::client::KuduClient*, std::function<void (kudu::Status const&)> const&) src/kudu/client/meta_cache.cc:145:39
      #6 0x7fa7a0ac60f5 in kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, kudu::MonoTime const&) src/kudu/client/meta_cache.cc:524:11
      #7 0x7fa7a09b2dcf in kudu::rpc::RetriableRpc<kudu::client::internal::RemoteTabletServer, kudu::tserver::WriteRequestPB, kudu::tserver::WriteResponsePB>::SendRpc() src/kudu/rpc/retriable_rpc.h:163:19
      #8 0x7fa7a09ac6cc in kudu::client::internal::Batcher::FlushBuffer(kudu::client::internal::RemoteTablet*, std::vector<kudu::client::internal::InFlightOp*, std::allocator<kudu::client::internal::InFlightOp*> > const&) src/kudu/client/batcher.cc:911:8
      #9 0x7fa7a09a9e38 in kudu::client::internal::Batcher::FlushBuffersIfReady() src/kudu/client/batcher.cc:884:5
      #10 0x7fa7a09abd2d in kudu::client::internal::Batcher::TabletLookupFinished(kudu::client::internal::InFlightOp*, kudu::Status const&) src/kudu/client/batcher.cc:851:3
      #11 0x7fa7a0acec66 in kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) src/kudu/client/meta_cache.cc:923:3
      #12 0x7fa7a0ab8800 in kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB, kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()() const src/kudu/client/master_proxy_rpc.cc:130:26

  0x608000129e20 is located 0 bytes inside of 96-byte region [0x608000129e20,0x608000129e80)
  freed by thread T163 here:
      #0 0x65c650 in operator delete(void*) thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:160:399
      #1 0x7fa7a0aec859 in void STLDeleteContainerPairSecondPointers<std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true> >(std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true>, std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true>) src/kudu/gutil/stl_util.h:199:5
      #2 0x7fa7a0ad96d1 in void STLDeleteValues<std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*> > > >(std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*> > >*) src/kudu/gutil/stl_util.h:400:3
      #3 0x7fa7a0ad4188 in kudu::client::internal::MetaCache::ClearCache() src/kudu/client/meta_cache.cc:1257:3

  previously allocated by thread T149 (rpc reactor-146) here:
      #0 0x65bc98 in operator new(unsigned long) thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:99:386
      #1 0x7fa7a0ac7bcb in kudu::client::internal::MetaCache::UpdateTabletServerUnlocked(kudu::master::TSInfoPB const&) src/kudu/client/meta_cache.cc:596:48
      #2 0x7fa7a0ad0802 in kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse(kudu::client::KuduTable const*, std::string const&, bool, kudu::master::GetTableLocationsResponsePB const&, kudu::client::internal::MetaCacheEntry*, int) src/kudu/client/meta_cache.cc:1030:7
      #3 0x7fa7a0acf9c0 in kudu::client::internal::MetaCache::ProcessLookupResponse(kudu::client::internal::LookupRpc const&, kudu::client::internal::MetaCacheEntry*, int) src/kudu/client/meta_cache.cc:941:10
      #4 0x7fa7a0ace64e in kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) src/kudu/client/meta_cache.cc:911:31
      #5 0x7fa7a0ab8800 in kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB, kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()() const src/kudu/client/master_proxy_rpc.cc:130:26
      #6 0x7fa79b2c1620 in kudu::rpc::OutboundCall::CallCallback() src/kudu/rpc/outbound_call.cc:274:5
      #7 0x7fa79b2c1af0 in kudu::rpc::OutboundCall::SetResponse(std::unique_ptr<kudu::rpc::CallResponse, std::default_delete<kudu::rpc::CallResponse> >) src/kudu/rpc/outbound_call.cc:306:5
      #8 0x7fa79b26ef5e in kudu::rpc::Connection::HandleCallResponse(std::unique_ptr<kudu::rpc::InboundTransfer, std::default_delete<kudu::rpc::InboundTransfer> >) src/kudu/rpc/connection.cc:735:14
      #9 0x7fa79b26e0d6 in kudu::rpc::Connection::ReadHandler(ev::io&, int) src/kudu/rpc/connection.cc:673:7

  SUMMARY: AddressSanitizer: heap-use-after-free src/kudu/gutil/atomicops-internals-x86.h:200:10 in base::subtle::NoBarrier_Load(int const volatile*)
  Shadow bytes around the buggy address:
    0x0c108001d370: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
    0x0c108001d380: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c108001d390: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
    0x0c108001d3a0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c108001d3b0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  =>0x0c108001d3c0: fa fa fa fa[fd]fd fd fd fd fd fd fd fd fd fd fd
    0x0c108001d3d0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
    0x0c108001d3e0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c108001d3f0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
    0x0c108001d400: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
    0x0c108001d410: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa

Change-Id: I03ec9318526fbfc2da9b068eb3bbd9cd996efbca
Reviewed-on: http://gerrit.cloudera.org:8080/16839
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
(cherry picked from commit ae45cd15c2703e3cfaf62aabb20ba79b961b3135)
  Conflicts:
    src/kudu/client/client-test.cc
    src/kudu/client/client.h
    src/kudu/client/meta_cache.cc
    src/kudu/client/meta_cache.h
(cherry picked from commit b845d2fcbfd3e2c7ac1ae22d5d7fe75ebefb63dc)
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
4 files changed, 67 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/16863/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.12.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I03ec9318526fbfc2da9b068eb3bbd9cd996efbca
Gerrit-Change-Number: 16863
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>

[kudu-CR](branch-1.12.x) [client] make metacache reset safe

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16863 )

Change subject: [client] make metacache reset safe
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.12.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I03ec9318526fbfc2da9b068eb3bbd9cd996efbca
Gerrit-Change-Number: 16863
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 12 Dec 2020 00:25:20 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.12.x) [client] make metacache reset safe

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16863 )

Change subject: [client] make metacache reset safe
......................................................................

[client] make metacache reset safe

I noticed that the newly added TxnManagerTest.BeginManyTransactions
test scenario started failing with ASAN heap-use-after-free warnings.

After looking a that, it turned out that the original code was assuming
the cache wouldn't be ever reset before calling the MetaCache's
destructor.  However, changelist 232474a51 introduced a new method
MetaCache::ClearCache() and since then the method is being called upon
altering a table if the partitioning scheme has been updated.

This patch resolves the issue by introducing so-called tablet server
registry that's never reset indeed, where entries in the tablet server
cache are just references to the entries in the registry (they are
raw pointers, actually).

The newly added test scenario was reliably producing AddressSanitizer's
heap-use-after-free warnings every time I ran it using ASAN build.
Below is a snapshot of the relevant traces captured when running the
new test scenario without the changes in the client metacache.

  AddressSanitizer: heap-use-after-free on address 0x608000129e20 at pc 0x00000078bd54 bp 0x7fa731d0b240 sp 0x7fa731d0b238
  READ of size 4 at 0x608000129e20 thread T149 (rpc reactor-146)
      #0 0x78bd53 in base::subtle::NoBarrier_Load(int const volatile*) src/kudu/gutil/atomicops-internals-x86.h:200:10
      #1 0x7fa79520e227 in base::SpinLock::SpinLoop(long, int*) src/kudu/gutil/spinlock.cc:86:10
      #2 0x7fa79520e38b in base::SpinLock::SlowLock() src/kudu/gutil/spinlock.cc:104:25
      #3 0x7fa7a099aab0 in std::unique_lock<kudu::simple_spinlock>::lock() ../../../include/c++/8/bits/std_mutex.h:267:17
      #4 0x7fa7a0991e3e in std::unique_lock<kudu::simple_spinlock>::unique_lock(kudu::simple_spinlock&) ../../../include/c++/8/bits/std_mutex.h:197:2
      #5 0x7fa7a0abfda1 in kudu::client::internal::RemoteTabletServer::InitProxy(kudu::client::KuduClient*, std::function<void (kudu::Status const&)> const&) src/kudu/client/meta_cache.cc:145:39
      #6 0x7fa7a0ac60f5 in kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, kudu::MonoTime const&) src/kudu/client/meta_cache.cc:524:11
      #7 0x7fa7a09b2dcf in kudu::rpc::RetriableRpc<kudu::client::internal::RemoteTabletServer, kudu::tserver::WriteRequestPB, kudu::tserver::WriteResponsePB>::SendRpc() src/kudu/rpc/retriable_rpc.h:163:19
      #8 0x7fa7a09ac6cc in kudu::client::internal::Batcher::FlushBuffer(kudu::client::internal::RemoteTablet*, std::vector<kudu::client::internal::InFlightOp*, std::allocator<kudu::client::internal::InFlightOp*> > const&) src/kudu/client/batcher.cc:911:8
      #9 0x7fa7a09a9e38 in kudu::client::internal::Batcher::FlushBuffersIfReady() src/kudu/client/batcher.cc:884:5
      #10 0x7fa7a09abd2d in kudu::client::internal::Batcher::TabletLookupFinished(kudu::client::internal::InFlightOp*, kudu::Status const&) src/kudu/client/batcher.cc:851:3
      #11 0x7fa7a0acec66 in kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) src/kudu/client/meta_cache.cc:923:3
      #12 0x7fa7a0ab8800 in kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB, kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()() const src/kudu/client/master_proxy_rpc.cc:130:26

  0x608000129e20 is located 0 bytes inside of 96-byte region [0x608000129e20,0x608000129e80)
  freed by thread T163 here:
      #0 0x65c650 in operator delete(void*) thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:160:399
      #1 0x7fa7a0aec859 in void STLDeleteContainerPairSecondPointers<std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true> >(std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true>, std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true>) src/kudu/gutil/stl_util.h:199:5
      #2 0x7fa7a0ad96d1 in void STLDeleteValues<std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*> > > >(std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*> > >*) src/kudu/gutil/stl_util.h:400:3
      #3 0x7fa7a0ad4188 in kudu::client::internal::MetaCache::ClearCache() src/kudu/client/meta_cache.cc:1257:3

  previously allocated by thread T149 (rpc reactor-146) here:
      #0 0x65bc98 in operator new(unsigned long) thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:99:386
      #1 0x7fa7a0ac7bcb in kudu::client::internal::MetaCache::UpdateTabletServerUnlocked(kudu::master::TSInfoPB const&) src/kudu/client/meta_cache.cc:596:48
      #2 0x7fa7a0ad0802 in kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse(kudu::client::KuduTable const*, std::string const&, bool, kudu::master::GetTableLocationsResponsePB const&, kudu::client::internal::MetaCacheEntry*, int) src/kudu/client/meta_cache.cc:1030:7
      #3 0x7fa7a0acf9c0 in kudu::client::internal::MetaCache::ProcessLookupResponse(kudu::client::internal::LookupRpc const&, kudu::client::internal::MetaCacheEntry*, int) src/kudu/client/meta_cache.cc:941:10
      #4 0x7fa7a0ace64e in kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) src/kudu/client/meta_cache.cc:911:31
      #5 0x7fa7a0ab8800 in kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB, kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()() const src/kudu/client/master_proxy_rpc.cc:130:26
      #6 0x7fa79b2c1620 in kudu::rpc::OutboundCall::CallCallback() src/kudu/rpc/outbound_call.cc:274:5
      #7 0x7fa79b2c1af0 in kudu::rpc::OutboundCall::SetResponse(std::unique_ptr<kudu::rpc::CallResponse, std::default_delete<kudu::rpc::CallResponse> >) src/kudu/rpc/outbound_call.cc:306:5
      #8 0x7fa79b26ef5e in kudu::rpc::Connection::HandleCallResponse(std::unique_ptr<kudu::rpc::InboundTransfer, std::default_delete<kudu::rpc::InboundTransfer> >) src/kudu/rpc/connection.cc:735:14
      #9 0x7fa79b26e0d6 in kudu::rpc::Connection::ReadHandler(ev::io&, int) src/kudu/rpc/connection.cc:673:7

  SUMMARY: AddressSanitizer: heap-use-after-free src/kudu/gutil/atomicops-internals-x86.h:200:10 in base::subtle::NoBarrier_Load(int const volatile*)
  Shadow bytes around the buggy address:
    0x0c108001d370: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
    0x0c108001d380: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c108001d390: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
    0x0c108001d3a0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c108001d3b0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  =>0x0c108001d3c0: fa fa fa fa[fd]fd fd fd fd fd fd fd fd fd fd fd
    0x0c108001d3d0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
    0x0c108001d3e0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c108001d3f0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
    0x0c108001d400: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
    0x0c108001d410: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa

Change-Id: I03ec9318526fbfc2da9b068eb3bbd9cd996efbca
Reviewed-on: http://gerrit.cloudera.org:8080/16839
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
(cherry picked from commit ae45cd15c2703e3cfaf62aabb20ba79b961b3135)
  Conflicts:
    src/kudu/client/client-test.cc
    src/kudu/client/client.h
    src/kudu/client/meta_cache.cc
    src/kudu/client/meta_cache.h
(cherry picked from commit b845d2fcbfd3e2c7ac1ae22d5d7fe75ebefb63dc)
Reviewed-on: http://gerrit.cloudera.org:8080/16863
Tested-by: Kudu Jenkins
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
4 files changed, 67 insertions(+), 15 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/16863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.12.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I03ec9318526fbfc2da9b068eb3bbd9cd996efbca
Gerrit-Change-Number: 16863
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)