You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2021/03/04 06:09:01 UTC

[kudu-CR](branch-1.13.x) WIP [client] add test scenario to expose bug in meta-cache

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17152


Change subject: WIP [client] add test scenario to expose bug in meta-cache
......................................................................

WIP [client] add test scenario to expose bug in meta-cache

WIP:
  * fix the issue
  * enable the scenario to catch any regressions

This patch add a scenario reproducing a SIGSEGV crash in Kudu client
when working with scan tokens which contain information about
tablet locations.

The scenario is disabled for now, because is simply crashes otherwise.
The bug doesn't manifest itself in a crash in Kudu 1.14 and current
HEAD version in upstream because the crash culprit (FindOrDie() call)
was changed into FindOrNull() with changelist
https://github.com/apache/kudu/commit/2a558768f8aa00068e72ccd1327081f07ba46b03

I haven't yet checked whether there are any other issues due to stale
scan tokens in Kudu 1.14, but at least Kudu client doesn't crash with
this scenario.

The crash stack is the following on macOS:

  * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120
    frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3
    frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3
    frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442
    frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5
    frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5
    frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37
    frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3
    frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23
    frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35
    frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10
    frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10

Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
---
M src/kudu/client/scan_token-test.cc
1 file changed, 129 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/17152/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17152 )

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@932
PS6, Line 932:       if (remote.get() != nullptr) {
> Yep: in pre-KUDU-1802 case, the crash could only happen if the range partit
A small update: in pre-KUDU-1802, a crash could only happen if a range was dropped and replaced with the same range again _and_ the UUID of the new tablet is the same as it was.  The latter is virtually impossible because UUID would not be the same.  So, in pre-KUDU-1802 case this would not trigger.



-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 07 Mar 2021 15:45:23 +0000
Gerrit-HasComments: Yes

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17152 )

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc
File src/kudu/client/scan_token-test.cc:

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc@1052
PS6, Line 1052:       expected_row_cou
> OK, I guess keeping it uniform makes more sense: replaced with FALLTHROUGH_
Thanks for the follow-up!



-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 08 Mar 2021 06:22:48 +0000
Gerrit-HasComments: Yes

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17152

to look at the new patch set (#6).

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................

KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

This patch fixes an issue resulting in a SIGABRT crash in Kudu client
when working with stale scan tokens which contain information about
tablet locations for a table (see KUDU-1802) whose range partition
was dropped.  The patch also adds a test scenario reproducing the crash;
now it passes and can catch future regressions.

This patch is a follow-up to d23ee5d38ddc4317f431dd65df0c825c00cc968a.

Prior the change in src/kudu/client/meta_cache.cc was back-ported from
Kudu 1.14 as part of this fix, the scenario crashed with SIGABRT when
running with the stack trace similar to the following (this one below
was captured on macOS):

  * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120
    frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3
    frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3
    frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442
    frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5
    frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5
    frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37
    frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3
    frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23
    frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35
    frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10
    frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10

Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
---
M src/kudu/client/meta_cache.cc
M src/kudu/client/scan_token-test.cc
2 files changed, 265 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/17152/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR](branch-1.13.x) WIP [client] add test scenario to expose bug in meta-cache

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17152

to look at the new patch set (#2).

Change subject: WIP [client] add test scenario to expose bug in meta-cache
......................................................................

WIP [client] add test scenario to expose bug in meta-cache

DONT_BUILD

WIP:
  * fix the issue
  * enable the scenario to catch any regressions

This patch adds a scenario reproducing a SIGSEGV crash in Kudu client
when working with scan tokens which contain information about
tablet locations.

The scenario is disabled for now, because is simply crashes otherwise.
The bug doesn't manifest itself in a crash in Kudu 1.14 and current
HEAD version in upstream because the crash culprit (FindOrDie() call)
was changed with changelist
https://github.com/apache/kudu/commit/2a558768f8aa00068e72ccd1327081f07ba46b03

I haven't yet checked whether there are any other issues due to stale
scan tokens in Kudu 1.14, but at least Kudu client doesn't crash with
this scenario.

The crash stack is the following on macOS:

  * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120
    frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3
    frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3
    frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442
    frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5
    frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5
    frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37
    frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3
    frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23
    frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35
    frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10
    frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10

Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
---
M src/kudu/client/scan_token-test.cc
1 file changed, 129 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/17152/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17152 )

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................


Patch Set 7: Verified+1

unrelated build failures


-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 07 Mar 2021 15:45:45 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17152 )

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 06 Mar 2021 22:51:33 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17152 )

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................


Patch Set 6:

(5 comments)

> (2 comments)
 > 
 > Thanks for the fix and test. This test should also be added to
 > master right?
 > 
 > It looks like there are test failures due to unrelated python infra
 > issues on the branch.

Yep, I'm planning to back-port the test scenario to the main branch from 1.13.x: the 'fix' is already there.

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@932
PS6, Line 932:       if (remote.get() != nullptr) {
> Ah, so this issue only exists if a range partition was dropped and then rep
Yep: in pre-KUDU-1802 case, the crash could only happen if the range partition was dropped and then replaced with the same range again.  After KUDU-1802, the crash happened even if a range was just dropped, when the client instance was fed with 'stale' scan tokens.

In the test scenario, the RANGE_DROPPED case is the essence of the culprit, the rest are just variations on top.  I added sub-scenarios for larger and smaller ranges in there just to explicitly document how that case works w.r.t. discovering the corresponding 'mapped' ranges, so it's clear what rows are read given a set of 'stale' scan tokens.


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@943
PS6, Line 943:         auto* entry = FindOrNull(tablets_by_key, tablet_lower_bound);
> I generally disagree that we shouldn't use FindOrDie in production code. It
Yep, I agree: FindOrDie() should be used when there isn't a way to gracefully handle the situation otherwise.  For example, in many cases continuing further could damage the integrity of the data or even worse -- that's definitely a case to use FindOrDie().  In all other cases, it makes sense to use FindOrNull() or alike, sure.

FWIW, I think it makes sense to revise the usage of FindOrDie() in the code since I suspect that in many cases FindOrDie() is used only because it was easier to write it like that and not handle the error otherwise, even if it were possible to handle the error condition gracefully.


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@955
PS6, Line 955: 							
> nit: replace with spaces
Done


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc
File src/kudu/client/scan_token-test.cc:

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc@950
PS6, Line 950:        
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc@1052
PS6, Line 1052:       [[fallthrough]];
> nit: maybe use FALLTHROUGH_INTENDED? Or maybe we should deprecate that in f
OK, I guess keeping it uniform makes more sense: replaced with FALLTHROUGH_INTENDED.  I think I can post a separate patch replacing FALLTHROUGH_INTENDED with [[fallthrough]].



-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 07 Mar 2021 07:18:25 +0000
Gerrit-HasComments: Yes

[kudu-CR](branch-1.13.x) WIP [client] add test scenario to expose bug in meta-cache

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17152

to look at the new patch set (#4).

Change subject: WIP [client] add test scenario to expose bug in meta-cache
......................................................................

WIP [client] add test scenario to expose bug in meta-cache

DONT_BUILD

WIP:
  * file JIRA ticket and use it in the commit description
  * fix the issue
  * enable the scenario to catch any regressions

This patch adds a scenario reproducing a SIGABRT crash in Kudu client
when working with stale scan tokens which contain information about
tablet locations for a table whose range partition was dropped.

Tablet location information is put into scan tokens since addressing
KUDU-1802 with d23ee5d38ddc4317f431dd65df0c825c00cc968a.

As of now, the scenario crashes with SIGABRT when running.  BTW,
the issue doesn't manifest itself in a crash in Kudu 1.14 and current
HEAD version in upstream because the crash culprit (FindOrDie() call)
was removed with changelist 2a558768f8aa00068e72ccd1327081f07ba46b03.

I haven't yet checked whether there are any other issues due to stale
scan tokens.

The crash stack is the following on macOS:

  * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120
    frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3
    frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3
    frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442
    frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5
    frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5
    frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37
    frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3
    frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23
    frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35
    frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10
    frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10

Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
---
M src/kudu/client/scan_token-test.cc
1 file changed, 129 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/17152/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17152 )

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................


Patch Set 6: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@943
PS6, Line 943:         auto* entry = FindOrNull(tablets_by_key, tablet_lower_bound);
> Might be a good idea to audit the codebase for other usages of FindOrDie. W
I generally disagree that we shouldn't use FindOrDie in production code. It has its place in the codebase for areas where we are provably guaranteed that the element exists. That said, it is definitely important upon reviewing and in testing that we ensure that such a guarantee exists (and isn't reliant on assumptions that can be broken). Well used, FindOrDie has the nice property of telling readers that such a guarantee does exist, similar to our usages of other *OrDie methods.


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@955
PS6, Line 955: 							
nit: replace with spaces


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc
File src/kudu/client/scan_token-test.cc:

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc@950
PS6, Line 950:        
nit: spacing


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/scan_token-test.cc@1052
PS6, Line 1052:       [[fallthrough]];
nit: maybe use FALLTHROUGH_INTENDED? Or maybe we should deprecate that in favor of [[fallthrough]].



-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 06 Mar 2021 23:33:13 +0000
Gerrit-HasComments: Yes

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17152 )

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................


Patch Set 6: Code-Review+1

(2 comments)

Thanks for the fix and test. This test should also be added to master right? 

It looks like there are test failures due to unrelated python infra issues on the branch.

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@932
PS6, Line 932:       if (remote.get() != nullptr) {
Ah, so this issue only exists if a range partition was dropped and then replaced with a new range that overlaps the dropped range or replaces the dropped range?


http://gerrit.cloudera.org:8080/#/c/17152/6/src/kudu/client/meta_cache.cc@943
PS6, Line 943:         auto* entry = FindOrNull(tablets_by_key, tablet_lower_bound);
Might be a good idea to audit the codebase for other usages of FindOrDie. We probably don't want that anywhere for production code if possible. Not related specifically to this pass just a passing comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 06 Mar 2021 22:51:27 +0000
Gerrit-HasComments: Yes

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has removed a vote on this change.

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR](branch-1.13.x) WIP [client] add test scenario to expose bug in meta-cache

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17152

to look at the new patch set (#3).

Change subject: WIP [client] add test scenario to expose bug in meta-cache
......................................................................

WIP [client] add test scenario to expose bug in meta-cache

DONT_BUILD

WIP:
  * file JIRA ticket and use it in the commit description
  * fix the issue
  * enable the scenario to catch any regressions

This patch adds a scenario reproducing a SIGABRT crash in Kudu client
when working with stale scan tokens which contain information about
tablet locations for a table whose range partition was dropped.

Tablet location information is put into scan tokens since addressing
KUDU-1802 with d23ee5d38ddc4317f431dd65df0c825c00cc968a.

As of now, the scenario crashes with SIGABRT when running.  BTW,
the issue doesn't manifest itself in a crash in Kudu 1.14 and current
HEAD version in upstream because the crash culprit (FindOrDie() call)
was removed with changelist 2a558768f8aa00068e72ccd1327081f07ba46b03.

I haven't yet checked whether there are any other issues due to stale
scan tokens.

The crash stack is the following on macOS:

  * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120
    frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3
    frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3
    frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442
    frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5
    frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5
    frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37
    frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3
    frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23
    frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35
    frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10
    frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10

Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
---
M src/kudu/client/scan_token-test.cc
1 file changed, 129 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/17152/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17152

to look at the new patch set (#7).

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................

KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

This patch fixes an issue resulting in a SIGABRT crash in Kudu client
when working with stale scan tokens which contain information about
tablet locations for a table (see KUDU-1802) whose range partition
was dropped.  The patch also adds a test scenario reproducing the crash;
now it passes and can catch future regressions.

This patch is a follow-up to d23ee5d38ddc4317f431dd65df0c825c00cc968a.

Prior the change in src/kudu/client/meta_cache.cc was back-ported from
Kudu 1.14 as part of this fix, the scenario crashed with SIGABRT when
running with the stack trace similar to the following (this one below
was captured on macOS):

  * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120
    frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3
    frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3
    frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442
    frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5
    frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5
    frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37
    frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3
    frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23
    frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35
    frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10
    frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10

Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
---
M src/kudu/client/meta_cache.cc
M src/kudu/client/scan_token-test.cc
2 files changed, 264 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/17152/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17152

to look at the new patch set (#5).

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................

KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

This patch fixes an issue resulting in a SIGABRT crash in Kudu client
when working with stale scan tokens which contain information about
tablet locations for a table (see KUDU-1802) whose range partition
was dropped.  The patch also adds a test scenario reproducing the crash;
now it passes and can catch regressions correspondingly.

This patch is a follow-up to d23ee5d38ddc4317f431dd65df0c825c00cc968a.

Prior to the change in src/kudu/client/meta_cache.cc back-ported from
Kudu 1.14, the scenario crashed with SIGABRT when running, with stack
trace similar to the following (this one below captured at macOS):

  * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120
    frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3
    frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3
    frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442
    frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5
    frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5
    frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37
    frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3
    frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23
    frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35
    frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10
    frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10

Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
---
M src/kudu/client/meta_cache.cc
M src/kudu/client/scan_token-test.cc
2 files changed, 264 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/17152/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR](branch-1.13.x) KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17152 )

Change subject: KUDU-3254 fix bug in meta-cache exposed by KUDU-1802
......................................................................

KUDU-3254 fix bug in meta-cache exposed by KUDU-1802

This patch fixes an issue resulting in a SIGABRT crash in Kudu client
when working with stale scan tokens which contain information about
tablet locations for a table (see KUDU-1802) whose range partition
was dropped.  The patch also adds a test scenario reproducing the crash;
now it passes and can catch future regressions.

This patch is a follow-up to d23ee5d38ddc4317f431dd65df0c825c00cc968a.

Prior the change in src/kudu/client/meta_cache.cc was back-ported from
Kudu 1.14 as part of this fix, the scenario crashed with SIGABRT when
running with the stack trace similar to the following (this one below
was captured on macOS):

  * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120
    frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3
    frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3
    frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442
    frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5
    frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5
    frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37
    frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3
    frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23
    frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35
    frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10
    frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10

Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Reviewed-on: http://gerrit.cloudera.org:8080/17152
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/client/meta_cache.cc
M src/kudu/client/scan_token-test.cc
2 files changed, 264 insertions(+), 18 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/17152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.13.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19
Gerrit-Change-Number: 17152
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)