You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2017/11/29 05:00:38 UTC

[kudu-CR] transactions: check tablet before checking stopped

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8673


Change subject: transactions: check tablet before checking stopped
......................................................................

transactions: check tablet before checking stopped

This patch cleans up an unsafe pointer access to Tablet by using the
commonly-used pattern of grabbing a std::shared_ptr<Tablet> and
operating on it within an appropriate scope.

Previously, I saw one failure in multiple thousands of dist-test runs of
an upcoming disk failure test case in exactly_once_writes-itest caused
by this.

Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
---
M src/kudu/tablet/transactions/transaction_driver.cc
1 file changed, 6 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/8673/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] transactions: check tablet before checking stopped

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )

Change subject: transactions: check tablet before checking stopped
......................................................................


Patch Set 2:

> Patch Set 2:
> 
> Hrm, nullptr Tablet in the transaction driver is a case used in tests that shouldn't return errors.
> 
> In prod it doesn't really matter because when it gets to the prepare phase, the Tablet still _must_ exist in order to run (if the Tablet has been destroyed, the prepare pool token will also have been shut down).

But you saw it in a test run? Do you still have the stacktrace and the log?


-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 06:21:00 +0000
Gerrit-HasComments: No

[kudu-CR] transactions: check tablet before checking stopped

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has removed Misty Stanley-Jones from this change.  ( http://gerrit.cloudera.org:8080/8673 )

Change subject: transactions: check tablet before checking stopped
......................................................................


Removed reviewer Misty Stanley-Jones.
-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] transactions: check tablet before checking stopped

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )

Change subject: transactions: check tablet before checking stopped
......................................................................


Patch Set 3: Code-Review+2

thanks for the details!


-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 07:49:12 +0000
Gerrit-HasComments: No

[kudu-CR] transactions: check tablet before checking stopped

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )

Change subject: transactions: check tablet before checking stopped
......................................................................


Patch Set 3:

Failure was a flake of ts_recovery-itest, it seems like.


-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 07:18:16 +0000
Gerrit-HasComments: No

[kudu-CR] transactions: check tablet before checking stopped

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )

Change subject: transactions: check tablet before checking stopped
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 05:05:21 +0000
Gerrit-HasComments: No

[kudu-CR] transactions: check tablet before checking stopped

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )

Change subject: transactions: check tablet before checking stopped
......................................................................


Patch Set 2:

Yep, found here: http://dist-test.cloudera.org/job?job_id=awong.1511905845.38852, and also added the un-mangled stack trace in the commit message.

In prod it doesn't matter that the Tablet has been deleted/is null; it only matters that we've handled that case (and don't call HasBeenStopped), which wasn't the case before.


-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 06:51:28 +0000
Gerrit-HasComments: No

[kudu-CR] transactions: check tablet before checking stopped

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8673

to look at the new patch set (#3).

Change subject: transactions: check tablet before checking stopped
......................................................................

transactions: check tablet before checking stopped

This patch cleans up an unsafe pointer access to Tablet by using the
commonly-used pattern of grabbing a std::shared_ptr<Tablet> and
operating on it within an appropriate scope.

Previously, I saw one failure in multiple thousands of dist-test runs of
an upcoming disk failure test case in exactly_once_writes-itest caused
by this:

*** Aborted at 1511907337 (unix time) try "date -d @1511907337" if you are using GNU date ***
PC: @           0x4a327b __tsan_atomic32_compare_exchange_strong
*** SIGSEGV (@0x240) received by PID 11526 (TID 0x7f5a1a6f8700) from PID 576; stack trace: ***
    @           0x44c8e2 __tsan::CallUserSignalHandler() at
    @           0x44d84d rtl_sigaction() at
    @     0x7f5a3a1e9330 (unknown) at
    @           0x4a327b __tsan_atomic32_compare_exchange_strong at
    @     0x7f5a3e95ca74 base::SpinLock::Lock() at
    @     0x7f5a3e95ca19 kudu::simple_spinlock::lock() at
    @     0x7f5a3d42f055 kudu::tablet::Tablet::HasBeenStopped() at
    @     0x7f5a3d547015 kudu::tablet::TransactionDriver::Init() at
    @     0x7f5a3d520441 kudu::tablet::TabletReplica::NewLeaderTransactionDriver() at
    @     0x7f5a3d51fea9 kudu::tablet::TabletReplica::SubmitWrite() at
    @     0x7f5a3ea0edad kudu::tserver::TabletServiceImpl::Write() at
    @     0x7f5a38cea594 kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_3::operator()() at
    @     0x7f5a38cea4c9 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_3PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_ at
    @     0x7f5a38cea228 std::__1::__function::__func<>::operator()() at
    @     0x7f5a37d6b023 std::__1::function<>::operator()() at
    @     0x7f5a37d69fba kudu::rpc::GeneratedServiceIf::Handle() at
    @     0x7f5a37d6e9f1 kudu::rpc::ServicePool::RunThread() at
    @     0x7f5a37d734a4 boost::_mfi::mf0<>::operator()() at
    @     0x7f5a37d7335e boost::_bi::list1<>::operator()<>() at
    @     0x7f5a37d732a2 boost::_bi::bind_t<>::operator()() at
    @     0x7f5a37d72e91 boost::detail::function::void_function_obj_invoker0<>::invoke() at
    @     0x7f5a37ca0549 boost::function0<>::operator()() at
    @     0x7f5a360b225e kudu::Thread::SuperviseThread() at
    @           0x44c7bc __tsan_thread_start_func at
    @     0x7f5a3a1e1184 start_thread at
    @     0x7f5a32273ffd clone at
    @                0x0 (unknown)

Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
---
M src/kudu/tablet/transactions/transaction_driver.cc
1 file changed, 6 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/8673/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] transactions: check tablet before checking stopped

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8673 )

Change subject: transactions: check tablet before checking stopped
......................................................................

transactions: check tablet before checking stopped

This patch cleans up an unsafe pointer access to Tablet by using the
commonly-used pattern of grabbing a std::shared_ptr<Tablet> and
operating on it within an appropriate scope.

Previously, I saw one failure in multiple thousands of dist-test runs of
an upcoming disk failure test case in exactly_once_writes-itest caused
by this:

*** Aborted at 1511907337 (unix time) try "date -d @1511907337" if you are using GNU date ***
PC: @           0x4a327b __tsan_atomic32_compare_exchange_strong
*** SIGSEGV (@0x240) received by PID 11526 (TID 0x7f5a1a6f8700) from PID 576; stack trace: ***
    @           0x44c8e2 __tsan::CallUserSignalHandler() at
    @           0x44d84d rtl_sigaction() at
    @     0x7f5a3a1e9330 (unknown) at
    @           0x4a327b __tsan_atomic32_compare_exchange_strong at
    @     0x7f5a3e95ca74 base::SpinLock::Lock() at
    @     0x7f5a3e95ca19 kudu::simple_spinlock::lock() at
    @     0x7f5a3d42f055 kudu::tablet::Tablet::HasBeenStopped() at
    @     0x7f5a3d547015 kudu::tablet::TransactionDriver::Init() at
    @     0x7f5a3d520441 kudu::tablet::TabletReplica::NewLeaderTransactionDriver() at
    @     0x7f5a3d51fea9 kudu::tablet::TabletReplica::SubmitWrite() at
    @     0x7f5a3ea0edad kudu::tserver::TabletServiceImpl::Write() at
    @     0x7f5a38cea594 kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_3::operator()() at
    @     0x7f5a38cea4c9 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_3PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_ at
    @     0x7f5a38cea228 std::__1::__function::__func<>::operator()() at
    @     0x7f5a37d6b023 std::__1::function<>::operator()() at
    @     0x7f5a37d69fba kudu::rpc::GeneratedServiceIf::Handle() at
    @     0x7f5a37d6e9f1 kudu::rpc::ServicePool::RunThread() at
    @     0x7f5a37d734a4 boost::_mfi::mf0<>::operator()() at
    @     0x7f5a37d7335e boost::_bi::list1<>::operator()<>() at
    @     0x7f5a37d732a2 boost::_bi::bind_t<>::operator()() at
    @     0x7f5a37d72e91 boost::detail::function::void_function_obj_invoker0<>::invoke() at
    @     0x7f5a37ca0549 boost::function0<>::operator()() at
    @     0x7f5a360b225e kudu::Thread::SuperviseThread() at
    @           0x44c7bc __tsan_thread_start_func at
    @     0x7f5a3a1e1184 start_thread at
    @     0x7f5a32273ffd clone at
    @                0x0 (unknown)

Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Reviewed-on: http://gerrit.cloudera.org:8080/8673
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/tablet/transactions/transaction_driver.cc
1 file changed, 6 insertions(+), 2 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Mike Percy: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] transactions: check tablet before checking stopped

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )

Change subject: transactions: check tablet before checking stopped
......................................................................


Patch Set 2:

Hrm, nullptr Tablet in the transaction driver is a case used in tests that shouldn't return errors.

In prod it doesn't really matter because when it gets to the prepare phase, the Tablet still _must_ exist in order to run (if the Tablet has been destroyed, the prepare pool token will also have been shut down).


-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 06:09:21 +0000
Gerrit-HasComments: No

[kudu-CR] transactions: check tablet before checking stopped

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8673

to look at the new patch set (#2).

Change subject: transactions: check tablet before checking stopped
......................................................................

transactions: check tablet before checking stopped

This patch cleans up an unsafe pointer access to Tablet by using the
commonly-used pattern of grabbing a std::shared_ptr<Tablet> and
operating on it within an appropriate scope.

Previously, I saw one failure in multiple thousands of dist-test runs of
an upcoming disk failure test case in exactly_once_writes-itest caused
by this.

Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
---
M src/kudu/tablet/transactions/transaction_driver.cc
1 file changed, 6 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/8673/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>