You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2017/11/29 05:00:38 UTC
[kudu-CR] transactions: check tablet before checking stopped
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8673
Change subject: transactions: check tablet before checking stopped
......................................................................
transactions: check tablet before checking stopped
This patch cleans up an unsafe pointer access to Tablet by using the
commonly-used pattern of grabbing a std::shared_ptr<Tablet> and
operating on it within an appropriate scope.
Previously, I saw one failure in multiple thousands of dist-test runs of
an upcoming disk failure test case in exactly_once_writes-itest caused
by this.
Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
---
M src/kudu/tablet/transactions/transaction_driver.cc
1 file changed, 6 insertions(+), 2 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/8673/1
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
[kudu-CR] transactions: check tablet before checking stopped
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )
Change subject: transactions: check tablet before checking stopped
......................................................................
Patch Set 2:
> Patch Set 2:
>
> Hrm, nullptr Tablet in the transaction driver is a case used in tests that shouldn't return errors.
>
> In prod it doesn't really matter because when it gets to the prepare phase, the Tablet still _must_ exist in order to run (if the Tablet has been destroyed, the prepare pool token will also have been shut down).
But you saw it in a test run? Do you still have the stacktrace and the log?
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 06:21:00 +0000
Gerrit-HasComments: No
[kudu-CR] transactions: check tablet before checking stopped
Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has removed Misty Stanley-Jones from this change. ( http://gerrit.cloudera.org:8080/8673 )
Change subject: transactions: check tablet before checking stopped
......................................................................
Removed reviewer Misty Stanley-Jones.
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] transactions: check tablet before checking stopped
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )
Change subject: transactions: check tablet before checking stopped
......................................................................
Patch Set 3: Code-Review+2
thanks for the details!
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 07:49:12 +0000
Gerrit-HasComments: No
[kudu-CR] transactions: check tablet before checking stopped
Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )
Change subject: transactions: check tablet before checking stopped
......................................................................
Patch Set 3:
Failure was a flake of ts_recovery-itest, it seems like.
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 07:18:16 +0000
Gerrit-HasComments: No
[kudu-CR] transactions: check tablet before checking stopped
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )
Change subject: transactions: check tablet before checking stopped
......................................................................
Patch Set 1: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 05:05:21 +0000
Gerrit-HasComments: No
[kudu-CR] transactions: check tablet before checking stopped
Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )
Change subject: transactions: check tablet before checking stopped
......................................................................
Patch Set 2:
Yep, found here: http://dist-test.cloudera.org/job?job_id=awong.1511905845.38852, and also added the un-mangled stack trace in the commit message.
In prod it doesn't matter that the Tablet has been deleted/is null; it only matters that we've handled that case (and don't call HasBeenStopped), which wasn't the case before.
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 06:51:28 +0000
Gerrit-HasComments: No
[kudu-CR] transactions: check tablet before checking stopped
Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8673
to look at the new patch set (#3).
Change subject: transactions: check tablet before checking stopped
......................................................................
transactions: check tablet before checking stopped
This patch cleans up an unsafe pointer access to Tablet by using the
commonly-used pattern of grabbing a std::shared_ptr<Tablet> and
operating on it within an appropriate scope.
Previously, I saw one failure in multiple thousands of dist-test runs of
an upcoming disk failure test case in exactly_once_writes-itest caused
by this:
*** Aborted at 1511907337 (unix time) try "date -d @1511907337" if you are using GNU date ***
PC: @ 0x4a327b __tsan_atomic32_compare_exchange_strong
*** SIGSEGV (@0x240) received by PID 11526 (TID 0x7f5a1a6f8700) from PID 576; stack trace: ***
@ 0x44c8e2 __tsan::CallUserSignalHandler() at
@ 0x44d84d rtl_sigaction() at
@ 0x7f5a3a1e9330 (unknown) at
@ 0x4a327b __tsan_atomic32_compare_exchange_strong at
@ 0x7f5a3e95ca74 base::SpinLock::Lock() at
@ 0x7f5a3e95ca19 kudu::simple_spinlock::lock() at
@ 0x7f5a3d42f055 kudu::tablet::Tablet::HasBeenStopped() at
@ 0x7f5a3d547015 kudu::tablet::TransactionDriver::Init() at
@ 0x7f5a3d520441 kudu::tablet::TabletReplica::NewLeaderTransactionDriver() at
@ 0x7f5a3d51fea9 kudu::tablet::TabletReplica::SubmitWrite() at
@ 0x7f5a3ea0edad kudu::tserver::TabletServiceImpl::Write() at
@ 0x7f5a38cea594 kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_3::operator()() at
@ 0x7f5a38cea4c9 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_3PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_ at
@ 0x7f5a38cea228 std::__1::__function::__func<>::operator()() at
@ 0x7f5a37d6b023 std::__1::function<>::operator()() at
@ 0x7f5a37d69fba kudu::rpc::GeneratedServiceIf::Handle() at
@ 0x7f5a37d6e9f1 kudu::rpc::ServicePool::RunThread() at
@ 0x7f5a37d734a4 boost::_mfi::mf0<>::operator()() at
@ 0x7f5a37d7335e boost::_bi::list1<>::operator()<>() at
@ 0x7f5a37d732a2 boost::_bi::bind_t<>::operator()() at
@ 0x7f5a37d72e91 boost::detail::function::void_function_obj_invoker0<>::invoke() at
@ 0x7f5a37ca0549 boost::function0<>::operator()() at
@ 0x7f5a360b225e kudu::Thread::SuperviseThread() at
@ 0x44c7bc __tsan_thread_start_func at
@ 0x7f5a3a1e1184 start_thread at
@ 0x7f5a32273ffd clone at
@ 0x0 (unknown)
Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
---
M src/kudu/tablet/transactions/transaction_driver.cc
1 file changed, 6 insertions(+), 2 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/8673/3
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
[kudu-CR] transactions: check tablet before checking stopped
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8673 )
Change subject: transactions: check tablet before checking stopped
......................................................................
transactions: check tablet before checking stopped
This patch cleans up an unsafe pointer access to Tablet by using the
commonly-used pattern of grabbing a std::shared_ptr<Tablet> and
operating on it within an appropriate scope.
Previously, I saw one failure in multiple thousands of dist-test runs of
an upcoming disk failure test case in exactly_once_writes-itest caused
by this:
*** Aborted at 1511907337 (unix time) try "date -d @1511907337" if you are using GNU date ***
PC: @ 0x4a327b __tsan_atomic32_compare_exchange_strong
*** SIGSEGV (@0x240) received by PID 11526 (TID 0x7f5a1a6f8700) from PID 576; stack trace: ***
@ 0x44c8e2 __tsan::CallUserSignalHandler() at
@ 0x44d84d rtl_sigaction() at
@ 0x7f5a3a1e9330 (unknown) at
@ 0x4a327b __tsan_atomic32_compare_exchange_strong at
@ 0x7f5a3e95ca74 base::SpinLock::Lock() at
@ 0x7f5a3e95ca19 kudu::simple_spinlock::lock() at
@ 0x7f5a3d42f055 kudu::tablet::Tablet::HasBeenStopped() at
@ 0x7f5a3d547015 kudu::tablet::TransactionDriver::Init() at
@ 0x7f5a3d520441 kudu::tablet::TabletReplica::NewLeaderTransactionDriver() at
@ 0x7f5a3d51fea9 kudu::tablet::TabletReplica::SubmitWrite() at
@ 0x7f5a3ea0edad kudu::tserver::TabletServiceImpl::Write() at
@ 0x7f5a38cea594 kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_3::operator()() at
@ 0x7f5a38cea4c9 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_3PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_ at
@ 0x7f5a38cea228 std::__1::__function::__func<>::operator()() at
@ 0x7f5a37d6b023 std::__1::function<>::operator()() at
@ 0x7f5a37d69fba kudu::rpc::GeneratedServiceIf::Handle() at
@ 0x7f5a37d6e9f1 kudu::rpc::ServicePool::RunThread() at
@ 0x7f5a37d734a4 boost::_mfi::mf0<>::operator()() at
@ 0x7f5a37d7335e boost::_bi::list1<>::operator()<>() at
@ 0x7f5a37d732a2 boost::_bi::bind_t<>::operator()() at
@ 0x7f5a37d72e91 boost::detail::function::void_function_obj_invoker0<>::invoke() at
@ 0x7f5a37ca0549 boost::function0<>::operator()() at
@ 0x7f5a360b225e kudu::Thread::SuperviseThread() at
@ 0x44c7bc __tsan_thread_start_func at
@ 0x7f5a3a1e1184 start_thread at
@ 0x7f5a32273ffd clone at
@ 0x0 (unknown)
Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Reviewed-on: http://gerrit.cloudera.org:8080/8673
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/tablet/transactions/transaction_driver.cc
1 file changed, 6 insertions(+), 2 deletions(-)
Approvals:
Kudu Jenkins: Verified
Mike Percy: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
[kudu-CR] transactions: check tablet before checking stopped
Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8673 )
Change subject: transactions: check tablet before checking stopped
......................................................................
Patch Set 2:
Hrm, nullptr Tablet in the transaction driver is a case used in tests that shouldn't return errors.
In prod it doesn't really matter because when it gets to the prepare phase, the Tablet still _must_ exist in order to run (if the Tablet has been destroyed, the prepare pool token will also have been shut down).
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 06:09:21 +0000
Gerrit-HasComments: No
[kudu-CR] transactions: check tablet before checking stopped
Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8673
to look at the new patch set (#2).
Change subject: transactions: check tablet before checking stopped
......................................................................
transactions: check tablet before checking stopped
This patch cleans up an unsafe pointer access to Tablet by using the
commonly-used pattern of grabbing a std::shared_ptr<Tablet> and
operating on it within an appropriate scope.
Previously, I saw one failure in multiple thousands of dist-test runs of
an upcoming disk failure test case in exactly_once_writes-itest caused
by this.
Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
---
M src/kudu/tablet/transactions/transaction_driver.cc
1 file changed, 6 insertions(+), 2 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/8673/2
--
To view, visit http://gerrit.cloudera.org:8080/8673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4821554e76c987fb8548384462b960f5c3b2c75f
Gerrit-Change-Number: 8673
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>