You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Will Berkeley (JIRA)" <ji...@apache.org> on 2019/01/18 20:57:00 UTC
[jira] [Updated] (KUDU-2664) Tablet server crashed when running
kudu remote_replica unsafe_change
[ https://issues.apache.org/jira/browse/KUDU-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Berkeley updated KUDU-2664:
--------------------------------
Description:
While trying to reproduce a different issue, I ran the following command
{noformat}
for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 3ccbce6a3116487cbcc79ab4280a2ee5 6ca21fa7dcf54761a5ec7017ff101a68 454b53ed77bd458a81a7710c892f214b; done
{noformat}
and encountered the following tablet server crash
{noformat}
F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b
*** Check failure stack trace: ***
@ 0x10c91247f google::LogMessageFatal::~LogMessageFatal()
@ 0x10c90f259 google::LogMessageFatal::~LogMessageFatal()
@ 0x108b74c05 kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked()
@ 0x108b6c180 kudu::consensus::RaftConsensus::UpdateReplica()
@ 0x108b6b459 kudu::consensus::RaftConsensus::Update()
@ 0x107cf5106 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
@ 0x10b53b87d kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()()
@ 0x10b53b819 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_
@ 0x10b53b6a9 std::__1::__function::__func<>::operator()()
@ 0x10b843e07 std::__1::function<>::operator()()
@ 0x10b843a1a kudu::rpc::GeneratedServiceIf::Handle()
@ 0x10b846cb6 kudu::rpc::ServicePool::RunThread()
@ 0x10b849aa9 boost::_mfi::mf0<>::operator()()
@ 0x10b849a10 boost::_bi::list1<>::operator()<>()
@ 0x10b8499ba boost::_bi::bind_t<>::operator()()
@ 0x10b84979d boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x10b7bb1fa boost::function0<>::operator()()
@ 0x10c2cc2f5 kudu::Thread::SuperviseThread()
@ 0x7fff5dc09305 _pthread_body
@ 0x7fff5dc0c26f _pthread_start
@ 0x7fff5dc08415 thread_start
{noformat}
The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at address 127.0.0.1:7250, and I was trying to kick out one of the three replicas while fishing for a repro of the other issue.
I couldn't get the crash to happen again and I wasn't able to capture a minidump or core dump...and I accidentally deleted the logs, so I'm afraid the above is all there is to go on.
It's expected that funny stuff could happen when using unsafe_change_config-- it's unsafe. But it shouldn't be possible to crash the tablet server with it.
was:
While trying to reproduce a different issue, I ran the following command
{noformat}
for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 3ccbce6a3116487cbcc79ab4280a2ee5
{noformat}
and encountered the following tablet server crash
{noformat}
F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b
*** Check failure stack trace: ***
@ 0x10c91247f google::LogMessageFatal::~LogMessageFatal()
@ 0x10c90f259 google::LogMessageFatal::~LogMessageFatal()
@ 0x108b74c05 kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked()
@ 0x108b6c180 kudu::consensus::RaftConsensus::UpdateReplica()
@ 0x108b6b459 kudu::consensus::RaftConsensus::Update()
@ 0x107cf5106 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
@ 0x10b53b87d kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()()
@ 0x10b53b819 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_
@ 0x10b53b6a9 std::__1::__function::__func<>::operator()()
@ 0x10b843e07 std::__1::function<>::operator()()
@ 0x10b843a1a kudu::rpc::GeneratedServiceIf::Handle()
@ 0x10b846cb6 kudu::rpc::ServicePool::RunThread()
@ 0x10b849aa9 boost::_mfi::mf0<>::operator()()
@ 0x10b849a10 boost::_bi::list1<>::operator()<>()
@ 0x10b8499ba boost::_bi::bind_t<>::operator()()
@ 0x10b84979d boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x10b7bb1fa boost::function0<>::operator()()
@ 0x10c2cc2f5 kudu::Thread::SuperviseThread()
@ 0x7fff5dc09305 _pthread_body
@ 0x7fff5dc0c26f _pthread_start
@ 0x7fff5dc08415 thread_start
{noformat}
The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at address 127.0.0.1:7250, and I was trying to kick out one of the three replicas while fishing for a repro of the other issue.
I couldn't get the crash to happen again and I wasn't able to capture a minidump or core dump...and I accidentally deleted the logs, so I'm afraid the above is all there is to go on.
It's expected that funny stuff could happen when using unsafe_change_config-- it's unsafe. But it shouldn't be possible to crash the tablet server with it.
> Tablet server crashed when running kudu remote_replica unsafe_change
> --------------------------------------------------------------------
>
> Key: KUDU-2664
> URL: https://issues.apache.org/jira/browse/KUDU-2664
> Project: Kudu
> Issue Type: Bug
> Affects Versions: 1.8.0
> Reporter: Will Berkeley
> Priority: Major
>
> While trying to reproduce a different issue, I ran the following command
> {noformat}
> for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 3ccbce6a3116487cbcc79ab4280a2ee5 6ca21fa7dcf54761a5ec7017ff101a68 454b53ed77bd458a81a7710c892f214b; done
> {noformat}
> and encountered the following tablet server crash
> {noformat}
> F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b
> *** Check failure stack trace: ***
> @ 0x10c91247f google::LogMessageFatal::~LogMessageFatal()
> @ 0x10c90f259 google::LogMessageFatal::~LogMessageFatal()
> @ 0x108b74c05 kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked()
> @ 0x108b6c180 kudu::consensus::RaftConsensus::UpdateReplica()
> @ 0x108b6b459 kudu::consensus::RaftConsensus::Update()
> @ 0x107cf5106 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
> @ 0x10b53b87d kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()()
> @ 0x10b53b819 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_
> @ 0x10b53b6a9 std::__1::__function::__func<>::operator()()
> @ 0x10b843e07 std::__1::function<>::operator()()
> @ 0x10b843a1a kudu::rpc::GeneratedServiceIf::Handle()
> @ 0x10b846cb6 kudu::rpc::ServicePool::RunThread()
> @ 0x10b849aa9 boost::_mfi::mf0<>::operator()()
> @ 0x10b849a10 boost::_bi::list1<>::operator()<>()
> @ 0x10b8499ba boost::_bi::bind_t<>::operator()()
> @ 0x10b84979d boost::detail::function::void_function_obj_invoker0<>::invoke()
> @ 0x10b7bb1fa boost::function0<>::operator()()
> @ 0x10c2cc2f5 kudu::Thread::SuperviseThread()
> @ 0x7fff5dc09305 _pthread_body
> @ 0x7fff5dc0c26f _pthread_start
> @ 0x7fff5dc08415 thread_start
> {noformat}
> The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at address 127.0.0.1:7250, and I was trying to kick out one of the three replicas while fishing for a repro of the other issue.
> I couldn't get the crash to happen again and I wasn't able to capture a minidump or core dump...and I accidentally deleted the logs, so I'm afraid the above is all there is to go on.
> It's expected that funny stuff could happen when using unsafe_change_config-- it's unsafe. But it shouldn't be possible to crash the tablet server with it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)