You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (Jira)" <ji...@apache.org> on 2019/11/18 20:44:00 UTC

[jira] [Updated] (KUDU-2998) RebalancingDuringElectionStormTest.RoundRobin sometimes crashes

     [ https://issues.apache.org/jira/browse/KUDU-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Serbin updated KUDU-2998:
--------------------------------
    Description: 
I saw the {{RebalancingDuringElectionStormTest.RoundRobin}} tests crashed in DEBUG configuration with the following error:

{noformat}
F1116 06:53:57.325479 11078 quorum_util.cc:167] Check failed: RaftPeerPB::NON_PARTICIPANT != GetConsensusRole(peer_uuid, cstate) (3 vs. 3) Peer fe4321fd981c466d86cd1fe2949868dc << not a participant in current_term: 25 leader_uuid: "422db10def4d4c95a5a5bfd2cb787aa2" committed_config { opid_index: 77 OBSOLETE_local: false peers { permanent_uuid: "f6d99d2a6f5542428e5e797972a0f53e" member_type: VOTER last_known_addr { host: "127.25.232.67" port: 41397 } } peers { permanent_uuid: "4084fddb6afb4aed80b27fc4bee3de1f" member_type: VOTER last_known_addr { host: "127.25.232.68" port: 39941 } } peers { permanent_uuid: "fe4321fd981c466d86cd1fe2949868dc" member_type: VOTER last_known_addr { host: "127.25.232.65" port: 40533 } attrs { replace: true } } peers { permanent_uuid: "422db10def4d4c95a5a5bfd2cb787aa2" member_type: VOTER last_known_addr { host: "127.25.232.70" port: 35983 } attrs { promote: false } } } pending_config { opid_index: 80 OBSOLETE_local: false peers { permanent_uuid: "f6d99d2a6f5542428e5e797972a0f53e" member_type: VOTER last_known_addr { host: "127.25.232.67" port: 41397 } } peers { permanent_uuid: "4084fddb6afb4aed80b27fc4bee3de1f" member_type: VOTER last_known_addr { host: "127.25.232.68" port: 39941 } } peers { permanent_uuid: "422db10def4d4c95a5a5bfd2cb787aa2" member_type: VOTER last_known_addr { host: "127.25.232.70" port: 35983 } attrs { promote: false } } }
{noformat}


The stack trace looked like the following:
{noformat}
    @     0x7f6598afa62d  google::LogMessage::Fail() at ??:0
    @     0x7f6598afc64c  google::LogMessage::SendToLog() at ??:0
    @     0x7f6598afa189  google::LogMessage::Flush() at ??:0
    @     0x7f6598afcfdf  google::LogMessageFatal::~LogMessageFatal() at ??:0
    @     0x7f6599a2c12c  kudu::consensus::GetParticipantRole() at ??:0
    @     0x7f659a549cae  kudu::master::CatalogManager::BuildLocationsForTablet() at ??:0
    @     0x7f6596765d8b  _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6master15MasterServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E18_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ at ??:0
    @     0x7f659a54a37b  kudu::master::CatalogManager::GetTabletLocations() at ??:0
    @     0x7f659a5de5ea  kudu::master::MasterServiceImpl::GetTabletLocations() at ??:0
    @     0x7f659675e742  _ZZN4kudu6master15MasterServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_ at ??:0
    @     0x7f6596764a9f  _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6master15MasterServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ at ??:0
    @     0x7f659472cd16  std::function<>::operator()() at ??:0
    @     0x7f659472c547  kudu::rpc::GeneratedServiceIf::Handle() at ??:0
    @     0x7f659472f02e  kudu::rpc::ServicePool::RunThread() at ??:0
    @     0x7f65947303fd  boost::_mfi::mf0<>::operator()() at ??:0
    @     0x7f6594730224  boost::_bi::list1<>::operator()<>() at ??:0
    @     0x7f659473010b  boost::_bi::bind_t<>::operator()() at ??:0
    @     0x7f659473003a  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
    @     0x7f6599599842  boost::function0<>::operator()() at ??:0
    @     0x7f65995965cb  kudu::Thread::SuperviseThread() at ??:0
    @     0x7f6595ca2184  start_thread at ??:0
    @     0x7f6598104ffd  clone at ??:0
{noformat}

The full log is attached.

  was:
I saw the {{RebalancingDuringElectionStormTest.RoundRobin}} tests crashed in DEBUG configuration with the following error:

noformat}
F1116 06:53:57.325479 11078 quorum_util.cc:167] Check failed: RaftPeerPB::NON_PARTICIPANT != GetConsensusRole(peer_uuid, cstate) (3 vs. 3) Peer fe4321fd981c466d86cd1fe2949868dc << not a participant in current_term: 25 leader_uuid: "422db10def4d4c95a5a5bfd2cb787aa2" committed_config { opid_index: 77 OBSOLETE_local: false peers { permanent_uuid: "f6d99d2a6f5542428e5e797972a0f53e" member_type: VOTER last_known_addr { host: "127.25.232.67" port: 41397 } } peers { permanent_uuid: "4084fddb6afb4aed80b27fc4bee3de1f" member_type: VOTER last_known_addr { host: "127.25.232.68" port: 39941 } } peers { permanent_uuid: "fe4321fd981c466d86cd1fe2949868dc" member_type: VOTER last_known_addr { host: "127.25.232.65" port: 40533 } attrs { replace: true } } peers { permanent_uuid: "422db10def4d4c95a5a5bfd2cb787aa2" member_type: VOTER last_known_addr { host: "127.25.232.70" port: 35983 } attrs { promote: false } } } pending_config { opid_index: 80 OBSOLETE_local: false peers { permanent_uuid: "f6d99d2a6f5542428e5e797972a0f53e" member_type: VOTER last_known_addr { host: "127.25.232.67" port: 41397 } } peers { permanent_uuid: "4084fddb6afb4aed80b27fc4bee3de1f" member_type: VOTER last_known_addr { host: "127.25.232.68" port: 39941 } } peers { permanent_uuid: "422db10def4d4c95a5a5bfd2cb787aa2" member_type: VOTER last_known_addr { host: "127.25.232.70" port: 35983 } attrs { promote: false } } }
{noformat}


The stack trace looked like the following:
{noformat}
    @     0x7f6598afa62d  google::LogMessage::Fail() at ??:0
    @     0x7f6598afc64c  google::LogMessage::SendToLog() at ??:0
    @     0x7f6598afa189  google::LogMessage::Flush() at ??:0
    @     0x7f6598afcfdf  google::LogMessageFatal::~LogMessageFatal() at ??:0
    @     0x7f6599a2c12c  kudu::consensus::GetParticipantRole() at ??:0
    @     0x7f659a549cae  kudu::master::CatalogManager::BuildLocationsForTablet() at ??:0
    @     0x7f6596765d8b  _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6master15MasterServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E18_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ at ??:0
    @     0x7f659a54a37b  kudu::master::CatalogManager::GetTabletLocations() at ??:0
    @     0x7f659a5de5ea  kudu::master::MasterServiceImpl::GetTabletLocations() at ??:0
    @     0x7f659675e742  _ZZN4kudu6master15MasterServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_ at ??:0
    @     0x7f6596764a9f  _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6master15MasterServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ at ??:0
    @     0x7f659472cd16  std::function<>::operator()() at ??:0
    @     0x7f659472c547  kudu::rpc::GeneratedServiceIf::Handle() at ??:0
    @     0x7f659472f02e  kudu::rpc::ServicePool::RunThread() at ??:0
    @     0x7f65947303fd  boost::_mfi::mf0<>::operator()() at ??:0
    @     0x7f6594730224  boost::_bi::list1<>::operator()<>() at ??:0
    @     0x7f659473010b  boost::_bi::bind_t<>::operator()() at ??:0
    @     0x7f659473003a  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
    @     0x7f6599599842  boost::function0<>::operator()() at ??:0
    @     0x7f65995965cb  kudu::Thread::SuperviseThread() at ??:0
    @     0x7f6595ca2184  start_thread at ??:0
    @     0x7f6598104ffd  clone at ??:0
{noformat}

The full log is attached.


> RebalancingDuringElectionStormTest.RoundRobin sometimes crashes
> ---------------------------------------------------------------
>
>                 Key: KUDU-2998
>                 URL: https://issues.apache.org/jira/browse/KUDU-2998
>             Project: Kudu
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.10.0, 1.10.1
>            Reporter: Alexey Serbin
>            Priority: Major
>         Attachments: rebalancer_tool-test.6.txt.xz
>
>
> I saw the {{RebalancingDuringElectionStormTest.RoundRobin}} tests crashed in DEBUG configuration with the following error:
> {noformat}
> F1116 06:53:57.325479 11078 quorum_util.cc:167] Check failed: RaftPeerPB::NON_PARTICIPANT != GetConsensusRole(peer_uuid, cstate) (3 vs. 3) Peer fe4321fd981c466d86cd1fe2949868dc << not a participant in current_term: 25 leader_uuid: "422db10def4d4c95a5a5bfd2cb787aa2" committed_config { opid_index: 77 OBSOLETE_local: false peers { permanent_uuid: "f6d99d2a6f5542428e5e797972a0f53e" member_type: VOTER last_known_addr { host: "127.25.232.67" port: 41397 } } peers { permanent_uuid: "4084fddb6afb4aed80b27fc4bee3de1f" member_type: VOTER last_known_addr { host: "127.25.232.68" port: 39941 } } peers { permanent_uuid: "fe4321fd981c466d86cd1fe2949868dc" member_type: VOTER last_known_addr { host: "127.25.232.65" port: 40533 } attrs { replace: true } } peers { permanent_uuid: "422db10def4d4c95a5a5bfd2cb787aa2" member_type: VOTER last_known_addr { host: "127.25.232.70" port: 35983 } attrs { promote: false } } } pending_config { opid_index: 80 OBSOLETE_local: false peers { permanent_uuid: "f6d99d2a6f5542428e5e797972a0f53e" member_type: VOTER last_known_addr { host: "127.25.232.67" port: 41397 } } peers { permanent_uuid: "4084fddb6afb4aed80b27fc4bee3de1f" member_type: VOTER last_known_addr { host: "127.25.232.68" port: 39941 } } peers { permanent_uuid: "422db10def4d4c95a5a5bfd2cb787aa2" member_type: VOTER last_known_addr { host: "127.25.232.70" port: 35983 } attrs { promote: false } } }
> {noformat}
> The stack trace looked like the following:
> {noformat}
>     @     0x7f6598afa62d  google::LogMessage::Fail() at ??:0
>     @     0x7f6598afc64c  google::LogMessage::SendToLog() at ??:0
>     @     0x7f6598afa189  google::LogMessage::Flush() at ??:0
>     @     0x7f6598afcfdf  google::LogMessageFatal::~LogMessageFatal() at ??:0
>     @     0x7f6599a2c12c  kudu::consensus::GetParticipantRole() at ??:0
>     @     0x7f659a549cae  kudu::master::CatalogManager::BuildLocationsForTablet() at ??:0
>     @     0x7f6596765d8b  _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6master15MasterServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E18_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ at ??:0
>     @     0x7f659a54a37b  kudu::master::CatalogManager::GetTabletLocations() at ??:0
>     @     0x7f659a5de5ea  kudu::master::MasterServiceImpl::GetTabletLocations() at ??:0
>     @     0x7f659675e742  _ZZN4kudu6master15MasterServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_ at ??:0
>     @     0x7f6596764a9f  _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6master15MasterServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ at ??:0
>     @     0x7f659472cd16  std::function<>::operator()() at ??:0
>     @     0x7f659472c547  kudu::rpc::GeneratedServiceIf::Handle() at ??:0
>     @     0x7f659472f02e  kudu::rpc::ServicePool::RunThread() at ??:0
>     @     0x7f65947303fd  boost::_mfi::mf0<>::operator()() at ??:0
>     @     0x7f6594730224  boost::_bi::list1<>::operator()<>() at ??:0
>     @     0x7f659473010b  boost::_bi::bind_t<>::operator()() at ??:0
>     @     0x7f659473003a  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
>     @     0x7f6599599842  boost::function0<>::operator()() at ??:0
>     @     0x7f65995965cb  kudu::Thread::SuperviseThread() at ??:0
>     @     0x7f6595ca2184  start_thread at ??:0
>     @     0x7f6598104ffd  clone at ??:0
> {noformat}
> The full log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)