You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Andrew Wong (Jira)" <ji...@apache.org> on 2019/09/24 01:36:00 UTC

[jira] [Created] (KUDU-2952) TServers reporting replica stats may race with leadership change, hitting a DCHECK

Andrew Wong created KUDU-2952:
---------------------------------

             Summary: TServers reporting replica stats may race with leadership change, hitting a DCHECK
                 Key: KUDU-2952
                 URL: https://issues.apache.org/jira/browse/KUDU-2952
             Project: Kudu
          Issue Type: Bug
          Components: consensus, tserver
            Reporter: Andrew Wong
            Assignee: Andrew Wong


I have a precommit that failed with:
{code:java}
F0924 00:08:46.821594  9670 catalog_manager.cc:4239] Check failed: ts_desc->permanent_uuid() == report.consensus_state().leader_uuid() 
*** Check failure stack trace: ***
    @     0x7f5e442ea62d  google::LogMessage::Fail() at ??:0
    @     0x7f5e442ec64c  google::LogMessage::SendToLog() at ??:0
    @     0x7f5e442ea189  google::LogMessage::Flush() at ??:0
    @     0x7f5e442ecfdf  google::LogMessageFatal::~LogMessageFatal() at ??:0
    @     0x7f5e45d89a01  kudu::master::CatalogManager::ProcessTabletReport() at ??:0
    @     0x7f5e45e29ae7  kudu::master::MasterServiceImpl::TSHeartbeat() at ??:0
    @     0x7f5e41f29cbc  _ZZN4kudu6master15MasterServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE0_clESG_SH_SJ_ at ??:0
    @     0x7f5e41f3068b  _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6master15MasterServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E0_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ at ??:0
    @     0x7f5e3fea909e  std::function<>::operator()() at ??:0
    @     0x7f5e3fea88cf  kudu::rpc::GeneratedServiceIf::Handle() at ??:0
    @     0x7f5e3feab3b6  kudu::rpc::ServicePool::RunThread() at ??:0
    @     0x7f5e3feac785  boost::_mfi::mf0<>::operator()() at ??:0
    @     0x7f5e3feac5ac  boost::_bi::list1<>::operator()<>() at ??:0
    @     0x7f5e3feac493  boost::_bi::bind_t<>::operator()() at ??:0
    @     0x7f5e3feac3c2  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
    @     0x7f5e44db28d2  boost::function0<>::operator()() at ??:0
    @     0x7f5e44daf65b  kudu::Thread::SuperviseThread() at ??:0
    @     0x7f5e41429184  start_thread at ??:0
    @     0x7f5e438f4ffd  clone at ??:0 

{code}
Looking through the code, it looks like there's a kind of TOCTOU race going on when generating reports.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)