You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2014/05/16 13:01:33 UTC

[jira] [Updated] (MESOS-1376) CHECK failure in the Registrar

     [ https://issues.apache.org/jira/browse/MESOS-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kone updated MESOS-1376:
------------------------------

    Sprint: Q2'14 Sprint 2

> CHECK failure in the Registrar
> ------------------------------
>
>                 Key: MESOS-1376
>                 URL: https://issues.apache.org/jira/browse/MESOS-1376
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.19.0
>            Reporter: Benjamin Mahler
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> {noformat}
> I0515 05:44:37.049137  7179 master.cpp:2301] Ignoring re-register slave message from slave 20140416-015639-1890854154-5050-1354-24152 at slave(1)@10.34.119.132:5051 (smf1-aep-35-sr1.prod.twitter.com) as readmission is already in progress
> E0515 05:44:37.271734  7168 registrar.cpp:500] Registrar aborting: Failed to update 'registry': Failed to perform store within 5secs
> F0515 05:44:37.271728  7170 master.cpp:2341] Failed to readmit slave 20140416-015639-1890854154-5050-1354-24133 at slave(1)@10.34.119.131:5051 (smf1-aep-31-sr4.prod.twitter.com): Failed to update 'registry': Failed to perform store within 5secs
> *** Check failure stack trace: ***
> F0515 05:44:37.272384 7168 owned.hpp:103] Check failed: data->t != NULL This owned pointer has already been shared
> *** Check failure stack trace: ***
>     @     0x7f687d06e2ad  google::LogMessage::Fail()
>     @     0x7f687d06e2ad  google::LogMessage::Fail()
>     @     0x7f687d0700f4  google::LogMessage::SendToLog()
>     @     0x7f687d0700f4  google::LogMessage::SendToLog()
>     @     0x7f687d06de9c  google::LogMessage::Flush()
>     @     0x7f687d06de9c  google::LogMessage::Flush()
>     @     0x7f687d0709e9  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f687d0709e9  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f687cc46182  process::Owned<>::get()
>     @     0x7f687cbdaa41  mesos::internal::master::Master::_reregisterSlave()
>     @     0x7f687cc46209  process::Owned<>::operator->()
>     @     0x7f687cbe987a  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERKSt6vectorINS5_12ExecutorInfoESaISG_EERKSF_INS6_4TaskESaISL_EERKSF_INS6_17Archive_FrameworkESaISQ_EERKNS0_6FutureIbEES9_SC_SI_SN_SS_SW_EEvRKNS0_3PIDIT_EEMS10_FvT0_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_T11_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>     @     0x7f687cc39e05  mesos::internal::master::fail()
>     @     0x7f687cfa3c72  process::ProcessManager::resume()
>     @     0x7f687cc39f97  mesos::internal::master::RegistrarProcess::abort()
>     @     0x7f687cc3d77f  mesos::internal::master::RegistrarProcess::_update()
>     @     0x7f687cfa3f6c  process::schedule()
>     @     0x7f687c47883d  start_thread
>     @     0x7f687cc47b27  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master16RegistrarProcessERKNS0_6FutureI6OptionINS6_5state8protobuf8VariableINS6_8RegistryEEEEEESt5dequeINS0_5OwnedINS7_9OperationEEESaISN_EESH_SP_EEvRKNS0_3PIDIT_EEMSR_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>     @     0x7f687b1e026d  clone
> {noformat}
> [~jieyu] pointed out the following problematic code:
> {code}
> // Helper for failing a deque of operations.
> void fail(deque<Owned<Operation> >* operations, const string& message)
> {
>   while (!operations->empty()) {
>     const Owned<Operation>& operation = operations->front(); // This reference becomes invalid!
>     operations->pop_front();
>     operation->fail(message);
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)