You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "James Peach (JIRA)" <ji...@apache.org> on 2015/11/05 05:36:27 UTC

[jira] [Created] (MESOS-3834) slave upgrade framework checkpoint incompatibility

James Peach created MESOS-3834:
----------------------------------

             Summary: slave upgrade framework checkpoint incompatibility 
                 Key: MESOS-3834
                 URL: https://issues.apache.org/jira/browse/MESOS-3834
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 0.24.1
            Reporter: James Peach
            Assignee: James Peach


We are upgrading from 0.22 to 0.25 and experienced the following crash in the 0.24 slave:

{code}
F1104 05:20:49.162701  1153 slave.cpp:4175] Check failed: frameworkInfo.has_id()
*** Check failure stack trace: ***
    @     0x7fef9c294650  google::LogMessage::Fail()
    @     0x7fef9c29459f  google::LogMessage::SendToLog()
    @     0x7fef9c293fb0  google::LogMessage::Flush()
    @     0x7fef9c296ce4  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fef9b9a5492  mesos::internal::slave::Slave::recoverFramework()
    @     0x7fef9b9a3314  mesos::internal::slave::Slave::recover()
    @     0x7fef9b9d069c  _ZZN7process8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS4_5state5StateEES9_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSG_FSE_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESP_
    @     0x7fef9ba039f4  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS8_5state5StateEESD_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSK_FSI_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
{code}

As near as I can tell, what happened was this:

- 0.22 wrote {{framework.info}} without the FrameworkID
- 0.23 had a compatibility check so it was ok with it
- 0.24 removed the compatibility check in MESOS-2259
- the framework checkpoint doesn't get rewritten during recovery so when the 0.24 slave starts it reads the 0.22 version
- 0.24 asserts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)