You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Michael Park (JIRA)" <ji...@apache.org> on 2016/02/27 01:38:18 UTC

[jira] [Updated] (MESOS-3834) slave upgrade framework checkpoint incompatibility

     [ https://issues.apache.org/jira/browse/MESOS-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Park updated MESOS-3834:
--------------------------------
    Fix Version/s: 0.24.2
                   0.25.1
                   0.26.1

> slave upgrade framework checkpoint incompatibility 
> ---------------------------------------------------
>
>                 Key: MESOS-3834
>                 URL: https://issues.apache.org/jira/browse/MESOS-3834
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.24.1
>            Reporter: James Peach
>            Assignee: James Peach
>             Fix For: 0.27.0, 0.26.1, 0.25.1, 0.24.2
>
>
> We are upgrading from 0.22 to 0.25 and experienced the following crash in the 0.24 slave:
> {code}
> F1104 05:20:49.162701  1153 slave.cpp:4175] Check failed: frameworkInfo.has_id()
> *** Check failure stack trace: ***
>     @     0x7fef9c294650  google::LogMessage::Fail()
>     @     0x7fef9c29459f  google::LogMessage::SendToLog()
>     @     0x7fef9c293fb0  google::LogMessage::Flush()
>     @     0x7fef9c296ce4  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7fef9b9a5492  mesos::internal::slave::Slave::recoverFramework()
>     @     0x7fef9b9a3314  mesos::internal::slave::Slave::recover()
>     @     0x7fef9b9d069c  _ZZN7process8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS4_5state5StateEES9_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSG_FSE_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESP_
>     @     0x7fef9ba039f4  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS8_5state5StateEESD_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSK_FSI_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> {code}
> As near as I can tell, what happened was this:
> - 0.22 wrote {{framework.info}} without the FrameworkID
> - 0.23 had a compatibility check so it was ok with it
> - 0.24 removed the compatibility check in MESOS-2259
> - the framework checkpoint doesn't get rewritten during recovery so when the 0.24 slave starts it reads the 0.22 version
> - 0.24 asserts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)