You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Michael Park (JIRA)" <ji...@apache.org> on 2016/02/27 01:38:18 UTC
[jira] [Updated] (MESOS-3834) slave upgrade framework checkpoint
incompatibility
[ https://issues.apache.org/jira/browse/MESOS-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Park updated MESOS-3834:
--------------------------------
Fix Version/s: 0.24.2
0.25.1
0.26.1
> slave upgrade framework checkpoint incompatibility
> ---------------------------------------------------
>
> Key: MESOS-3834
> URL: https://issues.apache.org/jira/browse/MESOS-3834
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.24.1
> Reporter: James Peach
> Assignee: James Peach
> Fix For: 0.27.0, 0.26.1, 0.25.1, 0.24.2
>
>
> We are upgrading from 0.22 to 0.25 and experienced the following crash in the 0.24 slave:
> {code}
> F1104 05:20:49.162701 1153 slave.cpp:4175] Check failed: frameworkInfo.has_id()
> *** Check failure stack trace: ***
> @ 0x7fef9c294650 google::LogMessage::Fail()
> @ 0x7fef9c29459f google::LogMessage::SendToLog()
> @ 0x7fef9c293fb0 google::LogMessage::Flush()
> @ 0x7fef9c296ce4 google::LogMessageFatal::~LogMessageFatal()
> @ 0x7fef9b9a5492 mesos::internal::slave::Slave::recoverFramework()
> @ 0x7fef9b9a3314 mesos::internal::slave::Slave::recover()
> @ 0x7fef9b9d069c _ZZN7process8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS4_5state5StateEES9_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSG_FSE_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESP_
> @ 0x7fef9ba039f4 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS8_5state5StateEESD_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSK_FSI_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> {code}
> As near as I can tell, what happened was this:
> - 0.22 wrote {{framework.info}} without the FrameworkID
> - 0.23 had a compatibility check so it was ok with it
> - 0.24 removed the compatibility check in MESOS-2259
> - the framework checkpoint doesn't get rewritten during recovery so when the 0.24 slave starts it reads the 0.22 version
> - 0.24 asserts
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)