You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Michael Park (JIRA)" <ji...@apache.org> on 2017/05/02 22:58:05 UTC

[jira] [Commented] (MESOS-7389) Mesos 1.2.0 crashes with pre-1.0 Mesos agents

    [ https://issues.apache.org/jira/browse/MESOS-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993958#comment-15993958 ] 

Michael Park commented on MESOS-7389:
-------------------------------------

[~neilc]: Pushing this off to target 1.4.0 and 1.3.1.

> Mesos 1.2.0 crashes with pre-1.0 Mesos agents
> ---------------------------------------------
>
>                 Key: MESOS-7389
>                 URL: https://issues.apache.org/jira/browse/MESOS-7389
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>         Environment: Ubuntu 14.04 
>            Reporter: Nicholas Studt
>            Assignee: Benjamin Mahler
>            Priority: Critical
>              Labels: mesosphere
>
> During upgrade from 1.0.1 to 1.2.0 a single mesos-slave reregistering with the running leader caused the leader to terminate. All 3 of the masters suffered the same failure as the same slave node reregistered against the new leader, this continued across the entire cluster until the offending slave node was removed and fixed. The fix to the slave node was to remove the mesos directory and then start the slave node back up. 
>  F0412 17:24:42.736600  6317 master.cpp:5701] Check failed: frameworks_.contains(task.framework_id())
>  *** Check failure stack trace: ***
>      @     0x7f59f944f94d  google::LogMessage::Fail()
>      @     0x7f59f945177d  google::LogMessage::SendToLog()
>      @     0x7f59f944f53c  google::LogMessage::Flush()
>      @     0x7f59f9452079  google::LogMessageFatal::~LogMessageFatal()
>  I0412 17:24:42.750300  6316 replica.cpp:693] Replica received learned notice for position 6896 from @0.0.0.0:0 
>      @     0x7f59f88f2341  mesos::internal::master::Master::_reregisterSlave()
>      @     0x7f59f88f488f  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERKSt6vectorINS5_8ResourceESaISG_EERKSF_INS5_12ExecutorInfoESaISL_EERKSF_INS5_4TaskESaISQ_EERKSF_INS5_13FrameworkInfoESaISV_EERKSF_INS6_17Archive_FrameworkESaIS10_EERKSsRKSF_INS5_20SlaveInfo_CapabilityESaIS17_EERKNS0_6FutureIbEES9_SC_SI_SN_SS_SX_S12_SsS19_S1D_EEvRKNS0_3PIDIT_EEMS1H_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_ET10_T11_T12_T13_T14_T15_T16_T17_T18_T19_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>      @     0x7f59f93c3eb1  process::ProcessManager::resume()
>      @     0x7f59f93ccd57  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
>      @     0x7f59f77cfa60  (unknown)
>      @     0x7f59f6fec184  start_thread
>      @     0x7f59f6d19bed  (unknown)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)