You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Andreas Peters (Jira)" <ji...@apache.org> on 2021/10/04 10:08:00 UTC
[jira] [Commented] (MESOS-10231) Mesos master crashes during
framework teardown
[ https://issues.apache.org/jira/browse/MESOS-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423868#comment-17423868 ]
Andreas Peters commented on MESOS-10231:
----------------------------------------
Can u show us the configuration how u start spark? Would be helpfull to try it out byself.
> Mesos master crashes during framework teardown
> ----------------------------------------------
>
> Key: MESOS-10231
> URL: https://issues.apache.org/jira/browse/MESOS-10231
> Project: Mesos
> Issue Type: Bug
> Components: framework, master
> Affects Versions: 1.9.0
> Environment: CentOS Linux release 7.9.2009
> Mesos version - 1.9.0
> Reporter: Divyansh Jamuaar
> Priority: Major
>
> I have setup a Mesos cluster with a single Mesos Master and I submit spark jobs to it in "cluster" mode.
> After running few spark jobs correctly, the Mesos master crashes while trying to shutdown one of the Spark frameworks with the following error -
>
> {code:java}
> F0928 14:34:57.678421 2093314 framework.cpp:671] Check failed: totalOfferedResources.filter(allocatedToRole).empty()
> *** Check failure stack trace: ***
> @ 0x7f1e024ded2e google::LogMessage::Fail()
> @ 0x7f1e024dec8d google::LogMessage::SendToLog()
> @ 0x7f1e024de637 google::LogMessage::Flush()
> @ 0x7f1e024e191c google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f1dff93978d mesos::internal::master::Framework::untrackUnderRole()
> @ 0x7f1dffad004b mesos::internal::master::Master::removeFramework()
> @ 0x7f1dfface859 mesos::internal::master::Master::teardown()
> @ 0x7f1dffa8ba25 mesos::internal::master::Master::receive()
> @ 0x7f1dffb2f1cf ProtobufProcess<>::handlerMutM<>()
> @ 0x7f1dffbe6809 std::__invoke_impl<>()
> @ 0x7f1dffbdae22 std::__invoke<>()
> @ 0x7f1dffbc8079 _ZNSt5_BindIFPFvPN5mesos8internal6master6MasterEMS3_FvRKN7process4UPIDEONS0_9scheduler4CallEES8_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEES4_SD_St12_PlaceholderILi1EESO_ILi2EEEE6__callIvJS8_SL_EJLm0ELm1ELm2ELm3EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @ 0x7f1dffbaaae5 std::_Bind<>::operator()<>()
> @ 0x7f1dffb833c9 std::_Function_handler<>::_M_invoke()
> @ 0x7f1dff330281 std::function<>::operator()()
> @ 0x7f1dffb13329 ProtobufProcess<>::consume()
> @ 0x7f1dffa85436 mesos::internal::master::Master::_consume()
> @ 0x7f1dffa84ad5 mesos::internal::master::Master::consume()
> @ 0x7f1dffafb9ae _ZNO7process12MessageEvent7consumeEPNS_13EventConsumerE
> @ 0x564c359f7002 process::ProcessBase::serve()
> @ 0x7f1e023a7bbd process::ProcessManager::resume()
> @ 0x7f1e023a407c _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv
> @ 0x7f1e023cf1ba _ZSt13__invoke_implIvZN7process14ProcessManager12init_threadsEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
> @ 0x7f1e023cd9c9 _ZSt8__invokeIZN7process14ProcessManager12init_threadsEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS4_DpOS5_
> @ 0x7f1e023cc482 _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x7f1e023cb53b _ZNSt6thread8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEclEv
> @ 0x7f1e023ca3c4 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvEUlvE_EEEEE6_M_runEv
> @ 0x7f1e051f419d execute_native_thread_routine
> @ 0x7f1df4200ea5 start_thread
> @ 0x7f1df3f2996d __clone
> {code}
>
>
> It seems like an assertion check is failing which is categorized as fatal but I am not able to figure out the root cause of this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)