You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Charles Natali (Jira)" <ji...@apache.org> on 2021/10/16 11:51:00 UTC
[jira] [Assigned] (MESOS-9657) Launching a command task twice can
crash the agent
[ https://issues.apache.org/jira/browse/MESOS-9657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Natali reassigned MESOS-9657:
-------------------------------------
Fix Version/s: 1.12.0
Assignee: Charles Natali
Resolution: Fixed
> Launching a command task twice can crash the agent
> --------------------------------------------------
>
> Key: MESOS-9657
> URL: https://issues.apache.org/jira/browse/MESOS-9657
> Project: Mesos
> Issue Type: Bug
> Reporter: Benno Evers
> Assignee: Charles Natali
> Priority: Major
> Fix For: 1.12.0
>
>
> When launching a command task, we verify that the framework has no existing executor for that task:
> {noformat}
> // We are dealing with command task; a new command executor will be
> // launched.
> CHECK(executor == nullptr);
> {noformat}
> and afterwards an executor is created with the same executor id as the task id:
> {noformat}
> // (slave.cpp)
> // Either the master explicitly requests launching a new executor
> // or we are in the legacy case of launching one if there wasn't
> // one already. Either way, let's launch executor now.
> if (executor == nullptr) {
> Try<Executor*> added = framework->addExecutor(executorInfo);
> [...]
> {noformat}
> This means that if we relaunch the task with the same task id before the executor is removed, it will crash the agent:
> {noformat}
> F0315 16:39:32.822818 38112 slave.cpp:2865] Check failed: executor == nullptr
> *** Check failure stack trace: ***
> @ 0x7feb29a407af google::LogMessage::Flush()
> @ 0x7feb29a43c3f google::LogMessageFatal::~LogMessageFatal()
> @ 0x7feb28a5a886 mesos::internal::slave::Slave::__run()
> @ 0x7feb28af4f0e _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSK_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaISU_EERKSK_IbESG_SJ_SO_SS_SY_S11_EEvRKNS1_3PIDIT_EEMS13_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSE_OSH_OSM_OSQ_OSW_OSZ_S3_E_JSE_SH_SM_SQ_SW_SZ_St12_PlaceholderILi1EEEEEEclEOS3_
> @ 0x7feb2998a620 process::ProcessBase::consume()
> @ 0x7feb29987675 process::ProcessManager::resume()
> @ 0x7feb299a2d2b _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvE3$_8EEEEE6_M_runEv
> @ 0x7feb2632f523 (unknown)
> @ 0x7feb25e40594 start_thread
> @ 0x7feb25b73e6f __GI___clone
> Aborted (core dumped)
> {noformat}
> Instead of crashing, the agent should just drop the task with an appropriate error in this case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)