You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2019/04/04 00:12:00 UTC
[jira] [Comment Edited] (MESOS-9635)
OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is
flaky again (3x) due to orphan operations
[ https://issues.apache.org/jira/browse/MESOS-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803066#comment-16803066 ]
Greg Mann edited comment on MESOS-9635 at 4/4/19 12:11 AM:
-----------------------------------------------------------
I think this issue would be better addressed by allocating the recovered orphan operations at the time of framework recovery, rather than when an {{UpdateSlaveMessage}} is received. The following patch implements this approach:
https://reviews.apache.org/r/70325/
was (Author: greggomann):
I think this issue would be better addressed by allocating the recovered orphan operations at the time of framework recovery, rather than when an {{UpdateSlaveMessage}} is received. The following patches implement this approach:
https://reviews.apache.org/r/70324/
https://reviews.apache.org/r/70325/
> OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is flaky again (3x) due to orphan operations
> -----------------------------------------------------------------------------------------------------------------
>
> Key: MESOS-9635
> URL: https://issues.apache.org/jira/browse/MESOS-9635
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.8.0
> Reporter: Benno Evers
> Assignee: Gastón Kleiman
> Priority: Blocker
> Labels: foundations, mesosphere
> Attachments: failure
>
>
> This test fails consistently when run while the system is stressed:
> {code}
> [ RUN ] ContentType/OperationReconciliationTest.AgentPendingOperationAfterMasterFailover/0
> F0305 08:10:07.670622 3982 hierarchical.cpp:1259] Check failed: slave.getAllocated().contains(resources) {} does not contain disk(allocated: default-role)[RAW(,,profile)]:200
> *** Check failure stack trace: ***
> @ 0x7f1120b0ce5e google::LogMessage::Fail()
> @ 0x7f1120b0cdbb google::LogMessage::SendToLog()
> @ 0x7f1120b0c7b5 google::LogMessage::Flush()
> @ 0x7f1120b0f578 google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f111e536f2a mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::recoverResources()
> @ 0x5580c2651c26 _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_11FrameworkIDERKNS1_7SlaveIDERKNS1_9ResourcesERK6OptionINS1_7FiltersEES8_SB_SE_SJ_EEvRKNS_3PIDIT_EEMSL_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_ENKUlOS6_OS9_OSC_OSH_PNS_11ProcessBaseEE_clES13_S14_S15_S16_S18_
> @ 0x5580c26c7e02 _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS3_11FrameworkIDERKNS3_7SlaveIDERKNS3_9ResourcesERK6OptionINS3_7FiltersEESA_SD_SG_SL_EEvRKNS1_3PIDIT_EEMSN_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOS8_OSB_OSE_OSJ_PNS1_11ProcessBaseEE_JS8_SB_SE_SJ_S1A_EEEDTclcl7forwardISN_Efp_Espcl7forwardIT0_Efp0_EEEOSN_DpOS1C_
> @ 0x5580c26c5b1e _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS4_11FrameworkIDERKNS4_7SlaveIDERKNS4_9ResourcesERK6OptionINS4_7FiltersEESB_SE_SH_SM_EEvRKNS2_3PIDIT_EEMSO_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOS9_OSC_OSF_OSK_PNS2_11ProcessBaseEE_JS9_SC_SF_SK_St12_PlaceholderILi1EEEE13invoke_expandIS1C_St5tupleIJS9_SC_SF_SK_S1E_EES1H_IJOS1B_EEJLm0ELm1ELm2ELm3ELm4EEEEDTcl6invokecl7forwardISO_Efp_Espcl6expandcl3getIXT2_EEcl7forwardISS_Efp0_EEcl7forwardIST_Efp2_EEEEOSO_OSS_N5cpp1416integer_sequenceImJXspT2_EEEEOST_
> @ 0x5580c26c47ac _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS4_11FrameworkIDERKNS4_7SlaveIDERKNS4_9ResourcesERK6OptionINS4_7FiltersEESB_SE_SH_SM_EEvRKNS2_3PIDIT_EEMSO_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOS9_OSC_OSF_OSK_PNS2_11ProcessBaseEE_JS9_SC_SF_SK_St12_PlaceholderILi1EEEEclIJS1B_EEEDTcl13invoke_expandcl4movedtdefpT1fEcl4movedtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1ELm2ELm3ELm4EEEE_Ecl16forward_as_tuplespcl7forwardIT_Efp_EEEEDpOS1K_
> @ 0x5580c26c3ad7 _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS6_11FrameworkIDERKNS6_7SlaveIDERKNS6_9ResourcesERK6OptionINS6_7FiltersEESD_SG_SJ_SO_EEvRKNS4_3PIDIT_EEMSQ_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOSB_OSE_OSH_OSM_PNS4_11ProcessBaseEE_JSB_SE_SH_SM_St12_PlaceholderILi1EEEEEJS1D_EEEDTclcl7forwardISQ_Efp_Espcl7forwardIT0_Efp0_EEEOSQ_DpOS1I_
> @ 0x5580c26c32ad _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS7_11FrameworkIDERKNS7_7SlaveIDERKNS7_9ResourcesERK6OptionINS7_7FiltersEESE_SH_SK_SP_EEvRKNS5_3PIDIT_EEMSR_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOSC_OSF_OSI_OSN_PNS5_11ProcessBaseEE_JSC_SF_SI_SN_St12_PlaceholderILi1EEEEEJS1E_EEEvOSR_DpOT0_
> @ 0x5580c26c0a5e _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNSA_11FrameworkIDERKNSA_7SlaveIDERKNSA_9ResourcesERK6OptionINSA_7FiltersEESH_SK_SN_SS_EEvRKNS1_3PIDIT_EEMSU_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOSF_OSI_OSL_OSQ_S3_E_JSF_SI_SL_SQ_St12_PlaceholderILi1EEEEEEclEOS3_
> @ 0x7f1120a51c60 _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_
> @ 0x7f1120a16a4e process::ProcessBase::consume()
> @ 0x7f1120a3d9d8 _ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE
> @ 0x5580c2284afa process::ProcessBase::serve()
> @ 0x7f1120a138db process::ProcessManager::resume()
> @ 0x7f1120a0fc28 _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv
> @ 0x7f1120a375d0 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x7f1120a36734 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEclEv
> @ 0x7f1120a3569c _ZNSt6thread11_State_implISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> @ 0x7f111499276f (unknown)
> @ 0x7f111507273a start_thread
> @ 0x7f11140f8e7f __GI___clone
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)