You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Yan Xu (JIRA)" <ji...@apache.org> on 2017/11/14 19:30:00 UTC

[jira] [Commented] (MESOS-8223) Master crashes when suppressed on subscribe is enabled.

    [ https://issues.apache.org/jira/browse/MESOS-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252014#comment-16252014 ] 

Yan Xu commented on MESOS-8223:
-------------------------------

The problem is that this [code|https://github.com/apache/mesos/blob/bb2deb3baafffb9a35d1dfbc35b0d43677b0b842/src/master/allocator/mesos/hierarchical.cpp#L447-L460] treats frameworks moving off a role and frameworks suppressing a role the same way. The former should untrack the framework under that role and the latter shouldn't.

> Master crashes when suppressed on subscribe is enabled.
> -------------------------------------------------------
>
>                 Key: MESOS-8223
>                 URL: https://issues.apache.org/jira/browse/MESOS-8223
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.4.0
>            Reporter: Yan Xu
>            Priority: Critical
>
> Introduced in MESOS-7015, this feature is not actually turned on due to MESOS-8200. However once this is addressed and the feature enabled, the master crashes with:
> {noformat:title=}
> I1113 17:17:37.240901 11285 master.cpp:3309] Disconnecting framework 40f7bdc0-e54b-46da-ace1-48162171baf4-0110 (test-framework)
> I1113 17:17:37.240911 11285 master.cpp:1435] Giving framework 40f7bdc0-e54b-46da-ace1-48162171baf4-0110 (test-framework) 3days to failover
> I1113 17:17:37.241953 11285 master.cpp:2612] Received subscription request for HTTP framework 'test-framework'
> I1113 17:17:37.242807 11285 master.cpp:2748] Subscribing framework 'test-framework' with checkpointing enabled, roles { * } suppressed and capabilities [ SHARED_RESOURCES, TASK_KILLING_STATE ]
> I1113 17:17:37.242820 11285 master.cpp:6994] Updating info for framework 40f7bdc0-e54b-46da-ace1-48162171baf4-0110
> I1113 17:17:37.252637 11270 hierarchical.cpp:380] Activated framework 40f7bdc0-e54b-46da-ace1-48162171baf4-0110
> I1113 17:17:37.272457 11289 master.cpp:7723] Performing implicit task state reconciliation for framework 40f7bdc0-e54b-46da-ace1-48162171baf4-0110 (test-framework)
> I1113 17:17:37.272507 11289 master.cpp:7723] Performing implicit task state reconciliation for framework 40f7bdc0-e54b-46da-ace1-48162171baf4-0110 (test-framework)
> I1113 17:17:41.966331 11271 master.cpp:5564] Processing REVIVE call for framework 40f7bdc0-e54b-46da-ace1-48162171baf4-0110 (test-framework)
> F1113 17:17:41.966380 11280 sorter.cpp:270] Check failed: 'find(clientPath)' Must be non NULL
> *** Check failure stack trace: ***
>     @     0x7f3467efd0dd  (unknown)
> {noformat}
> This happens with a unsuppressed framework reregisters with suppressed roles and then revive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)