You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Yan Xu (JIRA)" <ji...@apache.org> on 2016/08/23 16:44:21 UTC

[jira] [Commented] (MESOS-5763) Task stuck in fetching is not cleaned up after --executor_registration_timeout.

    [ https://issues.apache.org/jira/browse/MESOS-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433168#comment-15433168 ] 

Yan Xu commented on MESOS-5763:
-------------------------------

[~megha.sharma] contributed a test for this.

{noformat:title=}
commit a064505e411fe78a257e9b336a888f1eeddaa949
Author: Megha Sharma <ms...@apple.com>
Date:   Mon Aug 22 14:51:07 2016 -0700

    Added test to simulate slow/unresponsive fetch.
    
    Added test to simulate the scenario of slow/unresponsive HDFS leading
    to executor register timeout and verify that slave gets notified of the
    failure.
    
    Review: https://reviews.apache.org/r/50000/
{noformat}

> Task stuck in fetching is not cleaned up after --executor_registration_timeout.
> -------------------------------------------------------------------------------
>
>                 Key: MESOS-5763
>                 URL: https://issues.apache.org/jira/browse/MESOS-5763
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.28.0, 1.0.0, 0.29.0
>            Reporter: Yan Xu
>            Assignee: Yan Xu
>            Priority: Blocker
>             Fix For: 0.28.3, 1.0.0, 0.27.4
>
>
> When the fetching process hangs forever due to reasons such as HDFS issues, Mesos containerizer would attempt to destroy the container and kill the executor after {{--executor_registration_timeout}}. However this reliably fails for us: the executor would be killed by the launcher destroy and the container would be destroyed but the agent would never find out that the executor is terminated thus leaving the task in the STAGING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)