You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/03/14 20:38:13 UTC

[jira] [Created] (MESOS-396) Slave GarbageCollector needs to delete the parent executor directories. It currently only deletes the executor run directories.

Benjamin Mahler created MESOS-396:
-------------------------------------

             Summary: Slave GarbageCollector needs to delete the parent executor directories. It currently only deletes the executor run directories.
                 Key: MESOS-396
                 URL: https://issues.apache.org/jira/browse/MESOS-396
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Mahler
            Priority: Blocker


The result of this is that long lived slaves accumulate a large number of empty executor directories. All that remains in these directories is a broken link to the 'latest' run.

Over time, as the slave approaches having LINK_MAX empty executor directories, the slave will crash from mkdir failing, as was found in MESOS-391.

The fix is that we have to schedule the executor parent directories for deletion, however the GC module does not know whether the parent executor can be deleted! This is because there could be more tasks launched with the same executor id, since having scheduled the directory for deletion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira