You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by "Bernd Mathiske (JIRA)" <ji...@apache.org> on 2014/04/14 20:49:23 UTC

[jira] [Commented] (MESOS-336) Mesos slave should cache executors

    [ https://issues.apache.org/jira/browse/MESOS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968660#comment-13968660 ] 

Bernd Mathiske commented on MESOS-336:
--------------------------------------

I suggest the following approach. All URI contents gets downloaded into a fetcher result cache directory (short: fetch dir) per slave instead of a work dir per executor.  Extraction of archives (e.g. *.tgz files) also happens per slave, inside the fetch dir. The extracted resources are then soft-linked into each executor's work dir.

How to handle different users and chmod-ing for them? There is a separate fetch subdir for each fetched URI/user combination. In case of an archive, we extract and chmod once per user. If it's not an archive, we make a copy and chmod per user. In any case, we only download once, regardless of user settings.

The main problem I am facing now is persisting what URIs have been downloaded and resulted in what fetch subdir. This info needs to be kept at least for the duration of the slave process. (No need to go beyond that as in case a slave fails, we can simply wipe the entire fetch cache on recovery.) It would be simpler and foster less fragile source code if the fetcher were part of the slave program, not a separate program. But I reckon we can still keep the required state in the slave's dynamic memory and use it to direct fetcher program invocations. Then we have to be careful to keep what the fetcher does and what the slave knows in sync, though.


> Mesos slave should cache executors
> ----------------------------------
>
>                 Key: MESOS-336
>                 URL: https://issues.apache.org/jira/browse/MESOS-336
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>            Reporter: brian wickman
>            Assignee: Bernd Mathiske
>              Labels: newbie
>
> The slave should be smarter about how it handles pulling down executors.  In our environment, executors rarely change but the slave will always pull it down from regardless HDFS.  This puts undue stress on our HDFS clusters, and is not resilient to reduced HDFS availability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)