You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Adam B (JIRA)" <ji...@apache.org> on 2014/11/07 11:47:35 UTC

[jira] [Updated] (MESOS-1667) Extract from URI while downloading into work dir

     [ https://issues.apache.org/jira/browse/MESOS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam B updated MESOS-1667:
--------------------------
    Component/s: fetcher

> Extract from URI while downloading into work dir
> ------------------------------------------------
>
>                 Key: MESOS-1667
>                 URL: https://issues.apache.org/jira/browse/MESOS-1667
>             Project: Mesos
>          Issue Type: Improvement
>          Components: fetcher, slave
>    Affects Versions: 0.20.0
>         Environment: Every
>            Reporter: Bernd Mathiske
>              Labels: features, performance
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When the fetcher downloads an extractable archive, e.g. a tar file, it currently downloads it completely and only then starts extracting from it. But only the end result is needed for execution. Thus the space used for the downloaded copy of the archive is wasted. This can become critical in case of large archives.
> The general idea to solve this issue is to perform the extraction while downloading, and not storing intermediate results on disk. Possibly, this can be achieved by arranging process pipes or by using some extraction library code to stream the data through.
> However, as a result of this, repeated downloading may always be called for, whereas given an existing (https://reviews.apache.org/r/21316/) but not yet committed patch for MESOS-336, the fetcher cache could just repeat the extraction, without downloading more than once. Thus choosing in-stream extraction might result in an overall performance loss. We should therefore give users extra options in CommandInfo.URI to choose how to handle this.
> In some cases, it could be possible to reuse the extracted assets directly, also forgoing the repeat extraction. This could be handled with sym links. Then extraction can happen during downloading and neither repeat downloading nor repeat extraction occur. The user has to be conscious of the safety issue, though, that any post-extraction modifications to the downloaded assets are visible to subsequent tasks. So, an explicit flag in CommandInfo.UIR is called for here, as well.
> Ideally, this issue would be solved as a follow-up of MESOS-336, because some of the described benefits depend on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)