You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Miklos Szegedi (JIRA)" <ji...@apache.org> on 2017/12/23 00:09:00 UTC

[jira] [Comment Edited] (YARN-2185) Use pipes when localizing archives

    [ https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302123#comment-16302123 ] 

Miklos Szegedi edited comment on YARN-2185 at 12/23/17 12:08 AM:
-----------------------------------------------------------------

Attaching my suggestion how to solve this. The code streams HDFS as standard input to the tar and gzip commands. It handles Windows as well. As an addition I create temporary files with permissions 700 instead of 755. I do not create any additional temporary directories for extraction, one is enough. A difference is that I use jar command for zips as well, so that it handles Windows properly. Also I added an additional switch to be able to disable the modification time check specifying -1 as the timestamp. I also do parallel copy for directory localization to leverage the distributed storage in HDFS.


was (Author: miklos.szegedi@cloudera.com):
Attaching my suggestion how to solve this. The code streams HDFS as standard input to the tar and gzip commands. It handles Windows as well. As an addition I create temporary files with permissions 700 instead of 755. I do not create any additional temporary directories for extraction, one is enough. A difference is that I use jar command for zips as well, so that it handles Windows properly. Also I added an additional switch to be able to disable the modification time check specifying -1 as the timestamp.

> Use pipes when localizing archives
> ----------------------------------
>
>                 Key: YARN-2185
>                 URL: https://issues.apache.org/jira/browse/YARN-2185
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
>            Assignee: Miklos Szegedi
>         Attachments: YARN-2185.000.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, and then removes it.  It would be more efficient to stream the data as it's being unpacked to avoid both the extra disk space requirements and the additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org