You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "wan kun (JIRA)" <ji...@apache.org> on 2017/11/04 03:55:00 UTC

[jira] [Commented] (HIVE-17574) Avoid multiple copies of HDFS-based jars when localizing job-jars

    [ https://issues.apache.org/jira/browse/HIVE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238753#comment-16238753 ] 

wan kun commented on HIVE-17574:
--------------------------------

Hi, [~mithun] , [~cdrome]:
I have some questions , and look forward to your advices:
1. In the MapReduce jobs, tmpJars have the similar problem. I think we can also use the tmpJars file on hdfs.
2. Fo the destFS.copyFromLocalFile method in tez DagUtils class, if the source file system type and the target file system type are also hfs fileSystem, it would not be upload again? When the MR jobs are submitted,there would not upload the jars.
3. Could we set the resources permission to PUBLIC, so  they would only be downloaded only once by NodeManager ?

Thank you

> Avoid multiple copies of HDFS-based jars when localizing job-jars
> -----------------------------------------------------------------
>
>                 Key: HIVE-17574
>                 URL: https://issues.apache.org/jira/browse/HIVE-17574
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.2.0, 3.0.0, 2.4.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Chris Drome
>            Priority: Major
>         Attachments: HIVE-17574.1-branch-2.2.patch, HIVE-17574.1-branch-2.patch, HIVE-17574.1.patch, HIVE-17574.2.patch
>
>
> Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)
> This has to do with the classpaths of Hive actions run from Oozie, and affects scripts that adds jars/resources from HDFS locations.
> As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend to be stored in HDFS paths, as are any custom user-libraries used in workflows. An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the following steps to occur:
> # Files are downloaded from HDFS to local temp dir.
> # UDFs are resolved/validated.
> # All jars/files, including those just downloaded from HDFS, are shipped right back to HDFS-based scratch-directories, for job submission.
> For HDFS-based files, this is wasteful and time-consuming. #3 above should skip shipping HDFS-based resources, and add those directly to the Tez session.
> We have a patch that's being used internally at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)