You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Mithun Radhakrishnan (JIRA)" <ji...@apache.org> on 2017/10/02 18:34:00 UTC

[jira] [Commented] (HIVE-17574) Avoid multiple copies of HDFS-based jars when localizing job-jars

    [ https://issues.apache.org/jira/browse/HIVE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188584#comment-16188584 ] 

Mithun Radhakrishnan commented on HIVE-17574:
---------------------------------------------

I'm +1, but I'd like to be sure that there's no adverse effect on, say, LLAP execution because of this fix. Could [~thejas] or [~gopalv] take a look?

For reference, the point of this is to prevent user-jars from Oozie workflows from being copied several times between local FS and HDFS.

> Avoid multiple copies of HDFS-based jars when localizing job-jars
> -----------------------------------------------------------------
>
>                 Key: HIVE-17574
>                 URL: https://issues.apache.org/jira/browse/HIVE-17574
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.2.0, 3.0.0, 2.4.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Chris Drome
>         Attachments: HIVE-17574.1-branch-2.2.patch, HIVE-17574.1-branch-2.patch, HIVE-17574.1.patch
>
>
> Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)
> This has to do with the classpaths of Hive actions run from Oozie, and affects scripts that adds jars/resources from HDFS locations.
> As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend to be stored in HDFS paths, as are any custom user-libraries used in workflows. An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the following steps to occur:
> # Files are downloaded from HDFS to local temp dir.
> # UDFs are resolved/validated.
> # All jars/files, including those just downloaded from HDFS, are shipped right back to HDFS-based scratch-directories, for job submission.
> For HDFS-based files, this is wasteful and time-consuming. #3 above should skip shipping HDFS-based resources, and add those directly to the Tez session.
> We have a patch that's being used internally at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)