You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "mahesh kumar behera (JIRA)" <ji...@apache.org> on 2018/08/23 06:40:00 UTC

[jira] [Commented] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

    [ https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589769#comment-16589769 ] 

mahesh kumar behera commented on HIVE-13704:
--------------------------------------------

[~ashutoshc] [~spena]

There seems to be a leak of job object if we call run instead of execute. The issue is in the run method of distcp which does not close the job created. As per this issue, the problem with calling execute is that , setTargetPathExists is not done in execute. Can we do that and other settings done in run method  in hive and call distcp.execute instead of distcp.run ? 

//cc
 [~thejas] [~anishek][~sankarh]

> Don't call DistCp.execute() instead of DistCp.run()
> ---------------------------------------------------
>
>                 Key: HIVE-13704
>                 URL: https://issues.apache.org/jira/browse/HIVE-13704
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Harsh J
>            Assignee: Sergio Peña
>            Priority: Critical
>             Fix For: 2.1.1, 2.2.0
>
>         Attachments: HIVE-13704.1.patch
>
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} method runs added logic that drives the state of {{SimpleCopyListing}} which runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or between encrypted/non-encrypted zones, for sizes above a configured value) this state not being set causes wrong paths to appear on the target (subdirs named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method directly, to not skip the target exists flag that the {{setTargetPathExists}} call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)