You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2016/07/11 23:14:10 UTC

[jira] [Commented] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

    [ https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371868#comment-15371868 ] 

Ashutosh Chauhan commented on HIVE-13704:
-----------------------------------------

+1

> Don't call DistCp.execute() instead of DistCp.run()
> ---------------------------------------------------
>
>                 Key: HIVE-13704
>                 URL: https://issues.apache.org/jira/browse/HIVE-13704
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Harsh J
>            Assignee: Sergio Peña
>            Priority: Critical
>         Attachments: HIVE-13704.1.patch
>
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} method runs added logic that drives the state of {{SimpleCopyListing}} which runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or between encrypted/non-encrypted zones, for sizes above a configured value) this state not being set causes wrong paths to appear on the target (subdirs named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method directly, to not skip the target exists flag that the {{setTargetPathExists}} call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)