You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Lefty Leverenz (JIRA)" <ji...@apache.org> on 2016/01/03 11:32:39 UTC

[jira] [Commented] (HIVE-11940) "INSERT OVERWRITE" query is very slow because it creates one "distcp" per file to copy data from staging directory to target directory

    [ https://issues.apache.org/jira/browse/HIVE-11940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080389#comment-15080389 ] 

Lefty Leverenz commented on HIVE-11940:
---------------------------------------

[~prasanth_j] committed this to branch-1 on Dec. 8, 2015 (commit 445ed86f2b51bdcf8beed5291b1eb11be4fd2b61), so Fix Version/s should include 1.3.0.

> "INSERT OVERWRITE" query is very slow because it creates one "distcp" per file to copy data from staging directory to target directory
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11940
>                 URL: https://issues.apache.org/jira/browse/HIVE-11940
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.1
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>             Fix For: 2.0.0
>
>         Attachments: HIVE-11940.1.patch, HIVE-11940.2.patch
>
>
> When hive.exec.stagingdir is set to ".hive-staging", which will be placed under the target directory when running "INSERT OVERWRITE" query, Hive will grab all files under the staging directory and copy them ONE BY ONE to target directory.
> When hive exec.stagingdir is set to "/tmp/hive", Hive will simply do a RENAME operation which will be instant.
> This happens with files that are not encrypted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)