You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2017/06/14 18:49:00 UTC
[jira] [Created] (HIVE-16901) Distcp optimization - One distcp per
ReplCopyTask
Sankar Hariappan created HIVE-16901:
---------------------------------------
Summary: Distcp optimization - One distcp per ReplCopyTask
Key: HIVE-16901
URL: https://issues.apache.org/jira/browse/HIVE-16901
Project: Hive
Issue Type: Sub-task
Components: Hive, repl
Affects Versions: 2.1.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan
Fix For: 3.0.0
Currently, if a ReplCopyTask is created to copy a list of files, then distcp is invoked for each and every file. Instead, need to pass the list of source files to be copied to distcp tool which basically copies the files in parallel and hence gets lot of performance gain.
If the copy of list of files fail, then traverse the destination directory to see which file is missing and checksum mismatches, then trigger copy of those files one by one.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)