You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2017/09/29 16:58:00 UTC

[jira] [Comment Edited] (HIVE-16898) Validation of source file after distcp in repl load

    [ https://issues.apache.org/jira/browse/HIVE-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186070#comment-16186070 ] 

Sankar Hariappan edited comment on HIVE-16898 at 9/29/17 4:57 PM:
------------------------------------------------------------------

Added 8.patch with below changes.
- Rebased against master
- Fixed the bugs in handling for FileNotFoundException flow after distCp.
- Some code clean-up.

*Note: *Didn't handle couple of known issues as follows. Will track it from separate JIRA.
- If the source file is changed twice during distCp and leads to same checksum after copy but actually copied intermediate data.
- If distCp fails with FileNotFoundException, it is assumed that no partially copied file exist in destination. If it leads to partially copied data, then we always redirect copy from CM path even if source file exists.

Request [~thejas], [~anishek] to please review the same.
cc [~daijy]


was (Author: sankarh):
Added 8.patch with below changes.
- Rebased against master
- Fixed the bugs in handling for FileNotFoundException flow after distCp.
- Some code clean-up.

Request [~thejas], [~anishek] to please review the same.
cc [~daijy]

> Validation of source file after distcp in repl load 
> ----------------------------------------------------
>
>                 Key: HIVE-16898
>                 URL: https://issues.apache.org/jira/browse/HIVE-16898
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.0.0
>            Reporter: anishek
>            Assignee: Sankar Hariappan
>             Fix For: 3.0.0
>
>         Attachments: HIVE-16898.1.patch, HIVE-16898.2.patch, HIVE-16898.3.patch, HIVE-16898.4.patch, HIVE-16898.5.patch, HIVE-16898.6.patch, HIVE-16898.7.patch, HIVE-16898.8.patch
>
>
> time between deciding the source and destination path for distcp to invoking of distcp can have a change of the source file, hence distcp might copy the wrong file to destination, hence we should an additional check on the checksum of the source file path after distcp finishes to make sure the path didnot change during the copy process. if it has take additional steps to delete the previous file on destination and copy the new source and repeat the same process as above till we copy the correct file. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)