You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2017/07/14 21:11:00 UTC

[jira] [Comment Edited] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.

    [ https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088069#comment-16088069 ] 

Sankar Hariappan edited comment on HIVE-16990 at 7/14/17 9:10 PM:
------------------------------------------------------------------

Added 01.patch with below updates.
- The setting of current repl state by TableSerializer and PartitionSerializer is limited to only bootstrap dump. In case of incremental dump, this is done by load.
- Repl load track the metadata objects modified using newly UpdatedMetadataTracker object. This replaces the dbsUpdated and tablesUpdated maps.
- Added additional alter tasks to update the current repl state of the updated metadata objects. All these alter tasks are added after applying each event. This increased the number of tasks for each event. The overall execution time of replication test cases also increased due to this. Will try to optimise later.
- Made ReplCopyTasks to throw error if any of the listed file is missing from both original path and cmpath. Corrected the test cases to handle this failure case.
- Removed unused or dead code wherever found.
- Added a new test case to verify the repl status on failure and ensure if retry of failed dump works after fix.

Request [~daijy]/[~sushanth]/[~anishek]/[~thejas] to review the patch!




was (Author: sankarh):
Added 01.patch with below updates.
- The setting of current repl state by TableSerializer and PartitionSerializer is limited to only bootstrap dump. In case of incremental dump, this is done by load.
- Repl load track the metadata objects modified using newly UpdatedMetadataTracker object. This replaces the dbsUpdated and tablesUpdated maps.
- Added additional alter tasks to update the current repl state of the updated metadata objects. All these alter tasks are added after applying each event. This increased the number of tasks for each event. The overall execution time of replication test cases also increased due to this. Will try to optimise later.
- Made ReplCopyTasks to throw error if any of the listed file is missing from both original path and cmpath. Corrected the test cases to handle this failure case.
- Removed unused or dead code wherever found.

Request [~daijy]/[~sushanth]/[~anishek]/[~thejas] to review the patch!



> REPL LOAD should update last repl ID only after successful copy of data files.
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-16990
>                 URL: https://issues.apache.org/jira/browse/HIVE-16990
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive, repl
>    Affects Versions: 2.1.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>              Labels: DR, replication
>             Fix For: 3.0.0
>
>         Attachments: HIVE-16990.01.patch
>
>
> For REPL LOAD operations that includes both metadata and data changes should follow the below rule.
> 1. Copy the metadata excluding the last repl ID.
> 2. Copy the data files
> 3. If Step 1 and 2 are successful, then update the last repl ID of the object.
> This rule will allow the the failed events to be re-applied by REPL LOAD and ensures no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)