You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2018/04/24 19:14:00 UTC

[jira] [Comment Edited] (HIVE-18988) Support bootstrap replication of ACID tables

    [ https://issues.apache.org/jira/browse/HIVE-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449673#comment-16449673 ] 

Sankar Hariappan edited comment on HIVE-18988 at 4/24/18 7:13 PM:
------------------------------------------------------------------

Added 04.patch with
 * Logic to timeout the open txns which are opened before triggering bootstrap.
 * Replicate the write ids state in target based on validWriteIdlist for each ACID/MM table getting replicated.

[~ekoifman], [~maheshk114], [~thejas]

Can you please review this patch?


was (Author: sankarh):
Added 04.patch with
 * Logic to timeout the open txns which are opened before triggering bootstrap.
 * Replicate the write ids state in target based on validWriteIdlist for each ACID/MM table getting replicated.

> Support bootstrap replication of ACID tables
> --------------------------------------------
>
>                 Key: HIVE-18988
>                 URL: https://issues.apache.org/jira/browse/HIVE-18988
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 3.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: ACID, DR, pull-request-available, replication
>             Fix For: 3.1.0
>
>         Attachments: HIVE-18988.01.patch, HIVE-18988.02.patch, HIVE-18988.03.patch, HIVE-18988.04.patch
>
>
> Bootstrapping of ACID tables, need special handling to replicate a stable state of data.
>  - If ACID feature enables, then perform bootstrap dump for ACID tables with in read txn.
>  -> Dump table/partition metadata.
>  -> Get the list of valid data files for a table using same logic as read txn do.
>  -> Dump latest ValidWriteIdList as per current read txn.
>  - Set the valid last replication state such that it doesn't miss any open txn started after triggering bootstrap dump.
>  - If any txns on-going which was opened before triggering bootstrap dump, then it is not guaranteed that if open_txn event captured for these txns. Also, if these txns are opened for streaming ingest case, then dumped ACID table data may include data of open txns which impact snapshot isolation at target. To avoid that, bootstrap dump should wait for timeout (new configuration: hive.repl.bootstrap.dump.open.txn.timeout). After timeout, just force abort those txns and continue.
>  - If any txns force aborted belongs to a streaming ingest case, then dumped ACID table data may have aborted data too. So, it is necessary to replicate the aborted write ids to target to mark those data invalid for any readers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)