You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2018/05/01 05:43:00 UTC

[jira] [Updated] (HIVE-18988) Support bootstrap replication of ACID tables

     [ https://issues.apache.org/jira/browse/HIVE-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sankar Hariappan updated HIVE-18988:
------------------------------------
    Status: Open  (was: Patch Available)

> Support bootstrap replication of ACID tables
> --------------------------------------------
>
>                 Key: HIVE-18988
>                 URL: https://issues.apache.org/jira/browse/HIVE-18988
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 3.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: ACID, DR, pull-request-available, replication
>             Fix For: 3.1.0
>
>         Attachments: HIVE-18988.01.patch, HIVE-18988.02.patch, HIVE-18988.03.patch, HIVE-18988.04.patch, HIVE-18988.05.patch, HIVE-18988.06.patch
>
>
> Bootstrapping of ACID tables, need special handling to replicate a stable state of data.
>  - If ACID feature enables, then perform bootstrap dump for ACID tables with in read txn.
>  -> Dump table/partition metadata.
>  -> Get the list of valid data files for a table using same logic as read txn do.
>  -> Dump latest ValidWriteIdList as per current read txn.
>  - Set the valid last replication state such that it doesn't miss any open txn started after triggering bootstrap dump.
>  - If any txns on-going which was opened before triggering bootstrap dump, then it is not guaranteed that if open_txn event captured for these txns. Also, if these txns are opened for streaming ingest case, then dumped ACID table data may include data of open txns which impact snapshot isolation at target. To avoid that, bootstrap dump should wait for timeout (new configuration: hive.repl.bootstrap.dump.open.txn.timeout). After timeout, just force abort those txns and continue.
>  - If any txns force aborted belongs to a streaming ingest case, then dumped ACID table data may have aborted data too. So, it is necessary to replicate the aborted write ids to target to mark those data invalid for any readers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)