You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2018/06/18 00:46:00 UTC

[jira] [Created] (HIVE-19927) Last Repl ID set by bootstrap dump is not proper and may cause loss of data if have ACID tables.

Sankar Hariappan created HIVE-19927:
---------------------------------------

             Summary: Last Repl ID set by bootstrap dump is not proper and may cause loss of data if have ACID tables.
                 Key: HIVE-19927
                 URL: https://issues.apache.org/jira/browse/HIVE-19927
             Project: Hive
          Issue Type: Sub-task
          Components: HiveServer2, Transactions
    Affects Versions: 3.1.0
            Reporter: Sankar Hariappan
            Assignee: Sankar Hariappan


During bootstrap dump of ACID tables, let's consider the below sequence.
- Current session (REPL DUMP), Open txn (Txn1) - Event-10
- Another session (Session-2), Open txn (Txn2) - Event-11
- Session-2 -> Insert data (T1.D1) to ACID table. - Event-12
- Get lastReplId = last event ID logged. (Event-12)
- Session-2 -> Commit Txn (Txn2) - Event-13
- Dump ACID tables based on validTxnList based on Txn1. --> This step skips all the data written by txns > Txn1. So, T1.D1 will be missing.
- Commit Txn (Txn1)
- REPL LOAD from bootstrap dump will skip T1.D1.
- Incremental REPL DUMP will start from Event-13 and hence lose Txn2 which is opened after Txn1. So, data T1.D1 will be lost for ever.

Proposed to capture the lastReplId of bootstrap before opening current txn (Txn1) and store it in Driver context and use it for dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)