You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Peter Varga (Jira)" <ji...@apache.org> on 2020/12/03 15:42:00 UTC

[jira] [Created] (HIVE-24481) Skipped compaction can cause data corruption with streaming

Peter Varga created HIVE-24481:
----------------------------------

             Summary: Skipped compaction can cause data corruption with streaming
                 Key: HIVE-24481
                 URL: https://issues.apache.org/jira/browse/HIVE-24481
             Project: Hive
          Issue Type: Bug
            Reporter: Peter Varga
            Assignee: Peter Varga


Timeline:
1. create a partitioned table, add one static partition
2. transaction 1 writes delta_1, and aborts
3. create streaming connection, with batch 3, withStaticPartitionValues with the existing partition
4. beginTransaction, write, commitTransaction
5. beginTransaction, write, abortTransaction
6. beingTransaction, write, commitTransaction
7. close connection, count of the table is 2
8. run manual minor compaction on the partition. it will skip compaction, because deltacount =1 but clean, because there is aborted txn1
9. cleaner will remove both aborted record from txn_components
10. wait for acidhousekeeper to remove empty aborted txns
11. select * from table return *3* records, reading the aborted record



--
This message was sent by Atlassian Jira
(v8.3.4#803005)