You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mahipal Jupalli (JIRA)" <ji...@apache.org> on 2016/10/15 23:01:20 UTC

[jira] [Created] (HIVE-14980) Minor compaction when triggered simultaniously on the same table/partition deletes data

Mahipal Jupalli created HIVE-14980:
--------------------------------------

             Summary: Minor compaction when triggered simultaniously on the same table/partition deletes data
                 Key: HIVE-14980
                 URL: https://issues.apache.org/jira/browse/HIVE-14980
             Project: Hive
          Issue Type: Bug
          Components: Metastore
    Affects Versions: 2.1.0
            Reporter: Mahipal Jupalli
            Assignee: Mahipal Jupalli
            Priority: Critical


I have two tables (TABLEA, TABLEB). If I manually trigger compaction after each INSERT into TABLEB from TABLEA, compactions are triggered on random metastore asynchronously and are stepping on each other which is causing the data to be deleted.

Example here: 
TABLEA - has 10k rows. 

insert into mj.tableb select * from mj.tablea;
alter table mj.tableb compact 'MINOR';
insert into mj.tableb select * from mj.tablea;
alter table mj.tableb compact 'MINOR';

Once all the compactions are complete, I should ideally see 20k rows in the table. But I see only 10k rows (Only the rows INSERTED before the last compaction persist, the old rows are deleted. I believe the old delta files are deleted). 

To further confirm the bug, if I do only one compaction after two inserts, I see 20k rows in TABLEB.

Proposed Fix:
I have identified the bug in the code, it requires an additional check in the org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active compactions on the table/partition. I will 'share the details of the fix once I test it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)