You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Wellington Chevreuil (JIRA)" <ji...@apache.org> on 2018/03/28 21:27:00 UTC

[jira] [Created] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster

Wellington Chevreuil created HBASE-20305:
--------------------------------------------

             Summary: Add option to SyncTable that skip deletes on target cluster
                 Key: HBASE-20305
                 URL: https://issues.apache.org/jira/browse/HBASE-20305
             Project: HBase
          Issue Type: Improvement
          Components: mapreduce
    Affects Versions: 2.0.0-alpha-4
            Reporter: Wellington Chevreuil
            Assignee: Wellington Chevreuil


We had a situation where two clusters with active-active replication got out of sync, but both had data that should be kept. The tables in question never have data deleted, but ingestion had happened on the two different clusters, some rows had been even updated.

In this scenario, a cell that is present in one of the table clusters should not be deleted, but replayed on the other. Also, for cells with same identifier but different values, the most recent value should be kept. Current version of SyncTable would not be applicable here, because it would simply copy the whole state from source to target, then losing any additional rows that might be only in target, as well as cell values that got most recent update. This could be solved by adding an option to skip deletes for SyncTable. This way, the additional cells not present on source would still be kept. For cells with same identifier but different values, it would just perform a Put for the cell version from source, but client scans would still fetch the most recent timestamp.

I'm attaching a patch with this additional option shortly. Please share your thoughts.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)