You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2019/02/01 19:43:05 UTC

[jira] [Updated] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster

     [ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-20305:
-----------------------------------
    Fix Version/s:     (was: 1.5.0)
                   1.5.1

> Add option to SyncTable that skip deletes on target cluster
> -----------------------------------------------------------
>
>                 Key: HBASE-20305
>                 URL: https://issues.apache.org/jira/browse/HBASE-20305
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 2.0.0-alpha-4
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>             Fix For: 3.0.0, 2.2.0, 1.5.1
>
>         Attachments: 0001-HBASE-20305.master.001.patch, HBASE-20305.master.002.patch
>
>
> We had a situation where two clusters with active-active replication got out of sync, but both had data that should be kept. The tables in question never have data deleted, but ingestion had happened on the two different clusters, some rows had been even updated.
> In this scenario, a cell that is present in one of the table clusters should not be deleted, but replayed on the other. Also, for cells with same identifier but different values, the most recent value should be kept. Current version of SyncTable would not be applicable here, because it would simply copy the whole state from source to target, then losing any additional rows that might be only in target, as well as cell values that got most recent update. This could be solved by adding an option to skip deletes for SyncTable. This way, the additional cells not present on source would still be kept. For cells with same identifier but different values, it would just perform a Put for the cell version from source, but client scans would still fetch the most recent timestamp.
> I'm attaching a patch with this additional option shortly. Please share your thoughts.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)