You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Dave Latham (JIRA)" <ji...@apache.org> on 2018/03/30 16:16:00 UTC

[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster

    [ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420663#comment-16420663 ] 

Dave Latham commented on HBASE-20305:
-------------------------------------

Thanks for the patch, Wellington.

Looks like it also fixes a nasty bug where dryRun doesn't actually appear to work.   I hope that did not bite you in using the tool.

I think I'd prefer calling the option something like doDeletes (default true) rather than insertsOnly (default false).  Could also have a similar option for doPuts to allow people to do deletes but not puts if they preferred.

+0.9 as is.

I'm going to hit the Submit Patch button to try to get Hadoop QA to take a pass.

> Add option to SyncTable that skip deletes on target cluster
> -----------------------------------------------------------
>
>                 Key: HBASE-20305
>                 URL: https://issues.apache.org/jira/browse/HBASE-20305
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 2.0.0-alpha-4
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>         Attachments: 0001-HBASE-20305.master.001.patch
>
>
> We had a situation where two clusters with active-active replication got out of sync, but both had data that should be kept. The tables in question never have data deleted, but ingestion had happened on the two different clusters, some rows had been even updated.
> In this scenario, a cell that is present in one of the table clusters should not be deleted, but replayed on the other. Also, for cells with same identifier but different values, the most recent value should be kept. Current version of SyncTable would not be applicable here, because it would simply copy the whole state from source to target, then losing any additional rows that might be only in target, as well as cell values that got most recent update. This could be solved by adding an option to skip deletes for SyncTable. This way, the additional cells not present on source would still be kept. For cells with same identifier but different values, it would just perform a Put for the cell version from source, but client scans would still fetch the most recent timestamp.
> I'm attaching a patch with this additional option shortly. Please share your thoughts.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)