You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Alex Newman (JIRA)" <ji...@apache.org> on 2014/08/10 22:55:11 UTC

[jira] [Commented] (HBASE-11715) HBase should provide a tool to compare 2 remote tables.

    [ https://issues.apache.org/jira/browse/HBASE-11715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092200#comment-14092200 ] 

Alex Newman commented on HBASE-11715:
-------------------------------------

Could you provide some more details

1. How is this table copied. Do we flush and just move the HFiles over.
2. What do we do if they are not equivalent. Is it enough to throw an error, or do we need to say what part of the table isn't equivalent.
3. Do Merkle trees make sense for this type of thing?

> HBase should provide a tool to compare 2 remote tables.
> -------------------------------------------------------
>
>                 Key: HBASE-11715
>                 URL: https://issues.apache.org/jira/browse/HBASE-11715
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jean-Marc Spaggiari
>
> As discussed in the mailing list, when a table is copied to another cluster and need to be validated against the first one, only VerifyReplication can be used. However, this can be very long since data need to be copied again.
> We should provide an easier and faster way to compare the tables. 
> One option is to calculate hashs per ranges. User can define number of buckets, then we split the table into this number of buckets and calculate an hash for each (Like partitioner is already doing). We can also optionally calculate an overall CRC to reduce even more hash collision. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)