You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by z11373 <z1...@outlook.com> on 2015/11/12 16:12:39 UTC

data comparison tool

We currently write to tables in 2 places (this may change once we leverage
Accumulo 1.7 replication feature or another solution). I wonder if Accumulo
provides (or someone already wrote) the tool to compare data from both
tables (from 2 different Accumulo instances)?
Naïve solution I can think of is to iterate both tables (since they already
sorted by row ids) and perform something like 'merge' comparison, but it'd
definitely save my time if someone already wrote the implementation.

Thanks,
Z



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/data-comparison-tool-tp15537.html
Sent from the Developers mailing list archive at Nabble.com.

Re: data comparison tool

Posted by Josh Elser <jo...@gmail.com>.
Yep, that's an easy way to check. It can just be slow depending on how 
much data you have.

I tried to write a slightly more parallel approach to verifying this 
based using a Merkle tree.

https://github.com/apache/accumulo/tree/master/test/system/merkle-replication

It's a little tricky as the boundaries of each leaf-node in the tree (a 
tablet) can affect the root value of the tree. In other words, if you 
don't have the same split points on both tables, the verification would 
fail.

z11373 wrote:
> We currently write to tables in 2 places (this may change once we leverage
> Accumulo 1.7 replication feature or another solution). I wonder if Accumulo
> provides (or someone already wrote) the tool to compare data from both
> tables (from 2 different Accumulo instances)?
> Naïve solution I can think of is to iterate both tables (since they already
> sorted by row ids) and perform something like 'merge' comparison, but it'd
> definitely save my time if someone already wrote the implementation.
>
> Thanks,
> Z
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/data-comparison-tool-tp15537.html
> Sent from the Developers mailing list archive at Nabble.com.