You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by tobe <to...@gmail.com> on 2014/08/12 09:56:22 UTC

Can sqoop validate the data from each database

An amateur question, can sqoop validate the values of date from each
database?

I have read
http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#validation and find
out RowCountValidator. But what I want is to verify the values from each
database.

If it's not supported now, how can I validator the data from two databases?

Re: Can sqoop validate the data from each database

Posted by Gwen Shapira <gs...@cloudera.com>.
Yeah, I think you are on your own there.

If you manage to come up with something generic, consider contributing
it back into Sqoop :)

On Tue, Aug 12, 2014 at 5:49 PM, tobe <to...@gmail.com> wrote:
> Thanks @Gwen.
>
> I want to compare the content between MySQL and HBase. It's not suitable to
> use row count or checksums because the values are increasing. I think I have
> to write a script to read from two databases and compare by myself.
>
>
> On Wed, Aug 13, 2014 at 1:04 AM, Gwen Shapira <gs...@cloudera.com> wrote:
>>
>> By data validator you mean comparing entire table contents between
>> HDFS and the database?
>>
>> This is not currently supported by Sqoop validators. Most users
>> implement it by using Sqoop to re-load the table from HDFS to the
>> database and do the comparison within the DB (typically using hashes
>> or checksums).
>>
>> Gwen
>>
>> On Tue, Aug 12, 2014 at 12:56 AM, tobe <to...@gmail.com> wrote:
>> > An amateur question, can sqoop validate the values of date from each
>> > database?
>> >
>> > I have read
>> > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#validation and
>> > find
>> > out RowCountValidator. But what I want is to verify the values from each
>> > database.
>> >
>> > If it's not supported now, how can I validator the data from two
>> > databases?
>
>

Re: Can sqoop validate the data from each database

Posted by tobe <to...@gmail.com>.
Thanks @Gwen.

I want to compare the content between MySQL and HBase. It's not suitable to
use row count or checksums because the values are increasing. I think I
have to write a script to read from two databases and compare by myself.


On Wed, Aug 13, 2014 at 1:04 AM, Gwen Shapira <gs...@cloudera.com> wrote:

> By data validator you mean comparing entire table contents between
> HDFS and the database?
>
> This is not currently supported by Sqoop validators. Most users
> implement it by using Sqoop to re-load the table from HDFS to the
> database and do the comparison within the DB (typically using hashes
> or checksums).
>
> Gwen
>
> On Tue, Aug 12, 2014 at 12:56 AM, tobe <to...@gmail.com> wrote:
> > An amateur question, can sqoop validate the values of date from each
> > database?
> >
> > I have read
> > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#validation and
> find
> > out RowCountValidator. But what I want is to verify the values from each
> > database.
> >
> > If it's not supported now, how can I validator the data from two
> databases?
>

Re: Can sqoop validate the data from each database

Posted by Gwen Shapira <gs...@cloudera.com>.
By data validator you mean comparing entire table contents between
HDFS and the database?

This is not currently supported by Sqoop validators. Most users
implement it by using Sqoop to re-load the table from HDFS to the
database and do the comparison within the DB (typically using hashes
or checksums).

Gwen

On Tue, Aug 12, 2014 at 12:56 AM, tobe <to...@gmail.com> wrote:
> An amateur question, can sqoop validate the values of date from each
> database?
>
> I have read
> http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#validation and find
> out RowCountValidator. But what I want is to verify the values from each
> database.
>
> If it's not supported now, how can I validator the data from two databases?