You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by pratik khadloya <ti...@gmail.com> on 2014/09/16 20:19:58 UTC

Comparing sqoop's output to hdfs to data in mysql

Hello,

I am comparing the mysql data (by dumping into a file) to the textfile
imported by sqoop onto HDFS.
Am using the diff tool to do the same.

I observed the following differences:
mysql      -->       sqoop_text_output
\\n                       \n
\\t                        \n
\$                        $
0                         0.0

So, it seems like mysql auto escapes the output with a \. I got around that
by telling mysql not to do that so that i can compare properly. I had to
pass the --raw flag to mysql. Then the only difference i currently see is
that 0 being converted to 0.0 by sqoop (as mentioned in the docs).

How can i make mysql also convert the 0 to a 0.0 when it dumps to a csv?
Maybe the answer lines in the guts of sqoop that i can use myself. Or is it
possible to tell sqoop not to convert 0 to 0.0 ?

All in all, i am trying to verify the work done by sqoop for my
satisfaction. Once i verify the text data is being exported fine, i will
verify the same for the parquet format.

Thanks,
~Pratik

Re: Comparing sqoop's output to hdfs to data in mysql

Posted by pratik khadloya <ti...@gmail.com>.
Thanks Abraham. I was hoping to not have to do that since am dealing with
70+ queries.
In the mean time i am writing a script which uses sed to massage the files
and then compares.

Regards,
~Pratik

On Wed, Sep 17, 2014 at 12:18 PM, Abraham Elmahrek <ab...@cloudera.com> wrote:

> Try making your query a free-form query and casting the float to a string.
> Something like the following worked well for me:
>
> SELECT id, text, fl, CAST(fl AS CHAR(64)) FROM fl WHERE $CONDITIONS
>
> On Tue, Sep 16, 2014 at 2:41 PM, pratik khadloya <ti...@gmail.com>
> wrote:
>
>> Sure. The column in question is
>>
>> Field Type Null Default
>> usage_fee_percent double YES 0
>>
>> If we run a mysql select query the value is 0
>> If we cat the file on hdfs which is exported by sqoop the value is 0.0
>>
>> ======= Mysql Command =======
>> mysql -u <user> -p<pwd> -h <host> <db> --raw -e "SELECT
>> a.usage_fee_percent FROM accounts a" >
>> /home/pkhadloya/sqoop_out/mysql_accounts
>>
>> ======= Sqoop Command =======
>> bin/sqoop import -jt <jobtracker> --connect jdbc:mysql://.../<db>
>> --username <user> --password <pwd> --target-dir
>> /user/pkhadloya/sqoop/accounts --delete-target-dir --query "SELECT
>> a.usage_fee_percent FROM accounts a WHERE \$CONDITIONS" --num-mappers 1
>> --mapreduce-job-name accounts_sqoop_import --fields-terminated-by "\t"
>>  --as-textfile
>>
>> ======= Diff Command =======
>> bash -c "diff -U 0 <(tail -n +2 /home/pkhadloya/sqoop_out/mysql_accounts)
>> <(hadoop fs -cat /user/pkhadloya/sqoop/accounts/part-m-00*) >
>> /home/pkhadloya/sqoop_out/diff_accounts"
>>
>>
>> Thanks for looking into this.
>>
>> Regards,
>> Pratik
>>
>>
>> On Tue, Sep 16, 2014 at 2:11 PM, Abraham Elmahrek <ab...@cloudera.com>
>> wrote:
>>
>>> Hey there,
>>>
>>> Could you provide us with the table description (types) and the sqoop
>>> command you are running?
>>>
>>> -Abe
>>>
>>>
>>> On Tue, Sep 16, 2014 at 11:19 AM, pratik khadloya <ti...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am comparing the mysql data (by dumping into a file) to the textfile
>>>> imported by sqoop onto HDFS.
>>>> Am using the diff tool to do the same.
>>>>
>>>> I observed the following differences:
>>>> mysql      -->       sqoop_text_output
>>>> \\n                       \n
>>>> \\t                        \n
>>>> \$                        $
>>>> 0                         0.0
>>>>
>>>> So, it seems like mysql auto escapes the output with a \. I got around
>>>> that by telling mysql not to do that so that i can compare properly. I had
>>>> to pass the --raw flag to mysql. Then the only difference i currently see
>>>> is that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
>>>>
>>>> How can i make mysql also convert the 0 to a 0.0 when it dumps to a
>>>> csv? Maybe the answer lines in the guts of sqoop that i can use myself. Or
>>>> is it possible to tell sqoop not to convert 0 to 0.0 ?
>>>>
>>>> All in all, i am trying to verify the work done by sqoop for my
>>>> satisfaction. Once i verify the text data is being exported fine, i will
>>>> verify the same for the parquet format.
>>>>
>>>> Thanks,
>>>> ~Pratik
>>>>
>>>
>>>
>>
>

Re: Comparing sqoop's output to hdfs to data in mysql

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Try making your query a free-form query and casting the float to a string.
Something like the following worked well for me:

SELECT id, text, fl, CAST(fl AS CHAR(64)) FROM fl WHERE $CONDITIONS

On Tue, Sep 16, 2014 at 2:41 PM, pratik khadloya <ti...@gmail.com>
wrote:

> Sure. The column in question is
>
> Field Type Null Default
> usage_fee_percent double YES 0
>
> If we run a mysql select query the value is 0
> If we cat the file on hdfs which is exported by sqoop the value is 0.0
>
> ======= Mysql Command =======
> mysql -u <user> -p<pwd> -h <host> <db> --raw -e "SELECT
> a.usage_fee_percent FROM accounts a" >
> /home/pkhadloya/sqoop_out/mysql_accounts
>
> ======= Sqoop Command =======
> bin/sqoop import -jt <jobtracker> --connect jdbc:mysql://.../<db>
> --username <user> --password <pwd> --target-dir
> /user/pkhadloya/sqoop/accounts --delete-target-dir --query "SELECT
> a.usage_fee_percent FROM accounts a WHERE \$CONDITIONS" --num-mappers 1
> --mapreduce-job-name accounts_sqoop_import --fields-terminated-by "\t"
>  --as-textfile
>
> ======= Diff Command =======
> bash -c "diff -U 0 <(tail -n +2 /home/pkhadloya/sqoop_out/mysql_accounts)
> <(hadoop fs -cat /user/pkhadloya/sqoop/accounts/part-m-00*) >
> /home/pkhadloya/sqoop_out/diff_accounts"
>
>
> Thanks for looking into this.
>
> Regards,
> Pratik
>
>
> On Tue, Sep 16, 2014 at 2:11 PM, Abraham Elmahrek <ab...@cloudera.com>
> wrote:
>
>> Hey there,
>>
>> Could you provide us with the table description (types) and the sqoop
>> command you are running?
>>
>> -Abe
>>
>>
>> On Tue, Sep 16, 2014 at 11:19 AM, pratik khadloya <ti...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am comparing the mysql data (by dumping into a file) to the textfile
>>> imported by sqoop onto HDFS.
>>> Am using the diff tool to do the same.
>>>
>>> I observed the following differences:
>>> mysql      -->       sqoop_text_output
>>> \\n                       \n
>>> \\t                        \n
>>> \$                        $
>>> 0                         0.0
>>>
>>> So, it seems like mysql auto escapes the output with a \. I got around
>>> that by telling mysql not to do that so that i can compare properly. I had
>>> to pass the --raw flag to mysql. Then the only difference i currently see
>>> is that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
>>>
>>> How can i make mysql also convert the 0 to a 0.0 when it dumps to a csv?
>>> Maybe the answer lines in the guts of sqoop that i can use myself. Or is it
>>> possible to tell sqoop not to convert 0 to 0.0 ?
>>>
>>> All in all, i am trying to verify the work done by sqoop for my
>>> satisfaction. Once i verify the text data is being exported fine, i will
>>> verify the same for the parquet format.
>>>
>>> Thanks,
>>> ~Pratik
>>>
>>
>>
>

Re: Comparing sqoop's output to hdfs to data in mysql

Posted by pratik khadloya <ti...@gmail.com>.
Sure. The column in question is

Field Type Null Default
usage_fee_percent double YES 0

If we run a mysql select query the value is 0
If we cat the file on hdfs which is exported by sqoop the value is 0.0

======= Mysql Command =======
mysql -u <user> -p<pwd> -h <host> <db> --raw -e "SELECT a.usage_fee_percent
FROM accounts a" > /home/pkhadloya/sqoop_out/mysql_accounts

======= Sqoop Command =======
bin/sqoop import -jt <jobtracker> --connect jdbc:mysql://.../<db>
--username <user> --password <pwd> --target-dir
/user/pkhadloya/sqoop/accounts --delete-target-dir --query "SELECT
a.usage_fee_percent FROM accounts a WHERE \$CONDITIONS" --num-mappers 1
--mapreduce-job-name accounts_sqoop_import --fields-terminated-by "\t"
 --as-textfile

======= Diff Command =======
bash -c "diff -U 0 <(tail -n +2 /home/pkhadloya/sqoop_out/mysql_accounts)
<(hadoop fs -cat /user/pkhadloya/sqoop/accounts/part-m-00*) >
/home/pkhadloya/sqoop_out/diff_accounts"


Thanks for looking into this.

Regards,
Pratik


On Tue, Sep 16, 2014 at 2:11 PM, Abraham Elmahrek <ab...@cloudera.com> wrote:

> Hey there,
>
> Could you provide us with the table description (types) and the sqoop
> command you are running?
>
> -Abe
>
>
> On Tue, Sep 16, 2014 at 11:19 AM, pratik khadloya <ti...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I am comparing the mysql data (by dumping into a file) to the textfile
>> imported by sqoop onto HDFS.
>> Am using the diff tool to do the same.
>>
>> I observed the following differences:
>> mysql      -->       sqoop_text_output
>> \\n                       \n
>> \\t                        \n
>> \$                        $
>> 0                         0.0
>>
>> So, it seems like mysql auto escapes the output with a \. I got around
>> that by telling mysql not to do that so that i can compare properly. I had
>> to pass the --raw flag to mysql. Then the only difference i currently see
>> is that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
>>
>> How can i make mysql also convert the 0 to a 0.0 when it dumps to a csv?
>> Maybe the answer lines in the guts of sqoop that i can use myself. Or is it
>> possible to tell sqoop not to convert 0 to 0.0 ?
>>
>> All in all, i am trying to verify the work done by sqoop for my
>> satisfaction. Once i verify the text data is being exported fine, i will
>> verify the same for the parquet format.
>>
>> Thanks,
>> ~Pratik
>>
>
>

Re: Comparing sqoop's output to hdfs to data in mysql

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey there,

Could you provide us with the table description (types) and the sqoop
command you are running?

-Abe

On Tue, Sep 16, 2014 at 11:19 AM, pratik khadloya <ti...@gmail.com>
wrote:

> Hello,
>
> I am comparing the mysql data (by dumping into a file) to the textfile
> imported by sqoop onto HDFS.
> Am using the diff tool to do the same.
>
> I observed the following differences:
> mysql      -->       sqoop_text_output
> \\n                       \n
> \\t                        \n
> \$                        $
> 0                         0.0
>
> So, it seems like mysql auto escapes the output with a \. I got around
> that by telling mysql not to do that so that i can compare properly. I had
> to pass the --raw flag to mysql. Then the only difference i currently see
> is that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
>
> How can i make mysql also convert the 0 to a 0.0 when it dumps to a csv?
> Maybe the answer lines in the guts of sqoop that i can use myself. Or is it
> possible to tell sqoop not to convert 0 to 0.0 ?
>
> All in all, i am trying to verify the work done by sqoop for my
> satisfaction. Once i verify the text data is being exported fine, i will
> verify the same for the parquet format.
>
> Thanks,
> ~Pratik
>